Skip to content

Import Excel spreadsheets into MySQL tables

License

Notifications You must be signed in to change notification settings

Deltaco-AB/excel-to-mysql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Excel to MySQL

A strict one-to-one importer of Excel .xlsx files to MySQL tables.

The post-processor functions enables fine tuning of the imported data such as type casting and indexing.

Installation

Prerequisites

You can install pip3 on Debian/Ubuntu with

sudo apt-get install python3-pip

Debian/Ubuntu

  1. Clone or download a zip of this repo
git clone https://github.com/Deltaco-AB/excel-to-mysql.git
  1. cd into the project directory and run the following to install all system dependencies
cat requirements.system | xargs sudo apt-get install
  1. Install Python dependencies with pip3
python3 -m pip install -r requirements.txt

Basic usage (Quick import)

Import one or more Excel file(s).

  1. Place your .xlsx file(s) in the sheets/ directory.
  2. Run import_all.py
python3 import_all.py
  1. A config file called mysql_config.json has been created in the project root dir.
  2. Enter your MySQL credentials under the server section. Everything else can be left as default.
"server": {
  "mysql_host": "IP_OR_HOSTNAME:CUSTOM_PORT",
  "mysql_user": "MYSQL_USERNAME",
  "mysql_passwd": "MYSQL_PASSWORD",
  "mysql_db": "MYSQL_DATABASE"
}

If rebuild_tables is true (which is the default setting), a MySQL user with these privileges are required

Setting this value to false will append columns and rows to an already existing database and table, in which case; these privileges are required.


  1. Run import_all.py again and if everything is configured correctly, the import should now start.
python3 import_all.py

Basic import with post-processing

Perform the steps in the basic usage guide first. These functions are run after each imported Excel


Type casting

This script uses pandas.DataFrame.dtypes to determine the data type for each column.

A translation is performed to convert pandas data types to SQL data types (datetime64[ns] becomes DATE etc.). It's more than likely that the data type provided by pandas and the translation done on it afterwards can differ from the actual data type. You can therefore define columns in your mysql_config.json where the data type is known, and will be changed after insertion is done.

example:

"post_processing": {
  "change": {
    "I_am_a_String": "VARCHAR(256)",
    "I_am_not_a_Float": "DOUBLE"
  }
}

Indexing

Primary key

The table created will exist without a primary key. You can define a column name in your mysql_config.json to use as primary key

example:

"post_processing": {
  "index": {
    "primary": "uuid"
  }
}

Additional indices

You can define columns to create indices for by specifying them by name in your mysql_config.json

example:

"post_processing": {
  "index": {
    "columns": [
      "ItemID",
      "SearchKeywords",
      "AvailableFlag"
    ]
  }
}

MySQL Privileges

Base Privileges

The following MySQL privileges are required to run this script.

Data Structure Administraton
INSERT ALTER
UPDATE

Standard Privileges

The following MySQL privileges are required to run this script with rebuild_tables, which is recommended and enabled by default.

Data Structure Administraton
INSERT ALTER
UPDATE INDEX
DROP
CREATE