This project provides an easy set of tools to transfer data between a SQL database and Google Drive. It was developed at Portland Public Schools (PPS) in Portland, OR to help administrators and analysts programatically share sensitive data with schools. For example, this interface allows analysts to create simple python scripts that pull data from an internal database and populate individual school folders on the Google Drive. Users can format spreadsheets, upload graphs, change permissions, and so forth.
The advantage of using Google Drive to communicate with schools is:
- School principals, secretaries, and administrators are familiar with Google Drive
- School districts frequently already have security contracts in place with Google
These instructions will get you a copy of the project up and running on your local machine. Note that you need Python 3.6 or higher (Python 2.7 is not supported.)
Getting up an running takes three steps:
- First, we need to install the actual Python package (and dependencies)
- Next we need to obtain credentials from Google so you can access and modify server resources
- Lastly, we need to create a text file containing information for connecting to your internal database
First, open a command prompt and install the package using PIP. This will automatically install dependencies (which are listed below if you'd rather install them independently):
pip install sql-google-interface
You can check that the package has been installed by examining your installed python packages:
pip freeze
Go to the Google API console. If your organization already has a project, ask the project owner to grant you member permissions. Otherwise, create a new project for your organization and call it something like "SQL-google-interface".
Navigate to the project, and enable the Google Drive and Google Sheets API's.
Create credentials!
- Go to the "Credentials" tab on the left-hand sidebar
- Click "Create Credentials" -> "OAuth client ID"
- Select "Other" as the Application Type. For the name, use something like "your-name-client-id".
- After clicking create, download the client ID, rename it to
client_secret.json
, and place it in yourC:/
drive.
Create a text file called server_connection_data.txt
and place it in a directory like C:\connection_data\
.
This file should look like:
server = [server_name]
database = [database_name]
All done! You are ready to run the basic tests.
If you're interested, here is a list of package dependencies
backoff
google-api-python-client
numpy
pyodbc
pandas
This repository comes with some basic tests to make sure your server connections and Google credentials are working appropriately.
Download ./tests/basic_test.py
. Make sure the script contains the correct paths to your client_secret.json
and server_connection_data.txt
files.
client_secret_file = "C:/client_secret.json"
connection_data = "C:/connection_data/server_connection_data.txt"
Next, cd
into the appropriate directory and run the basic_test.py
file. (The first time this is run, you'll need to authorize your credentials by following the OAuth authorization flow. This only happens once, after which you can delete your client_secret.json
file.)
Here is a short example script that takes data from a SQL database and uploads it into a google sheet. If it has already been run today, it deletes the current spreadsheet and creates a new one. Notice how custom metadata is used to identify files.
from datetime import date
from sql_google_interface import interface
server = <your-server-name>
database_name = <your-database-name>
parent_folder_id = <parent-folder-id>
SQL_filepath = "./data.sql"
client_secret_file = "C:/client_secret.json"
# get the connection to the SQL server and get data
cnn = interface.get_server_connection(server, database_name)
dataframe = interface.get_data_from_server(cnn, SQL_filepath)
cnn.close()
# get credentials for uploading data
credentials = interface.get_credentials(client_secret_file)
# create a drive and sheets service to interact with Google
drive_service = interface.get_drive_service(credentials=credentials, service_type='drive')
sheets_service = interface.get_drive_service(credentials=credentials, service_type='sheets')
# Create filename using month abbreviation, day, and year
today = date.today().strftime("%b-%d-%Y")
data_filename = "Data run {}".format(today)
# search to see if a file has already been uploaded today; if so delete it
file_data_from_search = interface.get_files_from_drive(drive_service=drive_service,
parent_id=parent_folder_id,
custom_metadata={"data-date" : today})
file_ids_from_search = [item['id'] for item in file_data_from_search]
if file_ids_from_search:
print("There has already been a file created today. I will overwrite it with new data.")
interface.delete_drive_files_by_ID(drive_service=drive_service, list_of_file_ids=file_ids_from_search)
sheet_id = interface.create_spreadsheet(drive_service=drive_service,
spreadsheet_name=data_filename,
parent_folder_list=[parent_folder_id],
custom_metadata={"data-date" : today})
print("Created spreadsheet with id {}".format(sheet_id))
# upload data to the spreadsheet
interface.populate_spreadsheet_from_df(sheets_service, sheet_id, dataframe)
# format the spreadhseet
interface.format_spreadsheet(sheets_service, sheet_id, wrap_strategy="WRAP")
- Google Drive API - The API used to communicate with Google Drive
- Google Sheets API - The API used to interact with the contents of Google sheets
- Will Kearney - Initial work - GitHub
This project is licensed under the MIT License - see the LICENSE.md file for details
- Shawn Helm - Principal Analyst at PPS - PPS Analytics Homepage