Name | Description |
---|---|
config.py |
Configuration variables |
requirements.txt |
Required Python packages |
service.py |
Script for model deployment |
train_model.py |
Script for model training |
To reduce headaches with incompatible package versions across multiple projects, it is preferred to maintain separate environments. After creating and activating a conda environment, all required packages are available just for the environment.
-
Install Anaconda
-
Create conda environment:
conda create -y --name <env-name> python=3.7
-
Install the packages from the
requirements.txt
:conda install --force-reinstall -y -q --name <env-name> -c conda-forge --file requirements.txt
-
Activate the environment:
conda activate <env-name>
Summary of steps is following:
-
Data preparation
-
Vectorizer data
-
Classifier data
-
-
Training
-
Run
python train_model.py
-
-
Deployment
-
Run
python service.py
-
train_model.py
expects two CSV files. The first for the vectorizer training and the second for the classifier training.
SQL Query to fetch the data:
SELECT q.text AS question, mp.id AS spec_id FROM question q JOIN medical_problem mp ON q.medical_problem_id = mp.id;
Place the datasets in the root folder with proper names as described in config.py
:
-
VOCABULARY_DATA_PATH = 'data_vectorizer.csv'
-
CLASSIFIER_DATA_PATH = 'data_classifier.csv'
For the vectorizer training, provide as much data as possible. For the classifier training, you can provide less, but cleaner data (double check that spec. id’s are correctly assigned) for better accuracy. Keep in mind, that reducing the data, can worsen the final accuracy.
Expected CSV format:
Name | Type | Description |
---|---|---|
question |
string |
Question asked by the client |
spec_id |
number |
Specialization ID |
To train the model, run python train_model.py
.
Training script expects these configuration variables:
Name | Type | Description | Example |
---|---|---|---|
PICKLE_VECTORIZER_NAME |
string |
Vectorizer file name |
|
PICKLE_CLASSIFIER_NAME |
string |
Classifier file name |
|
VOCABULARY_DATA_PATH |
string |
Vectorizer data file name |
|
CLASSIFIER_DATA_PATH |
string |
Classifier data file name |
|
RANDOM_STATE |
number |
Random number to ensure consistent training results |
|
MAPPING |
dictionary |
Mapping of the specialization IDs |
|
-
Classifier model stored in
CLASSIFIER_DATA_PATH
-
Vectorizer model stored in
VOCABULARY_DATA_PATH
If you want to add additional specialization, simply add the key-value pair into the dictionary.
Key represents the source specialization and value represents the target specialization
(e.g. to map from id 11 to id 4, just add the pair as: { 11: 4 }
)
There are two rules to follow:
-
Target ID 0 is reserved for the default/unmapped classes
-
Target IDs must create a sequence of 1 to N (There can be no skipped numbers from 1 to N).
To deploy the model, run python service.py
. This script will deploy the model as a service using the
Flask micro web framework and the Waitress WSGI server.
Host, port, API version and prefix can be configured in the config.py
.
URL |
/api/v1/predictions/specialization |
---|---|
query param |
question: string |
Method |
GET |
Response |
200 OK - returns number // ID of predicted specialization 500 Internal Server Error |
Example |
Request: http://localhost:5000/api/v1/predictions/specialization?question='Trápí mě zubní kaz' Response: 4 |
Copyright (c) 2021 Adam Jankovec
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.