Project Description: This project is a Python FastAPI application that serves as an API. It provides three endpoints to interact with the application. Basically, this project is a link in the chain of a complete ML project and it plays the role of automatic feature extraction for a dataset.
- Clone the repo:
git clone https://github.com/mordilos/feature_engineering.git
- Navigate to the repo dir:
cd feature_engineering
- Use the compose.yml to build and run the app:
docker-compose -f compose.yml up
- Clone the repository:
git clone https://github.com/mordilos/feature_engineering.git
- Navigate to the project directory:
cd feature_engineering
Now you can choose your next step.- Locally install everything
- Create new venv:
python3 -m venv venv_name
- Activate the venv:
source venv_name/bin/activate
- Install the dependencies:
pip install -r requirements.txt
- Start the FastAPI server:
python src/main.py
- Access the endpoints using a web browser or an API client.
- You can also run the tests that are in the test folder in
test_main.py
.
- Create new venv:
- Use the Dockerfile to build the image
docker build -t feature_engineering_image .
and then create and run the containerdocker run -d --name fe_container -p 8000:8000 feature_engineering_image
- Locally install everything
- Description: Returns the built-in swagger documentation.
- URL:
http://localhost:8000/docs
- Method:
GET
- Description: Returns all the available endpoints of the app.
- URL:
http://localhost:8000/
- Method:
GET
- Response:
{ "endpoints": [ "/", "/status", "/features_file", "/features_json" ] }
- Description: Returns the status of the application.
- URL:
http://localhost:8000/status
- Method:
GET
- Response:
{ "status": "UP" }
- Description: Automatic feature extraction for data given in json file.
- URL:
http://localhost:8000/features_file
- Method:
POST
- Request Body:
{ "file": "<path-to-json-file>", "feature_selection": ["keyword1,keyword2"] }
file
(required): Path to the JSON file containing user data.feature_selection
(optional): List of strings specifying methods to filter the data, [highly_null_features, single_value_features, highly_correlated_features]
(the user can choose between 0 and 3 values) based on https://featuretools.alteryx.com/en/stable/guides/feature_selection.html
- Response:
{ "feature_matrix": "<extracted-feature-matrix>" }
feature_matrix
: JSON representation of the extracted feature matrix.
- Description: Automatic feature extraction for data given in json form.
- URL:
http://localhost:8000/features_json
- Method:
POST
- Request Body:
{ "data": [ { "customer_ID": "string", "loans": [ { "customer_ID": "string", "loan_date": "string", "amount": "string", "fee": "string", "loan_status": "string", "term": "string", "annual_income": "string" } ] } ], "feature_selection": [ "string" ]
}
- `data` (required): data in json format
- `feature_selection` (optional): List of strings specifying methods to filter the data, [highly_null_features, single_value_features, highly_correlated_features]<br /> (the user can choose between 0 and 3 values) based on https://featuretools.alteryx.com/en/stable/guides/feature_selection.html
- Response:
```json
{
"feature_matrix": "<extracted-feature-matrix>"
}
feature_matrix
: JSON representation of the extracted feature matrix.
- Start the FastAPI server.
- Open a web browser or an API client.
- Go to
http://localhost:8000/docs
for the built-in swagger documentation. From there you can test all the other endpoints. - Send a GET request to
http://localhost:8000/
to get all available endpoints. - Send a GET request to
http://localhost/status:8000
to check the application status. - Send a POST request to
http://localhost/features:8000
with a JSON body containing the path to the data file and optional keywords to extract features.
CLI example that uses all the feature selection algorithms:curl -X 'POST' \
'http://localhost:8000/features' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@cvas_data.json;type=application/json' \
-F 'feature_selection=highly_null_features,single_value_features,highly_correlated_features'
- Receive the extracted feature matrix in the response.