Sustainable Product Database (User Guide)

Using Sustainable Product Database

In order to use the sustainable product database, the user must

Download the database (provided in compressed .gzip format) from the GitHub repository.
Install Sqlite3 and Pandas python modules if not installed already.
Use the UseSustainableProductDB.ipynb notebook to load the view containing sustainable products.
Additionally, data in the database can be queried and retrieved through simple SQL queries.

Building Data Pipeline From Scratch

The design given in the project report gives a fair understanding of the process of creating a data pipeline. MainProjectPipeline.ipynb notebook is an actual implementation of one such pipeline from scratch. A summary of the steps to be followed is listed below:

Read the product dataset from any source and explore the fields
Decide on the pre-processing pipeline design for that data – leverage ColumnDropper, RowDropper, StringCleaner classes
Load the ontology data (already processed data – vocab_updated.xlsx available on Git repo).
Implement the keyword extraction step using KeywordExtractor class.
Implement keyword mapper using KeywordMapper class.
Investigate the results and tune the vocabulary as required.
Insert the data generated at different stages into the sustainable product database using the DBWriter class.
Save the keyword extractor and mapper objects using Pickle Python module in order to reuse later.

Building Data Pipeline using Pre-Existing Data

If there is an already existing data pipeline with keyword extractor and mapper classes saved, then, the same objects can be used on a different product database. SecondDataPipeline.ipynb is an example of this design. To do this, the steps listed below can be followed.

Read the product dataset from any source and explore the fields
Decide on the pre-processing pipeline design for that data – leverage ColumnDropper, RowDropper, StringCleaner classes
Load the keyword extractor object and extract keywords from the new product data.
Load the keyword mapper object, set the results of the previous step in this object and then map the keywords using the ontology data already saved in the object.
Investigate the results and tune the vocabulary as required.
Insert the data generated at different stages into the sustainable product database using the DBWriter class.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
resources		resources
.gitignore		.gitignore
ComparitiveAnalysis.ipynb		ComparitiveAnalysis.ipynb
Data Exploration.ipynb		Data Exploration.ipynb
MainProjectPipeline.ipynb		MainProjectPipeline.ipynb
ProjectModules.ipynb		ProjectModules.ipynb
README.md		README.md
SecondDataPipeline.ipynb		SecondDataPipeline.ipynb
UseSustainableProductDB.ipynb		UseSustainableProductDB.ipynb
sustainable_product_db_final.db.gz		sustainable_product_db_final.db.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sustainable Product Database (User Guide)

Using Sustainable Product Database

Building Data Pipeline From Scratch

Building Data Pipeline using Pre-Existing Data

About

Releases

Packages

Languages

AnushreeSS/SustainableProductDatabase

Folders and files

Latest commit

History

Repository files navigation

Sustainable Product Database (User Guide)

Using Sustainable Product Database

Building Data Pipeline From Scratch

Building Data Pipeline using Pre-Existing Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages