-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from fabiobove-dr/develop
Develop
- Loading branch information
Showing
30 changed files
with
1,009 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
[run] | ||
omit = *tests*, commons.py, *config.py, !src/commons.py, src/logger/*, app.py, __init__.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
[flake8] | ||
max-line-length = 120 | ||
exclude = | ||
.venv, | ||
init.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# Auto detect text files and perform LF normalization | ||
* text=auto |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
name: Unit Tests | ||
|
||
on: | ||
push: | ||
branches: | ||
- main | ||
|
||
jobs: | ||
test: | ||
name: Run tests and collect coverage | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v4 | ||
with: | ||
fetch-depth: 0 | ||
|
||
- name: Set up Python | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: "3.10" | ||
|
||
- name: Install dependencies (pip) | ||
run: | | ||
python -m pip install --upgrade pip | ||
# Install main dependencies | ||
pip install -e . | ||
# Install testing dependencies explicitly | ||
pip install .[test] | ||
- name: Run tests | ||
run: pytest --cov --cov-report=xml | ||
|
||
- name: Upload results to Codecov | ||
uses: codecov/codecov-action@v4 | ||
with: | ||
token: ${{ secrets.CODECOV_TOKEN }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
name: Publish to PyPI | ||
|
||
on: | ||
push: | ||
tags: | ||
- '*' # Triggers on any tag (e.g., 1.0.0, 1.1.0) | ||
|
||
jobs: | ||
release: | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Checkout Code | ||
uses: actions/checkout@v3 | ||
|
||
- name: Set up Python | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: '3.9' | ||
|
||
- name: Install Dependencies | ||
run: | | ||
python -m pip install --upgrade pip setuptools wheel build twine | ||
- name: Extract Version from Tag | ||
id: get_version | ||
run: echo "::set-output name=version::${GITHUB_REF##*/}" | ||
|
||
- name: Update Version in pyproject.toml | ||
run: | | ||
sed -i "s/^version = .*/version = \"${{ steps.get_version.outputs.version }}\"/" pyproject.toml | ||
- name: Build the Package | ||
run: python -m build | ||
|
||
- name: Publish to PyPI | ||
env: | ||
TWINE_USERNAME: "__token__" | ||
TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }} | ||
run: python -m twine upload dist/* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,123 @@ | ||
 | ||
|
||
<hr> | ||
|
||
# better-moles-patent-finder | ||
A tool designed to enhance patent discovery by leveraging MongoDB for efficient storage, querying, and analysis of patent data. This repository includes features to streamline patent searches, improve retrieval accuracy, and support advanced filtering and indexing capabilities. | ||
|
||
[](https://codecov.io/gh/tacclab/bio_dataset_manager) | ||
[](https://pypi.org/project/better-moles-patent-finder/) | ||
<br> | ||
[]() | ||
[](https://github.com/tacclab/bio_dataset_manager/blob/main/LICENSE)<br> | ||
|
||
|
||
## Overview | ||
This project offers a powerful platform for patent research, combining advanced search features with a MongoDB backend to store, retrieve, and analyze patent-related data efficiently. It allows users to search for patents associated with chemical compounds, leveraging SMILES, InChI, and other molecular representations. The system also supports filtering by molecular structure, patent ID, and other criteria. | ||
<hr> | ||
|
||
This project is based on the **PatCID** paper, which focuses on the identification and classification of patent data related to molecular structures. The techniques and methodologies from the PatCID framework are utilized to enhance patent search results by leveraging chemical informatics and advanced query techniques. The core concept of this project builds upon PatCID's ability to match molecular structures with relevant patent information, improving the overall efficiency and accuracy of patent searches. | ||
To check out their incredible work, visit the [PatCID GitHub repository](https://github.com/DS4SD/PatCID). | ||
<hr> | ||
|
||
**Key Features:** | ||
- **Patent Search**: Search patents by their ID or associated molecular properties. | ||
- **Advanced Filtering**: Filter patents based on molecular structure, chemical formula, and other relevant fields. | ||
- **Efficient Querying**: Use MongoDB's indexing and querying capabilities to retrieve patents quickly. | ||
- **Data Model**: The system stores patents and associated molecules in a structured format, making it easy to extend and scale. | ||
|
||
|
||
## Authors: | ||
- Fabio Bove | fabio.bove.dr@gmail.com<br> | ||
<hr> | ||
|
||
## What is it? | ||
This tool is designed to assist researchers and patent professionals in finding relevant patents related to chemical compounds using molecular representations like SMILES and InChI. By using MongoDB as the backend, it efficiently stores and indexes large volumes of patent and molecular data. Users can easily query patents, filter based on molecular structures, and retrieve precise results with high speed. | ||
|
||
**Key Features:** | ||
- **Patent Search**: Search patents by their ID or associated molecular properties. | ||
- **Advanced Filtering**: Filter patents based on molecular structure, chemical formula, and other relevant fields. | ||
- **Efficient Querying**: Use MongoDB's indexing and querying capabilities to retrieve patents quickly. | ||
- **Data Model**: The system stores patents and associated molecules in a structured format, making it easy to extend and scale. | ||
|
||
<hr> | ||
|
||
## Mongo Documents Format | ||
|
||
The MongoDB documents used by this project follow the structure below, which includes information about the molecule (using SMILES, InChI, etc.) and the associated patent IDs: | ||
|
||
```json | ||
{ | ||
"molecule": { | ||
"smiles": "Brc1cc(-c2ccccc2)nc(-c2ccc3c4ccccc4c4ccccc4c3c2)c1", | ||
"inchi": "InChI=1S/C29H18BrN/c30-21-17-28(19-8-2-1-3-9-19)31-29(18-21)20-14-15-26-24-12-5-4-10-22(24)23-11-6-7-13-25(23)27(26)16-20/h1-18H", | ||
"inchikey": "UPAWJZOAEGLCFP-UHFFFAOYSA-N", | ||
"sum_formula": "C29H18BrN", | ||
"conf": 0.57 | ||
}, | ||
"patents": [ | ||
{"id": "US20200136057A1"}, | ||
{"id": "US20200136057"} | ||
] | ||
} | ||
``` | ||
|
||
## Mongo Documents Format | ||
|
||
- **molecule**: Contains the molecular data (SMILES, InChI, InChIKey, sum formula). | ||
- **patents**: A list of patent IDs that are associated with the molecule. | ||
|
||
<hr> | ||
|
||
## Usage | ||
|
||
### Installation | ||
|
||
You can install the `better-moles-patent-finder` package via `pip` from PyPI or clone the repository to run locally: | ||
|
||
#### Install from PyPI: | ||
```bash | ||
pip install better-moles-patent-finder | ||
``` | ||
|
||
#### Basic Usage | ||
|
||
Once installed, you can start querying patents using the provided API or Running as a Script | ||
|
||
You can run the project as a script by passing a configuration file path: | ||
```bash | ||
better-moles-patent-finder --config-path /path/to/config_file.yaml | ||
``` | ||
|
||
```python | ||
from better_moles_patent_finder import PatentFinder | ||
|
||
# Create a PatentFinder instance | ||
pf = PatentFinder() | ||
|
||
# Search for a patent by ID | ||
result = pf.search_by_patent_id('US20200136057A1') | ||
|
||
# Search for patents by molecule structure (SMILES) | ||
result = pf.search_by_smiles('Brc1cc(-c2ccccc2)nc(-c2ccc3c4ccccc4c4ccccc4c3c2)c1') | ||
|
||
# Print the result | ||
print(result) | ||
``` | ||
|
||
MongoDB Connection | ||
|
||
Ensure MongoDB is running and accessible. The default connection string is configured in the project. You can modify it if necessary in the mongo_connector.py file. | ||
```python | ||
from better_moles_patent_finder import MongoConnector | ||
|
||
# Connect to the MongoDB database | ||
mongo = MongoConnector() | ||
mongo.connect() | ||
|
||
# Perform queries and operations | ||
``` | ||
|
||
--- | ||
## License | ||
This project is licensed under the terms of the GNU General Public License, Version 3. |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
{ | ||
"MONGO_CONNECTION": { | ||
"host": "", | ||
"port": 27017, | ||
"db_name": "", | ||
"collection": "", | ||
"username": "", | ||
"password": "", | ||
"auth_db": "" | ||
}, | ||
"REPORT_FOLDER": "examples/outputs", | ||
"DATA_FOLDER": "examples/data.csv" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
version: '3.1' | ||
|
||
services: | ||
mongo: | ||
image: mongo | ||
container_name: molecules_mongo | ||
restart: always | ||
env_file: | ||
- ./environment/.env | ||
networks: | ||
- mongo-network | ||
ports: | ||
- "27017:27017" | ||
|
||
mongo-express: | ||
image: mongo-express | ||
container_name: molecules_mongo_express | ||
restart: always | ||
ports: | ||
- "127.0.0.1:8081:8081" | ||
env_file: | ||
- ./environment/.env | ||
networks: | ||
- mongo-network | ||
|
||
networks: | ||
mongo-network: | ||
driver: bridge | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
[project] | ||
name = "better-moles-patent-finder" | ||
dynamic = ["version"] # This relies on git tags for versioning | ||
authors = [ | ||
{ name = "Fabio Bove", email = "fabio.bove.dr@gmail.com" } | ||
] | ||
description = "A tool designed to enhance patent discovery by leveraging MongoDB for efficient storage, querying, and analysis of patent data. This repository includes features to streamline patent searches, improve retrieval accuracy, and support advanced filtering and indexing capabilities." | ||
readme = "README.md" | ||
license = { file = "LICENSE" } | ||
keywords = ["molecules", "bioinformatics", "patent-finder", "molecules-evaluation", "mongo_db"] | ||
|
||
# Specify the required dependencies with correct versions | ||
dependencies = [ | ||
"numpy>=1.0.0,<2.0.0", | ||
"pandas==2.2.3", | ||
"pymongo==4.9", | ||
"rdkit==2024.3.6", | ||
"tqdm==4.67.1", | ||
] | ||
|
||
# Define optional dependencies like testing | ||
[project.optional-dependencies] | ||
test = [ | ||
"pytest==7.2.1", | ||
"pytest-cov==5.0.0", | ||
"pytest-mock==3.14.0", | ||
] | ||
|
||
[project.urls] | ||
Homepage = "https://github.com/fabiobove-dr/better-moles-patent-finder" | ||
Issues = "https://github.com/fabiobove-dr/better-moles-patent-finder/issues" | ||
|
||
# Setuptools git versioning configuration | ||
[tool.setuptools-git-versioning] | ||
enabled = true | ||
|
||
[build-system] | ||
requires = ["setuptools>=61.0", "setuptools-git-versioning>=2.0,<3", "wheel"] | ||
build-backend = "setuptools.build_meta" | ||
|
||
[project.scripts] | ||
better-moles-patent-finder = "mol_patents_find:main" |
Empty file.
Oops, something went wrong.