Skip to content

Commit

Permalink
Merge pull request #1 from fabiobove-dr/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
fabiobove-dr authored Dec 7, 2024
2 parents 3d487da + 1c9531b commit cc48816
Show file tree
Hide file tree
Showing 30 changed files with 1,009 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[run]
omit = *tests*, commons.py, *config.py, !src/commons.py, src/logger/*, app.py, __init__.py
5 changes: 5 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[flake8]
max-line-length = 120
exclude =
.venv,
init.py
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Auto detect text files and perform LF normalization
* text=auto
38 changes: 38 additions & 0 deletions .github/workflows/codecov.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Unit Tests

on:
push:
branches:
- main

jobs:
test:
name: Run tests and collect coverage
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"

- name: Install dependencies (pip)
run: |
python -m pip install --upgrade pip
# Install main dependencies
pip install -e .
# Install testing dependencies explicitly
pip install .[test]
- name: Run tests
run: pytest --cov --cov-report=xml

- name: Upload results to Codecov
uses: codecov/codecov-action@v4
with:
token: ${{ secrets.CODECOV_TOKEN }}
40 changes: 40 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: Publish to PyPI

on:
push:
tags:
- '*' # Triggers on any tag (e.g., 1.0.0, 1.1.0)

jobs:
release:
runs-on: ubuntu-latest

steps:
- name: Checkout Code
uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'

- name: Install Dependencies
run: |
python -m pip install --upgrade pip setuptools wheel build twine
- name: Extract Version from Tag
id: get_version
run: echo "::set-output name=version::${GITHUB_REF##*/}"

- name: Update Version in pyproject.toml
run: |
sed -i "s/^version = .*/version = \"${{ steps.get_version.outputs.version }}\"/" pyproject.toml
- name: Build the Package
run: python -m build

- name: Publish to PyPI
env:
TWINE_USERNAME: "__token__"
TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}
run: python -m twine upload dist/*
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
# Additional
.idea
dataset
.env
reports
report.csv
/configs.json

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
121 changes: 121 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,123 @@
![icon](icon.png)

<hr>

# better-moles-patent-finder
A tool designed to enhance patent discovery by leveraging MongoDB for efficient storage, querying, and analysis of patent data. This repository includes features to streamline patent searches, improve retrieval accuracy, and support advanced filtering and indexing capabilities.

[![Coverage](https://codecov.io/github/fabiobove-dr/better-moles-patent-finder/coverage.svg?branch=main)](https://codecov.io/gh/tacclab/bio_dataset_manager)
[![PyPI Latest Release](https://img.shields.io/pypi/v/better-moles-patent-finder.svg)](https://pypi.org/project/better-moles-patent-finder/)
![Unit Tests](https://github.com/fabiobove-dr/better-moles-patent-finder/actions/workflows/main.yml/badge.svg)<br>
[![Powered by Fabio](https://img.shields.io/badge/powered%20by-Fabio-orange.svg?style=flat&colorA=E1523D&colorB=007D8A)]()
[![License](https://img.shields.io/github/license/fabiobove-dr/better-moles-patent-finder.svg)](https://github.com/tacclab/bio_dataset_manager/blob/main/LICENSE)<br>


## Overview
This project offers a powerful platform for patent research, combining advanced search features with a MongoDB backend to store, retrieve, and analyze patent-related data efficiently. It allows users to search for patents associated with chemical compounds, leveraging SMILES, InChI, and other molecular representations. The system also supports filtering by molecular structure, patent ID, and other criteria.
<hr>

This project is based on the **PatCID** paper, which focuses on the identification and classification of patent data related to molecular structures. The techniques and methodologies from the PatCID framework are utilized to enhance patent search results by leveraging chemical informatics and advanced query techniques. The core concept of this project builds upon PatCID's ability to match molecular structures with relevant patent information, improving the overall efficiency and accuracy of patent searches.
To check out their incredible work, visit the [PatCID GitHub repository](https://github.com/DS4SD/PatCID).
![scratches(1).png](..%2F..%2F..%2F..%2FDownloads%2Fscratches%281%29.png)<hr>

**Key Features:**
- **Patent Search**: Search patents by their ID or associated molecular properties.
- **Advanced Filtering**: Filter patents based on molecular structure, chemical formula, and other relevant fields.
- **Efficient Querying**: Use MongoDB's indexing and querying capabilities to retrieve patents quickly.
- **Data Model**: The system stores patents and associated molecules in a structured format, making it easy to extend and scale.


## Authors:
- Fabio Bove | fabio.bove.dr@gmail.com<br>
<hr>

## What is it?
This tool is designed to assist researchers and patent professionals in finding relevant patents related to chemical compounds using molecular representations like SMILES and InChI. By using MongoDB as the backend, it efficiently stores and indexes large volumes of patent and molecular data. Users can easily query patents, filter based on molecular structures, and retrieve precise results with high speed.

**Key Features:**
- **Patent Search**: Search patents by their ID or associated molecular properties.
- **Advanced Filtering**: Filter patents based on molecular structure, chemical formula, and other relevant fields.
- **Efficient Querying**: Use MongoDB's indexing and querying capabilities to retrieve patents quickly.
- **Data Model**: The system stores patents and associated molecules in a structured format, making it easy to extend and scale.

<hr>

## Mongo Documents Format

The MongoDB documents used by this project follow the structure below, which includes information about the molecule (using SMILES, InChI, etc.) and the associated patent IDs:

```json
{
"molecule": {
"smiles": "Brc1cc(-c2ccccc2)nc(-c2ccc3c4ccccc4c4ccccc4c3c2)c1",
"inchi": "InChI=1S/C29H18BrN/c30-21-17-28(19-8-2-1-3-9-19)31-29(18-21)20-14-15-26-24-12-5-4-10-22(24)23-11-6-7-13-25(23)27(26)16-20/h1-18H",
"inchikey": "UPAWJZOAEGLCFP-UHFFFAOYSA-N",
"sum_formula": "C29H18BrN",
"conf": 0.57
},
"patents": [
{"id": "US20200136057A1"},
{"id": "US20200136057"}
]
}
```

## Mongo Documents Format

- **molecule**: Contains the molecular data (SMILES, InChI, InChIKey, sum formula).
- **patents**: A list of patent IDs that are associated with the molecule.

<hr>

## Usage

### Installation

You can install the `better-moles-patent-finder` package via `pip` from PyPI or clone the repository to run locally:

#### Install from PyPI:
```bash
pip install better-moles-patent-finder
```

#### Basic Usage

Once installed, you can start querying patents using the provided API or Running as a Script

You can run the project as a script by passing a configuration file path:
```bash
better-moles-patent-finder --config-path /path/to/config_file.yaml
```

```python
from better_moles_patent_finder import PatentFinder

# Create a PatentFinder instance
pf = PatentFinder()

# Search for a patent by ID
result = pf.search_by_patent_id('US20200136057A1')

# Search for patents by molecule structure (SMILES)
result = pf.search_by_smiles('Brc1cc(-c2ccccc2)nc(-c2ccc3c4ccccc4c4ccccc4c3c2)c1')

# Print the result
print(result)
```

MongoDB Connection

Ensure MongoDB is running and accessible. The default connection string is configured in the project. You can modify it if necessary in the mongo_connector.py file.
```python
from better_moles_patent_finder import MongoConnector

# Connect to the MongoDB database
mongo = MongoConnector()
mongo.connect()

# Perform queries and operations
```

---
## License
This project is licensed under the terms of the GNU General Public License, Version 3.
Empty file added examples/__init__.py
Empty file.
13 changes: 13 additions & 0 deletions examples/configs_example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"MONGO_CONNECTION": {
"host": "",
"port": 27017,
"db_name": "",
"collection": "",
"username": "",
"password": "",
"auth_db": ""
},
"REPORT_FOLDER": "examples/outputs",
"DATA_FOLDER": "examples/data.csv"
}
29 changes: 29 additions & 0 deletions examples/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
version: '3.1'

services:
mongo:
image: mongo
container_name: molecules_mongo
restart: always
env_file:
- ./environment/.env
networks:
- mongo-network
ports:
- "27017:27017"

mongo-express:
image: mongo-express
container_name: molecules_mongo_express
restart: always
ports:
- "127.0.0.1:8081:8081"
env_file:
- ./environment/.env
networks:
- mongo-network

networks:
mongo-network:
driver: bridge

Binary file added icon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
42 changes: 42 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
[project]
name = "better-moles-patent-finder"
dynamic = ["version"] # This relies on git tags for versioning
authors = [
{ name = "Fabio Bove", email = "fabio.bove.dr@gmail.com" }
]
description = "A tool designed to enhance patent discovery by leveraging MongoDB for efficient storage, querying, and analysis of patent data. This repository includes features to streamline patent searches, improve retrieval accuracy, and support advanced filtering and indexing capabilities."
readme = "README.md"
license = { file = "LICENSE" }
keywords = ["molecules", "bioinformatics", "patent-finder", "molecules-evaluation", "mongo_db"]

# Specify the required dependencies with correct versions
dependencies = [
"numpy>=1.0.0,<2.0.0",
"pandas==2.2.3",
"pymongo==4.9",
"rdkit==2024.3.6",
"tqdm==4.67.1",
]

# Define optional dependencies like testing
[project.optional-dependencies]
test = [
"pytest==7.2.1",
"pytest-cov==5.0.0",
"pytest-mock==3.14.0",
]

[project.urls]
Homepage = "https://github.com/fabiobove-dr/better-moles-patent-finder"
Issues = "https://github.com/fabiobove-dr/better-moles-patent-finder/issues"

# Setuptools git versioning configuration
[tool.setuptools-git-versioning]
enabled = true

[build-system]
requires = ["setuptools>=61.0", "setuptools-git-versioning>=2.0,<3", "wheel"]
build-backend = "setuptools.build_meta"

[project.scripts]
better-moles-patent-finder = "mol_patents_find:main"
Empty file added src/__init__.py
Empty file.
Loading

0 comments on commit cc48816

Please sign in to comment.