NaturalSQL for Materials Science is a project that integrates natural language processing capabilities with AiiDA (Automated Interactive Infrastructure and Database for Computational Science) to enable researchers to query computational materials science data using plain English instead of complex SQL queries.
The project leverages AiiDA's powerful provenance tracking and database capabilities to make scientific data more accessible through natural language queries, helping researchers focus on science rather than database query syntax.
- Natural Language Queries: Query your AiiDA database using plain English
- Automatic SQL Generation: Converts natural language to optimized SQL queries
- PDF Report Generation: Creates comprehensive reports from query results
- AiiDA Integration: Works with AiiDA's provenance graph to provide context-aware results
- Materials Science Focus: Tailored for computational materials science terminology and workflows
- Python 3.8+
- AiiDA 2.5.0+ (2.6.0 recommended)
- PostgreSQL (for production use) or SQLite (for testing)
-
Clone the repository:
git clone https://github.com/your-username/NaturalSQL-for-Material-Science.git cd NaturalSQL-for-Material-Science
-
Set up a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install AiiDA:
pip install aiida-core
-
Install additional dependencies:
pip install -r requirements.txt
-
Configure AiiDA (if not already set up):
verdi setup
If you're starting from scratch, create and configure an AiiDA profile:
verdi quicksetup # For quick setup with default values
Or for more control:
verdi setup
Set up a compute resource:
verdi computer setup -L mycomputer -H localhost -T core.local -S core.direct -w /path/to/work/dir
verdi computer configure core.local mycomputer --safe-interval 0
Register computational codes:
verdi code create core.code.installed --label mycode --computer=mycomputer --default-calc-job-plugin plugin.name --filepath-executable=/path/to/executable
verdi run_workflow.py
The system can handle queries like:
- "Show me all calculations that failed last week"
- "Find structures with more than 50 atoms"
- "List all workflows related to band structure calculations"
- "Count the number of calculations per computer used"
- "What is the average calculation runtime for quantum espresso jobs?"
Reports are automatically generated when running queries and saved to the nl_query_reports
folder with timestamped filenames:
nl_query_report_YYYYMMDD_HHMMSS.pdf
To create your own custom natural language query workflows:
- Extend the base workflow in
nl_query_workflow.py
- Define your specific query patterns in the workflow
- Register your workflow in
workflow.py
- Execute using
nl_query_demo/run_workflow.py
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project builds upon the AiiDA framework, a workflow manager for computational science with a strong focus on provenance, performance, and extensibility.
Please cite the following when using this project:
-
S. P. Huber et al., "AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance", Scientific Data 7, 300 (2020); DOI: 10.1038/s41597-020-00638-4
-
M. Uhrin et al., "Workflows in AiiDA: Engineering a high-throughput, event-based engine for robust and modular computational workflows", Computational Materials Science 187, 110086 (2021); DOI: 10.1016/j.commatsci.2020.110086
For questions and support, please open an issue in the GitHub repository or contact the development team at mrebaal14@gmail.com.