This project enhances an RDF (Resource Description Framework) graph by generating and adding context-aware related terms, hypernyms, and speculative related terms using BERT-based models. The augmentation process is driven by natural language processing techniques to enrich the graph with semantically relevant information.
The primary goal of this project is to augment RDF graphs by leveraging the capabilities of pre-trained BERT models. The program parses a given RDF graph, identifies specific literals (like occupations), and enriches them with contextually relevant synonyms, hypernyms (types), and speculative related terms. This can be particularly useful in knowledge graph construction, ontology building, and other applications that require enhanced semantic relationships.
- Contextual Term Generation: Uses BERT to generate related terms based on the context of the existing triples in the RDF graph.
- Hypernym and Speculative Term Augmentation: Adds hypernyms and speculative related terms to the graph, enriching its semantic depth.
- POS Tagging for Validation: Ensures that only contextually valid triples are added by comparing the part of speech (POS) tags of terms.
- Customizable Augmentation: Allows the customization of the predicates and relation types to be augmented.
-
Clone the Repository:
git clone https://github.com/your-username/rdf-augmentation.git cd rdf-augmentation
-
Create and Activate a Virtual Environment (Optional but Recommended):
python3 -m venv venv source venv/bin/activate
-
Install Required Packages:
pip install -r requirements.txt
-
Prepare RDF Data: Modify the
rdf_data
string in the code to include your RDF triples in Turtle format. -
Run the Script: Execute the main script to process and augment your RDF graph:
python rdf_augmenter.py
-
View the Output: The augmented RDF graph will be serialized and printed in Turtle format.
The script follows these steps:
- RDF Parsing: The input RDF graph is parsed using
rdflib
. - BERT Model Initialization: The BERT model and tokenizer are loaded for generating context-aware embeddings.
- Contextual Term Generation: For each relevant literal in the RDF graph, the script generates context-aware synonyms, hypernyms, and speculative related terms.
- POS Validation: Ensures that newly generated terms are contextually appropriate based on their POS tags.
- Graph Augmentation: Adds the valid terms to the RDF graph using predefined predicates.
- Serialization: Outputs the augmented RDF graph in Turtle format.
The script includes logging to track the augmentation process:
- Logs are outputted to the console, detailing the terms generated and added to the RDF graph.
- Python 3.7+
rdflib
transformers
torch
sklearn
nltk
To install all dependencies, use the following command:
pip install -r requirements.txt
@article{martinez2022kgaugmentation,
author = {Jorge Martinez-Gil and
Shaoyi Yin and
Josef K{\"{u}}ng and
Franck Morvan},
title = {Knowledge Graph Augmentation for Increased Question Answering Accuracy},
journal = {Trans. Large Scale Data Knowl. Centered Syst.},
volume = {52},
pages = {70--85},
year = {2022},
url = {https://doi.org/10.1007/978-3-662-66146-8\_3},
doi = {10.1007/978-3-662-66146-8\_3}
}
This project is licensed under the MIT License.