This project implements a novel watermarking technique for graph databases, embedding ownership indicators within the graph structure using pseudo-nodes. The method ensures data integrity and provenance without compromising database performance. It is resilient to ownership attacks and provides an efficient way of verifying data authenticity through watermark extraction.
The system utilizes Neo4j for graph database management and is implemented in Python 3.12. The process involves creating a fake graph that mimics the original, and using a private key to match embedded pseudo-nodes, proving the authenticity of the data.
- Pseudo-Node Watermarking: Embeds ownership markers into the graph structure using pseudo-nodes, maintaining operational integrity.
- Attack Resilience: Protects against ownership attacks such as guessing and deletion attacks.
- Efficient Watermark Extraction: Verifies data ownership by matching pseudo-nodes with a private key, without compromising performance.
- Graph Database Verification: Uses Neo4j as the graph database for storing and querying data.
- Python 3.12 or higher
- Neo4j
- Libraries:
py2neo
pandas
hashlib
git clone https://github.com/yourusername/graph-watermarking.git
cd graph-watermarking
pip install -r requirements.txt
OR
python3.12 -m pipenv shell
pip install -r requirements.txt
-
Download and Install Neo4j
- Download Neo4j from its official website.
- Follow the installation instructions for your operating system.
-
Install Plugins
- Install the APOC plugin for advanced procedures.
- Enable the plugin by adding the following lines to the
neo4j.conf
file:dbms.security.procedures.unrestricted=apoc.* dbms.security.procedures.allowlist=apoc.*
-
Set Up the Database
- Start the Neo4j database server.
- Access the Neo4j browser at
http://localhost:7474
. - Set up the database credentials (default username:
neo4j
). Update the password as prompted.
-
Using Py2neo for Queries
- Install Py2neo using the following command:
pip install py2neo
- Update the connection details in your Python scripts to match your Neo4j database credentials.
- Example connection code:
from py2neo import Graph graph = Graph("neo4j://localhost:7687", auth=("<Your Username>", "<Your Password>"))
- Install Py2neo using the following command:
Please download the UKCompanies dataset and import in Neo4j by following the import instructions in the same link.
One the dataset import has been verified, please run the following script in the virtual environment:
python3 driver.py
- The script will first print the database summary and information useful for schema analysis in this project.
- The script will first prompt you to select the node types. Please select type "all" to watermark all node types.
- Then you will be prompted to choose the minimum group size and maximum group size for pseudo node generation.
- Now, for all node types, you must select the required and optional fields for the pseudo nodes. Remember, the required fields must be numerical. Optional fields can be numerical.
- After this, the program will watermark the pseudo nodes and insert them back into the original data. The script will print the Watermark Secret: Private Key K and the watermarked node IDs along with the hashed secret.
- Then you will be asked to generate the number of total nodes in suspected fake data.
- You will prompted for real-to-fake data ratio between 0 and 1. Here, the real data has the watermarked nodes.
- After this, the watermark validation script will run to search for watermarks in the suspected data.