Skip to content

Commit

Permalink
FEAT: Added import queries and updated steps in README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
DavidBakerEffendi committed Mar 4, 2020
1 parent 79ddea2 commit 4a670cb
Show file tree
Hide file tree
Showing 4 changed files with 90 additions and 12 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -128,4 +128,7 @@ dmypy.json
# Pyre type checker
.pyre/

.idea
.idea

# Yelp files
*.json
53 changes: 44 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,29 +4,64 @@ Contains Python scripts to import and model the Yelp challenge dataset into Neo4

## Getting Started

### Step 1:
### Step 1: Getting Neo4j Community Edition

Download the [Neo4j Community Edition](https://neo4j.com/download-thanks/?edition=community&release=4.0.1&flavour=unix) ZIP or tarball or start the Neo4j Docker container with the `docker-compose.yml` file.
Download the [Neo4j Community Edition](https://neo4j.com/download-thanks/?edition=community&release=4.0.1&flavour=unix)
ZIP or tarball or start the Neo4j Docker container with the `docker-compose.yml` file.

If using the ZIP or tarball, extract the archive to a directory e.g. `$HOME`:
```bash
$ tar -xvzf neo4j-community-4.0.1-unix.tar.gz -C ~/.
```
tar -xvzf neo4j-community-4.0.1-unix.tar.gz -C ~/.
```

### Step 2:
### Step 2: Server Plugins and Configuration
The `neo4j-community.x.x.x` directory (where `x.x.x` would be the version you are using) in this project contains details
on the files that need to be changed on the server. Note, the changes must be made on your server files and not on this
project! This project's directory is simply a demonstration.

The following files must be changed on the server:

* `plugins/apoc-x.x.x.x-all.jar`: Download the latest [APOC plugin](https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases) and place it under the plugins directory on your server.
* `conf/neo4j.conf`: This has been configured to whitelist the APOC functions we use in the import process.
* `import/{business, review, user}.json`: All Yelp files one wishes to import must be placed here. This script only
considers the three JSON files listed here.

### Step 3: Starting Neo4j

If not using the Docker image, start Neo4j using the `neo4j` binary in the `neo4j-community-4.0.1/bin` file. Example:
```bash
$ ~/neo4j-community-4.0.1/bin/neo4j start
```
~/neo4j-community-4.0.1/bin/neo4j start
Optionally, one can set their `~/.profile`, `~/.bashrc`, or `~/.zshrc` etc. to add Neo4j binaries to their path by adding the following line:
```bash
export PATH=/home/david/neo4j-community-4.0.1/bin:$PATH
```
Then one can simply use:
```bash
$ neo4j start
```
Note this needs to run with Oracle Java 11 or OpenJDK 11. I recommend using [AdoptOpenJDK 11](https://adoptopenjdk.net/installation.html?variant=openjdk11&jvmVariant=hotspot) and setting $JAVA_HOME to the location of the directory e.g. `export JAVA_HOME=/home/david/Downloads/jdk-11.0.5+10`.

Neo4j browser should now be running on `http://localhost:7474`. Default username and password is `neo4j` and `neo4j` respectively.

### Step 3:
### Step 4: Python Dependencies

Before running the `neo4j_yelp.py` script, make sure that you have installed all of the dependencies and edited `config.py` to contain your credentials. To download all the dependencies you can simply type:
```
pip3 install -r requirements.txt --user
```bash
$ pip3 install -r requirements.txt --user
```

### Step 5: Import the Dataset

Now that everything is configured and ready, the import script can be run with:
```bash
$ python3 neo4j_yelp.py
```
If all goes well, you will see the following output from the terminal:
```bash
[INFO] Clearing graph of any existing data
[INFO] Asserting schema
[INFO] Loading businesses
[INFO] Loading users
[INFO] Loading reviews
```
42 changes: 41 additions & 1 deletion neo4j_yelp.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,44 @@

graph = Graph(uri)

print(graph.evaluate("MATCH (tom {name: \"Tom Hanks\"}) RETURN tom"))
print("[INFO] Clearing graph of any existing data")
graph.evaluate("MATCH (n) DETACH DELETE n")

print("[INFO] Asserting schema")
graph.evaluate("CALL apoc.schema.assert({Category:['name']},{Business:['id'],User:['id'],Review:['id']})")

print("[INFO] Loading businesses")
graph.evaluate('CALL apoc.periodic.iterate("'
'CALL apoc.load.json(\'file:///business.json\') YIELD value RETURN value '
'"," '
'MERGE (b:Business{id:value.business_id}) '
'SET b += apoc.map.clean(value, [\'business_id\',\'categories\',\'address\',\'postal_code\'],[]) '
'WITH b,value.categories as categories '
'UNWIND categories as category '
'MERGE (c:Category{id:category}) '
'MERGE (b)-[:IN_CATEGORY]->(c)"'
',{batchSize: 10000, iterateList: true});')

print("[INFO] Loading users")
graph.evaluate('CALL apoc.periodic.iterate("'
'CALL apoc.load.json(\'file:///user.json\') '
'YIELD value RETURN value '
'"," '
'MERGE (u:User{id:value.user_id}) '
'SET u += apoc.map.clean(value, [\'friends\',\'user_id\'],[0]) '
'WITH u,value.friends as friends '
'UNWIND friends as friend '
'MERGE (u1:User{id:friend}) '
'MERGE (u)-[:FRIEND]-(u1) '
'",{batchSize: 100, iterateList: true});')

print("[INFO] Loading reviews")
graph.evaluate('CALL apoc.periodic.iterate("'
'CALL apoc.load.json(\'file:///review.json\') '
'YIELD value RETURN value '
'"," '
'MERGE (b:Business{id:value.business_id}) '
'MERGE (u:User{id:value.user_id}) '
'MERGE (u)-[r:REVIEWS]->(b) '
'SET r += apoc.map.clean(value, [\'business_id\',\'user_id\',\'review_id\'],[0])'
'",{batchSize: 10000, iterateList: true});')
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
py2neo
py2neo==4.3.0

0 comments on commit 4a670cb

Please sign in to comment.