The drive has the information about the complete project, without source code, and also some Elasticsearch indices.
There are a few locations for the new files. The import files for MongoDB and Elasticsearch have been added to narcisweb/bin/, together with This file creates the new author-country dataset. The rest of the code is written in jupyter notebooks and they can be found in notebooks/persistent/.
The import scripts can be run when the DOIBoost2017, DOIBoost2018 and GRID datasets have been downloaded and unzipped in /data/orginal/. These are very basic scripts based on and The script that creates the author-country dataset has been commented and all notebooks have extended documentation and comments. The notebooks should therefore be self explainatory.
Download NARCIS data from in the current folder (augmenting-narcis)
Get subset of publications from 2017 (NARCIS)
cat ./data/original/harvest.2019-05-15 |grep '"date": \["2017'|grep 'info:eu-repo/semantics/article'
All data should be uploaded in MongoDB and Elastic before you can use it. Run upload script first
docker-compose build
docker-compose up -d
docker-compose down
It will take about 3-4 hours when all data will be available in folder ./data
To get NARCIS infrastructure up and running execute commands
docker-compose build
docker-compose up
The infrastructure consists of the following components:
Please note that Jupyter notebook is running on port 8880. To access it go to and copy/paste the token shown as result of "docker-compose up" command.