This guide will help you to install and setup a production deploy of Knowledgehub CKAN portal on Ubuntu Server 18.04 or CentOS 7 (2003).
- Deployment setup
- Install CKAN
- Install Required CKAN Extensions
- Install Knowledhub Extension
- Configure Production Deployment
The installation and deployment is on a single machine. The following components will be installed and configured:
- PostgreSQL with PostGIS and Redis
- System provided packages, running as
systemd
services
- System provided packages, running as
- Solr, version 6.5.1
- Custom installation, running as
systemd
service
- Custom installation, running as
- CKAN installation with Knowledgehub portal extension
- Source install, running on top of Apache with
mod_wsgi
- Source install, running on top of Apache with
- CKAN Datapusher
- Source install, running on top of Apache with
mod_wsgi
- Source install, running on top of Apache with
- Scheduled periodic jobs - cron
- CKAN Workers
- Running as
systemd
services
- Running as
- Apache2
- to handle CKAN and Datapusher via
mod_wsgi
- to handle CKAN and Datapusher via
- Nginx
- Reverse proxy to Knowledgehub portal
Full installation guide documentation is available on CKAN docs:
- Installing CKAN
- Deployment
- Installing CKAN on CentOS 7 - as a reference, because some modifications are required.
Before we start, we must install the required system packages. These will install PostgreSQL (+Postgis), Redis, JDK for running Solr and git so we can fetch the source code of CKAN and the extensions.
Ubuntu Server 18.04
sudo apt install postgresql libpq-dev python-pip python-virtualenv git redis-server openjdk-8-jdk postgis
CentoOS 7
sudo yum install -y python-devel postgresql-server postgresql-contrib python-pip python-virtualenv postgresql-devel git redis postgis wget lsof policycoreutils-python java-1.8.0-openjdk
sudo yum groupinstall 'Development Tools'
Once installed, we must initialize PostgreSQL database - otherwise the service will refuse to start.
sudo postgresql-setup initdb
sudo systemctl start postgresql
# Let it start always on system start
sudo systemctl enable postgresql
Also enable and start other service, they are not enabled by default on Cent OS:
sudo systemctl enable redis
sudo systemctl enable httpd
sudo systemctl enable nginx
sudo systemctl start redis
sudo systemctl start httpd
sudo systemctl start nginx
There are two main routes we can take here - either we do a source install (as in this guide), or we do package install. We're going down the source install path, as it gives more flexibility when installed on different platforms.
From here on, for a source install, we might do one the following:
- We can set up CKAN in its own home directory, that will contain the source code and all configurations. This is better
for development and bug fixing, as we have everything in a single home dir. The Ubuntu setup closely matches this type of
installation, although it uses a special user called
ckan
. - Or we can set up a special user
ckan
, that has no login shell, has less privileges and its home is in/usr/lib/ckan
. This is more desireable for a production setup (although the first option works just as well), especially on CentOS, where we have strict policies (selinux) that will prevent us from deploying from a regular user home. This is the option taken for CentOS.
Either of the above options would work equally well for Ubuntu Server setup, although the second one is recommended.
Ubuntu Server (option one)
First let's create a user for CKAN. This will create a regular user (option one) in /home
:
sudo useradd -m -s /bin/bash ckan
Then switch to that user:
sudo -i -u ckan
And create the folders:
mkdir -p ~/ckan/lib
mkdir -p ~/ckan/etc
Exit to the user with sudo privileges, then create the necessary soft links:
sudo ln -s /home/ckan/ckan/lib /usr/lib/ckan
sudo ln -s /home/ckan/ckan/etc /etc/ckan
Create new Python virtualenv:
sudo mkdir -p /usr/lib/ckan/default
sudo chown ckan /usr/lib/ckan/default # Important to make 'ckan' the owner of the virtualenv
# Then switch to 'ckan' user and create the virtualenv
sudo -i -u ckan
virtualenv --no-site-packages /usr/lib/ckan/default
Then activate the virtualenv. From this point on, make sure you have the virtualenv activated when executing the commands. Important Note: Make sure you have activated the virutalenv with user ckan.
. /usr/lib/ckan/default/bin/activate
CentOS 7 & Ubuntu Server (option two)
This creates a special user (option two), with home at /var/lib/ckan
. This directory will contain all of the CKAN installation data.
We also must create another directory in /etc/ckan
, that will hold configurations.
First, let's create the user:
useradd -m -s /sbin/nologin -d /usr/lib/ckan -c "CKAN User" ckan
Then give 755
permission to the whole directory, so Apache can access it later on:
chmod 755 /usr/lib/ckan
Create the configuration directories and change ownership:
sudo mkdir -p /etc/ckan/default
sudo chown -R ckan /etc/ckan
Now, create the storage directory for CKAN uploads:
sudo mkdir /usr/lib/ckan/data
sudo chown apache:apache /usr/lib/ckan/data
sudo chmod 755 /usr/lib/ckan/data
Then, we must add the correct permission labels for selinux
:
sudo semanage fcontext -a -t httpd_sys_rw_content_t "/usr/lib/ckan/data(/.*)?"
sudo restorecon -R /usr/lib/ckan/data
Switch to ckan user and create virtualenv:
sudo su -s /bin/bash - ckan
cd /usr/lib/ckan/
virtualenv --no-site-packages default
Then you can activate the virtualenv:
. /usr/lib/ckan/default/bin/activate
We're installing CKAN from source. This means that we're going to use pip
to install CKAN in the virtualenv that
we created above. Also, we're installing with -e
option, meaning that the unpacked files will be placed under
src/
directory in the virtualenv, instead mixed in with the other Python site packages.
First make sure you have the correct version of setuptools
:
pip install setuptools==36.1
Then, install CKAN source:
pip install -e 'git+https://github.com/ckan/ckan.git@ckan-2.8.2#egg=ckan'
Important Note: We're using version
2.8.2
, so make sure you have the correct version in the Git url.
Install the required Python packages for CKAN:
pip install -r /usr/lib/ckan/default/src/ckan/requirements.txt
Deactivate and reactivate your virtualenv, to make sure you’re using the virtualenv’s copies of commands like paster rather than any system-wide installed copies:
deactivate
. /usr/lib/ckan/default/bin/activate
First go back and login with the sudoer user - we're going to need to run commands and super user.
Check that PostgreSQL is running properly:
sudo -u postgres psql -l
Check that the encoding of databases is UTF8
, if not internationalization may be a problem. Since changing the encoding of PostgreSQL may mean deleting existing databases, it is suggested that this is fixed before continuing with the CKAN install.
Now, create the user for CKAN, and enter the user password when prompted for it:
sudo -u postgres createuser -S -D -R -P ckan_default
Finally, enable the PostGIS extension on the ckan_default database:
sudo -i -u postgres
Create the database:
sudo -u postgres createdb -O ckan_default ckan_default -E utf-8
Install the extension via psql:
psql -d ckan_default
In psql console, run the SQL expressions create extension postgis
and to check that it went well, run select PostGIS_version()
:
create extension postgis;
select PostGIS_version();
You should see output like this:
postgis_version
---------------------------------------
2.4 USE_GEOS=1 USE_PROJ=1 USE_STATS=1
(1 row)
This creates the Data Store database and creates two types of access to it: read-only and read-write.
First create the user that has read-only access to the datastore:
sudo -u postgres createuser -S -D -R -P -l datastore_default
Then create the database, owned by ckan_default
, as created in the previous step (this user haw write privileges):
sudo -u postgres createdb -O ckan_default datastore_default -E utf-8
Note that this is a bit different from the official CKAN installation gude. We install Solr manually to get the same version across different distributions instead relying on the provided version. Different distros provide varying versions of Solr, usually deployed differently on each. We want to make sure we have the same version everywhere. If you already have a Solr setup deployed (single instance or Solr Cloud), you can skip the installation part, and configure a Solr Core (or collection) with the provided Solr schema.
Create a user for Solr, and a directory where we're going to install it:
sudo mkdir -p /opt/apps/solr
sudo useradd -m -d /opt/apps/solr -s /bin/bash solr
sudo chown -R solr /opt/apps/solr
Switch to user solr and to the installation location:
sudo -i -u solr
cd /opt/apps/solr
Download version 6.5.1 of Solr and unpack it:
wget https://archive.apache.org/dist/lucene/solr/6.5.1/solr-6.5.1.tgz
tar xzvf solr-6.5.1.tgz
mv solr-6.5.1 6.5.1 # Rename to the version so we get nice path `/opt/apps/solr/6.5.1`
rm solr-6.5.1.tgz # Remove the downloaded archive
We're going to use systemd
service to manage the state of Solr, thus integrating it as a system service.
First we need to create a Service file.
Create new file at /etc/systemd/system/solr.service
sudo touch /etc/systemd/system/solr.service
sudo chmod 664 /etc/systemd/system/solr.service
Add the following content of the file (run the editor in sudo mode - for example sudo vim /etc/systemd/system/solr.service
):
[Unit]
Description=Apache Solr
After=network.target
[Service]
Type=forking
ExecStart=/opt/apps/solr/6.5.1/bin/solr start
ExecReload=/opt/apps/solr/6.5.1/bin/solr restart
ExecStop=/opt/apps/solr/6.5.1/bin/solr stop
User=solr
[Install]
WantedBy=multi-user.target
Save and close.
Enable the systemd service - this will run Solr on startup:
sudo systemctl enable solr
Then run the service:
sudo systemctl start solr
To confirm that it is running:
sudo systemctl status solr
● solr.service - Apache Solr
Loaded: loaded (/etc/systemd/system/solr.service; disabled; vendor preset: disabled)
Active: active (running) since нед 2020-08-09 14:21:48 CEST; 4s ago
Process: 25527 ExecStart=/opt/apps/solr/6.5.1/bin/solr start (code=exited, status=0/SUCCESS)
Main PID: 25601 (java)
CGroup: /system.slice/solr.service
└─25601 java -server -Xms512m -Xmx512m -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMark...
авг 09 14:21:42 centos-srv systemd[1]: Starting Apache Solr...
авг 09 14:21:48 centos-srv solr[25527]: [134B blob data]
авг 09 14:21:48 centos-srv solr[25527]: Started Solr server on port 8983 (pid=25601). Happy searching!
авг 09 14:21:48 centos-srv systemd[1]: Started Apache Solr.
To create CKAN Core, on a local installation, run:
sudo -u solr /opt/apps/solr/6.5.1/bin/solr create_core -c ckan
This will create a new Solr Core, with name ckan.
Then we must configure the core to use the specific CKAN configuration.
For this step, you'll need the config file from ckanext-knowledgehub
. You can get these either from the github master branch, or from the local repository clone (see the steps for installing ckanext-knowledgehub).
sudo -i -u solr
cd /opt/apps/solr/6.5.1/server/solr/ckan/conf/
wget https://raw.githubusercontent.com/keitaroinc/ckanext-knowledgehub/master/ckanext/knowledgehub/schema.xml
Then edit the configuration file solrconfig.xml
.
First set the Schema Factory to classic factory. This disables the automatic schema update, as we provide the schema ourselves.
Add the following to the main section:
<!-- Classic Schema Factory-->
<schemaFactory class="ClassicIndexSchemaFactory"/>
just above <codecFactory class="solr.SchemaCodecFactory"/>
.
Then comment out the following processor configuration:
<!--processor class="solr.AddSchemaFieldsUpdateProcessorFactory">
<str name="defaultFieldType">strings</str>
<lst name="typeMapping">
<str name="valueClass">java.lang.Boolean</str>
<str name="fieldType">booleans</str>
</lst>
<lst name="typeMapping">
<str name="valueClass">java.util.Date</str>
<str name="fieldType">tdates</str>
</lst>
<lst name="typeMapping">
<str name="valueClass">java.lang.Long</str>
<str name="valueClass">java.lang.Integer</str>
<str name="fieldType">tlongs</str>
</lst>
<lst name="typeMapping">
<str name="valueClass">java.lang.Number</str>
<str name="fieldType">tdoubles</str>
</lst>
</processor-->
Alternatively, you can just copy the provided solrconfig.xml file from the docs.
Save the changes, and restart Solr:
sudo systemctl restart solr
Create the directory that will contain the configuration files (in /etc
):
sudo mkdir -p /etc/ckan/default
sudo chown -R ckan /etc/ckan/
# This is for Ubuntu Only
sudo chown -R ckan /home/ckan/ckan/etc
Switch to ckan
user, and activate the virtualenv:
sudo -i -u ckan
sudo su -s /bin/bash - ckan # Centos
. /usr/lib/ckan/default/bin/activate
Create the storage directory for uploaded files (make sure CKAN has write privileges to this dir):
mkdir /home/ckan/data
On CentOS:
mkdir /usr/lib/ckan/data
Create the CKAN config file:
paster make-config ckan /etc/ckan/default/production.ini
Once the file is generated, we need to edit the file and set up these basic values:
sqlalchemy.url
must point to the CKAN database that was created in the previous step:
sqlalchemy.url = postgresql://ckan_default:pass@localhost/ckan_default?sslmode=disable
ckan.site_id=unhcr_knowledgehub
- Each site should have a unique ID.ckan.site_url = http://knowledgehub.unhcr.org
- the real URL of the site, as it will be accessed later on. This is used to generate URLs inside CKAN.
DataStore DB urls:
ckan.datastore.write_url
- This is the read-write access to the datastore database. Use the CKAN user here.
ckan.datastore.write_url = postgresql://ckan_default:pass@localhost/datastore_default
ckan.datastore.read_url
- Read-only access to the datastore database here (created in the previous step):
ckan.datastore.read_url = postgresql://datastore_default:pass@localhost/datastore_default
solr_url
- The url to the configured Solr with CKAN core.
solr_url = http://127.0.0.1:8983/solr/ckan
Link to who.ini
file:
ln -s /usr/lib/ckan/default/src/ckan/who.ini /etc/ckan/default/who.ini
This step is important to be performed immeditely after installing CKAN, before any additional extensions are installed.
This installs the core CKAN tables. If additional estensions are enabled at this point, there is a possibility that those plugins would require the base tables to be installed and may cause an error to be thrown.
To create the core tables, run (make sure the virtualenv is activated):
. /usr/lib/ckan/default/bin/activate
cd /usr/lib/ckan/default/src/ckan
paster db init -c /etc/ckan/default/production.ini
You should get:
Initialising DB: SUCCESS
On CentOS 7 You will need to update the pg_hba.conf
file to enable password authentication from localhost.
Edit file /var/lib/pgsql/data/pg_hba.conf
to look like this:
# TYPE DATABASE USER ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all peer
# IPv4 local connections:
host all all 127.0.0.1/32 md5
# IPv6 local connections:
host all all ::1/128 md5
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication postgres peer
host replication postgres 127.0.0.1/32 md5
host replication postgres ::1/128 md5
Then save and restart PostgreSQL:
sudo systemctl restart postgresql
CKAN provides paster command that generates the SQL for setting permissions on the datastore database.
You can pipe the SQL to psql
connected to the database, by executing the following command:
paster --plugin=ckan datastore set-permissions -c /etc/ckan/default/production.ini | psql -h localhost -U ckan_default -d ckan_default
Run CKAN in development mode with paster:
. /usr/lib/ckan/default/bin/activate
cd /usr/lib/ckan/default/src/ckan
paster serve /etc/ckan/default/production.ini
There should be no error in the output of this command.
Then open the CKAN url in browser (still running on port 5000): http://<server-ip>:5000/
You should see the CKAN welcome page.
CentOS7 Note
You may need to add the port to the firewall.
sudo firewall-cmd --zone=public --add-port=5000/tcp --permanent sudo firewall-cmd --reload
Knowledgehub depends on a couple of CKAN Extensions:
Switch to user ckan
, activate the virtualenv then install the extension using pip
:
sudo -i -u ckan
sudo su -s /bin/bash - ckan # Centos
. /usr/lib/ckan/default/bin/activate
pip install --no-cache-dir -e "git+https://github.com/frictionlessdata/ckanext-validation.git#egg=ckanext-validation"
cd /usr/lib/ckan/default/
echo "pyrsisten==0.16.0" >> src/ckanext-validation/requirements.txt
pip install -r src/ckanext-validation/requirements.txt
Validation extenstion requires database initialization. So after the installation run:
cd src/ckanext-validation/
paster validation init-db -c /etc/ckan/default/production.ini
Edit the production configuration file (/etc/ckan/default/production.ini
) and add the validation plugin:
ckan.plugins = recline_view validation stats
Save, then start CKAN with paster:
paster serve /etc/ckan/default/production.ini
If not already switched, then switch to user ckan
and activate the virtualenv:
sudo -i -u ckan
sudo su -s /bin/bash - ckan # Centos
. /usr/lib/ckan/default/bin/activate
Install the extension using pip
:
pip install --no-cache-dir -e "git+https://github.com/keitaroinc/ckanext-datarequests.git@kh_stable#egg=ckanext-datarequests"
Edit the production configuration file (/etc/ckan/default/production.ini
) and add the validation plugin:
ckan.plugins = recline_view validation stats datarequests
Test the installation:
paster serve /etc/ckan/default/production.ini
Note
Due to a bug in
ckanext-datarequests
, you need to installhumanize
package manually at this step:pip install humanize==1.0.0
If not already switched, then switch to user ckan
and activate the virtualenv:
sudo -i -u ckan
sudo su -s /bin/bash - ckan # Centos
. /usr/lib/ckan/default/bin/activate
Install the extension using pip
:
pip install --no-cache-dir -e "git+https://github.com/keitaroinc/ckanext-oauth2.git@kh_stable#egg=ckanext-oauth2"
OAuth2 requires additional configuration to run properly. You need to have set up application and have generated keys and tokens for the OAuth2 protocol. Then those values have to be set up in the production.ini
file.
Open /etc/ckan/default/production.ini
and add these values (replace your real production keys and URLs):
# OAuth2 settings
ckan.oauth2.register_url = https://YOUR_OAUTH_SERVICE/users/sign_up
ckan.oauth2.reset_url = https://YOUR_OAUTH_SERVICE/users/password/new
ckan.oauth2.edit_url = https://YOUR_OAUTH_SERVICE/settings
ckan.oauth2.authorization_endpoint = https://YOUR_OAUTH_SERVICE/authorize
ckan.oauth2.token_endpoint = https://YOUR_OAUTH_SERVICE/token
ckan.oauth2.profile_api_url = https://YOUR_OAUTH_SERVICE/user
ckan.oauth2.client_id = YOUR_CLIENT_ID
ckan.oauth2.client_secret = YOUR_CLIENT_SECRET
ckan.oauth2.scope = profile other.scope
ckan.oauth2.rememberer_name = auth_tkt
ckan.oauth2.profile_api_user_field = JSON_FIELD_TO_FIND_THE_USER_IDENTIFIER
ckan.oauth2.profile_api_fullname_field = JSON_FIELD_TO_FIND_THE_USER_FULLNAME
ckan.oauth2.profile_api_mail_field = JSON_FIELD_TO_FIND_THE_USER_MAIL
ckan.oauth2.authorization_header = OAUTH2_HEADER
Refer to the OAuth2 setup document for exact values for these fields.
Add the oauth2
plugin to the list of plugins:
ckan.plugins = recline_view validation stats datarequests oauth2
Test the istallation by running paster:
paster serve /etc/ckan/default/production.ini
Edit the production config file (/etc/ckan/default/production.ini
), and add datastore
to the list of plugins (the order is important):
ckan.plugins = recline_view validation stats datastore datarequests oauth2
Remember to also check if the database URLs are set up properly as well:
ckan.datastore.write_url = postgresql://ckan_default:pass@localhost/datastore_default
ckan.datastore.read_url = postgresql://datastore_default:pass@localhost/datastore_default
Save, then test the installation running paster
:
paster serve /etc/ckan/default/production.ini
Switch to user ckan
then activate the virtualenv:
# Ubuntu
sudo -i -u ckan
. /usr/lib/ckan/default/bin/activate
# Centos
sudo su -s /bin/bash - ckan
. /usr/lib/ckan/default/bin/activate
Install the extension and required Python packages with pip
:
pip install --no-cache-dir -e "git+https://github.com/keitaroinc/ckanext-knowledgehub.git#egg=ckanext-knowledgehub"
# Due to some of the packages requiring newer version of setuptools, we need to uninstall, then reinstall setuptools
pip uninstall setuptools -y
pip install setuptools
pip install --no-cache-dir -r "/usr/lib/ckan/default/src/ckanext-knowledgehub/requirements.txt"
Due to mismatch of PostgreSQL driver (psycopg2) version used in CKAN and HDX API library, we must reinstall the correct version:
pip uninstall psycopg2-binary -y pip uninstall psycopg2 -y pip install --no-cache-dir psycopg2==2.7.3.2
Download language model for Spacy:
python -m spacy download en_core_web_sm
Make sure you're swtched to user ckan
and the virtualenv is activated.
To initialize knowledgehub database tables:
knowledgehub -c /etc/ckan/default/production.ini db init
Then we need to add the extension to the list of CKAN plugins.
Open /etc/ckan/default/production.ini
and add knowledgehub
to the list of plugins:
ckan.plugins = recline_view validation knowledgehub stats datastore datapusher datarequests oauth2
Then we need to configure some basic properties. In the same file:
# HDX API keys
ckanext.knowledgehub.hdx.api_key = <HDX_API_KEY>
ckanext.knowledgehub.hdx.site = test
ckanext.knowledgehub.hdx.owner_org = <HDX_OWNER_ORG_ID>
ckanext.knowledgehub.hdx.dataset_source = knowledgehub
ckanext.knowledgehub.hdx.maintainer = <HDX_USER_NAME>
# SMTP Server config
smtp.server = <SMTP_SERVER_URL>
smtp.starttls = True
smtp.user = <SMTP_SERVER_USERNAME>
smtp.password = <SMTP_SERVER_PASSWORD>
smtp.mail_from = <KNOWLEDGEHUB_PORTAL_EMAIL>
Where:
- HDX_API_KEY - is the HDX Portal API Key
- HDX_OWNER_ORG_ID - is the registered organization on HDX Portal (the id)
- HDX_USER_NAME - the username of the maintainer on HDX Portal (for Knowledgehub)
- SMTP_SERVER_URL - the URL of your SMTP relay server
- SMTP_SERVER_USERNAME - SMTP username
- SMTP_SERVER_PASSWORD - SMTP password
- KNOWLEDGEHUB_PORTAL_EMAIL - the official email address of Knowledgehub portal.
There are other configuration options that can be used to fine-tune different parts of the portal. To configure these please check the README file on Github.
Restart apache and nginx, then check the site (if apache2 and ngix aleady set up, otherwise test with paster serve
).
# Ubuntu
sudo systemctl restart apache2
sudo systemctl restart nginx
# Centos
sudo systemctl restart httpd
sudo systemctl restart nginx
- Install Apache, nginx and related libraries
Ubuntu Server:
sudo apt-get install -y nginx apache2 libapache2-mod-wsgi libapache2-mod-rpaf libpq5
CentOS 7:
sudo yum install -y nginx httpd mod_wsgi
Note that
mod_rpaf
is not available in the stadard CentOS repositories. We'll skip for this installation.
- Create wsgi file
/etc/ckan/default/apache.wsgi
:
import os
activate_this = os.path.join('/usr/lib/ckan/default/bin/activate_this.py')
execfile(activate_this, dict(__file__=activate_this))
from paste.deploy import loadapp
config_filepath = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'production.ini')
from paste.script.util.logging_config import fileConfig
fileConfig(config_filepath)
application = loadapp('config:%s' % config_filepath)
Make the user ckan
owner of the file:
sudo chown ckan /etc/ckan/default/apache.wsgi
- Create the Apache config file at
/etc/apache2/sites-available/ckan_default.conf
:
Ubuntu Only
<VirtualHost 127.0.0.1:8080>
ServerName knowledhehub.ubuntu.unhcr.org
ServerAlias www.knowledhehub.ubuntu.unhcr.org
WSGIScriptAlias / /etc/ckan/default/apache.wsgi
# Pass authorization info on (needed for rest api).
WSGIPassAuthorization On
# Deploy as a daemon (avoids conflicts between CKAN instances).
WSGIDaemonProcess ckan_default display-name=ckan_default processes=2 threads=15
WSGIProcessGroup ckan_default
ErrorLog /var/log/apache2/ckan_default.error.log
CustomLog /var/log/apache2/ckan_default.custom.log combined
<IfModule mod_rpaf.c>
RPAFenable On
RPAFsethostname On
RPAFproxy_ips 127.0.0.1
</IfModule>
<Directory />
Require all granted
</Directory>
</VirtualHost>
Replace knowledhehub.ubuntu.unhcr.com
with the actual doman name of the site.
CentOS 7:
Centos locations are different from Ubuntu. We're going to create new VirtualHost definition in /etc/httpd/conf.d/knowledhehub.centos.unhcr.org.conf
.
First, remove the welcome page from the apache server:
sudo mv /etc/httpd/conf.d/welcome.conf /etc/httpd/conf.d/welcome.conf_backup
Then create the file: /etc/httpd/conf.d/knowledhehub.centos.unhcr.org.conf
with the following content:
<VirtualHost 127.0.0.1:8080>
ServerName knowledhehub.centos.unhcr.org
ServerAlias www.knowledhehub.centos.unhcr.org
WSGIScriptAlias / /etc/ckan/default/apache.wsgi
# Pass authorization info on (needed for rest api).
WSGIPassAuthorization On
# Deploy as a daemon (avoids conflicts between CKAN instances).
WSGIDaemonProcess ckan_default display-name=ckan_default processes=2 threads=15
WSGIProcessGroup ckan_default
ErrorLog /var/log/httpd/ckan_default.error.log
CustomLog /var/log/httpd/ckan_default.custom.log combined
<IfModule mod_rpaf.c>
RPAFenable On
RPAFsethostname On
RPAFproxy_ips 127.0.0.1
</IfModule>
<Directory />
Require all granted
</Directory>
# Add this to avoid Apache show error:
# "AH01630: client denied by server configuration: /etc/ckan/default/apache.wsgi"
<Directory /etc/ckan/default>
Options All
AllowOverride All
Require all granted
</Directory>
</VirtualHost>
Remember to replace
knowledhehub.centos.unhcr.org
with the correct domain name both in the config file and as the file name.
Site is enabled by default.
Reload the service:
sudo systemctl restart httpd
Open http port through firewall
Run the following command:
sudo firewall-cmd --permanent --add-service=http
- Modify the Apache ports file
/etc/apache2/ports.conf
. We want the apache to listen to port 8080 instead of port 80.
On CentOS edit file
/etc/httpd/conf/httpd.conf
Change
Listen 80
To
Listen 8080
- Create
nginx
configuration file at/etc/nginx/sites-available/ckan
:
proxy_cache_path /tmp/nginx_cache levels=1:2 keys_zone=cache:30m max_size=250m;
proxy_temp_path /tmp/nginx_proxy 1 2;
server {
client_max_body_size 100M;
location / {
proxy_pass http://127.0.0.1:8080/;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
proxy_cache cache;
proxy_cache_bypass $cookie_auth_tkt;
proxy_no_cache $cookie_auth_tkt;
proxy_cache_valid 30m;
proxy_cache_key $host$scheme$proxy_host$request_uri;
# In emergency comment out line to force caching
# proxy_ignore_headers X-Accel-Expires Expires Cache-Control;
}
}
CentOS
We must create the config directories sites-available
and sites-enabled
:
sudo mkdir /etc/nginx/sites-available /etc/nginx/sites-enabled
Edit /etc/nginx/nginx.conf
, and add the following inside the http
block:
include /etc/nginx/sites-enabled/*.conf
server_names_hash_bucket_size 64;
or use the following configuration if only using it for CKAN deployment:
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
# Load dynamic modules. See /usr/share/doc/nginx/README.dynamic.
include /usr/share/nginx/modules/*.conf;
events {
worker_connections 1024;
}
http {
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
include /etc/nginx/mime.types;
default_type application/octet-stream;
# Load modular configuration files from the /etc/nginx/conf.d directory.
# See http://nginx.org/en/docs/ngx_core_module.html#include
# for more information.
include /etc/nginx/conf.d/*.conf;
# Load enabled sites.
include /etc/nginx/sites-enabled/*;
server_names_hash_bucket_size 64;
}
Add the following configuration at /etc/nginx/sites-available/ckan.conf
:
proxy_cache_path /tmp/nginx_cache levels=1:2 keys_zone=cache:30m max_size=250m;
proxy_temp_path /tmp/nginx_proxy 1 2;
server {
client_max_body_size 100M;
location / {
proxy_pass http://127.0.0.1:8080/;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
proxy_cache cache;
proxy_cache_bypass $cookie_auth_tkt;
proxy_no_cache $cookie_auth_tkt;
proxy_cache_valid 30m;
proxy_cache_key $host$scheme$proxy_host$request_uri;
# In emergency comment out line to force caching
# proxy_ignore_headers X-Accel-Expires Expires Cache-Control;
}
}
Enable the site:
sudo ln -s /etc/nginx/sites-available/ckan.conf /etc/nginx/sites-enabled/ckan.conf
Restart nginx:
sudo systemctl restart nginx
If SE Linux, you must allow nginx to connect to upstream server. Execute:
sudo setsebool httpd_can_network_connect on -P
- Enable CKAN site with Apache and nginx
Ubuntu Server
sudo a2ensite ckan_default
sudo a2dissite 000-default
sudo rm -vi /etc/nginx/sites-enabled/default
sudo ln -s /etc/nginx/sites-available/ckan /etc/nginx/sites-enabled/ckan_default
sudo service apache2 reload
sudo service nginx reload
CentOS
sudo systemctl restart httpd
sudo systemctl restart nginx
- Install required libraries
sudo apt install build-essential libxslt1-dev libxml2-dev git libffi-dev
CentOS
sudo yum install -y libxslt-devel libxml2-devel libffi-devel
- switch to user ckan
sudo -i -u ckan # Ubuntu
sudo su -s /bin/bash - ckan # Centos
- create a virtualenv for datapusher
virtualenv /usr/lib/ckan/datapusher
- create a source directory and switch to it
mkdir /usr/lib/ckan/datapusher/src
cd /usr/lib/ckan/datapusher/src
- clone the stable version of datapusher
git clone -b 0.0.16 https://github.com/ckan/datapusher.git
- install the requirements and datapusher itself
. /usr/lib/ckan/datapusher/bin/activate
cd datapusher
pip install -r requirements.txt
python setup.py develop
- Deploy datapusher to run with apache mod_wsgi
This assumes that Apache has been set up. If not please read the section Deployment - Installing Apache and nginx at the bottom of this document.
Ubuntu Server First copy the site configuration for the datapusher:
sudo cp /usr/lib/ckan/datapusher/src/datapusher/deployment/datapusher.conf /etc/apache2/sites-available/datapusher.conf
Edit the datapusher.conf
file and add permissions for /etc/ckan
directory. The full file should look like this:
<VirtualHost 0.0.0.0:8800>
ServerName ckan
# this is our app
WSGIScriptAlias / /etc/ckan/datapusher.wsgi
# pass authorization info on (needed for rest api)
WSGIPassAuthorization On
# Deploy as a daemon (avoids conflicts between CKAN instances)
WSGIDaemonProcess datapusher display-name=demo processes=1 threads=15
WSGIProcessGroup datapusher
ErrorLog /var/log/apache2/datapusher.error.log
CustomLog /var/log/apache2/datapusher.custom.log combined
<Directory /etc/ckan>
Options All
AllowOverride All
Require all granted
</Directory>
</VirtualHost>
Copy the provided WSGI script:
sudo cp /usr/lib/ckan/datapusher/src/datapusher/deployment/datapusher.wsgi /etc/ckan/
Copy the default setting script:
sudo cp /usr/lib/ckan/datapusher/src/datapusher/deployment/datapusher_settings.py /etc/ckan/
Then add the port 8800
to the Apache ports file:
sudo vim /etc/apache2/ports.conf
and add:
Listen 8800
Enable the datapusher site, then reload:
sudo a2ensite datapusher
sudo systemctl restart apache2
Test the setup by calling:
curl http://localhost:8800
You should get:
{"help":"\n Get help at:\n http://ckan-service-provider.readthedocs.org/."}
CentOS
First copy the site configuration for the datapusher:
sudo cp /usr/lib/ckan/datapusher/src/datapusher/deployment/datapusher.conf /etc/httpd/conf.d/datapusher.conf
Edit the datapusher.conf
file and add permissions for /etc/ckan
directory. The full file should look like this:
<VirtualHost 0.0.0.0:8800>
ServerName ckan
# this is our app
WSGIScriptAlias / /etc/ckan/datapusher.wsgi
# pass authorization info on (needed for rest api)
WSGIPassAuthorization On
# Deploy as a daemon (avoids conflicts between CKAN instances)
WSGIDaemonProcess datapusher display-name=demo processes=1 threads=15
WSGIProcessGroup datapusher
ErrorLog /var/log/httpd/datapusher.error.log
CustomLog /var/log/httpd/datapusher.custom.log combined
<Directory /etc/ckan>
Options All
AllowOverride All
Require all granted
</Directory>
</VirtualHost>
Copy the provided WSGI script:
sudo cp /usr/lib/ckan/datapusher/src/datapusher/deployment/datapusher.wsgi /etc/ckan/
Copy the default settings script:
sudo cp /usr/lib/ckan/datapusher/src/datapusher/deployment/datapusher_settings.py /etc/ckan/
Then add the port 8800
to the Apache config file (/etc/httpd/conf/httpd.conf
):
sudo vim /etc/httpd/conf/httpd.conf
Find the Listen
directive and add another one (bellow the one for 8080
):
Listen 8800
Change SELinux policy to allow the Apache to bind to port 8800
:
sudo semanage port -a -t http_port_t -p tcp 8800
Reload apache:
sudo systemctl restart httpd
Test the setup by calling:
curl http://localhost:8800
You should get:
{"help":"\n Get help at:\n http://ckan-service-provider.readthedocs.org/."}
Add datapusher
to the list of active plugins in /etc/ckan/default/production.ini
:
ckan.plugins = recline_view validation stats datastore datapusher datarequests oauth2
Note the order of plugins must be as specified above.
In the same file, set the datapusher URL:
ckan.datapusher.url = http://127.0.0.1:8800/
Save the file, then restart Apache:
# Ubuntu
sudo systemctl restart apache2
# Centos
sudo systemctl restart httpd
Make sure you're using ckan
user and have the virtualenv activated.
Then to add new admin user:
cd /usr/lib/ckan/default/src/ckan
# Add the user
paster --plugin=ckan user add ckan_admin email=ckan_admin@knowledgehb.org -c /etc/ckan/default/production.ini
# Set the user `ckan_admin` as sysadmin
paster --plugin=ckan sysadmin add ckan_admin -c /etc/ckan/default/production.ini
Then try to login with the set username and password.
There are a couple of jobs that should be run periodically:
- predictive search ML model training - runs once per day, or once per week
- user intents extraction job - at least once per day
For adding the cron jobs, first make sure that cron is installed:
crontab -l
If not installed, install it:
# Ubuntu
sudo apt install cron
# Centos
sudo yum install cron
Create directory that will hold the logs for the scheduled CKAN jobs:
sudo mkdir /var/log/ckan
sudo chown -R ckan /var/log/ckan
Switch to user ckan
, as we want to run the cron jobs with this user:
# Ubuntu
sudo -i -u ckan
# Centos
sudo su -s /bin/bash - ckan
Then edit the crontab for this user:
crontab -e
Insert the following line for the crontab, then save and exit:
0 0 * * * /usr/lib/ckan/default/bin/knowledgehub -c /etc/ckan/default/production.ini predictive_search train >/var/log/ckan/cronjob_predictive_search.log 2>&1
Runs the predictive_search
train command once per day.
Insert the following line for the crontab, then save and exit:
0 0 * * * /usr/lib/ckan/default/bin/knowledgehub -c /etc/ckan/default/production.ini intents update >/var/log/ckan/cronjob_intents_update.log 2>&1
Runs the intents
update command once per day.
We're going to configure the CKAN workers to run as systemd
services.
There are 3 workers, one for each messaging topic of ckan: default
, priority
and bulk
.
Default Worker
Create the service file
sudo touch /etc/systemd/system/ckan-worker.default.service
sudo chmod 664 /etc/systemd/system/ckan-worker.default.service
Add the following content of the file (run the editor in sudo mode - for example sudo vim /etc/systemd/system/ckan-worker.default.service
):
[Unit]
Description=CKAN Worker Default
After=network.target
[Service]
WorkingDirectory=/usr/lib/ckan/default/src/ckan
ExecStart=/usr/lib/ckan/default/bin/paster --plugin=ckan jobs worker --config=/etc/ckan/default/production.ini
User=ckan
[Install]
WantedBy=multi-user.target
Save and close.
Enable and start the service:
sudo systemctl enable ckan-worker.default
sudo systemctl start ckan-worker.default
Check status to make sure it is running:
sudo systemctl status ckan-worker.default
To view the logs, use journalctl
:
sudo journalctl -u ckan-worker.default
Priority Worker
Create the service file
sudo touch /etc/systemd/system/ckan-worker.priority.service
sudo chmod 664 /etc/systemd/system/ckan-worker.priority.service
Add the following content of the file (run the editor in sudo mode - for example sudo vim /etc/systemd/system/ckan-worker.priority.service
):
[Unit]
Description=CKAN Worker Priority
After=network.target
[Service]
WorkingDirectory=/usr/lib/ckan/default/src/ckan
ExecStart=/usr/lib/ckan/default/bin/paster --plugin=ckan jobs worker priority --config=/etc/ckan/default/production.ini
User=ckan
[Install]
WantedBy=multi-user.target
Save and close.
Enable and start the service:
sudo systemctl enable ckan-worker.priority
sudo systemctl start ckan-worker.priority
Check status to make sure it is running:
sudo systemctl status ckan-worker.priority
To view the logs, use journalctl
:
sudo journalctl -u ckan-worker.priority
Bulk Worker
Create the service file
sudo touch /etc/systemd/system/ckan-worker.bulk.service
sudo chmod 664 /etc/systemd/system/ckan-worker.bulk.service
Add the following content of the file (run the editor in sudo mode - for example sudo vim /etc/systemd/system/ckan-worker.bulk.service
):
[Unit]
Description=CKAN Worker Bulk
After=network.target
[Service]
WorkingDirectory=/usr/lib/ckan/default/src/ckan
ExecStart=/usr/lib/ckan/default/bin/paster --plugin=ckan jobs worker bulk --config=/etc/ckan/default/production.ini
User=ckan
[Install]
WantedBy=multi-user.target
Save and close.
Enable and start the service:
sudo systemctl enable ckan-worker.bulk
sudo systemctl start ckan-worker.bulk
Check status to make sure it is running:
sudo systemctl status ckan-worker.bulk
To view the logs, use journalctl
:
sudo journalctl -u ckan-worker.bulk