Optimize seamark imports in AWS land. #3

erictheise · 2018-01-31T17:01:43Z

The first imposm3 load of seamark data ran for roughly a day and a half and, of course, by the time it finished it was found to be lacking. The import ran over the wire from the EC2 machine where the planet file resides to the PostgreSQL server on RDS.

I don't know how the initial planet file was imported but it'd be great to be able to speed the process. Options include PostgreSQL config tweaks & having the planet file on the same host as the database server, but I'm rusty as to what RDS allows.

@jj0hns0n, how'd you do it the first time?

The text was updated successfully, but these errors were encountered:

erictheise · 2018-02-14T19:11:18Z

I'll just sketch out what I've been doing in case I get clocked by a distracted uber driver. This is about rerunning an imposm3 import when we find the import schema has been incorrect for our needs. I assume the cache is invalid because the schema has changed; when the schema becomes stable we can use the -import and -deployproduction flags to imposm3 which does a brisk rotation of the tables.

In dev or prod I import to osm_seamark_staging and upon completion DROP DATABASE osm_seamark and ALTER DATABASE osm_seamark_staging RENAME TO osm_seamark.
I begin by running a local import against a country-sized Geofabrik extract. Locally, this takes 5+ minutes. If the data looks good there, I'll skip step 2.

imposm3 import -connection postgis://user:password@localhost/osm_seamark_staging -mapping seamark.yml -read wkg/italy-latest.osm.pbf -write -overwritecache -deployproduction

If there's not enough data in the country-sized extract I'll DROP the staging database and rerun using a continent-sized extract. Locally, this takes 2-3 hours.

imposm3 import -connection postgis://user:password@localhost/osm_seamark_staging -mapping seamark.yml -read wkg/europe-latest.osm.pbf -write -overwritecache -deployproduction

Assuming step 1 or 2 passes I'll start up an ec2 c4.8xlarge instance with virtually nothing on it except the mechanism for imposm3 reads and writes. The current planet is in the home directory. The original 2017 planet is in ~/orig/. I use wget to get the raw seamark.yml from GitHub. In production the import takes 5-6 hours.
Shut down the c4.8xlarge seamark import instance.
On the actual openseamap webserver, git pull all the related changes to the tegola config and style.
Connect to the database host and promote the _staging database via DROP and ALTER commands.
Restart the server.

sudo systemctl stop openseamap
sudo systemctl start openseamap

It writes to syslog so tail -fing that will reveal any blockers. 9 times out of 10 (actually 10 times out of 10) the problem is a mismatch between the states of the tegola .toml config and the database.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize seamark imports in AWS land. #3

Optimize seamark imports in AWS land. #3

erictheise commented Jan 31, 2018

erictheise commented Feb 14, 2018 •

edited

Loading

Optimize seamark imports in AWS land. #3

Optimize seamark imports in AWS land. #3

Comments

erictheise commented Jan 31, 2018

erictheise commented Feb 14, 2018 • edited Loading

erictheise commented Feb 14, 2018 •

edited

Loading