Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize seamark imports in AWS land. #3

Open
erictheise opened this issue Jan 31, 2018 · 1 comment
Open

Optimize seamark imports in AWS land. #3

erictheise opened this issue Jan 31, 2018 · 1 comment

Comments

@erictheise
Copy link
Member

The first imposm3 load of seamark data ran for roughly a day and a half and, of course, by the time it finished it was found to be lacking. The import ran over the wire from the EC2 machine where the planet file resides to the PostgreSQL server on RDS.

I don't know how the initial planet file was imported but it'd be great to be able to speed the process. Options include PostgreSQL config tweaks & having the planet file on the same host as the database server, but I'm rusty as to what RDS allows.

@jj0hns0n, how'd you do it the first time?

@erictheise
Copy link
Member Author

erictheise commented Feb 14, 2018

I'll just sketch out what I've been doing in case I get clocked by a distracted uber driver. This is about rerunning an imposm3 import when we find the import schema has been incorrect for our needs. I assume the cache is invalid because the schema has changed; when the schema becomes stable we can use the -import and -deployproduction flags to imposm3 which does a brisk rotation of the tables.

  1. In dev or prod I import to osm_seamark_staging and upon completion DROP DATABASE osm_seamark and ALTER DATABASE osm_seamark_staging RENAME TO osm_seamark.

  2. I begin by running a local import against a country-sized Geofabrik extract. Locally, this takes 5+ minutes. If the data looks good there, I'll skip step 2.

imposm3 import -connection postgis://user:password@localhost/osm_seamark_staging -mapping seamark.yml -read wkg/italy-latest.osm.pbf -write -overwritecache -deployproduction
  1. If there's not enough data in the country-sized extract I'll DROP the staging database and rerun using a continent-sized extract. Locally, this takes 2-3 hours.
imposm3 import -connection postgis://user:password@localhost/osm_seamark_staging -mapping seamark.yml -read wkg/europe-latest.osm.pbf -write -overwritecache -deployproduction
  1. Assuming step 1 or 2 passes I'll start up an ec2 c4.8xlarge instance with virtually nothing on it except the mechanism for imposm3 reads and writes. The current planet is in the home directory. The original 2017 planet is in ~/orig/. I use wget to get the raw seamark.yml from GitHub. In production the import takes 5-6 hours.

  2. Shut down the c4.8xlarge seamark import instance.

  3. On the actual openseamap webserver, git pull all the related changes to the tegola config and style.

  4. Connect to the database host and promote the _staging database via DROP and ALTER commands.

  5. Restart the server.

sudo systemctl stop openseamap
sudo systemctl start openseamap

It writes to syslog so tail -fing that will reveal any blockers. 9 times out of 10 (actually 10 times out of 10) the problem is a mismatch between the states of the tegola .toml config and the database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant