You are browsing a read-only backup copy of Wikitech. The primary site can be found at

Difference between revisions of "Maps/OSM Database"

From Wikitech-static
Jump to navigation Jump to search
(→‎The initial-import-script: depooling, empty DB, note about import times)
(MSantos moved page Maps/OSM Database to Maps/OSM Database Legacy: This documentation is deprecated and will be kept for historical reasons)
Line 1: Line 1:
= OSM Database =
#REDIRECT [[Maps/OSM Database Legacy]]
The Maps OSM database is an important piece of the infrastructure, it's a PostgreSQL database that hosts the data source for tile generation and the [[Geoshapes service]].
The OSM Database needs two processes that are fundamental in order to keep it available with the most updated data from OSM: initial import and replication.
== Initial Importing of OSM data into PostgreSQL database ==
This process needs to be done in all clusters available, one at a time.
=== When you need to do initial import? ===
The following scenarios require a fresh install of the OSM database:
* First setup of the cluster
* Any upgrades or maintenance that require deleting the PostgreSQL data folder, for example:
** OS migration [ T198622]
** Disk space in critical condition because of OSM replication
* OSM replication is disabled for a long time range and the database is outdated
** Depending on the server condition, is safer to re-import OSM data instead of roll replication, because it can overload Tilerator and increase significantly the CPU usage
=== Importing ===
==== Checklist ====
# Create a Phab task to track this work. [ Example].
# Depool the maps master.
# Isolate the maps master from the rest of the cluster. [ Example].
# Disable Tilerator in puppet. [ Example].
# Delete postgres data, recreate the empty DB.
# Run `osm-initial-import` script to reimport data from OSM, needs root access.
# Re-enable replication
# Repool the maps master
# Re-init all slaves (cookbook)
# Re-enable tilerator
==== Depooling the maps master ====
From the puppetmaster, depooling maps1004 in eqiad: 
<code>sudo confctl select "dc=eqiad,cluster=maps,service=kartotherian,name=maps1004.eqiad.wmnet" set/pooled=no</code>
<code>sudo confctl select "dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1004.eqiad.wmnet" set/pooled=no</code>
==== Re-creating an empty Postgres DB ====
Simply remove the database and let puppet handle the rest:
<code>rm -rf /srv/postgresql/9.6/main/* && run-puppet-agent && service postgresql restart</code>
==== Re-creating Postgres DB from backup ====
* Clear the [[Postgres]] data directory and init the database from backup (replace <code>maps2001.codfw.wmnet</code> by the postgres master):
<code>rm -rf /srv/postgresql/9.6/main/* && sudo -u postgres pg_basebackup -X stream -D /srv/postgresql/9.6/main/ -h maps2001.codfw.wmnet -U replication -W</code>
==== The initial-import-script ====
Initial data load of OSM into postgresql is done by running <code>/usr/local/bin/osm-initial-import</code> on the postgresql master node. This command can take over 12 hours or more, so running in <code>screen</code> or <code>tmux</code> is advised.
<syntaxhighlight lang="bash">
osm-initial-import \
    -d <date_of_import> \
    -s  <state_file_url> \
    -x webproxy.eqiad.wmnet:8080
* '''date_of_import:''' find the latest dump at Example: <code>160530</code>. Do NOT use "latest", as that might change at any moment.
* '''state_file_url:''' The URL to the state file corresponding to the dump, find the correct one at (the state file must be older than the dump). Example: <code></code>.
==== Troubleshouting ====
* If the slave initialization timeouts in puppet, it needs to be run manually:
<syntaxhighlight lang="bash">
service postgresql@9.6-main stop
rm -rf /srv/postgresql/9.6/main
sudo -u postgres /usr/bin/pg_basebackup -X stream -D /srv/postgresql/9.6/main -h <maps_master fqdn> -U replication -w
# Run puppet to make sure recovery.conf file is created
service postgresql@9.6-main start
== OSM replication ==
In order to keep the OSM Database updated, we use the following tools: [ osmosis] and [ osm2pgsql]. These tools fetch data from OSM servers and apply the requested changesets in the PostgreSQL database. When it's complete, the [ replication script] will trigger a request to Tilerator, sending a list of tiles that needs to be regenerated.
== Monitoring ==
You can monitor PostgreSQL activities of the Maps clusters in the following dashboards:
* [ General PostgreSQL metrics] from [ prometheus postgres exporter]
* OSM sync lag: [ eqiad], [ codfw]

Latest revision as of 15:53, 16 September 2021