Osm: Difference between revisions

From Personal wiki
(→‎apache2: rectify cache directory (updated) and general phrasing. Also database intro sentence turned into a paragraph about disk space usage)
(→‎How to import planet.osm.pbf: added pidof generalization)
Line 9: Line 9:
==== How to import planet.osm.pbf ====
==== How to import planet.osm.pbf ====
The import process may overuse memory and crash the server. To mitigate this, the following was used (though only as long as nodes+ways+rels were processed until <code>2022-04-05 08:35</code>, not at the database operations)<syntaxhighlight lang="shell">
The import process may overuse memory and crash the server. To mitigate this, the following was used (though only as long as nodes+ways+rels were processed until <code>2022-04-05 08:35</code>, not at the database operations)<syntaxhighlight lang="shell">
$ while true;do while [ "$(free --giga|grep Mem|xargs|cut -d' ' -f7)" -gt 2 ];do printf '\033[2K\r%s' "$(free --giga|grep Mem|xargs|cut -d' ' -f7)";sleep 1;done;sudo kill 835;free -h;sleep 5;free -h;done
$ while true;do while [ "$(free --giga|grep Mem|xargs|cut -d' ' -f7)" -gt 2 ];do printf '\033[2K\r%s' "$(free --giga|grep Mem|xargs|cut -d' ' -f7)";sleep 1;done;sudo kill-9 "$(pidof /usr/local/bin/renderd)";free -h;sleep 5;free -h;done
</syntaxhighlight>This loop did not engage once, but the import consistlently crashed the server before. Call me crazy but I think it worked by osmosis.
</syntaxhighlight>This loop did not engage once, but the import consistlently crashed the server before. Call me crazy but I think it worked by osmosis.



Revision as of 17:06, 25 November 2022

OpenStreetmap container that contains a postgresql cartography database of the planet, the renderd daemon and an apache2 webserver to serve rendered tiles.

Cartography pipeline

postgresql

Data storage, mostly in planet_osm_* tables, but some smaller tables for e.g. coastlines and contours for the /contours/ layer, all in dbname=gis.

The filesystem is a separate zfs dataset with zstd-19 compression, reaching a 3.77x compression ratio (lz4 attained about 2.5x, and gzip-9 3.5x, the additional CPU load is worth the disk space headroom), and using about 560 GB of disk space. To allow regular updates, /mnt/maps/planet.bin.nodes is a binary file that also needs to be saved though it is not used for rendering; it uses about 50 GB.

How to import planet.osm.pbf

The import process may overuse memory and crash the server. To mitigate this, the following was used (though only as long as nodes+ways+rels were processed until 2022-04-05 08:35, not at the database operations)

$ while true;do while [ "$(free --giga|grep Mem|xargs|cut -d' ' -f7)" -gt 2 ];do printf '\033[2K\r%s' "$(free --giga|grep Mem|xargs|cut -d' ' -f7)";sleep 1;done;sudo kill-9 "$(pidof /usr/local/bin/renderd)";free -h;sleep 5;free -h;done

This loop did not engage once, but the import consistlently crashed the server before. Call me crazy but I think it worked by osmosis.

Also, it seemed that limiting CPU usage helped mitigate RAM overuse:

$ cpulimit -fl 400 -- osm2pgsql --create --multi-geometry --slim --hstore --style openstreetmap-with-more.style --tag-transform-script openstreetmap-adjusted-for-cyclosm.lua -d gis -C 14000 --number-processes 4 --flat-nodes /mnt/maps/planet.bin.nodes /mnt/maps/planet-220328.osm.pbf
Process 835 detected
2022-04-04 12:16:47  osm2pgsql version 1.5.2 (1.5.2-15-g25a1e9d1)
2022-04-04 12:16:47  Database version: 12.10 (Ubuntu 12.10-1.pgdg20.04+1)
2022-04-04 12:16:47  PostGIS version: 3.2
2022-04-04 12:16:47  Setting up table 'planet_osm_point'
2022-04-04 12:16:47  Setting up table 'planet_osm_line'
2022-04-04 12:16:47  Setting up table 'planet_osm_polygon'
2022-04-04 12:16:48  Setting up table 'planet_osm_roads' 
Processing: Node(1640k 546.7k/s) Way(0k 0.00k/s) Relation(0 0.0/s) 
Processing: Node(1002480k 1842.8k/s) Way(0k 0.00k/s) Relation(0 0.0/s) 
Processing: Node(4575560k 1902.5k/s) Way(0k 0.00k/s) Relation(0 0.0/s) 
Processing: Node(7587737k 1861.6k/s) Way(91455k 10.28k/s) Relation(0 0.0/s) 
Processing: Node(7587737k 1861.6k/s) Way(262832k 13.34k/s) Relation(0 0.0/s) 
Processing: Node(7587737k 1861.6k/s) Way(847173k 18.33k/s) Relation(1786720 216.9/s) 
2022-04-05 08:35:27 Reading input files done in 73119s (20h 18m 39s).
2022-04-05 08:35:27   Processed 7587737772 nodes in 4076s (1h 7m 56s) - 1862k/s
2022-04-05 08:35:27   Processed 847173340 ways in 46225s (12h 50m 25s) - 18k/s
2022-04-05 08:35:27   Processed 9773277 relations in 22818s (6h 20m 18s) - 428/s
2022-04-05 08:35:28  Clustering table 'planet_osm_point' by geometry... 
2022-04-05 08:35:28  Clustering table 'planet_osm_polygon' by geometry... 
2022-04-05 08:35:28  Clustering table 'planet_osm_line' by geometry... 
2022-04-05 08:35:28  Clustering table 'planet_osm_roads' by geometry... 
2022-04-05 10:22:51  Creating geometry index on table 'planet_osm_roads'... 
2022-04-05 10:50:28  Creating osm_id index on table 'planet_osm_roads'... 
2022-04-05 10:56:22  Creating geometry index on table 'planet_osm_point'... 
2022-04-05 10:58:21  Analyzing table 'planet_osm_roads'...
2022-04-05 10:58:34  Done postprocessing on table 'planet_osm_nodes' in 0s
2022-04-05 10:58:34  Building index on table 'planet_osm_ways'
2022-04-05 13:38:32  Creating osm_id index on table 'planet_osm_point'... 
2022-04-05 13:55:29  Analyzing table 'planet_osm_point'...
2022-04-05 13:55:45  Building index on table 'planet_osm_rels'
2022-04-05 15:38:19  Creating geometry index on table 'planet_osm_line'...
2022-04-05 21:40:06  Creating osm_id index on table 'planet_osm_line'...
2022-04-05 22:12:14  Analyzing table 'planet_osm_line'...
2022-04-06 00:58:09  Creating geometry index on table 'planet_osm_polygon'...
2022-04-06 18:13:34  Creating osm_id index on table 'planet_osm_polygon'...
2022-04-06 18:55:31  Analyzing table 'planet_osm_polygon'...
2022-04-07 10:36:05  Done postprocessing on table 'planet_osm_ways' in 171451s (47h 37m 31s)
2022-04-07 10:36:05  Done postprocessing on table 'planet_osm_rels' in 1214s (20m 14s)
2022-04-07 10:36:05  All postprocessing on table 'planet_osm_point' done in 19216s (5h 20m 16s).
2022-04-07 10:36:05  All postprocessing on table 'planet_osm_line' done in 49019s (13h 36m 59s).
2022-04-07 10:36:05  All postprocessing on table 'planet_osm_polygon' done in 123615s (34h 20m 15s).
2022-04-07 10:36:05  All postprocessing on table 'planet_osm_roads' done in 8586s (2h 23m 6s).
2022-04-07 10:36:05  osm2pgsql took 253158s (70h 19m 18s) overall.
Child process is finished, exiting...

Reconciliation of database schemas for rendering of two layer types

Layers /plain*/ and /cyclosm/ differ on their assumptions of the database schema. These assumptions are specified at import time to osm2pgsql. The details about which columns are extracted from planet.osm.pbf and how they are converted, are specified by --style and --tag-transform-script, the current files are: [osm2pgsql] --style openstreetmap-with-more.style --tag-transform-script openstreetmap-adjusted-for-cyclosm.lua. The .style file specifies which columns to take into the database. To unite both layers, it changes some columns from the /plain*/ layers' .style to apply not only to nodes but nodes,ways and adds a few new columns specific to /cyclosm/. There was one conflict: the layer column, regarding its type. The cyclosm mapnik.xml stylesheet only mentions the column four times and assumes it is text, whereas /plain*/ takes it as int4. Cyclosm converts layer to integer anyway, so only a few manual edits in its stylesheet and views.sql (the SQL indexes builder) were necessary:

(cyclosm stylesheet.xml)

CASE WHEN layer~E'^\\d+$' THEN layer::integer ELSE 0 END

replaced by

CASE WHEN layer IS NOT NULL THEN layer ELSE 0 END

and (views.sql)

CASE WHEN layer~E'^\\d+$' THEN 100*layer::integer+199 ELSE 199 END

replaced by

CASE WHEN layer IS NOT NULL THEN 100*layer+199 ELSE 199 END

renderd

/osm/plain/12/2145/1434.png
/osm/plain/12/2145/1434.png

Rendering daemon, the config file at /usr/local/etc/renderd.conf specifies which layers exist and their stylesheets. Uses the mapnik renderer, accessed as a python module.

Layer /plain/

The standard OSM map layer. The stylesheet is generated by cartocss and located at ~/src/openstreetmap-carto/mapnik_plain.xml.

Layer /plain_overlay/

/osm/plain_overlay/12/2145/1434.png
/osm/plain_overlay/12/2145/1434.png

Modified /plain/ layer to change colours of aerial elements (elements that have area, not point- or linelike) to "transparent", to make laying it over a second "base" layer, satellite imagery for example, possible. The Stylesheet is generated by $ python3 add_more_to_overlay.py {mapnik_plain}.xml > {mapnik_plain-overlay}.xml, where the python script only replaces a few of the <PolygonSymbolizer> fill colour attributes by fill="transparent". This works well, also supported by the fact that the rendered raster tiles are in .png format which supports transparent colour.

Layer /cyclosm/

/osm/cyclosm/12/2145/1434.png
/osm/cyclosm/12/2145/1434.png

Cycling-optimized map layer, the database schema is widened to contain all necessary data for both /plain*/ and /cyclosm/, see Reconciliation_of_database_schemas_for_rendering_of_two_layer_types. The stylesheet is generated in the working directory ~/src where the node module kosmtik is available, though only executable with a statically installed lts-version of nodejs in /root/(which has been made world-readable as a consequence).

$ export PATH="/root/node-v16.14.1-linux-x64/bin:$PATH"
$ node_modules/kosmtik/index.js export cyclosm-cartocss-style/project.mml --output cyclosm-cartocss-style/mapnik.xml

The style does support a few settings for rendering, specified in cyclosm-cartocss-style/localconfig.js but changing the database name did not work, so to replace the default database name osm with gis:

$ sed -e 's_name="dbname"><!\[CDATA\[osm\]\]>_name="dbname"><!\[CDATA\[gis\]\]>_g' cyclosm-cartocss-style/mapnik.xml > cyclosm-cartocss-style/cyclosm-dbname.xml

Then, also edit the layer column usage as in Reconciliation_of_database_schemas_for_rendering_of_two_layer_types:

$ cp cyclosm-cartocss-style/cyclosm-dbname.xml cyclosm-cartocss-style/cyclosm-dbname-layer.xml
$ vim cyclosm-cartocss-style/cyclosm-dbname-layer.xml

Layer /contours/

/osm/contours/12/2145/1434.png

Display contour lines of the same altitude. Similarly to #Layer_/plain_overlay/, background color is transparent to allow contour lines being used ontop of another map. The data is imported into table contours by:

$ gdal_contour -i 10 -snodata 32767 -a height {in}.tif {out}.tif.shp
$ shp2pgsql -a -g way {shapefile}.shp contours|psql -d gis

Most guidance on this is from the osm wiki page for contours[1]. The ~/contours.xml also comes from there. The final shell loop with progress reports is:

$ { t=$(ls -w1 /mnt/18a/raster/SRTM_GL3_srtm/*.shp|wc -l);for i in /mnt/18a/raster/SRTM_GL3_srtm/*.shp;do printf '\033[2K\r%s\n' "$((c++))/$t $i">&2;shp2pgsql -a -g way "$i" contours 2>stderr;done; }|grep -v 'ANALYZE'|dd status=progress|sudo lxc-attach -n osm -- sudo -u user -- psql -d gis >stdin

Note the removal of ANALYZE "contours" statements after every shapefile. ANALYZE was run once at the end, allowing a total speedup from 3MB/s and declining to 12MB/s and increasing to 18MB/s at the end.

In total, postgres reports that the contours table uses 223 GB. This data does not update so nothing more is needed for the future, except maybe instruct the rendering daemon to never re-render any /contours/ tiles (currently it will re-render them whenever they expire).

apache2

Requests rendering daemon if tiles not present in cache /var/cache/renderd/tiles/{layer_name} and serves them on port 80, coordinated by apache module mod_tile that reads renderd's config file.

Nominatim

API for processing search queries on the database. It uses a fundamentally different database schema than rendering but some tables hold the same data and might work just as views. Also updating the database can become problematic... Only theoretical for now.

Data updates

/home/user/udpate.py will import any .osc.gz files in /mnt/maps/tmp in alphabetical ascending order and if they imported successfully, remove them. See Perun#/home/user/osm_update.py which downloads the update files into /mnt/maps/tmp/.

Container rights

  • IP 10.0.3.10
  • read-write to /mnt/maps/(planet.bin.nodes) for database imports and as temporary storage (/mnt/maps/tmp) of changefiles
  • read-write to /var/lib/postgresql/ to keep the database on a separate zfs dataset