Rigel

From Personal wiki
Revision as of 16:44, 23 June 2022 by A a (talk | contribs) (→‎repo: add archzfs)

Server that runs all applications and manages all data. It does not run the router.

regular.py

Python script to coordinate recurring automation tasks. It reads a config file regular.csv and executes commands inside LXC containers at specified times and time intervals.

regular.csv

";"-delimited CSV file, header line is "id;runatstart;interval;start;depends;lxc_name;username;command". Each subsequent line specifies a command and how and when it has to be run. lxc_name specifies the LXC container under which to run, username the user within that container and command the exact command, which cannot have spaces in its arguments (true '1 2' 3 will evaluate to "true" "'1" "2'" "3"):

$ sudo lxc-attach -n {lxc_name} -- sudo -u {username} -- {command.split(' ')}

start is a 4-digit string in %H%M format at which time the command should be excecuted, and interval is a number of hours specifying how much hours after start the command should be run again, e.g. 24, 12 or 8 for one, two and three runs a day respectively. The depends column specifies whether the task is a parent task (empty depends) or a child task. Children tasks will be executed exactly after their parent, specified bydepends containing the id of the parent, finishes execution. The parent must be in a line above the child's, and the child's runatstart, interval and start values are ignored and can be empty. If multiple children depend on the same parent, they will be run in the order they are listed in the .csv file from top to bottom. The runatstart column is a boolean (only checked for ==1) specifying whether the parent task should also be run at program startup if 1 and else the start+n*interval time will be awaited.

Contents:

sponsorblock

rsyncs the 1GB-sized spnosorTimes.csv through perun lxc container. Re-mounts the youtube filesystem to be sure updates are propagated through, and repeats rsync+remount a second time to smooth out network errors. Then runs the python script /home/user/addcsv.py to read the .csv file and load it into the ram-based database. There was a C program before to filter out some data columns, which reduced the actual data by about 5x, but It is discontinued because it liked to crash with a malformed csv input, which would happen sometimes if the network was unstable. Instead, the whole file is read line-by-line in python and a few error checks are implemented, allowing it to fail gracefully. The python script then enforces the key-constraint as it is different from the one in the source csv, namely: key(line_uuid) for the csv becomes key(start,end,video_id) for the database. This means some lines will be thrown away, and some side-effects, like updating of already-existing lines being inconsistent/non-idempotent, and slight rounding errors due to the floating-point types of durations should be expected. Consult the script itself for details.

repo

rsyncs from archlinux, archzfs and artix servers every 12 hours. See repository for using it.

lectures

Uses feed2exec python module to fetch RSS feeds of ETHZ video lectures and extract links to the video files and their upload dates. Then download links with wget into files named by date, e.g. "%Y-%m-%d.mp4". All specified in /home/user/updatefeeds.sh and run in perun lxc container.

git

Daily synchronization of select repositories for source code. See /mnt/software/git_clone.sh in perun lxc container.

stats

Hourly synchronization through ssh from nyx. Extracts data as json with:

ssh nyx vnstat -i usb0 --json f

The .json file is then modified though, see prepending to make it Javascript.

osm.py

Python script that manages the rendering daemon, pre-rendering of cached low-zoom tiles and database data updates in osm lxc container. The rendering daemon does keep a lot of data in RAM though and this manages to crash the server by memory starvation after about 12-24h of rendering, low-zoom tiles for caching for example. This script monitors available RAM and will kill renderd if it gets too low. Then it also resumes pre-rendering tiles for caching. At 00:30 it stops all rendering anyway and runs /home/renderaccount/update.py, see Osm#Data_upades and resumes rendering when the script exits. This is to allow most resources to be available to osm2pgsql during import.

PLANNED: maybe periodically, once every 2-3 days, also restart the database and make a zfs snapshot for backup, of an offline database to allow restoring. It is possible to make a snapshot but the database needs to be manually shutdown before:

$ sudo /root/createsp.py ps1a dbp

This specific command, to make the next sp[a-z][a-z] snapshot for exactly the ps1a/dbp dataset, is allowed passwordless in visudo.