Rigel: Difference between revisions

From Personal wiki
(→‎repo: add archzfs)
(change structure of regular.py -> cron tasks. also replace deprecated osm.py -> ramusage.py)
Line 1: Line 1:
Server that runs all applications and manages all data. It does not run the router.
Server that runs all applications and manages all data. It does not run the router.


== regular.py ==
== cron jobs ==
Python script to coordinate recurring automation tasks. It reads a config file regular.csv and executes commands inside LXC containers at specified times and time intervals.
Use the script <code>./runtask.sh {name}</code> to lookup said task in <code>tasks</code>. Any possible task has a name that determines the executable file, the logfile, and the lockfile. Each task can have a language of either <code>sh</code> or <code>py</code>, with py execute <code>python3 {name}.py</code> and with sh <code>./{name}.sh</code>. Then, each task has an environment to be run in: an lxc container or <code>rigel</code> direclty. In this way, <code>runtask.sh</code> provides a standard framework to manage logs and locking (each task never runs multiple copies of itself) and have all tasks described by only their name.


=== regular.csv ===
To execute multiple tasks at one, just use <code>./runtask.sh {name1} {name2} {name3} [...]</code>. This is used for multiple different tasks that download something from the internet. Instead of scheduling them each offset by some guessed interval<syntaxhighlight lang="bash">
"<code>;</code>"-delimited CSV file, header line is "<code>id;runatstart;interval;start;depends;lxc_name;username;command</code>". Each subsequent line specifies a command and how and when it has to be run. <code>lxc_name</code> specifies the LXC container under which to run, <code>username</code> the user within that container and <code>command</code> the exact command, which cannot have spaces in its arguments (<code>true '1 2' 3</code> will evaluate to <code>"true" "'1" "2'" "3"</code>):<syntaxhighlight lang="shell">
04 03 * * * /home/user/runtask.sh yt
$ sudo lxc-attach -n {lxc_name} -- sudo -u {username} -- {command.split(' ')}
04 04 * * * /home/user/runtask.sh repos
</syntaxhighlight><code>start</code> is a 4-digit string in <code>%H%M</code> format at which time the command should be excecuted, and <code>interval</code> is a number of hours specifying how much hours after <code>start</code> the command should be run again, e.g. 24, 12 or 8 for one, two and three runs a day respectively. The <code>depends</code> column specifies whether the task is a parent task (empty <code>depends</code>) or a child task. Children tasks will be executed exactly after their parent, specified by<code>depends</code> containing the <code>id</code> of the parent, finishes execution. The parent must be in a line above the child's, and the child's <code>runatstart</code>, <code>interval</code> and <code>start</code> values are ignored and can be empty. If multiple children depend on the same parent, they will be run in the order they are listed in the .csv file from top to bottom. The <code>runatstart</code> column is a boolean (only checked for ==1) specifying whether the parent task should also be run at program startup if 1 and else the <code>start</code>+n*<code>interval</code> time will be awaited.
54 04 * * * /home/user/runtask.sh rsseth
16 05 * * * /home/user/runtask.sh yttmp
</syntaxhighlight>Just schedule the start of the chain:<syntaxhighlight lang="bash">
04 03 * * * /home/user/runtask.sh yt repos rsseth yttmp
</syntaxhighlight>


Contents:
==== spball ====
Sponsorblock all: <code>rsync</code>s the 1GB-sized <code>sponsorTimes.csv</code> through <code>[[perun]]</code> lxc container. Then runs the python script <code>/home/user/addcsv.py</code> in [[lada]] to read the <code>.csv</code> file and load it into the [[Lada#RAMdb|ram-based database]]. There was a C program before to filter out some data columns, which reduced the actual data by about 5x, but It is discontinued because it liked to crash with a malformed csv input, which would happen because the <code>sponsorTimes.csv</code> is malformed. Instead, the whole file is read line-by-line in python and a few error checks are implemented, allowing it to fail somewhat gracefully. The python script then enforces the key-constraint as it is different from the one in the source csv, namely: key(line_uuid) for the csv becomes key(start,end,video_id) for the database. This means some lines will be thrown away, and some side-effects, like updating of already-existing lines being inconsistent/non-idempotent, and slight rounding errors due to the floating-point types of durations should be expected. Consult the script itself for details.


==== sponsorblock ====
==== bwusage ====
<code>rsync</code>s the 1GB-sized <code>spnosorTimes.csv</code> through <code>[[perun]]</code> lxc container. Re-<code>mount</code>s the youtube filesystem to be sure updates are propagated through, and repeats <code>rsync+remount</code> a second time to smooth out network errors. Then runs the python script <code>/home/user/addcsv.py</code> to read the <code>.csv</code> file and load it into the [[Lada#RAMdb|ram-based database]]. There was a C program before to filter out some data columns, which reduced the actual data by about 5x, but It is discontinued because it liked to crash with a malformed csv input, which would happen sometimes if the network was unstable. Instead, the whole file is read line-by-line in python and a few error checks are implemented, allowing it to fail gracefully. The python script then enforces the key-constraint as it is different from the one in the source csv, namely: key(line_uuid) for the csv becomes key(start,end,video_id) for the database. This means some lines will be thrown away, and some side-effects, like updating of already-existing lines being inconsistent/non-idempotent, and slight rounding errors due to the floating-point types of durations should be expected. Consult the script itself for details.
Quarter-hourly synchronization through <code>ssh</code> from [[nyx]]. Extracts data as json with:<syntaxhighlight lang="shell">
==== repo ====
<code>rsync</code>s from archlinux, archzfs and artix servers every 12 hours. See [[Lada#Arch and artix package mirror|repository]] for using it.
 
==== lectures ====
Uses <code>feed2exec</code> python module to fetch RSS feeds of ETHZ video lectures and extract links to the video files and their upload dates. Then download links with <code>wget</code> into files named by date, e.g. "<code>%Y-%m-%d.mp4</code>". All specified in <code>/home/user/updatefeeds.sh</code> and run in [[perun]] lxc container.
 
==== git ====
Daily synchronization of select repositories for source code. See <code>/mnt/software/git_clone.sh</code> in [[perun]] lxc container.
 
==== stats ====
Hourly synchronization through <code>ssh</code> from [[nyx]]. Extracts data as json with:<syntaxhighlight lang="shell">
ssh nyx vnstat -i usb0 --json f
ssh nyx vnstat -i usb0 --json f
</syntaxhighlight>The <code>.json</code> file is then modified though, see [[Lada#Stats|prepending to make it Javascript]].
</syntaxhighlight>The <code>.json</code> file is then converted to multiple files to summarize and sum on time, also see [[Lada#Stats|prepending to make it Javascript]].
 
== osm.py ==
Python script that manages the [[Osm#renderd|rendering daemon]], pre-rendering of cached low-zoom tiles and database data updates in [[osm]] lxc container. The rendering daemon does keep a lot of data in RAM though and this manages to crash the server by memory starvation after about 12-24h of rendering, low-zoom tiles for caching for example. This script monitors available RAM and will kill <code>renderd</code> if it gets too low. Then it also resumes pre-rendering tiles for caching. At 00:30 it stops all rendering anyway and runs <code>/home/renderaccount/update.py</code>, see [[Osm#Data_upades]] and resumes rendering when the script exits. This is to allow most resources to be available to <code>osm2pgsql</code> during import.


PLANNED: maybe periodically, once every 2-3 days, also restart the database and make a zfs snapshot for backup, of an offline database to allow restoring. It is possible to make a snapshot but the database needs to be manually shutdown before:<syntaxhighlight lang="shell-session">
== ramusage.py ==
$ sudo /root/createsp.py ps1a dbp
Python script that monitors RAM usagfe and attempts to restart the <code>renderd</code> daemon. Now that [[Osm#Tirex|Tirex]] is in use, it should not have a memory leak.
</syntaxhighlight>This specific command, to make the next <code>sp[a-z][a-z]</code> snapshot for exactly the <code>ps1a/dbp</code> dataset, is allowed passwordless in <code>visudo</code>.

Revision as of 08:59, 8 January 2023

Server that runs all applications and manages all data. It does not run the router.

cron jobs

Use the script ./runtask.sh {name} to lookup said task in tasks. Any possible task has a name that determines the executable file, the logfile, and the lockfile. Each task can have a language of either sh or py, with py execute python3 {name}.py and with sh ./{name}.sh. Then, each task has an environment to be run in: an lxc container or rigel direclty. In this way, runtask.sh provides a standard framework to manage logs and locking (each task never runs multiple copies of itself) and have all tasks described by only their name.

To execute multiple tasks at one, just use ./runtask.sh {name1} {name2} {name3} [...]. This is used for multiple different tasks that download something from the internet. Instead of scheduling them each offset by some guessed interval

04 03 * * * /home/user/runtask.sh yt
04 04 * * * /home/user/runtask.sh repos
54 04 * * * /home/user/runtask.sh rsseth
16 05 * * * /home/user/runtask.sh yttmp

Just schedule the start of the chain:

04 03 * * * /home/user/runtask.sh yt repos rsseth yttmp

spball

Sponsorblock all: rsyncs the 1GB-sized sponsorTimes.csv through perun lxc container. Then runs the python script /home/user/addcsv.py in lada to read the .csv file and load it into the ram-based database. There was a C program before to filter out some data columns, which reduced the actual data by about 5x, but It is discontinued because it liked to crash with a malformed csv input, which would happen because the sponsorTimes.csv is malformed. Instead, the whole file is read line-by-line in python and a few error checks are implemented, allowing it to fail somewhat gracefully. The python script then enforces the key-constraint as it is different from the one in the source csv, namely: key(line_uuid) for the csv becomes key(start,end,video_id) for the database. This means some lines will be thrown away, and some side-effects, like updating of already-existing lines being inconsistent/non-idempotent, and slight rounding errors due to the floating-point types of durations should be expected. Consult the script itself for details.

bwusage

Quarter-hourly synchronization through ssh from nyx. Extracts data as json with:

ssh nyx vnstat -i usb0 --json f

The .json file is then converted to multiple files to summarize and sum on time, also see prepending to make it Javascript.

ramusage.py

Python script that monitors RAM usagfe and attempts to restart the renderd daemon. Now that Tirex is in use, it should not have a memory leak.