Rigel
Server that runs all applications and manages all data. It does not run the router.
LXC host
LXC containers are a type of virtualization environment made possible by namespace isolation of processes. Any program within a container still runs on the same (linux) kernel but knows about a different init process than the original of the host. Also, it cannot see other processes on the host's process tree, only processes in its namespace, somewhat like a subtree. LXC enables isolation of different resources and is mainly used here for network and data access controls.
The rigel server manages isolated LXC containers Lada, Perun and Osm. Some mechanisms used across all of them are described below:
Mounted filesystems
To make filesystems accessible, the directory /mnt/{lxc_general}/{lxc_name}/mounts
is --rbind mount
ed inside the container. It then has subdirectories that automatically show up in the lxc's /mnt.
A filesystem is mounted in /mnt/{lxc_general}/{lxc_name}/mounts/{dir}
and then becomes accessible in the container under /mnt/{dir}
. This two-layered mount makes mounting filesystems independent of rebooting the container: mounts are no longer specified as lxc config lines (only the toplevel lxc.mount.entry=/mnt/{lxc_general}/{lxc_name}/mounts/ mnt/ none rw,rbind 0 0
). Note that Lada uses a mnt_mounts
and a www_mounts
that map to /mnt
and /var/www/html/data
, for also accessing webserver contents directly.
Modularity
The containers are modular in design: their mountpoints are managed by the host and therefore they can run on a different host than rigel. For example, in the offline offsite backup environment. This also means that they may run in a degraded environment: not all filesystems will always be available (e.g. for family pictures as a website, only lada≈8GB+the pictures in terms of storage are needed, and the multi-terabyte youtube archive will probably not be connected).
Additionally to its /mnt/{lxc_general}/{lxc_name}/rootfs
which stores its root filesystem, each container also stores some files in /mnt/{lxc_general}/{lxc_name}
to keep track of some mount and filesystem metadata that makes running on different hosts easier.
Systemd units
A simple heuristic to check for the is just whether a particular filesystem in mounted. In systemd
unit files that depend on a filesystem being connected, the clause
AssertPathIsMountPoint=/mnt/{dir}
is added.
This mechanism is mostly used by Lada that runs webserver components and cannot run some component if its filesystem is missing. In Perun, because it is mainly a downloading container, no custom systemd
services are needed because no (custom) daemons are necessary for downloading data.
open.sh
For each container, a script that is always run before lxc-start
ing the container.
- It uses the contents of
/etc/hostname
of the host to determine which environment it is running in (e.g.==rigel
-> full environment) and therefore whether to runmounts.sh
,backup_mounts.sh
orother_mounts.sh
. - It sets up container-specific networking if needed, e.g. Lada's port forwarding with
iptables
:in_iface=eth0 for port in 80 443 8080 7000;do iptables -t nat -A POSTROUTING -o lxcbr0 -p tcp --dport $port -d {lxc_ip} -j SNAT --to-source {lxcbr0_host_gateway} iptables -A PREROUTING -t nat -i "${in_iface}" -p tcp --dport $port -j DNAT --to 10.0.3.2:$port iptables -A FORWARD -p tcp -d {lxc_ip} --dport $port -j ACCEPT done
mounts.sh
This is a script called by open.sh
(see above) for each container before starting it. It has hardcoded mountpoints for all needed {dir}
s, but also reads the file mount.json
to resolve from which host path it should take the corresponding dir. This allows for configurability of where filesystems are (in mount.json
) and which filesystems to actually mount (with different mounts.sh
versions: backup_mounts.sh
, other_mounts.sh
).
The file mount.json
is common to all containers and each container only keeps a symlink to a cental location. This does not affect mounts.sh
because it only reads it, but edits to it need only to be made once.
cron jobs
Use the script ./runtask.sh {name}
to lookup said task in tasks
. Any possible task has a name that determines the executable file, the logfile, and the lockfile. Each task can have a language of either sh
or py
, with py execute python3 {name}.py
and with sh ./{name}.sh
. Then, each task has an environment to be run in: an lxc container or rigel
direclty. In this way, runtask.sh
provides a standard framework to manage logs and locking (each task never runs multiple copies of itself) and have all tasks described by only their name.
To execute multiple tasks at one, just use ./runtask.sh {name1} {name2} {name3} [...]
. This is used for multiple different tasks that download something from the internet. Instead of scheduling them each offset by some guessed interval
04 03 * * * /home/user/runtask.sh yt
04 04 * * * /home/user/runtask.sh repos
54 04 * * * /home/user/runtask.sh rsseth
16 05 * * * /home/user/runtask.sh yttmp
Just schedule the start of the chain:
04 03 * * * /home/user/runtask.sh yt repos rsseth yttmp
Some concrete tasks described below, see Perun for more.
spball
Sponsorblock all: rsync
s the 1GB-sized sponsorTimes.csv
through perun
lxc container. Then runs the python script /home/user/addcsv.py
in lada to read the .csv
file and load it into the ram-based database. There was a C program before to filter out some data columns, which reduced the actual data by about 5x, but It is discontinued because it liked to crash with sponsorTimes.csv
's malformed csv (stray newlines as if they were just quotable characters, I hope that's not valid csv). Instead, the whole file is read line-by-line in python and a few error checks are implemented, allowing it to fail somewhat gracefully. The python script then enforces the key-constraint as it is different from the one in the source csv, namely: key(line_uuid) for the csv becomes key(start,end,video_id) for the database. This means some lines will be thrown away, and some side-effects, like updating of already-existing lines being inconsistent/non-idempotent, and slight rounding errors due to the floating-point types of durations should be expected. Consult the script itself for details.
bwusage
Quarter-hourly synchronization through ssh
from nyx. Extracts data as json with:
ssh nyx vnstat -i usb0 --json f
The .json
file is then converted to multiple files to summarize and sum on time, also see prepending to make it Javascript.
ramusage.py
Python script that monitors RAM usagfe and attempts to restart the renderd
daemon. Now that Tirex is in use, it should not have a memory leak.