Perun: Difference between revisions
(add podcast_fetch2db.py) |
(→/home/user/youtube_downloader.py: typos and updates in specification in osm_update) |
||
Line 11: | Line 11: | ||
== /home/user/osm_update.py == | == /home/user/osm_update.py == | ||
Fetches the newest <code>.osc.gz</code> daily changefiles from [https://planet.openstreetmap.org/ openstreetmap] for osm container into <code>/mnt/maps/tmp/</code>, to be processed by [[Osm#Data_updates]]. The latest already imported changefile is identified by its sequence number, and that written to <code>/mnt/maps/state.txt</code>. The state number therefore refers to which changefiles have already been downloaded. Whether they are in the database is indicated if they are removed from <code>/mnt/maps/tmp</code> or not. | Fetches the newest <code>.osc.gz</code> daily changefiles from [https://planet.openstreetmap.org/ openstreetmap] for osm container into <code>/mnt/maps/tmp/</code>, to be processed by [[Osm#Data_updates]]. The latest already imported changefile is identified by its sequence number, and that written to <code>/mnt/maps/state.txt</code>. The state number therefore refers to which changefiles have already been downloaded. Whether they are in the database is indicated if they are removed from <code>/mnt/s1a/maps/tmp</code> or not. Because this file presence/absence shows whether a changefile was successfully or not imported to database, the <code>maps</code> dataset is a subdataset of <code>dbp</code> which hosts the entire database: a recursive snapshot always contains a self-consistent state of the data. | ||
== /home/user/youtube_downloader.py == | == /home/user/youtube_downloader.py == | ||
For regularly downloading youtube videos to be displayed in [[Lada#Youtube]]. It can be called in several different modes for downloading a single video | For regularly downloading youtube videos to be displayed in [[Lada#Youtube]]. It can be called in several different modes for downloading a single video, its 1080p smaller version, downloading an entire channel (in update mode it will then only download the new not-yet-downloaded videos of said channel), or downloading the playlists of a channel. Only the mode is specified in the commandline and the ids are then written to stdin, if they are a list, on multiple lines. | ||
== /home/user/audio_podcasts_fetch2db.py == | == /home/user/audio_podcasts_fetch2db.py == |
Revision as of 14:04, 27 June 2022
LXC container, connected to the internet and used for downloading everything. It has readwrite access to relevant mountpoints and has wget
, youtube-dlp
, rsync
and lftp
as download software installed.
/video/lectures/updatefeeds.sh
Completely automatic ETHZ-video lecture recording downloader. It reads dir;url
lines from urls.csv
where dir
is a directory under /video/lectures
and link
is a RSS feed of an ETHZ video lecture series. It fetches the RSS feed through the feed2exec
python module, that converts every item to a csv line. Those are internally interpreted to extract the published date in %Y-%m-%d
format and the video url is then downloaded to {dir}/%Y-%m-%d.mp4
. It prints a last line reporting the work done: amount of feeds fetched, files detected (which are mentioned in the feeds but also already downloaded) and new files downloaded.
/software/git_clone.sh
Fetches archived git repositories, ran automatically.
/software/syncrepo-template.sh
Fetches, through rsync
, archlinux and artixlinux (depending on $1
) entire package repositories. Also fetches database files allowing completely functional package mirror operations, see Lada#Arch and artix package mirror.
/home/user/osm_update.py
Fetches the newest .osc.gz
daily changefiles from openstreetmap for osm container into /mnt/maps/tmp/
, to be processed by Osm#Data_updates. The latest already imported changefile is identified by its sequence number, and that written to /mnt/maps/state.txt
. The state number therefore refers to which changefiles have already been downloaded. Whether they are in the database is indicated if they are removed from /mnt/s1a/maps/tmp
or not. Because this file presence/absence shows whether a changefile was successfully or not imported to database, the maps
dataset is a subdataset of dbp
which hosts the entire database: a recursive snapshot always contains a self-consistent state of the data.
/home/user/youtube_downloader.py
For regularly downloading youtube videos to be displayed in Lada#Youtube. It can be called in several different modes for downloading a single video, its 1080p smaller version, downloading an entire channel (in update mode it will then only download the new not-yet-downloaded videos of said channel), or downloading the playlists of a channel. Only the mode is specified in the commandline and the ids are then written to stdin, if they are a list, on multiple lines.
/home/user/audio_podcasts_fetch2db.py
Multistep process to update a podcast. Has database acces on lada to dbname=audio. Can either:
- Fetch a podcast
url={dbname=audio:podcasts:url}
to a specified output file (then the XML should be manually edited and converted to json) - Import a given
.json
file as the next episode for a podcast (then only the file needs to be downloaded to the correct location) - Fully-automated
url={dbname=audio:podcasts:url}
fetch, XML parse of the latest item, import to database and download to the correct location. This also requires the given podcast to be specified in/home/user/audio_podcasts_parse.py
for how which XML keys should be translated or converted into data for the database. Also checks database contstraints like length in advance. If the database import succeeded, run awget
subprocess to download the file.
For the third option, /home/user/sanitize_eschtml.py
can parse HTML and remove attributes of some tags and remove some tags (while preserving children of those tags). It can be used to compress the description
s as it removes cluttering attributes and <span>
tags that have no attributes left. This script takes escaped HTML as both input and output.