Log post-processing

Log files written by cyclog and many of the alternatives one can use in place of it are sequences of zero or more variable-length records that end with linefeeds. There are a number of tools that can be used to post-process the contents of log directories.

One can "follow" a log with the tail -F (not -f) command applied to its current file.

Note: This does not "catch up" with old logs, and only processes the current file. The -n +0 option can be added to follow from the beginning of the current file.
Also note that it has a race condition where tail will miss out stuff if (for whatever reason) rotation from one file to the next happens faster than tail can switch files to keep up. tail ends up skipping whole files. (This is fairly easy to trigger in practice if the output of tail is to something slow like a slow scrolling remote terminal over a low bandwidth or lossy network connection.)
GNU tail and BSD tail have a multiplicity of problems handling rotated log files.
One can convert the contents of a log file from external TAI64N form to a human-readable timestamp in the current timezone, with the tai64nlocal tool, which can be used both as a filter within a post-processing pipeline and as a straight file-reading utility.
With GNU awk, tai64nlocal can be used as an awk "co-process" to convert timestamps:
```
print |& "tai64nlocal"
"tai64nlocal" |& getline
```
Over the years, many log administration and analysis tools, from logstash to Sawmill (to name but two), have gained the ability to understand TAI64N directly, without need for using tai64nlocal as an intermediary.
Log file directories are locked by the log writing programs with the conventional daemontools lockfile mechanism. One can arrange to execute tasks, interlocked with the logging service not running, using the setlock tool. For example, this arranges to temporarily stop the log service connected to the local-syslog-read service and archive a snapshot of its log directory:
```
setlock /var/log/sv/local-syslog-read/lock sh -c 'pax -w /var/log/sv/local-syslog-read/@*.[su] /var/log/sv/local-syslog-read/current > snapshot.tar' &
system-control condrestart cyclog@local-syslog-read
```
(Note the subtlety of wildcard expansion being deferred until setlock has acquired the lock.)
All log lines begin with a timestamp; TAI64N timestamps sort lexically into chronological order; and each log file is by its nature already sorted into chronological order. This means that log files are suitable for using the -m option to the sort command, in order to merge sort multiple log files together into a single log.
The following example makes use of this to sort all of the last hour's logs from all (nosh managed, system-wide) services together:
```
find /var/log/sv/*/ -type f \( -name current -o -name '@*.[su]' \) -mmin -60 -print0 |
xargs -0 sort -m -- |
tai64nlocal |
less -S +G
```
The follow-log-directories and export-to-rsyslog tools understand the structure of log directories, and can thus do things that tail -F cannot:
- They know the naming scheme for old rotated log files and know to scan through them, reading any that are newer than the point last recorded in their "log cursor", before reading the current file.
- They know to skip through any given log file, to the next entry after the point last recorded in their "log cursor".
- They know to read all of the way to the end of the current file before taking note of a rename notification triggered by a file rotation.
- Their "log cursors" are persistent and tweakable, and not just transient state held within some kernel open file description that is lost when the process holding it open terminates. They will "remember" where they last left off if terminated and then re-invoked. An administrator can hand-edit the "cursor" file with the TAI64N timestamp of the desired place in the log to (re-)start from.
With export-to-rsyslog one can build a de-coupled log export service that will export a machine's logs over the network to an RSYSLOG server, without skipping log entries and with the log directory itself being a buffering mechanism that allows the logged service to pull ahead of the export service without the risk of short-lived temporary network glitches blocking the logged service because its output pipe is full.

With follow-log-directories one can build de-coupled log post-processing services that do things like pass logs through awk or perl scripts and perform tasks when log entries match particular patterns. Again, the log directory itself acts as a buffer that allows the logged service to pull ahead if the tasks happen to take a long time.
There is a whole bunch of log post-processing tools for specific services written by various people. Here are just a couple, in no particular order:
- Skaarup's tools for analyzing the logs of various Bernstein softwares, from axfrdns to publicfile httpd.
- Erwin Hoffmann's newanalyse that analyses qmail log files specifically.