A few days ago I wrote about filtering referer spam from my server logs before
they're processed by goaccess. goaccess
doesn't support log filtering out of
the box, but tools like grep
or sed
can be used to filter the log before
it's processed by goaccess
.
There are two main reports I use for this blog: today's log, and the last 30 days of activity.
Viewing today's log
This is nice and simple. It searches the log for entries that have today's date
string, and the results are then piped to goaccess
.
grep "$(date '+%d\/%b\/%Y' -d 'today')" path/to/access_log \ | goaccess -a
This can also be adapted to show log entries for a specific month (or year). I use this when generating monthly reports on how "Writing PHP with Emacs" is performing:
grep 'Nov/2020' path/to/access_log \ | grep "books/php-with-emacs" \ | goaccess -a
Note: This could also be done in a single pass using grep
's -e
syntax,
but I find this version more readable.
Viewing this month's log
This one is a little trickier as we need to filter out everything before a
specific date. Thankfully sed
can do this:
sed -n '/'$(date '+%d\/%b\/%Y' -d '1 month ago')'/,$ p' path/to/access_log \ | goaccess -a
There are a couple of parts here:
-n
- This flag tells
sed
to hide its output. Normally it will display every line that is processed, which in this case would output the entire log file. $(date '+%d\/%b\/%Y' -d '1 month ago')
- This is evaluated and replaced by a timestamp for 1 month ago.
'/'$(date '+%d\/%b\/%Y' -d '1 month ago')'/,$ p'
- This is the expression
that
sed
will evaluate, which searches all entries that match a specific range. The,
symbol is the range delimiter, and$
is the end of the range (which in this case is any date after the start date). p
- This final modifier tells
sed
to output lines that matched.
All together this filters log entries to the last 30 days before goaccess
processes them. The date
command is pretty flexible so this approach can
filter entries to the current week, quarter, or any other range.