Filtering referral spam from my server logs, part 2

There are two tasks that never seem to end: removing dog hair from my clothes, and filtering spam.

Last year I implemented a basic system for filtering my logs, but over time it has been less effective. I wanted to create something that:

  • Could be reused across all of my sites.
  • Could be configured through files instead of having update the report generation script.
  • Could utilize third-party spam lists.

Before implementing any changes my goaccess reports looked like this:

Referer spam REALLY ruins everything

Nearly everything on that list is spam, and that's only the top 10. For monthly and yearly reports this amount of spam makes it hard to find pages that are actually doing well.

In the end I modified my existing grep setup with additional passes. It's a little slow on large logs, so I usually save the filtered log to speed up subsequent reports.

I created three files with keywords to remove:

  • bots.list contains a list of bot user agents that I want to ignore. This can also be done from within goaccess, but I wanted them completely scrubbed from my logs.
  • global-spammers.list is the contents of src/domains.txt from the referrer-spam-blocker project. It's a pretty hefty list.
  • spammers.list contains a list of spam domains that aren't in the global list. I normally add new items once or twice a week.

All of these files live in my "~/.config/" directory so I can use them on different projects. They're simple lists that have one entry per line:

spammersdomain.com
spammersdomain.org
anotherspammer.net

Filtering the access log looks like this:

grep -v -e '\"GET / HTTP/1.1\" 301 194' -e '\"HEAD / HTTP/1.1\"' access_log \
    | grep -v -f "~/.config/spammers.list" \
    | grep -v -f "~/.config/bots.list" \
    | grep -v -f "~/.config/global-spammers.list" \
    | goaccess -a

Which results in a report like this (over 50% of the hits before were from bots or spammers):

A yearly report with 90% less spam

No system will be 100% perfect, but so far this has cut out a large portion of the noise from my logs.


Groundhog Day Resolutions - March 2021

February is the shortest month of the year, so naturally I tried to cram in as much as possible. It actually went pretty well.

February's primary goals

1. Finish version 0.3 of Craft Roulette

Ooof, straight into a failure. I didn't even touch this during February.

2. Finish version 0.7 of Writing PHP with Emacs

I published a new version that contains all of the core content. There are still places I'd like to improve, but this was a big step.

3. Complete one session of deliberate practice

My original plan was to experiment with a fullscreen version of Splodey Boats, but I was feeling all 'sploded out so I went for something completely different: setting up my Atari ST development environment.

I'm still writing up the notes for this, but having a plan before I started helped keep me focused and made things much easier.

February's secondary goals

1. Run 90 miles

I had to skip two runs due to being iced in, but I still managed to hit this target. Just.

2. Create detail pages for all my major goals

All of my 2021 major goals now have detail pages.

3. Write a post-mortem for Splodey Boats 2000

Read the Splodey Boats 2000 Post Mortem. It's not very exciting.

Primary goals for March

1. Finish version 1.0 of Writing PHP with Emacs

There are 6 "recipe" chapters to write and then version 1.0 is complete. I still have more ideas for version 1.1 - and beyond - but this version will contain everything I originally had in mind when I started.

2. Complete another session of deliberate practice

I'm on a roll with the Atari development stuff, so I'm going to continue on that path for now.

3. Finish version 0.3 of Craft Roulette

Let's try this again.

4. Contribute to a free software project

This is one of my goals for 2021. I've already contributed a couple of bug fixes this year and I'd like to do the same in March.

Secondary goals for March

1. Run 120 miles

Everything aches, but the training must continue.

2. Read a book

"Read 6 books" is one of my goals for 2021, but I need to actually read books if I want to complete it.

3. Add detail pages to my secondary goals

Not all of them require much explanation, but I'd like to clarify some of the larger ones.

February felt like a really unproductive month, but now that I've looked over everything I'm a little more positive.

My first session of deliberate practice was short but productive. I ended up taking a lot more notes than I expected, so that's something I need to be aware of for future sessions. Having a list of questions I wanted to answer helped keep my focus.

I've also increased my writing output to 3 days a week. My goal for 100 posts meant I would need to publish on different days each week which doesn't really fit my style. I feel much better about a Monday-Wednesday-Friday schedule.

I still have a lot that I want to get done, but I don't think things will really kick in until after my marathon in May. The longer runs are sapping my weekend energy which makes it harder to really sit down and work on things.


Keeping a changelog with Emacs and change-log-mode

There are usually two files I creat when starting a new project: a TODO.org file for organizing tasks, and a CHANGELOG file for keeping tabs on what I've done. I don't limit myself to only writing about direct file changes; I also use it to keep notes about overall changes I've made.

Here's a snippet from the CHANGELOG I use for this site:

An example CHANGELOG file

It's not exactly thrilling stuff, but it's handy when looking over traffic numbers to see if any changes I made worked. Or not.

change-log-mode works fine without additional configuration, but there are a few options to tweak. I also wrote some functions to add projectile integration.

Configuring your name and email address

Calling add-change-log-entry will insert a new date heading with your name and email address.

There are two variables to modify what is inserted:

  • add-log-full-name - The full name to use.
  • add-log-mailing-address - The email address to use.

Something like this will set both of variables:

(setq add-log-full-name       "My name"
      add-log-mailing-address "test@example.org")

Emacs will attempt to figure your name or email if either of these are nil.

Configuring the default file name

change-log-mode looks for "ChangeLog" by default, but I use "CHANGELOG". Setting change-log-default-name will modify this behaviour:

(setq change-log-default-name "CHANGELOG")

Configuring indentation sizes

The default indentation width is a little large for my tastes, so I add this to my configuration file to narrow things:

(add-hook 'change-log-mode-hook
	  (lambda ()
	    (make-local-variable 'tab-width)
	    (make-local-variable 'left-margin)
	    (setq tab-width   2
		  left-margin 2)))

Alternatively these can be set inside a ".dir-locals.el" file if you don't want to change the global behaviour:

((change-log-mode . ((tab-width   . 2)
		     (left-margin . 2))))

Adding projectile integration

I use projectile a lot, so I wanted to make it easier to add log entries from within a project. I wrote two functions for this:

sodaware/projectile-goto-project-changelog
This opens the CHANGELOG file for the current projectile project.
sodaware/projectile-add-to-project-changelog
This works the same way as add-change-log-entry, but it automatically selects the project CHANGELOG instead of prompting for a file name.

The full code is below:

(defun sodaware/project-changelog-path ()
  "Get the full path to the current project's changelog file, or NIL if not found."
  (let ((change-log-file (concat (projectile-project-root)
				 (file-name-nondirectory change-log-default-name))))
    (if (file-exists-p change-log-file)
	change-log-file
	nil)))

(defun sodaware/projectile-goto-project-changelog ()
  "Open the CHANGELOG for the current project."
  (interactive)
  (let ((change-log-file (sodaware/project-changelog-path)))
    (if change-log-file
	(find-file change-log-file)
	(message "Project does not contain a changelog file."))))

(defun sodaware/projectile-add-to-project-changelog ()
  "Add a new entry to the CHANGELOG for the current project."
  (interactive)
  (let ((change-log-file (sodaware/project-changelog-path)))
    (if change-log-file
	(progn
	  (find-file change-log-file)
	  (add-change-log-entry))
	(message "Project does not contain a changelog file."))))

I added key bindings for both of these functions because I don't want to type that much:

(use-package projectile
  :diminish projectile-mode
  :bind
  (:map projectile-mode-map
	("C-c p G" . sodaware/projectile-goto-project-changelog)
	("C-c p L" . sodaware/projectile-add-to-project-changelog))
  :config
  (projectile-mode)
  :custom
  (projectile-keymap-prefix     (kbd "C-c p"))
  (projectile-enable-caching    t)
  (projectile-completion-system 'default))

Creating animated gifs using emacs-director

My last post about sorting lines had a couple of animated gifs to show how the feature worked. They were created using a combination of emacs-director, asciinema, and asciicast2gif. Here's how I made them.

The process

Creating a gif using this setup works like this:

  1. Create an elisp file containing instructions for emacs-director to run.
  2. Run emacs with this file and the emacs-director bootstrap file.
  3. Capture the output with asciinema.
  4. Convert the asciinema file to an animated gif using asciicast2gif.

I created a small Makefile to handle these steps, so all I need to do is write the instruction file and then use make to create the actual gif.

Let's create a really simple gif that says "Hello, World!".

Step 1 - Creating the instruction file

This is divided into two parts: a director-bootstrap command that tells emacs-director some basic information about the environment, and the director-run section that contains a full list of instructions.

For the bootstrap we can either use an existing user directory - useful if things need to be setup a certain way - or use the tmp directory.

The fun part is all contained in director-run, which tells Emacs exactly which steps to follow. Let's say hello!

(director-bootstrap
 :user-dir "/tmp/director-demo")

(director-run
 :version 1
 :before-start (lambda ()
		 (switch-to-buffer (get-buffer-create "*say-hello*"))
		 (menu-bar-mode -1))
 :steps '((:type "Hello, World!"))
 :typing-style 'human
 :delay-between-steps 1
 :after-end (lambda () (kill-emacs 0))
 :on-error  (lambda () (kill-emacs 1)))

The :before-start section can be used to set up buffers, switch the major mode, and run any other code that we don't want to look like it is being typed.

Step 2 - Turning it into a gif

Here's the Makefile I use:

%.gif: %.cast
    asciicast2gif $< $@

%.cast: %.el
    asciinema rec $@ -c 'emacs -nw -Q -l util/director-bootstrap.el -l $<'

Calling make hello-world.gif will convert a file called hello-world.el into a .cast file, which will then be turned into an animated gif.

The finished image looks like this:

Saying hello with emacs

There are a few little gotchas:

  • The finished gif will be based on the size of terminal where make was executed. If you create the gif from a fullscreen terminal you'll end up with a pretty big image.
  • This process uses the terminal version of Emacs, so showing things involving the gui is not possible.
  • When running code in :before-start there will be a frame or two before it is executed which can look a little weird.

None of these detract from the overall package, and it's a really easy way to create animated gifs of Emacs behaviour.

We'll finish with a slightly more complex gif:

Saying hello with emacs lisp

This uses the same director-bootstrap as before, but has a few more steps:

(director-run
 :version 1
 :before-start (lambda ()
		 (menu-bar-mode -1)
		 (switch-to-buffer (get-buffer-create "*say-hello*"))
		 (emacs-lisp-mode))
 :steps '((:type "(defun say-hello ()\r")
	  (:type "\"Say hello to the world.\"\r")
	  (:type "(interactive)\r")
	  (:type "(message \"Hello, World!\"))\r")
	  (:type "\C-x\C-e")
	  (:wait 2)
	  (:type "\M-x")
	  (:type "say-hello")
	  (:type [return]))
 :typing-style 'human
 :delay-between-steps 1
 :after-end (lambda () (sleep-for 5) (kill-emacs 0))
 :on-error  (lambda () (kill-emacs 1)))

Normally to create a gif like this I would type everything by hand and record the screen with licecap or recordMyDesktop, so being able to automate things is a huge time saver.


How to sort lines of text with Emacs

Alphabetically sorting a bunch of text is something I need to do fairly regularly. Thankfully Emacs has a built-in function to help with that: sort-lines.

It looks like this:

Sorting some lines

Calling M-x sort-lines will sort all lines in the current region. Lines are sorted numerically, then alphabetically, and capital letters are sorted before lower-case ones (e.g. A will come before a).

Emacs will only sort text in the highlighted region, so it's important to mark everything that needs sorting. Otherwise it can end up like this:

Incorrectly sorting some lines in a region

sort-lines is one of those handy little functions that I never knew I wanted, until I was sat with a pile of unsorted text and wondering "I wonder if Emacs can help".