Categories
System Administration

Rate-limit for swatch

At work, we installed swatch to have a look at our combined logfiles. (see techrepublic or linsec for a swatch intro.)

But contrary to most of the examples, we’re using swatch not to check for known events, but to look out for unexpected entries. So basically our config is “ignore the known, send mail for the rest”:

ignore=/…/
ignore=/…/

watchfor=/./
mail=…..

This has one severe drawback: every single unexpected line in a logfile will send one mail. This just doesn’t scale.

The threshold feature won’t really help us, as it rejects notifications over its limit, whereas for email notifications it’s better to collect more messages into a single email.

So I dived into the code and added a ratelimit feature for the mail Action.

Apply the patch in Actions.pm.diff and then you can write:

watchfor=/./
mail=addresses=joe\@example.com,subject=”swatch alert”,ratelimit=600,ratetag=foo

and joe will get no more than one mail per 10 mins, without missing a single message.

As written, this config has one problem: I need to flush the messages I held back once I’m allowed to send mail again. In theory, I should have added some sort of timer-based event-handling to swatch, but I considered that to be overkill. Especially if you have multiple mail statements with different rate-limits. So I added another option to the mail Action that tells it just to flush spooled messages and do nothing more. You should trigger that option frequently, e.g. with a stanza like this at the top of your config-file:

watchfor=/./
mail=addresses=joe\@example.com,subject=”swatch alert”,ratelimit=600,ratetag=foo,rateflush=1
continue

ignore=/ /
ignore=/…/

watchfor=/./
mail=addresses=joe\@example.com,subject=”swatch alert”,ratelimit=600,ratetag=foo