Friday, July 12, 2013

Do not assume existence of any data when creating a uptime monitoring sensor.

AlertFox

I've been evaluating AlertFox monitoring service lately, which I like alot.
It has awesome features, killing instantly services like pingdom.com.
I'm able to do anything on my site that a real user can - javascript pitfalls are not a problem.

I also get a screenshot of a problematic situation, which is priceless in case of a 500 error (it contains the error_id that leads programmers to stacktrace. Pretty useful, right?).

To monitor if the website was working properly I created a script that:
  • enters website and uses search bar 
  • evaluates if the product was found

It worked like charm till yesterday 10:00 AM. Got a alert e-mail saying that the site was down 50% of a time. So I went to customer service with that info, to notify them of the problem. 
It turned out to be a false alarm sadly. The product was no longer available, it was deleted.

Lessons learned

  • Do not assume as constant the existence of data or (editable) labels when creating a uptime monitoring sensor
  • Rely instead only on code features, and even then - watch out for system updates
  • Simpler is again better