20100104

Cron, diff & wget: Watch changes in a webpage


From flickr

A few months ago, I realised I was checking some pages frequently for changes. They were some congress pages, and I was waiting for them to add information about registration and such.

Then I realised I could write a script to do it, using diff and wget. You can get it below. You have to edit it to add the pages you want to follow, then run it with the "write" option to download the first version, then edit your crontab file (crontab -e) to run it every day at a specified time with the "diff" option. For example:
00 13 1,7,14,21,28 * * /home/user/PageDiff.sh diff
will run it every 1st, 7th, 14th, 21st and 28th of the month, at 13:00. Be sure to first run it as write.

#!/bin/sh

# Copyright 2009 Ruben Berenguel

# ruben /at/ maia /dot/ ub /dot/ es

# PageDiffs: Fill in an array of webpages, with the option "write"
# will download them, with the "diff" option will re-download them and
# check the new against the old for differences. With the "diff mail"
# option, will send an email to $MAILRECIPIENT, assuming mail works.
# You can find the most up to date version of this file (and the GPL)
# http://rberenguel.googlecode.com/svn/trunk/Bash/PageDiffs.sh

# 20091226@00:24

MAILRECIPIENT="mail@mail.com"

j=0
Pages[j++]="http://www.maia.ub.es/~ruben/"
Pages[j++]="http://www.google.es"
#Add more pages as above

if [ "$1" = "write" ]; then
echo Generate files
count=0
for i in "${Pages[@]}"
do
echo Getting "$i" into File$count
wget "$i" -v -O "File$count"
let count=$count+1
done
fi
if [ "$1" = "diff" ]; then
count=0
for i in "${Pages[@]}"
do
# echo Getting "$i" into Test$count
wget "$i" -q -O "Test$count"
Output=$(diff -q "Test$count" "File$count" | grep differ)
Result=$?
if [ "$Result" = "0" ]; then
if [ "$2" = "mail" ]; then
echo Page at "$i" has changed since last check! >> MailCont
mail=1
fi
echo Page at "$i" has changed since last check!
else
echo Page at "$i" has not changed since last check!
fi
#rm Test$count
let count=$count+1
done
if [ "$mail" = "1" ]; then
mail -s "Page changed alert!" $MAILRECIPIENT


Related posts:
Written by Ruben Berenguel