onsdag den 18. maj 2011

Broken link checker

Getting my websites up and running again and creating new ones have let to the need for some webmaster tools. One thing thats needed is a way to monitor if links are still valid and once again unix provides a program "linkchecker" that can perform this task in simple way so input and output can be manipulated by other programs each doing there little thing.
You can find more information on the program at sourceforge.net

I have made a setup with a shell script generating an index page for the websites to monitor and a short description of their check result.
From here you can go to the check result for the website in concern.

You can see my setup at AOit.dk and the script below. In the script I have restricted the recursion level to 1. Look further down for an example of automatic ftp upload one could add to the script.
#!/bin/bash

title="YOUR TITLE";
sites="WEBSITE.TO.MONITOR0 WEBSITE.TO.MONITOR1";

for site in $sites
do
linkchecker --no-status -r1 -ohtml http://${site} | sed "s/<h2>LinkChecker 6.2<\/h2>/<h3>AOit link check<\/h3><b>Current website $site<\/b>/g" | sed 's/LinkChecker comes with ABSOLUTELY NO WARRANTY!//g' | sed 's/This is free software, and you are welcome to redistribute it//g' | sed "s/under certain conditions. Look at the file \`LICENSE' within this//g" | sed 's/distribution.//g' | sed 's/<br><br>Start/Start/g' > link_chk_${site}.html
done

echo '
<html>
<head>
<title>
'${title}'
</title>
</head>
<body>
<h3>'${title}'</h3>
<hr>
Last run time:<pre> '`date -R`'</pre><p>
Sites monitored:
<ul>
<pre>' > link_chk.html

for site in $sites
do
echo -n '<li><a href=link_chk_'${site}'.html>'${site}'</a></li>' >> link_chk.html
grep "That's it." link_chk_${site}.html | sed "s/That's it.//g" >> link_chk.html
echo '' >> link_chk.html
done

echo '
</ul>
<hr>
<a href=http://aoit.dk/redirect.html>www.AOit.dk</a>
</body>
</html>
' >> link_chk.html

If you would like it to upload the link check result to your website you could add something like the below to the script.

ftp -i -n << eof
open HOSTNAME
user USERNAME PASSWORD
cd PUBLIC_DIR/link_chk
mput *.html
bye
eof

It can take time to check all the url's so i also added an email notice upon finish.
cat link_chk.html | mail -s 'Broken links report' -aFrom:linkchecker@aoit.dk abo@aoit.dk
Now all thats left to do is to add it to the crontab if one fancy.
Hope this will lead to not spending hours looking for the right link checker to do the job or spending hours writing your own as "linkchecker" can proberly do the job for you as it did for me.

Ingen kommentarer:

Send en kommentar

Baggrunden for opslag og artikler her på AOit er mine mere end sidste ti års erfaringer i drift og brug af LAMP platformen til at løse et utal af opgaver. En platform der består af Linux, Apache, MySQL og PHP. Alle fire frit og åben software og til at presentere data som som information bruger jeg HTML til formatering og CSS til layout.