Offline backup mediawiki with httrack

I’ve had the need to restore the contents of a wiki which ran mediawiki, recently. Unfortunately there were no backups, and my only solution was to restore from an outdated version that was available in Google’s cache.

The problem was that I only had the HTML “output” version and copy-pasting it into the Wiki sources on restore time lost all formatting and links.

Thus I’ve come up with the following script which is con-ed to make systematic backups in the background, both of an offline viewable version of the wiki, in static HTML pages, and of the wiki pages’ sources, for eventual restoration.

It uses the marvelous httrack and wget tools.

Here we go :

#! /bin/sh

site=wiki.my.site
topurl=http://$site

backupdir=/home/me/backup-websites/$site

httrack -%i -w $topurl/index.php/Special:Allpages \
-O "$backupdir" -%P -N0 -s0 -p7 -S -a -K0 -%k -A25000 \
-F "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" -%F '' \
-%s -x -%x -%u \
"-$site/index.php/Special:*" \
"-$site/index.php?title=Special:*" \
"+$site/index.php/Special:Recentchanges" \
"-$site/index.php/Utilisateur:*" \
"-$site/index.php/Discussion_Utilisateur:*" \
"-$site/index.php/Aide:*" \
"+*.css" \
"-$site/index.php?title=*&oldid=*" \
"-$site/index.php?title=*&action=edit" \
"-$site/index.php?title=*&curid=*" \
"+$site/index.php?title=*&action=history" \
"-$site/index.php?title=*&action=history&*" \
"-$site/index.php?title=*&curid=*&action=history*" \
"-$site/index.php?title=*&limit=*&action=history"

for page in $(grep "link updated: $site/index.php/" $backupdir/hts-log.txt | sed "s,^.*link updated: $site/index.php/,," | sed 's/ ->.*//' | grep -v Special:)
do
wget -nv -O $backupdir/$site/index.php/${page}_raw.txt "$topurl/index.php?index=$page&action=raw"
done

Hope this helps,

4 thoughts on “Offline backup mediawiki with httrack”

  1. Hi Olivier,

    thanks for sharing your script. This helps me on a solution for an offline wiki.
    The only problem i have, the attached files in the wiki are not available. The are just exported as FILENAME.html.

    I just want to create a offline know how database for our field engineers, to be used on their Andoid Tablets.

    Do you have an idea? What i could change on the script?

    Kind Regards,

    Robert

Leave a Reply

Your email address will not be published.