MyFamily Archive |
Last Updated: 16-May-2008 09:08:57
|
$Log: how.php $
Revision 1.1 2008/05/16 13:08:57 tc
Initial revision
|
|
This is a short description of how I tackled archiving a MyFamily.com site.
The archive itself is an almost flat store of static files. The mapping of uri's from the original MyFamily site to names in the archive is maintained in a database: CREATE DATABASE `MyFamily`; USE `MyFamily`; CREATE TABLE `Uris` (`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, `sUri` VARCHAR(767) NOT NULL UNIQUE KEY, `sAttribute` VARCHAR(255) NOT NULL, `sTag` VARCHAR(255) NOT NULL, `sContext` VARCHAR(255) NOT NULL, `idUriSubstitute` INT UNSIGNED NOT NULL); CREATE TABLE `UriSubstitutes` (`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, `sUri` VARCHAR(767) NOT NULL UNIQUE KEY, `vCollected` BOOLEAN NOT NULL, `vConverted` BOOLEAN NOT NULL, `sMD5Original` CHAR(22) DEFAULT NULL, `iLengthOriginal` INT UNSIGNED DEFAULT NULL, `sMD5Converted` CHAR(22) DEFAULT NULL, `iLengthConverted` INT UNSIGNED DEFAULT NULL); The UrisSubstitutes table is seeded with two special substitutes
INSERT INTO `UriSubstitutes` (`id`, `sUri`, `vCollected`, `vConverted`) VALUES (1, '', 0, 0), (2, '-', 0, 0); The Uris table is seeded with the front page of the Myfamily's site and is mapped to the unassigned substitute.
INSERT INTO `Uris` (`sUri`, `sAttribute`, `sTag`, `sContext`, `idUriSubstitute`)
VALUES ('http://www.myfamily.com/isapi.dll?c=site&htx=main&MemberID=______&SiteID=QAR',
'href', 'a', 'Login Page', 1);
The database is populated and maintained by a number of Perl programs including:
The archive is built by doing a number of iterations of (Assign, Synchronize, Cull x2, Collect) and then finally Convert. In the course of this project, a number of problems were encountered. While my own mistakes contributed their fair share of time consumption, by far the largest time sinks were associated with the MyFamily site itself. Of note were these:
Here is one program that demonstrates usage of the database.
#!/usr/bin/perl -w
# $Id: how.php 1.1 2008/05/16 13:08:57 tc Exp tc $
use strict;
use lib "lib";
use DBI;
use Common;
use tc;
# --------------------------------------------- main ---------------------------------------------
my $hDB;
my $pRows;
my $pFields;
my $id;
my $sUri;
my $sAttribute;
my $sTag;
my $sContext;
my $idUriSubstitute;
my $sUriSubstitute;
my $vCollected;
my $vConverted;
my $sMD5Original;
my $iLengthOriginal;
my $sMD5Converted;
my $iLengthConverted;
my $i;
$hDB = DBI->connect ("dbi:mysql:MyFamily", "localuser", "", {PrintError => false, PrintWarn => false})
|| die ("Failed to open MyFamily Database - $DBI::errstr");
$hDB->begin_work () || die ("Begin_Work failed - " . $DBI::errstr);
$pRows = $hDB->selectall_arrayref
( 'SELECT `Uris`.`id`, `Uris`.`sUri`, `sAttribute`, `sTag`, `sContext`, `idUriSubstitute`, '
. '`UriSubstitutes`.`sUri`, `vCollected`, `vConverted`, `sMD5Original`, `iLengthOriginal`, '
. '`sMD5Converted`, `iLengthConverted` '
. 'FROM `Urisubstitutes` INNER JOIN `Uris` '
. 'ON (`idUriSubstitute` = `Urisubstitutes`.`id`)')
|| die ("Collection Select failed - $DBI::errstr");
foreach $pFields (@{$pRows})
{($id, $sUri, $sAttribute, $sTag, $sContext, $idUriSubstitute, $sUriSubstitute, $vCollected, $vConverted,
$sMD5Original, $iLengthOriginal, $sMD5Converted, $iLengthConverted) = @{$pFields};
for ($i = 0; $i < scalar (@{$pFields}); $i++)
{if (!defined (${$pFields} [$i]))
{${$pFields} [$i] = "NULL";};};
print "\n", join (" - ", @{$pFields});};
print "\n\nTotal records: " . scalar (@{$pRows});
$hDB->commit () || die ("Commit failed - " . $DBI::errstr);
$hDB->disconnect || die ("Disconnect failed - " . $DBI::errstr);
|
|
Contents copyright © 1999-2010 Terrance R. Cassidy, Merrimack, New Hampshire, USA - all rights reserved. |