« MT-MostVisited Plugin, version 2 | Main | Winter Vacation 2003-2004 »


Various family members, including my mom, have been compiling family tree information over the past several years. This information is interesting from several perspectives: It is nice to know something about one's family, and the structure of the data itself presents several interesting computer-science related questions.

There seems to be a fairly well-accepted standard for family tree data interchange. This format, called GEDCOM, can be exported via most family-tree programs. It consists of plain text with markup characters, almost like an old version of XML. (The format really should be XML, as it would greatly ease parsing. The GEDCOM standards body would then simply release an XDSL that describes the grammar.) There is a Perl module, provided by Paul Johnson, which parses the file and provides a programming interface for dealing with the information. The data itself is straightforward, consisting of individual information and family tables, with a referencing syntax for relating individuals and families (replaced with XPath in the XML world?).

So, there is a well-known and standardized data format. There are utilities to parse the data and some APIs for dealing with it programmatically. There are lots of user-interfaces (in the form of commercial programs), and there are even some (Perl) CGI scripts for web-based interfaces. (It seems like many of the commercial programs also provide server space and hosting for family trees, which may be useful for some, but I simply cannot imagine surrounding my family information with advertisements or useless claptrap.)

The display of family information on the web brings up some privacy issues. Sometimes, organizations use family information (e.g. mother’s maiden name) to verify identity. (Of course, the world is foolish… Some form of public/private encryption keys should be used for identity verification. Not something as crackable as publicly-available information.) Medical history information would be very useful for the family to compile (as genetic information is more useful than accidents like birthdates and death dates), but would need to be kept very private. Living family members may not want that kind of information displayed.

It would seem like that the privacy issues would dictate that family information should not be publicly available. (It might be posted via the internet, if appropriate security measures are put into place, such as accounts and such. The internet can be used in many ways, and private collaboration is an important one.) Is there any utility that can be derived from public display of family history information? It would be interesting to search trees developed by strangers, to see if there are any interconnections. Long-lost branches could be united or at least new leads developed. The information would also be useful from a sociological point of view: With detailed and well-maintained data, and standard ways of interfacing with it programmatically, researchers would be able to study society in innovative ways.

Humanity can be thought of as a massive family tree, with many individuals and families, all interconnected. Some people are researching and documenting their “local” areas of the overall tree. How can these people work together? (Imagine that there are no privacy concerns, for a moment.) Some of the local areas overlap, and some have developed those overlap areas better than others. It is a difficult computer science problem (but tractable, I think). There needs to be automatic ways of merging “local tree” information together. There needs to be search utilities, naming conventions, discovery protocols, etc. The Gendex project seems to be addressing some of these issues.

I might try playing around with some of these concepts. At least, I will set up some private accounts, and use the family tree data that my mom compiled with Paul Johnson's Perl CGI scripts.



TrackBack URL for this entry:

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)