They did in their sh_te

Hard-bitten data wranglers know full well that the price you pay for corralling research data into a database is a certain amount of wastage. The grinding boredom of record transcription creates tempting opportunities to doze off, take shortcuts or just make things up. Believe me, I know.

Data wrangling

So when we started work on the database transcripts of Irish General Register Office marriage and death indexes supplied to us by FamilySearch, the website of the Latter-Day-Saints, we were prepared for a certain amount of mistranscription. And we certainly got it. For all its size, FamilySearch essentially depends on volunteers, so allowances need to be made.

More interesting than the gory details of omission and commission (for some of which see below) is what happened to the transcripts when they were licensed to the fully commercial sites Ancestry and FindMyPast. Surely they checked the data before putting it live? They did in their sh*te.

Some examples: A FamilySearch volunteer used the “Copy down” function in Excel as a shortcut in the 1880s and 1890s, meaning that they copied their mis-transcriptions  and then spread them around. Here are six non-existent “O’Flaberty deaths” from FamilySearch in sequence in 1888, on Ancestry and on FindMyPast. Here are the eight non-existent “M’Cood” deaths from 1892. On Ancestry and FindMyPast, and as they should be on IrishGenealogy. Here are the Earins who should be Eakins, the Gamas who should be Garas, the Hehies who should be Hehirs …

Maybe I’m being harsh about individual mistakes, but there are larger issues as well. The whole of 1886 is in the death index (and on FamilySearch and Ancestry and FindMyPast) twice. There are huge numbers of duplicate death records in 1882. Extra copies are a nuisance, to be sure, but not fatal in terms of findability. Much more serious are the gaping holes. Three index volumes for 1897 marriages are just omitted completely, almost 65% of the total.  Quarter 1 of 1900 marriages is listed under 1899, making those 15,000 marriages unfindable. The same for Quarter 1 of 1900 deaths. And don’t look for a death in Quarter 3 of 1898 or Quarter 2 of 1903.

This goes well beyond normal error rates. There seems to have been a breakdown in quality control for at least some portion of the FamilySearch digitisation process. Maybe, given the nature of FamilySearch, this could be forgiven.

What’s not forgivable is that no attempt was made by Ancestry and FindMyPast to check the data they got. The contempt for researchers is palpable.