They did in their sh_te

Hard-bitten data wranglers know full well that the price you pay for corralling research data into a database is a certain amount of wastage. The grinding boredom of record transcription creates tempting opportunities to doze off, take shortcuts or just make things up. Believe me, I know.

Data wrangling

So when we started work on the database transcripts of Irish General Register Office marriage and death indexes supplied to us by FamilySearch, the website of the Latter-Day-Saints, we were prepared for a certain amount of mistranscription. And we certainly got it. For all its size, FamilySearch essentially depends on volunteers, so allowances need to be made.

More interesting than the gory details of omission and commission (for some of which see below) is what happened to the transcripts when they were licensed to the fully commercial sites Ancestry and FindMyPast. Surely they checked the data before putting it live? They did in their sh*te.

Some examples: A FamilySearch volunteer used the “Copy down” function in Excel as a shortcut in the 1880s and 1890s, meaning that they copied their mis-transcriptions  and then spread them around. Here are six non-existent “O’Flaberty deaths” from FamilySearch in sequence in 1888, on Ancestry and on FindMyPast. Here are the eight non-existent “M’Cood” deaths from 1892. On Ancestry and FindMyPast, and as they should be on IrishGenealogy. Here are the Earins who should be Eakins, the Gamas who should be Garas, the Hehies who should be Hehirs …

Maybe I’m being harsh about individual mistakes, but there are larger issues as well. The whole of 1886 is in the death index (and on FamilySearch and Ancestry and FindMyPast) twice. There are huge numbers of duplicate death records in 1882. Extra copies are a nuisance, to be sure, but not fatal in terms of findability. Much more serious are the gaping holes. Three index volumes for 1897 marriages are just omitted completely, almost 65% of the total.  Quarter 1 of 1900 marriages is listed under 1899, making those 15,000 marriages unfindable. The same for Quarter 1 of 1900 deaths. And don’t look for a death in Quarter 3 of 1898 or Quarter 2 of 1903.

This goes well beyond normal error rates. There seems to have been a breakdown in quality control for at least some portion of the FamilySearch digitisation process. Maybe, given the nature of FamilySearch, this could be forgiven.

What’s not forgivable is that no attempt was made by Ancestry and FindMyPast to check the data they got. The contempt for researchers is palpable.

21 thoughts on “They did in their sh_te”

  1. Sloth exists across the human species both the paid staffer and eager volunteer. Fortunately there are a variety of ways to tweak a database search and hit pay dirt. No system is perfect. Dare I say that there are omissions in original records that since they were missed contemporaneous to their events are lost forever. The pastors and civil clerks were not immune to mess ups.

    1. Nor doctors when filling out death certificates! Edward became “Edwin”. “Edwin’s” wife’s maiden surname was entered as “Through”, a non-existent Irish surname ! I searched for “Threw” and “Trew”, and found the latter in Griffith’s Valuation, as well as the 1851 Irish Census for County Antrim. Looking at the original entries, it sure looked like “Frew”, and that is the correct surname based on DNA matches with living Frew’s in County Antrim as well as a Scottish death certificate for one of Edward’s sisters.

      It would, of course, be more helpful if the death certificates would identify who provided the information to the doctor!

  2. Well this at least makes the dull pain from constantly hitting my head against the brick wall a bit more understandable. Alas.

  3. The “copy down” feature certainly is a problem. I was looking in Family Search for a church record in Ontario that I had previously found in a microfilm. Now digitized, it was online, but when I brought up images from the film, the index feature repeated the same date for at least seven years of records.

  4. Ah, I’m suddenly a lot more glad I tend to use Irish Genealogy and Roots Ireland directly than FMP or Ancestry, since their transcriptions *tend* to be less actively awful in my somewhat limited experience. It also explains some of the troubles I had when I did try to use Ancestry, so this has been a very enlightening read!

  5. Given anyone can add to other’s trees on Family Search because they believe we are all one tree it is understandable that errors have been made over and over. If these are copied from many years back sadly it could be part of the problem. Believe me I have first hand experience in ending up with a paternal line completely messed up and not easy to have it deleted either.

  6. Human error has always been a problem with data entry. In business, a common approach was to enter the data twice using different people, and compare the results, correcting any discrepancies. But no genealogy organisation can afford those costs. What annoys me is when obvious errors are not corrected – such as thousands of Irish census records where first and last names were transposed on the form, and then transcribed without correction, making them unsearchable. And in the case of the records you have mentioned, a number of automated checks could certainly have been done to pick up major errors or omissions.

  7. Hi John,
    What is the project you are working on that has you looking at the transcripts? Is it for FamilySearch itself? And will these errors be “fixed” and the updated indexes accessible at some later date? And if so, where?

    Many thanks for your ongoing work to make Irish family research easier and resources more accessible and more accurate. Over the years of my own research I have consistently benefited from the work you do. I’m very grateful to you.

  8. Being a Mackey myself I can happily inform to add a surname variant to this list of ulster Scotts n Tipperary Celts, that the phonetically similar McKeen / McKeans also get jumbled in at times with my Mackey kin at least in the Burren and NW Clare. My 3rd GG Michael Mackey, a hedge school teacher of Carran in 1824 till national schools in 1830 would not be pleased. His son though made the civil registration agent in Tasmania strike out the Scottish mispelling of his first name son’s entry and put it back to Mackey. Joys of being a Mackey n all its variants Regards Ralph Mackey

  9. I only found my great-grandfather’s baptism record by paging through the microfilm on irishgenealogy.ie. Ancestry had indexed the entire book for that parish in Donaghmore, Tyrone, as pertaining to Donaghmore, Donegal, if that even exists. But every time I searched on their site for my ancestor in County Tyrone, their mistake hid him from me.

    1. Ancestry has mistakenly labeled parish records from Ballina Cathedral in Ballina, County Mayo as being from Ballina in County Tipperary. When I contacted them about this, they seemed unconcerned.

  10. It’s like the “repeats” on the Tithe Applotment records, trying to come up with a plausible list of people with the same name to hunt down, and you find that of your list of 126 or so, maybe there is only 95 separate individuals…

  11. Around 2015 FamilySearch changed the way their volunteer-driven FamilySearch Indexing program worked, in my opinion for the worse.

    Previously, two indexers transcribed each document/record. An arbitrator compared their transcriptions and made corrections. Arbitrators were appointed by FamilySearch and usually had indexed thousands of records before being appointed (I think I had done about 5,000 records when I was invited to become an arbitrator). Indexers could see the results of the arbitration, and thus learn from their mistakes.

    Now, in most projects, only one indexer transcribes the records. A reviewer, who attains that status automatically after indexing 1,000 records (across all projects), checks their work and makes corrections. If more than a certain percentage of fields are changed by the reviewer, the record is reviewed again. Indexers cannot see the results of reviewing.

  12. I found that while Family search is often unreliable, they do find ancestors not contained in the commercial data basis, although inaccurate you can sort work out what the names are. They had a Jonet Mc Lum, if you use other names in the area you get Mc Callum or a John Mc Umygh was married to a Gibsone by knowing the real name of John Mc Umygh who married her I found a whole list of descendants not recorded elsewhere. I found another well known data base with an important link to the family recorded the forename as Mcby, good thing I had other records of the parish or would have missed it.
    If you are at a dead end, use F S you never know what will pop up.

  13. The FamilySearch “system” for allowing volunteers to transcribe records also involves a person (or persons) at the next level who is/are supposed to proofread the transcriptions and correct the errors. Apparently this is not being done at the expense of speed in order to get the record sets published online, or they have changed their process since I had done some transcriptions of US record sets in the past.

  14. I hit this issue in a big way trying to find ancestors in the Tithe Applotment Books, also transcribed by the volunteers in Utah. Not only were some names mutilated (I submitted corrections for many names I saw in passing that included a letter “z”), but townlands were incorrectly identified. From names at tops of the individual page images, I could see that one group belonged in Co. Tipperary, not Clare at all, and the already confusing set of Clare townlands called KIlmurry (there are three) was seriously mixed up.
    Happily, though, Clare Library has a set of corrected Co. Clare entries for the TAP on their website. To my mind, that’s the fix.
    First, couldn’t every county in Ireland undertake this sort of project, through a county library or otherwise?
    And second, I believe the only people who should do transcriptions are Irish, and preferably local. Spread out over time, with production expectations realistic, this is the only way to clean up the records in a landscape whose multi-locational place names are truly difficult to sort out for outsiders, in addition to a variety of unusual surnames and surname spellings. No other way.

  15. As a computer programmer, I can see that this is one area of AI that will always prove problematic, if there was to be an automated attempt to find these issues. It’s much more obvious to the human eye that someone’s name is just “wrong”. Further, as has been noted by Deirdre, even if a computer was able to identify a lot of the issues, it really would need a native Irish person to recognise that you just don’t get certain surnames as they have been transcribed – notwithstanding census or register takers writing down names as they have heard them uttered in “foreign” (Irish) accents.

  16. Yes, the transcriptions can be laughable. The missing records are heart breakers. But at least Family Search is free and the search feature (once you figure it out) is pretty good – better than Ancestry. I can’t wait for your take on Find My Past’s treatment of the 1921 English Census. You’ve been surprisingly quite. I think they have done a terrible job and are ripping people off – $179 doesn’t even get you the actual census, just allows you to search it. The search is clunky and not intuitive. Such an important source and the English government flubbed it.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.