I went on the Guinness

You may have noticed the absence of a post last week. I’d like to be able to say that I was deep in my vast library thinking profound thoughts about the future of Irish genealogy. But no, the sun came out so I took the dog to the beach.

I wasn’t completely idle, though. I went on the Guinness.

At a professional development presentation organised by my accrediting association, Accredited Genealogists Ireland, and delivered by Guinness archivists Alex Marcus and Fergus Brady. It was a revelation.

Like most researchers, I’ve long been aware of the archive’s online catalogue (also available on Ancestry) and have found it interesting but a bit scanty. With 8,697 entries, it is obviously not a listing of all employees, and each result leads only to a very bare-bones record.

The archive itself is something else entirely. It holds more than 20,000 personnel files relating to deceased employees. For the majority of these, tradesmen or labourers, the files can include dates of birth, addresses, references, education, accident reports, changes of job, pension forms, medical information, physical descriptions, widows and orphans allowances, details of wives and children, even birth and baptism certificates.  For the expanded family history of anyone who worked in Guinness’s in Dublin after the 1870s, these records are a goldmine.

A week after the Easter Rising, this discreetly-redacted 46-year-old drayman dropped a barrel on his foot and bruised his big toe.

One of the most interesting parts of the presentation was the explanation of the reasons behind the lack of online detail. Guinness was a spectacularly benevolent cradle-to-grave employer. Once a family got a foothold, they tended to bring in other members, with children, grandchildren, great-grand nephews following each other into the firm (see the remarkably guff-free Cillian-Murphy-narrated ad on this). So even the files of those who died a century ago might hold sensitive information relating to living people.

But there are no Germanic GDPR-style prohibitions. The archivists just check the files you’re interested in and apply good old-fashioned common sense. They’ll do this by email, but there’s no substitute for handling the files onsite yourself, and they welcome visitors (by appointment).

Apart from the files, the archive also holds a spectacular digitised set of the in-house publications, in particular The Guinness Harp Magazine, which documents the extraordinary range of extracurricular activities sponsored by the company, from choirs to chess clubs to tug-of-war teams, complete with professional-quality journalism and photography.

Brewery People from The Harp magazine

And there’s also a lovely collection of the customisable labels used to reassure drinkers that the pub-bottled stout they were drinking was genuine Guinness.

‘WHO SELLS NO OTHER BROWN STOUT IN BOTTLE’

My lifetime supply can be parked just outside the house, please.

Genealogy and the Golden State serial killer

Most people will have heard about the recent arrest of the so-called Golden State serial killer, responsible for 50 rapes and 12 murders committed across California in the 1970s and 80s. He was identified by matching his DNA with samples taken from the crime scene, as in many other cases. What really caught my attention was the role genealogy played.

Police uploaded the crime-scene DNA to GEDmatch, the open-source genealogical DNA website, and compared it with the three-quarters of a million samples already on the site. This identified what looked like some third- and fourth-cousin matches. They then followed the accompanying family history information on the site and used it to build out more than twenty-five multigenerational family trees, using the research techniques any genealogist would.

Unlike a genealogist, though, they were aiming to winnow the family trees down to men living in the right areas at the times the crimes were committed. This is what ultimately led them to Joseph James DeAngelo in Sacramento. A covertly-taken sample of his DNA matched that from the crime-scene.

Joseph James DeAngelo

By all accounts the crimes were horrific, and if DeAngelo committed them, it’s wonderful he has been caught, and equally wonderful that genealogy helped do it.

But one of the problems that bedevils familial DNA matching is the number of false positives. My own test results are up on GEDmatch and I regularly get approached by people who see a distant match but have absolutely no discernible connection to my family. And that happened in this case. Before finding DeAngelo, at least one and almost certainly more than one suspect was compulsorily tested and ruled out. In the United Kingdom, a 2014 study found that just 17 percent of familial DNA searches resulted in the identification of a relative of the true offender. Which means four innocent suspects for every guilty one.

And that’s not the most troubling part of the story. By putting my sample into a publicly available database like GEDmatch, I have made the DNA of hundreds of related people also available for searching. I consented, they didn’t. The same is true for the tens of millions connected to the other GEDmatch profiles.

Finding a serial killer is a wonderful, exceptional case. But what about a health insurance company using these records to weed out the potentially ill across extended families? What about being flagged as a suspect for a crime you have no connection with, because someone very distantly related has uploaded their DNA? What about employers screening for the “right” kind of family background?

As evidence in the most serious crimes, familial DNA should certainly continue to be available to police, with safeguards. But the kind of no-rules access to DNA samples provided by GEDmatch looks increasingly unwise.

I don’t know whether to leave my sample there or not. I suspect that cat is long out of the bag.

 

FindMyPast’s unmarked elephant traps

Last March I wrote about the fresh transcripts of historic General Register Office birth and marriage records that had appeared on FindMyPast and issued my usual cheery “Come on in and thrash around, sure aren’t there mistakes in everything?” Now that I’ve been using them for a while, it’s time to advise a little more caution.

Apart from the peculiar transcription errors pointed out by Claire Santry, there also seem to be great gaping holes in the records. For the births, there appear to be no transcripts at all from the following Superintendent Registrars’ Districts:

Clogher, Clones, Clonmel, Coleraine, Cookstown, Cootehill, Croom, Dungannon, Dunshaughlin, Edenderry, Ennis, Enniscorthy, Ennistymon, Fermoy, Glin, Kildysart, Killala, Kilmallock, Kilrush, Limavady, Lisburn, Londonderry, Macroom, Magherafelt, Mallow, Manorhamilton, Middleton, Mitchelstown, Mohill, Mountmellick, Newcastle West, Omagh, Oughterard, Portumna, Rathdown, Rathdrum, Rathkeale, Scarriff, Schull, Skibbereen, Sligo, Tralee.

That’s 43 out of a total of 163, a whopping 26%. I haven’t actually checked every image on IrishGenealogy.ie against the transcripts – hey, I have to take the dog out for a walk occasionally – but none of these SRDs appear in the FindMyPast filters and any record from these areas that I’ve checked on IrishGenealogy is missing in the fresh transcript.

The marriages seem to be relatively less flawed, missing only the SRDs of Dublin North, Ennistymon, Gortin and Strokestown. Relatively less flawed. Dublin North alone has 54,297 images on IrishGenealogy not transcribed here.

Every set of transcripts has its flaws and there’s nothing wrong with putting up partial datasets. But there is something very wrong about putting up partial datasets without any indication of what’s missing. It’s the equivalent of blindfolding users and having them cross a landscape riddled with giant unmarked traps.

What’s all the more peculiar is that one of FindMyPast’s signal virtues has long been the detailed information it supplies about its sources. It’s possible that the mistake is with the site’s coding, not the transcription, I just don’t know. But as things stand some serious background detail (and a health warning) are needed.