Genealogy and the Golden State serial killer

Most people will have heard about the recent arrest of the so-called Golden State serial killer, responsible for 50 rapes and 12 murders committed across California in the 1970s and 80s. He was identified by matching his DNA with samples taken from the crime scene, as in many other cases. What really caught my attention was the role genealogy played.

Police uploaded the crime-scene DNA to GEDmatch, the open-source genealogical DNA website, and compared it with the three-quarters of a million samples already on the site. This identified what looked like some third- and fourth-cousin matches. They then followed the accompanying family history information on the site and used it to build out more than twenty-five multigenerational family trees, using the research techniques any genealogist would.

Unlike a genealogist, though, they were aiming to winnow the family trees down to men living in the right areas at the times the crimes were committed. This is what ultimately led them to Joseph James DeAngelo in Sacramento. A covertly-taken sample of his DNA matched that from the crime-scene.

Joseph James DeAngelo

By all accounts the crimes were horrific, and if DeAngelo committed them, it’s wonderful he has been caught, and equally wonderful that genealogy helped do it.

But one of the problems that bedevils familial DNA matching is the number of false positives. My own test results are up on GEDmatch and I regularly get approached by people who see a distant match but have absolutely no discernible connection to my family. And that happened in this case. Before finding DeAngelo, at least one and almost certainly more than one suspect was compulsorily tested and ruled out. In the United Kingdom, a 2014 study found that just 17 percent of familial DNA searches resulted in the identification of a relative of the true offender. Which means four innocent suspects for every guilty one.

And that’s not the most troubling part of the story. By putting my sample into a publicly available database like GEDmatch, I have made the DNA of hundreds of related people also available for searching. I consented, they didn’t. The same is true for the tens of millions connected to the other GEDmatch profiles.

Finding a serial killer is a wonderful, exceptional case. But what about a health insurance company using these records to weed out the potentially ill across extended families? What about being flagged as a suspect for a crime you have no connection with, because someone very distantly related has uploaded their DNA? What about employers screening for the “right” kind of family background?

As evidence in the most serious crimes, familial DNA should certainly continue to be available to police, with safeguards. But the kind of no-rules access to DNA samples provided by GEDmatch looks increasingly unwise.

I don’t know whether to leave my sample there or not. I suspect that cat is long out of the bag.


FindMyPast’s unmarked elephant traps

Last March I wrote about the fresh transcripts of historic General Register Office birth and marriage records that had appeared on FindMyPast and issued my usual cheery “Come on in and thrash around, sure aren’t there mistakes in everything?” Now that I’ve been using them for a while, it’s time to advise a little more caution.

Apart from the peculiar transcription errors pointed out by Claire Santry, there also seem to be great gaping holes in the records. For the births, there appear to be no transcripts at all from the following Superintendent Registrars’ Districts:

Clogher, Clones, Clonmel, Coleraine, Cookstown, Cootehill, Croom, Dungannon, Dunshaughlin, Edenderry, Ennis, Enniscorthy, Ennistymon, Fermoy, Glin, Kildysart, Killala, Kilmallock, Kilrush, Limavady, Lisburn, Londonderry, Macroom, Magherafelt, Mallow, Manorhamilton, Middleton, Mitchelstown, Mohill, Mountmellick, Newcastle West, Omagh, Oughterard, Portumna, Rathdown, Rathdrum, Rathkeale, Scarriff, Schull, Skibbereen, Sligo, Tralee.

That’s 43 out of a total of 163, a whopping 26%. I haven’t actually checked every image on against the transcripts – hey, I have to take the dog out for a walk occasionally – but none of these SRDs appear in the FindMyPast filters and any record from these areas that I’ve checked on IrishGenealogy is missing in the fresh transcript.

The marriages seem to be relatively less flawed, missing only the SRDs of Dublin North, Ennistymon, Gortin and Strokestown. Relatively less flawed. Dublin North alone has 54,297 images on IrishGenealogy not transcribed here.

Every set of transcripts has its flaws and there’s nothing wrong with putting up partial datasets. But there is something very wrong about putting up partial datasets without any indication of what’s missing. It’s the equivalent of blindfolding users and having them cross a landscape riddled with giant unmarked traps.

What’s all the more peculiar is that one of FindMyPast’s signal virtues has long been the detailed information it supplies about its sources. It’s possible that the mistake is with the site’s coding, not the transcription, I just don’t know. But as things stand some serious background detail (and a health warning) are needed.