Tuesday 14 May 2019

DNA Genealogy. Some Notes on the Reasons for Uploading your Raw Data to Multiple Sites.

DNA Genealogy. Some Notes on the Reasons for Uploading your Raw Data to Multiple Sites.

I know I have been blogging a lot lately, and this is very different from the other recent blogs, but I have returned to one of my other main themes of Genetic Genealogy and what you can do with your Autosomal DNA test results. This blog results mostly from having to spend hours writing out a lot of the reason for uploading copies of your raw data to multiple sites, and GEDmatch in particular, to a couple of new keen matches I have had recently, and I thought if I turn it into a blog I shouldn't have to keep re-writing it! So this is fairly current and up to date at the time of posting, but how long that will be for I don't know. I will probably keep it in the conversational style at the moment, but may have to change it, but remember the context was messaging a DNA match on AncestryDNA.

I'm afraid I wont be covering all the basics of terminology here, as, otherwise, I wont get onto the meat of the topic.


I mentioned GEDmatch.com to you early on, and will start pestering you about it soon, as I have someone else that I have recently had a lot of contact with that has uploaded to GEDmatch and it helped clarify some questions and assumptions we had about how we and some of our shared matches were related to us. 'GEDmatch Genesis', as it is currently, in it's new form, but will be changing back to simply 'GEDmatch' soon, is free for much of it's use, and is a serious research site. However, there are some higher advanced tools that cost $10 for a months access. I tend to pay for this about 3 or 4 times a year, especially when important/close matches have come into my life.

So! You, like me, tested through AncestryDNA, as I got my Mother and Aunt to do also. I tested my brother via LivingDNA because they have the most detailed information on the British Isles, county by county, more or less. You may have seen the articles a couple of years ago about research undertaken by Oxford University on the British DNA that reflected the Dark Age kingdoms, well that data was used to create LivingDNA. They have a DNA map reflecting where your ancestry comes from over various, mostly ancient, periods, but also within about the last 300 years, which is what is mostly of interest to us. 

(This is a little out of date now. https://stevethegreenman.blogspot.com/2017/09/finding-your-own-personal-ancestors.html). 

In the 4 years I have been dealing with my DNA genealogy I have seen most of the companies 'Ethnicity' results change from what was in reality a reflection of your very distant ancestry, as in more than 1500 years ago, which most people can't relate to or want, or are confused by, to certainly seeing AncestryDNA showing more about your more recent family history, based upon where your matches come from (via their public trees?). So now LivingDNA and AncestryDNA give similar geographical results. In fact all the companies are at war with each other, with all trying to find a quirk or trick that the others aren't doing, and over the months they all introduce their version of it.

Oh! And LivingDNA also gives you your basic Y group (not applicable to you as a woman of course) and basic mt group (which applies to men and women). Some sites give you information on your X chromosome, but not on AncestryDNA (even though the X data is there in your test results), as it can be a bit confusing for some to get their heads round.

Each testing company has its pluses and minuses and databases which will have a combination of matches not available on other sites, and a few people like me that have their data on multiple sites. 

This could be making you think something like "But I can't afford to pay for a test on all the different sites?" and "I have other family members I want to test too, now that I have some idea of what I am doing, and realise the importance of having other close family members that you can use to confirm or rule out lines of research, but I can't afford to test them all on all the different sites?", etc. Well you don't have to, and you can get the equivalent of testing through them often for free or very little.

Whoever you test through, the resultant raw data belongs to you. You hold the rights to it, and you have given permission to the company you tested through to use it according to your agreement. You can also download a copy of that raw data as a zip file (whatever you do don't open it, you will regret it, as it is HUGE!), and then upload it to various other sites. The testing company will whinge and whine about not doing it, and you will have to rightly go through some security hoops, but there are lots of things you can do with your raw data on other sites.

Ignoring some of the heath related sites you can upload your data to, these are some of the genealogy related ones, but basically you can upload most other test results to most other sites, except you can't upload anyone else's data to AncestryDNA, and as they have the biggest database some people that have tested via other sites (especially where in the past AncestryDNA wasn't available), some people later test via them too.

Although sometimes it depends on how old your test data is, as companies have used different 'chipsets' or something (no I don't know what this is either, it is computer stuff) to test data, over a period of time, not all data maybe fully compatible with all companies, all of the time, but generally, most companies will change their software with time to allow cross-company compatibility. Basically, I have noticed that with time, the different companies have learn't that they don't need to test all the areas of all your chromosomes, or not at such a high quality level to get the results needed. By reducing the quality and/or amount of data that needs to be tested, they can speed up processing (along with software/hardware improvements), and therefore make testing cheaper! All part of the huge commercial war going on to get us to test.

So! You can upload your data to;- 

Family Tree DNA (FTDNA) for free, although it costs a small one-off payment to get full access to the results. Not the best of sites, rather dated now, but a lot of people tested via them in the past before AncestryDNA was available, so you will find a lot of Australian and New Zealand matches on there. FTDNA is still the main testing site for Y and mt tests, but to be honest they are of very limited use in genetic genealogy.

MYHeritage is relatively new, only a couple of years, and is proving to be a very vigorous alternative to Ancestry, and certainly has a lot of good DNA tools. It was initially free to upload your data too, as they built up their database, but now there is a small fee (perhaps still free for initial upload, as is/was overall use of the site?), but it is worth getting some membership. MH have been concentrating on pushing their services in countries not directly covered by AncestryDNA; so you will get a lot of European matches, for example.

There is a site called DNAland, but it has never really gone anywhere.

You can upload your raw data file to LivingDNA for free, as they are desperate to build up a good database, but the information is rather limited if you didn't test through them, as things like the Y & mt part isn't tested in all the other companies (AncestryDNA did/does test Y for males, but the information is not detailed and not directly available, and I have lost the link to the site you could check it on).

23 & me doesn't seem to allow uploading of data to their site, and seems to be mainly a health related site.

You can upload the raw data from just about all the companies (including 23 & Me), for free, to GEDmatch! It is a very dry site for serious research by those wanting in depth information on how you are genetically related to your matches. There are various useful free tools and some more advanced tools for a small monthly fee.


So why upload to other companies? After all you are flooded and overwhelmed by all the matches you have on AncestryDNA. Well there are a number of reasons. Remembering that any matches you HAVE are just down to the random chance that a relative, close or distant, has tested, and anyone of them may prove to be an important link in your research, and/or help confirm/refute information in your tree. The fact that person may have tested via another company means that if you can compare your results in as many different databases as possible increases your chance of finding them. For example, I was able to start tracking down my Mother's previously unknown paternal family with more certainty early on via someone who had tested on FTDNA, but it was because she was top of my list (at the time) on GEDmatch that I noticed her. And because people on GEDmatch tend to be more serious and curious about how they are related to others, she quickly answered my message, and she had already noticed me!

Also, one of the main things you can do on all the sites EXCEPT AncestryDNA is look in detail at the segment(s) you share with your matches on chromosomes 1 to 22, AND the 23rd chromosome, chromosome X (which can give some specific information I wont go into here). The shared matches facility you have on Ancestry is mostly restricted to 4-6th cousins and closer (mostly due to the sheer quantity of matches), although sometimes if you are looking at a 5-8th you will see shared 4-6ths. But also AncestryDNA, again partly to reduce the sheer quantity of matches (and to reduce the chances of false matches), have different matching thresholds than some other sites with smaller databases. As I said, this is partly to reduce the number of matches by pure chance (Identical By State, IBS), and reduce the chance of legal litigation, I'm sure, but there are plenty of Ancestry users that you DO have a genuine match to (Identical By Descent, IBD) too that are not showing. If someone has also uploaded to one of the other sites too you may find you have a paper tree link to them on AncestryDNA, but not a DNA match there.

If you can examine your matches in detail (whether they show on AncestryDNA or not), and you find that there are a number of matches that share the same segment with you (at least three), the segment is called a triangulated segment, and more or less HAS to be a genuine match, and will relate to a specific Most Recent Common Ancestor (MRCA), or group of ancestors with a common ancestor.

GEDmatch is the easiest place to get this sort of data derived from all the main testing companies, where people have uploaded to it; although the full triangulation service is only available from the advanced paid for tools. But for $10 for a months access to them, you do as much data processing as you can in that month.

This create masses more data, which takes a lot of work to organise, but it is possible, and then you can start to see how groups relate to each other.

Now for a long time I have been building up a master spreadsheet based upon my triangulated and/or significant matches on GEDmatch (as well as many other spreadsheets based on the other specific companies, and I try to add cross reference details from each where I can, especially GEDmatch derived grouping information, etc.). 

It has been an ongoing process for years, and there is just so much data to try and keep on top of, but it recently became much easier, due to Rootsfinder.com having many DNA tools that can do much of this work in minutes, and display it in ways I had long dreamed of wanting to do. And it is constantly improving. It is free for initial access but it is a small annual fee to get full access, but VERY well worth it. 

(This is also a little out of date now. https://stevethegreenman.blogspot.com/2018/10/organising-my-dna-matches-trying-out.html).

The most important thing you can do is tie your DNA matches into your family tree (via a GEDCOM upload). So you can match known people directly to their DNA information, or to the known common ancestor, or assign a DNA match to a known route, such as your maternal or paternal side, or to a more specific line, like that you know the link has to be via your paternal grandmother, etc. And it is all colour coded to fan charts to help visualise things. It works best and easiest with triangulated matches from GEDmatch, but you can upload a lot of stuff from other companies too (even if it is somewhat long winded to do them, but that isn't Rootsfinder's fault). The interactive graphic displays of how matches relate to each other can be confusing at first, but once you start learning to filter some of the noise out, etc. they are great and useful fun. And there is a wonderful Facebook group too where you can directly affect the development of the program features.

Because of some issues I have found with my tree that I want to sort out first, I have not been doing so much on Rootsfinder recently, but intend to start over again with a major overall of my tree GEDCOM with less errors in it. But I intend to be busy on Rootsfinder very soon (famous last words!).


That's enough again for the moment.

No comments:

Post a Comment