Friday, January 9, 2015

Simple Geographic Projects

I have discovered in running raw data analysis that I love making maps. The heat map in the previous post was overlayed on a known births versus population matching chart. Having found yet another cool map making method I wanted to share it with you all! The goal is to find a common origin based on your largest matching populations. I had to run it twice myself because the first time came out some strange collapsing triangle.

1. Visit GEDmatch (presuming you already have genomic raw data and have an account, otherwise, acquire these) 
2. Select Admixture on the main menu.  
3. Insert your ID and select any of the open-source projects (MDLP, Eurogenes, Dodecad, HarappaWorld) and then any of the calculators. 
4. Once the calculator has finished, click the Oracle button (not Oracle-4) underneath your generated component scores. 
 5. Scroll down to the bottom and inspect the "Mixed Mode Population Sharing:" results. Pick one, preferably one with the lowest genetic difference (GD) to ensure better accuracy and one which includes non-diaspora/recently non-admixed populations (localising Ashkenazi Jewish or African Americans as a donor population on a map will be difficult due to subjective guidance regarding their placement on a map). 
6. Repeat the above with at least two other calculators and keep note of the results. For a minimalist approach, Europeans are better off using Eurogenes, Dodecad and MDLP. South Asians are recommended to have HarappaWorld included. Those from elsewhere in the world are free to use any combination, as none of these are specific for other regions. 
7. Download this map (from Wikipedia) or the map below (for McDonald BGA version) and Paint.Net (open-source image editor). Feel free to use another editing software. I prefer Paint.Net because it indicates the 1/3 increments along any line drawn. 
8. Open the map with Paint.Net/another image editor. Pinpoint your McDonald BGA average spot or physical ancestral location if desired. 
9. With a colour specific to the open-source calculator you're going to use, pinpoint the location where each donor population for your selected Oracle result comes from. If uncertain, look up roughly where they're from (e.g. Pakistani Pashtuns will be around NW Pakistan close to the Afghan border). If a national average (e.g. German_Dodecad), place in the middle of the country. 
10. Draw a line between both donor populations. Estimate where on the line you'll fall. Note the numbers are flipped round in practice; for instance, if the Oracle is 70% German + 30% Ukrainian, the spot will end up around the 30% mark on the German end. Make a spot on the line wherever this may be. 
11. Repeat steps 9+10 for all the other Oracle runs, remembering to use different colours for the calculators to keep track. 
12. Join these spots together with a different coloured line, forming the "bounded area" where your ancestry can be narrowed down from.  
13. Completed. Make all the relevant inferences from the results, compare to the additional data in step 8 if present.

My sample using Google Maps

Disadvantages of Generalized Consumer Information

Recently I was introduced to two sources: Snpedia.com and James Licks haplogroup reader. Originally I began an excel chart looking for a map/result of correlating my major rcrs differences to known markers of each subclade. When I completed my cursory search and had found only some related to the in typed mutation I was quite disappointed. Through a happy coincidence I searched that marker coming across a blog post which indicated James Licks haplogroup reader using phylotree data. Indeed I had found what I was looking for! When I third party transferred and took my mtDNA test with Family Tree DNA they had not yet differentiated the basic and full sequence test so while I thought I was getting a awesome deal I was indeed being short changed. For most people knowing your major haplogroup is probably very helpful. The general information will no doubt apply to at least part of your research and you may choose to look no further. In researching H I began to try and guess which subclade I might be. I began to notice that much of the research on Haplogroup H was inconsistent. When I first looked it up I was told H stood for Helena featuring most women found in the area of Greece and Turkey. More recent clippings will tell you it is actually a young line found in Norway and Scotland....the inconsistency being an east or west haplogroup. 

Running the James Lick emulator for my true subclade has been invaluable to discerning not only my origins but also understanding why information is so distorted suddenly on the topic of line H. The result from inputting my hrv1 and hrv2 differences was H2a2a1g. Major research has been done recently in recovering that haplogroup from the eurocentric viewpoint and possible selection biases. My own upper subclade of H2 is perhaps one of the least European of all the H derivatives with H2a2a1 represented in highest amount among Saudi Arabian women. H2a is also the only of the H2 subclade to have integrated back into Asian phylogeography after initial migration towards Europe. [Correction: As of Fall 2015 that build was replaced for giving false positives related to H2a2a. That is not my haplogroup.]

On advisory from a more seasoned genetic genealogist than I was the idea of charting matches to the most recent common female ancestor in the States. Of course for me this actually means Canada. Indeed my female immigrant ancestor of the mtdna line is Elizabeth 'Betty' Beck (1814-1874) who came from Dumfries-shire, Scotland to settle in Grey, Ontario, Canada with her husband John Swanston (1808-1891). From there I am to work backwards into Europe but I have a feeling the separation between North America and Europe might be better served by a more popular female such as Sebithy Ann Coultis (1857-1951) of Manitoulin Island who married William Henry Bryant (1864-1939).

Conversing on Ancestry.com has become even more limited without a subscription much to my annoyance so it will be hard work to find people matching my MRCA to compare mtdna results. Incidentally I noticed that the interactive genealogy map I made sometime ago has a strange overlay with the known path for the development of the H haplogroup both the predominant Eurasian and European subclades. Heatmaping the sources of my major subclade H2a2a has also been helpful though I intend to revise it further with matching recent populations excluding deep ancestry.


_______________________________________________________

http://dna.jameslick.com/mthap/