About us
Family Finder Timothy Peterman: Methodology
by Timothy E. Peterman, Mar. 25, 2017, updated Mar. 14, 2020
Introduction: The goal of this project is to identify DNA indicated cousins (ie, matches) that are related to the author regardless of side of the family, with each narrowed, as best as the data will take us, to the particular side of family. As administrator of this project, I have persuaded scores of carefully selected participants to join this project. I download the relative match lists for each, as well as the chromosome browser lists for each (ie, shared segments), into Genome Mate Pro. Each participant’s genome can be likened to a radio telescope, scanning the genetic horizon for a signal that will identify distant cousins. Why bother with this? A careful study of these distant matches offers the potential to both verify well known parts of my ancestry with a clean paper trail, and to explore those branches that remain a mystery. My approach has been to amplify the signal and enhance the resolution.
The signal is amplified by getting more participants involved. By sampling different lines of descent from a common ancestor, we expand the diversity of the project, which translates into greater coverage of the common ancestor’s inherited genome. To emphasize the lack of diversity in this project, lines with no participation are highlighted in gray.
The resolution is enhanced whenever I persuade a cousin, related at a more distant degree, to participate in the project. The new participant will share a number of matches with those already participating. These matches will be considered to be resolved to a more distant degree; a narrower slice of my ancestry.
All testing here is done by Family Tree DNA. Of over 6 billion autosomal nucleotides in the genome (over 3 billion from each parent), about 710,000 are used in this study. The 710,000 loci represents places where SNPs are known to occur (ie, variation across the human population). The 710,000 doesn’t represent all known SNPs, but is a carefully selected subset that are known to be medically neutral. Ancestry can be ascertained through these, but no medical condition or likelihood can be determined. Each participant probably has a few “no calls”. These happen when the Illumina Optima chip fails to record which nucleotide is present at the locus. These can create an odd condition, where a match shares a segment with me and one of my paternal uncles (or more distant paternal relatives), but NOT with my father.
Another thing to be mindful of is “back channel kinships”. These occur when a match is related to two or more participants through totally different lines of descent on other sides of their family. By only considering shared matches where segments overlap, we vastly reduce the likelihood of shared matches being the result of back channel kinships. If two participants, related to me on the same side of the family, are found to be related to each other through another side of their family, matches that only the two share will NOT be assigned to a group.
In order for a match to be assigned to a group, the match must share segments with at least one participant to be grouped with Robinson-Peterman+, Hall-Peterman+, Ellis-Robinson+, or Coffey-Robinson+, and with at least two participants in different primary lines of descent to be grouped with any of the rest.
There are several matters that must be considered. There are a number of first or second cousin marriages in the deeper part of the ancestry. These might make it challenging to determine which of the married cousins the segment is derived from. There are a number of cases where several siblings married into the same nearby family. Matches between these siblings alone can’t be considered to be resolved any deeper than the siblings themselves. These cases should be noted, along with siblings that married into different families; participants from the latter can be very helpful in resolving matches with the former to another degree.
Autosomal DNA testing can be a powerful tool for kinship analysis, but it does have its limitations. One must understand where advantages can be found, and one must accept the fact that there are some limitations that can’t be overcome. Each person inherits 50% of each parent’s genome. The other 50% is irrelevant to the child’s genome. A grandchild inherits roughly 25% of his genome from each grandparent, but thanks to the randomness of recombination is seldom exactly 25%. The percentages of both paternal grandparents, when added together, come to 50%; the same for the maternal grandparents. The further back one goes in considering ancestral contribution, the more the numbers vary from the genealogical expectations: 12.5% from each great grandparent; 6.25% from each great-great grandparent; 3.125% from each great-great-great grandparent. At some point, as one goes further back, some ancestral contributions fall to 0%. Once a part of the genome is lost, it can’t be recovered in future generations.
DNA testing collapses the data from the maternal autosome & the paternal autosome together. It can be teased apart by testing first cousins from both sides of the family. Take any single location on the genome & consider this: it is derived from both parents (no big surprise there), as well as only one paternal grandparent and one maternal grandparent. It is only derived from one paternal great grandparent and one maternal great grandparent. Relative to that location the other 6 great grandparents don’t even exist. This same logic rings true from earlier generations. Each location represents one paternal great-great-great-great-great grandparent, and one maternal great-great-great-great-great grandparent. Relative to that location, the other 126 5th great grandparents don’t even exist. Locations (or loci) are strung together to form segments, which are measured both in nucleotide length, as well as centimorgans. The segments are actually what we look at to ascertain which remote ancestor a chunk of DNA came from. By testing ever more distant cousins, we can determine which segments are inherited from ever more remote sources. Let’s say we test a couple of 3rd cousins and they share several segments. We know that those segments came from the common set of great-great grandparents. When looked at in Genome Mate, we can see that each of the 3rd cousins has a number of more distant matches that overlap each of the shared segments. The matches are related through either a set of paternal great-great grandparents, or a set of maternal great-great grandparents. When we compare the matches shared by each of the 3rd cousins in that segment, only about half are actually shared by both 3rd cousins. Voila! We know that those matches, even if they have no trees at Family Tree DNA, or have trees that are obviously wrong, or just list one or two surnames, or state that they are adopted, are related to both of the 3rd cousins through the shared great-great grandparents. The other 14 great-great grandparents are simply irrelevant here.
Consider that, on average, siblings share about 50% of their genome. Each gets a random inheritance from their parents. Odds are, about 50% will be shared by two siblings and 50% unique to each sibling. One person represents 50% of his parents’ combined genome. Two siblings represent about 75% of their parents’ combined genome. Three siblings represent about 87.5% of their parents’ combined genome, and so forth. Statistically, you never get to 100%, but in practice, if there are enough children, we might come close. Each child captures a different mix of the diversity in the parents’ genome. A person can’t pass to his children the part of his parents’ genome that he didn’t inherit.
First cousins share, on average, 12.5% of their genomes. Second cousins share, on average, 3.125% of their genomes. Third cousins share, on average, about 0.78125% of their genomes. Fourth cousins share, on average, about 0.1953125% of their genomes. Fifth cousins share, on average, about 0.048828125% of their genomes. Measured in centimorgans, first cousins share about 845 cms; second cousins share about 211 cms; third cousins share about 53 cms; fourth cousins share about 13 cms; fifth cousins share about 3 cms. The variance gets wider with each increasing degree of distance between the cousins.
The Family Tree DNA standard for identifying matches is either one shared segment that is 9 cms in length, or one shared segment that is 7.69 cms in length with a total of 20 or more shared cms. Considering the variation, a few third cousins will fall below the threshold, but the vast majority (greater than 95%) will share qualifying segments. Not true with 4th cousins; only a minority share enough DNA to qualify. Even fewer 5th cousins. If we were to test two people known to be 5th cousins, the odds of them appearing on each other’s match lists are close to nil. Yet, most of the people on the match lists for each participant share only 7.7 cms, or 8 cms, or 9, or 10 cms with each participant. The vast bulk of the matches are related at a far more distant degree. How can this be? Consider the powers of 10. Many people have about 30 relatives that are within the first cousin range (descent from a set of grandparents), 300 relatives within the 2nd cousin range, 3,000 relatives within the 3rd cousin range, 30,000 relatives within the 4th cousin range, 300,000 within the 5th cousin range, 3 million within the 6th cousin range, 30 million within the 7th cousin range, etc., etc.
Many of the actually identified distant matches represent the slim percentage within the vast pool of actual cousins. DNA successfully discovers cousins who are as distant as 10th cousin & maybe even 15th cousin. But it only discovers infinitesimally small droplets within the ocean of cousins who actually exist at these degrees. But these small droplets may be all that we need to identify entire lines of ancestry.
For some reason (unknown to me, but likely understood by geneticists and biochemists who specialize in cellular energy), when producing gametes for the next generation, the paternal & maternal chromosomes crossover or recombine. This usually happens an average of 34 times across all autosomal chromosomes. (Note: David Reich, author of “Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past”, 2018 states on page 11 “Females create an average of about 45 new splices when producing eggs, while males create about 26 new splices when producing sperm, for a total of about 71 new splices per generation.” The female count includes the 23rd or X chromosome, since she has two of them. This difference can have huge implications in determining how distant matches are. A lot more maternal sides of family from earlier generations should be represented, with matches sharing shorter segments, for the same degree of kinship, than those found on the paternal side.) Some of the shorter chromosomes may skip a generation in recombining, while the longer chromosomes might crossover in as many as three different places. Whenever this happens, it both slices through the DNA in each chromosome, thus dividing that derived from earlier ancestors in two, and it stitches together two totally unrelated pieces of DNA. This can create a false positive that could then be shared among future generations. Some reported matches for every participant, even those shared between cousins, are created this way. Even if the stitching hasn’t occurred yet, one can still have false positives because the maternal results are superimposed with the paternal results. In this case, false positives won’t be shared among first cousins.
Let’s explore the concept of centimorgans. A morgan unit is comprised of a segment so long that it has a 100% chance of incurring a crossover between generations. Centimorgans, as the name implies, are 1% of a morgan unit. Each 1 centimorgan length has a 1% likelihood of incurring a crossover in each generation. This means that a 1 centimorgan (cm) segment has a 100% chance of incurring a crossover every 100 generations. A 2 cms segment has a 100% chance of incurring a crossover every 50 generations. A 4 cms segment has a 100% chance of incurring a crossover every 25 generations. An 8 cms segment has a 100% chance of incurring a cross over every 12.5 generations, etc.
100 generations represents ancestors who lived about 3,000 years ago, on average. Take 2 to the 100th power & we get an idea about the number of ancestral couples who could have contributed 1 cm segments. Each of us only has about 6800 cms of DNA in our entire autosome. Only 6,800 of those 2 to the 100th power ancestors are represented in the genome. This tells us that many of the 1 cm segments or 2 cms segments that reported as shared originated many, many generations ago & have simply been spread across many populations. It is no surprise that small segments that superimpose a 2cms segment (50 generations old) at the end of a 5cms segment (20 generations old), creating the illusion of a 7 cms segment that will be shared with someone. This is why Family Tree DNA wisely rules out shared cms that are below designated thresholds.
When dealing with DNA segments that are maybe 10, 15, or 20 cms in length, they are usually not crossed over between generations. They will be either eliminated entirely, or retained entirely. At the 7.69 cms level, segments of that length could have originated as far back as 13 generations ago. Considering that most who dabble in genealogy haven’t identified more than a tiny fraction of their total ancestors 13 generations ago (8,192 ancestors), it is no wonder that they can never see the common ancestors.
Each person, as a standalone participant, won’t get much from autosomal testing. But by testing a field of carefully selected relatives, we can identify a lot more.
When dealing with an ancestral generation where the MRCAs had a large family of perhaps 10 children, the MRCAs’ combined genome was split in about 10 different ways. Descendants of each of the 10 likely inherited a different mix of the ancestral genome. Testing each line of descent can get at this mix and produce broadened match lists. Segments shared be matches with participants from two or more lines of descent can be identified, even if the root participant of the project (myself) doesn’t share the match at all. If the actual participants are removed by the 10 siblings by more than a couple of generations, most of the MRCA genome is beyond the reaches of testing, although the genomes of the participants descended from the 10 siblings will represent a good swath of diversity.
When dealing with an ancestral generation where the parents had an only child, 50% of the combined genome is forever lost. When dealing with these cases, we simply can’t make up for the lost DNA. Even if we are lucky enough to test DNA from descendants of several first cousins of the only child, chances are, we will never make up for the loss.
Multiple marriages can create an interesting possibility. Segments shared by descendants of the first wife with descendants of the second wife are derived from the husband that both wives shared. Each marriage created a different primary line of descent. However, unlike with siblings, who share only about 50% of their autosome, the common parent is like an identical twin with himself, sharing 100% of the autosome.
In the case of my family, the 2nd degree (my parents & their siblings) has been thoroughly tested. The 3rd degree (first cousins of my parents) has been tested reasonably well, a few improvements could be made. Some limitations, such as the fact that my maternal grandmother had only one sibling with one surviving child at the time of the testing, means that much data is forever lost among those who would be assigned to the Wilson-Ellis+ group. The 4th degree could use some additional testing, although many lines of descent are well covered.
As will become evident on the following pages, when we get to the 5th degree, most lines of descent have NOT been tested. If they were better tested, each would match against the appropriate participants in the 4th degree, as well as with other lines of descent, who were merely cousins of the 4th degree. If the 5th degree were more properly populated with participants, those at the 6th degree would likely find a lot of shared matches, and so forth. Populate the field of potential participants with actual participants and we will see autosomal testing go to work at seeing further on the genealogical horizon and resolving them in an ever more refined way to different parts of my ancestry.
Techniques: With the passage of time, I have discovered or improvised several methods of resolving segments to a more refined degree. These are probably not new to the discipline, but are worth noting here, for those who are following this avenue of research:
1. Positive sharing with a cousin. This is the most straightforward & obvious method. Test two cousins. They share matches in the same segment. You resolve the match to the MRCA of the two cousins.
2. Extended match overlap. Let’s say that two second cousins share a segment from pos 20 to 50. One of the matches is close enough to extend from 20 to 70. If matches in the upper area (50 to 70), have been isolated to the same side of family for the participant, we know that the matches are derived from the same source (ie, MRCA) as the shared segment. The matches in the extended segment will be noted with a reference to the overlapping match.
3. Sibling inconsistency. If two siblings share a match in the same segment, the siblings’ side of family for the match can be determined, by identifying the inconsistent side of family for that segment. In order for the match to be shared by both in that segment, one side MUST be consistent. For example, two siblings RWP & MLP share a match at a particular segment. RWP’s maternal segment is isolated to Horr-Hall & MLP’s maternal segment is isolated to Roley-Eggleson, as determined by tested second cousins. The maternal side for RWP & MLP is inconsistent in the segment where the match is found. Therefore, we know the match must be paternal for RWP & MLP, and will be resolved to Eagleton-Peterman + (inf).
4. Parent/ child comparison. If a parent & child have participated, the parent may have a match who, although he overlaps cousins on one or both sides of the family, that he shares with no one but the child. Look at the match through the child. The child will overlap one of the parent’s sides of family, but not both. Mark the side of family for the parent based on what the child reports. In some cases, this will be a negative. The parent will overlap a cousin on one side of the family. If the child does not overlap the cousin, the match is from the other side of the parent’s family.
4. Sibling comparison by side of family. Map a chromosome for a set of siblings. Limit the study to either the paternal or maternal side of the siblings’ family. Chart which grandparent various segments have been resolved to. Do this for each sibling. For example, my father’s maternal side is called Eggleson-Hall, which means related through either Edwin Hall or Sarah Elizabeth Eggleson, the parents of Clara Lenora Hall, who was the mother of participants RWP, MLP & PEP. We map each of the three brothers. Thanks to the participation of cousins of RWP, MLP & PEP, descended from six other siblings of Clara: Homer, Charles, Nellie, Alice, Orris and Marion, many of the maternal segments of RWP, MLP & PEP can be identified using Method 1 (above) and can be resolved to Eggleson-Hall. Thanks to the participation of second cousins of RWP, MLP & PEP, descended from the siblings of either Edwin Hall or Sarah Elizabeth Eggleson, some of the maternal segments can be further resolved to Horr-Hall or Roley-Eggleson. When RWP, MLP & PEP were each conceived, the only DNA that Clara had to offer for each segment was either 1) her paternal DNA (ie, Horr-Hall) or 2) her maternal DNA (ie, Roley-Eggleson). Let’s say that a particular segment has been resolved to Roley-Eggleson for RWP, but has only been resolved to Eggleson-Hall for MLP & PEP. We know this because RWP has a different set of maternal matches in this segment, when compared to MLP & PEP. What does this tell us about MLP & PEP? This tells us that MLP & PEP must be Horr-Hall in this maternal segment. There simply is no other alternative.
5. First cousin comparison by side of family. This employs the same logic as Method 4, but takes it to the next level. We know that when each of the nine children of Edwin Hall & Sarah Eggleson were conceived (there were two others, Edith and Enos, with no living descendants), for each segment, Edwin Hall could offer DNA from either his father, James William Hall or his mother, Sarah Benjamin Horr. Sarah Eggleson could offer DNA from either her father, Asa William Eggleson or her mother, Sarah Margaret Roley. When we map the seven primary lines of descent from Edwin Hall & Sarah Eggleson for a chromosome, we will see a number of places that have been resolved to Horr-Hall or Roley-Eggleson. We will also see places that have been resolved to Wickham-Horr (ie, Sarah Benjamin Horr), Rogers-Eggleson (ie, Asa William Eggleson), or Daugherty-Roley (ie, Sarah Margaret Roley). Let’s say that two of the primary lines of descent show a particular segment to be Wickham-Horr, and that two of the primary lines of descent show the same segment to be Horr-Hall, but not Wickham-Horr. What do we know about the Horr-Hall segment that has NOT been resolved to Wickham-Horr? Think for a minute. The segment must be Easton-Hall (ie, James William Hall). This is of paramount importance since James William Hall was an only child and most of his ancestry is untraceable. Distant cousins who are related to us through James William Hall are sending their DNA signal to us across this void of genealogical emptiness. Once we know who they are, we might begin to figure out how they are related.
6. Segment studies. Across the genome, we have identified a number of segments corresponding to a large number of different sides of the family. Some of these segments may be so short that they contain only 20 or 30 million nucleotides. We have identified the segment by patterns of inheritance from the various participants. Segments are defined by common descent from a couple. If I have labelled a segment as Rogers-Eggleson, it means that living participants have inherited it from Asa William Eggleson. There are a number of Rogers-Eggleson segments sprinkled across the genome. These segments originated long before Asa Wiiliam Eggleson; each such segment represents one of Asa William Eggleson’s ancestors from earlier. As we examine a segment, we usually note matches that share the segment. A few overlap most of the segment. Most overlap just a tiny part of the segment and are likely related through an even earlier ancestor. If one match in the segment can be confirmed as being related through Thomas Rogers, maternal grandfather of Asa William Eggleson, it means that ALL matches within that segment are related through Thomas Rogers. However, we can’t be certain until several matches can be confirmed as related through Thomas Rogers.
7. Contacting matches (or just looking at their online family trees). We try to let the DNA do the talking. The DNA, if properly studied using the various methods above, can tell us a lot about the side of family. Let’s say we have a segment narrowed to a great-great grandparent. We then look at the trees of several of the matches. Some of them show descent from other known ancestors. For example, a match I will only call Angela was shown by AncestryDNA as having a suggested common ancestor on the Vermillion side of my family. Ancestry made this suggestion because Angela & I have family trees showing descent from the Vermillion family, which is on my mother’s side of the family. When studying the segments through Family Tree DNA & Genome Mate Pro, I see that Angela shares the segment with my father & paternal second cousin. Turns out, the segment has been isolated to my great-great-great-great grandfather, Richard Bauguess, on my father’s side. Turns out that Angela is descended from Richard’s probable nephew, Joseph Boggess. We let the DNA do the talking & ignore the trees. Most of one’s 7th or 8th cousins are related through several different sets of common ancestors. Our interest here is in the ancestors that are the source of the DNA segment.
Once we have narrowed a segment to a side of family, we can begin contacting matches. Most newcomers do the opposite. They send e-mails with messages like, “Family Finder tells us that we are related. Here is my tree. Do you see the common ancestor?” They should isolate segments to side of family before contacting anyone. I prefer to send messages that 1) identify the match, 2) identify the side of family (ie, if I have narrowed a match to Edwin Hall, I don’t need to have the match considering my Bauguess ancestry as a source), 3) offer a brief list of ancestors, usually in the ahnentafel format, including places of residence. I get a lot more responses to messages that are narrow & specific, including a number who have looked at the small subset of my ancestry & confirmed the identity of the MRCA. Always begin with those who share the most DNA within a segment. The MRCA with those with large overlaps will be a lot more informative than the myriad of tiny segments shared within the larger segment.
The groups are described as follows, beginning with the closest and moving to the more distant.