About us
This page is dedicated to posting Sproul information that you might find interesting. If you have anything you would like to share with the group that can be posted here please send to the administrator email.
Posted: 04 October 2020
Recently there has been some discussion on the U106 Message Board regarding the understanding of how Big Y data, and WGS as a whole is analyzed. Since this project relies hevily on Y-DNA to achieve our project goals I thought I might include a response from Dr Iain McDonald, lead administrator for the U106 project. With this said, I highly encourage all of our Y-DNA testers to join the U106 project and sign up for the U106 Message Board. There is a tremendous amount of knowledge that can be gained by the experts in the field of Y-DNA analysis. With this said, here is an analysis of how Y-DNA is analyzed in the lab:
Hi folks,
Following Myles's comments, I thought I'd branch this off to explain how BigY works, what all these terms mean, why all the different numbers of SNPs don't always match up to your expectations, and what it means for a SNP to be "real" or "private".
In an ideal world, you send your DNA off to a company who reads the entire sequence you pay for. They compare that exact same sequence to everyone else's and return the SNPs that differ between you.
Spoiler alert: we don't live in an ideal world. Welcome to the complex world of data science. You might have met this world recently: How many people die of COVID? It sounds straightforward, but it isn't. Does it count if you were terminally ill anyway and otherwise asymptomatic? Or if you got hit by a bus after a positive result? Or if you weren't tested before you died? There are too many edge cases to give a precise number. Most statistics you see like this is the same. The challenge for data scientists and communicators (including the media) is to portray these complex data in a format that is easily understood without the viewer needing two weeks of background reading, thus skipping a lot of potentially important details. BigY is no different, so what follows is a condensed version of those two weeks. Important points for the impatient are in bold italics.
1. Why Y-DNA is hard to read. Around 166 million years ago, an X chromosome developed a gene called SRY. This let sex become genetically determined at conception, and the Y chromosome was born. The Y chromosome retains great similarity to the X chromosome, and bits of it have swapped with the X and other chromosomes over the years. The last major one, around two million years ago, gave us the X-transposed region. This region is 99.3% identical to the corresponding X chromosome, and large parts of it are read by the BigY-700 upgrade. Because it no longer recombined with X, Y suffers from Muller's ratchet: it is prone to being deleted. So evolution selected for protective mechanisms, including making the Y chromosome highly repetitive, repeating key genes. This makes much of the Y chromosome also very similar to itself - it has large palindromes and large repeating structures. These factors mean a chunk of DNA in isolation often cannot reliably be mapped back onto a specific Y chromosome location.
2. The Y chromosome and base pairs. DNA is made up of four chemical bases (adenine, guanine, cytosine and thymine: A, G, C and T) connected in pairs by a double-helix. One out of each base pair is read and returned by a genetic test. You have about 57 million base pairs on your Y chromosome, of which BigY will confidently read about 15 million, and tentatively read data on around 23 million. That 15 million is only 0.24% of your entire diploid DNA, so it is easy for one part of the Y chromosome to be mistaken for another.
3. What it means for a SNP to be "testable". (Thomas, Astrid or someone can doubtlessly correct me if I'm wrong, as this is not my forté.) Sanger sequencing, the "gold standard" method of SNP identification, is used by YSeq. This targets a specific location on the Y chromosome, of order 100 base pairs long, and looks for mutations within that region. That targetting allows a very-high-confidence result to be assigned to that specific SNP: Sanger sequencing is the sniper of the DNA world. However, if it falls within one of these regions that could be mapped to multiple locations, it can't reliably be tested by Sanger sequencing, and Thomas or whoever will say they don't recommend it for testing.
BigY and similar sequencing tests (YElite/WGS/etc.) aren't so targetted. Rather than being a sniper, they take a shotgun approach to sequencing. DNA is broken into arbitrary chunks of similar length (maybe 100-150 base pairs) and randomly pick them. If they can be read and mapped back onto the Y chromosome, great. If not, try again. With millions of these reads, you build up your whole Y chromosome: well-mapped regions are clearly read, while poorly mapped regions aren't. A primary difference here is that the chunks don't fully overlap, so we can successfully read slightly into regions where DNA mapping is poor if we are lucky with where the chunks are cut. The longer the length of chunks (read length) and more times we look (average read depth), the more likely we are to be lucky and the more of the chromosome we'll read. This process means that BigY can cover regions that are not viable for Sanger sequencing. However, the random luck means that no two BigY tests cover exactly the same regions.
4. How do I know if a SNP is real? Cogito ergo sum - the only thing one can know for certainty is that one exists. For everything else, one should have some measure of doubt, however small. BigY quantifies this doubt in a number of ways: some are public knowledge, some us mere mortals aren't privy to. Four of the most important are the clarity with which the base is read (base quality), the reliability it is mapped onto the correct location (mapping quality), the number of reads (read depth) and the consistency of those reads being of one particular base. These doubts are expressed in formal probabilities. Any mutation passing a certain probability threshold is marked as a SNP. Typically, SNPs might only be called if (say) they had a less than one-in-a-million chance of being wrong.
These probability thresholds are very accurate but make certain assumptions. For example, they assume the rest of your DNA exactly matches the reference DNA. If one part of your DNA mutates to look like another part of your DNA, the analysis might map it to the wrong position and not notice, and you might wrongly be flagged with a SNP, or be flagged with a SNP in the wrong position. There are various ways to look for these false positives, and they primarily affect highly repetitive regions of DNA. These include regions like the centromere, DYZ19 and the Yq12 arm. False positives and false negatives can also be caused by a variety of other reasons. However, typical false-positive rates are much less than one SNP per test, so any individual SNP recorded in your test is likely to be real unless you know otherwise. You may or may not be able to confirm or refute a SNP with another test.
5. Variant lists and "private" variants. SNPs selected by this process become a list of genetic variants in your DNA. These variants are compared with the existing haplotree. Any that are on it are ticked off, your place on the haplotree is assigned, you are given a haplogroup, and you are left with a list of variants that don't occur on the haplotree. Nominally these are private variants.
Most men will have no Y-SNPs that are truly private to themselves - i.e. they did not occur in their father and they have not passed them on to any sons. What "private" means here is that you are the only person in your haplogroup with that mutation. It can occur elsewhere on the haplotree - this makes it a recurrent mutation, and every SNP ultimately falls into this category - but what makes it private is that the people you are most closely related to do not have it.
This is not quite the same list as you see in your results. What you get in your list of private SNPs is filtered. Any SNPs in problematic regions like the centromere, DYZ19 and Yq12 are removed. I think SNPs occurring within STRs are removed too, but I'm not certain. Also removed are any mutations that are not SNPs: these may include some MNPs (two or more SNPs occurring at adjacent positions, but marked as individual SNPs by FTDNA), insertions and deletions (indels). I anticipate that further SNPs may be removed manually during the quality control process if they look like they might be problematic. Rare errors can, of course, still remain. However, so do some of the weirder, complex genetic variants that can look like errors.
5. Comparing your DNA to others. Comparisons between your DNA and others are done on the unfiltered list of private variants. This ensures that all possible variants that are consistently recorded make it onto the haplotree: in many cases (e.g. Z2265 and BY30097 between U106 and Z381) SNPs from poorly read parts of the chromosome make a big difference in the haplotree structure. Each position in your unfiltered list of private variants is compared to the other person's test, and vice versa. Positions where there is a clear mismatch are flagged as non-matching, but a null return is given if that position isn't read in the other person's test. So your non-matching variants are not filtered as your private SNPs are, but they don't contain positions read by only one test. For these SNPs, it's not possible to tell whether they are found in the second test. Family Tree DNA keeps them in your private variants until it can work out where on the haplotree they should go. It maintains this consistency up the tree: SNPs are placed as far down the tree as possible. Unfortunately, they don't flag ambiguous positions - this is one of the simplification they've chosen, but it can cause problems for people like me.
It's important to remember that the haplotree is a manually curated entity. A human eye avoid errors creeping in from various sources, and the curation of Family Tree DNA's is the full-time job of one marvellous man (who isn't customer-facing, so I won't mention him by name). What this means is there is a lag between results being finalised and the haplotree being manually updated to reflect those results. During this period, private variants remain private as they are not on the haplotree yet, and may therefore remain on non-matching variant lists unexpectedly while these final stages are completed. What you will typically find is that common private variants will be grouped together to form a new haplogroup, and you will be assigned to that haplogroup.
6. Summary. Looks can be deceptive. This very long post is a gross simplification of the very complex process that gives you a list of numbers and a position on a tree. There are lots of reasons why those lists of numbers might not conform to your expectations, and they are usually down to the way the system has been designed to either reduce errors or make the results comprehensible to the average person. Ultimately, we don't care about the lists themselves, but rather the relationships they trace.
If you are agonising about the individual numbers in those tests, it is probably time to consider whether they are really important for tracing your relationship to others. Perhaps it is, in which case you should consider advancing up the learning curve, and investing time in understanding the raw data that lies behind those numbers, and is a much more powerful tool. It's for these reasons that we ask users to share the raw data from their BigY and other sequencing tests with us, through the Data Warehouse ( https://ydna-warehouse.org/submit.php ), so that we can look more closely at individual mutations and look at data across multiple haplogroups with proper statistical methods.
Best wishes,
Iain.
Posted: 06 January 2020
2019 was an excellent year for the Sproul project. We were able to establish many lines in Ulster, particularly in County Tyrone. This would not have been possible without having Irish testers participating in the project. An incredible amount of information was learned and we continue to uncover more as testing is ongoing. If you are an Irish Sproul and have not tested, please consider doing so. It is vitally important that we have as many of you participating in order for us to truly understand how we are all related!
Our project goal for 2020 is to locate as many Scottish Sprouls that are also willing to participate in the Sproul project. We are at a point now in the project that weneed to start linking these lines back to Scotland. We cannot do this without Scottish testers! Please, if you are a male Scottish Sproul please consider joining us!
Posted: 17 March 2017
Ancient historical background of Walter Spreul and his time found here.
https://archive.org/stream/bookofdumbartons02irvi/bookofdumbartons02irvi_djvu.txt
Posted: 16 March 2017
A short story about experiences in Cauldhame;
http://www.scottishbooktrust.com/reading/stories-of-home/story/cauldhame
Interesting reading from:
A History of the County of Renfrew from the Earliest Times
Chapter X.—Families
http://www.electricscotland.com/history/renfrew/chapter10.htm
The Spreulls have long ceased to own their ancestral estate of Cowdon, in the parish of Neilston. During the period here dealt with, they were a family of considerable note. The first of them I have been able to meet with is Walter Spreull, Lord of Coldame, in the shire of Dumbarton, who, about the year 1294, was seneschal or steward to the Earl of Lennox, and with others, at the Earl’s direction, was holding courts in the Lennox property of the monks of Paisley, and in various ways seeking to deprive them of the donations conferred upon their house by one of the Earl’s predecessors. The cause of the monks was espoused by Robert Wishart, the famous Bishop of Glasgow, and afterwards “ the best perjured man in Scotland.” The Earl and his Steward paid no attention to the appeals of the Bishop. Robert at last directed the vicars of Carmunnock, Cathcart, Pollok, Kilmacolm and Kilbarchan to attend the courts the Earl and his steward were holding, and to warn them against interfering in the affairs of the Abbot and Convent “ contrary to God and to justice,” and in the event of their paying no heed to their warning, the vicars were, with all due solemnity, to proceed to excommunicate the Earl and his steward and all who adhered to them, and to lay the churches and chapels in the district under an interdict. This was only the beginning of the trouble between the Earl and the monastery. The controversy continued for many years, and was not settled until after the close of the English wars.
In 1296, Walter Spreull signed the Ragman Roll and took the oath of fealty to Edward I. of England. Among the garrison holding the Castle of Edinburgh for the English in 1335 was one Thomas Spreull, an esquire; but whether he belonged to the Cowdon family is uncertain, though it is not unlikely that he did. The same, or another, Thomas Spreull (“ Sproule ”) is mentioned in the Exchequer Rolls for 1368, 1368, 1372, as the receiver of stores for the Castle of Edinburgh. Under the year 1366, in the same Rolls, a Walter Spreull is mentioned as paying into the Exchequer the contribution of the barony of Glasgow towards the King’s ransom. At Bar, on August 29,
1483, Master William Spreull witnessed a charter whereby Hugh Lord Montgomery and Giffying gave to Alexander Montgomery, son and apparent heir of Robert Montgomery of Giffyng and his spouse Jonet of Dunlop, the five merk land of Bar, lying within the lordship of Giffyng and in the bailiary of Cunningham. With the exception of Sir Thomas Petcon, chaplain, the witnesses were Montgomeries.
In 1531 “ the laird of Cowdoun ” was engaged in a feud with the laird of Colgrane. One of the witnesses to the charter by which Alexander Porterfield sold his lands of Porterfield in the barony of Renfrew to his brother germane, Master John Porterfield, and his wife Beatrice Cunningham, on August 16, 1540, was Thomas Sprewill de Coldon. (Cowdon).
According to Nisbet there were several branches of the family, as the Spruells of Ladymuir, of Castlehill, and of Blachairne.
John Spreul, a younger son of the Cowdon family, was, in 1507, made vicar of Dundonald. At the same time he was one of the professors of philosophy in the University of Glasgow. Afterwards he was appointed Rector of the University. Subsequently he was advanced by Bishop Dunbar to be one of the prebends of his Cathedral Church, and in virtue of his prebendary became vicar of Ancrum. In 1541 he was a canon of Glasgow, and is so designated in a charter, according to which he bought from Lord Lyle, on the 25th August in that year, the lands of lie King’s Meadow, King’s Orchard, and Castlemilk, all lying within the territory of the burgh of Renfrew. Two years after this, he bought from Gabriel Semple of Ladymure, his wife Jonet Spreul consenting thereto, the lands of Ladymure in the lordship of Duchal and the parish of Kilmacolm, then occupied and cultivated by John Cochrane, George Lyle, and Jonet Caldwell. The contract was signed at Cathcart, April 25, 1543. On November 25 in the same year, Gabriel Semple and his wife Jonet Spreule purchased the lands and town of Cathcart, and assumed the designation of Cathcart. On July 27, 1545, the Queen granted to James Stewart of Cardonald, together with other lands, those of Dalmore and Dalquhorne4 in the lordship of Coldame in Dumbartonshire, the latter of which Walter Spreull of Cowdon had received from the Earl of Lennox in the time of Alexander III. In addition to the lands above mentioned, Master John Spreull is said to have purchased the lands of Blachairn in the lordship of Provan, and “ a fair lodging ” within the city of Glasgow. He died in the year 1555, leaving the whole of his property to John Spreull, his nephew, and son of his brother Robert, a burgess of Glasgow. At the Reformation, John is said to have become rector of Cambuslang.
In 1610, James Spreull of Cowdon was witness to a precept of dare constat by James Earl of Glencairn, dated at Glasgow, June 12. John Spreull, his successor, sold the lands of Cowdon to William Lord Cochrane, father of the first Earl of Dundonald, in 1622. John Spreull, the vicar of Cambuslang, was succeeded by his son and heir, whose son was Provost of Renfrew, and attended the Parliament of 1630 as one of the Commissioners of the Royal Burghs. The Provost was succeeded by his son, who was bred to the law, and was appointed Town Clerk of the city of Glasgow, and subsequently was one of the principal clerks of the Court of Session. In the Parliament of 1645 Renfrew was again represented by a John Spreull.