top of page

Taking My Own Advice 4C; DNA Research; Fundamental Research Techniques

Writer's picture: Vance HawkinsVance Hawkins

Updated: Sep 29, 2022

4C. DNA Research
Before You Start, Educate Yourself on Basic Research Techniques
I grew tired of people trying to tell me I just didn't know enough to understand their "superior minds" at work. I, being just a simple man, just couldn't comprehend their DEEPER thoughts at work. That's why I created this blog entry. Research A million people do “research” online. So many don’t know what they are doing. Here is a quick study of the principles I use that keep me grounded. I am upset by people saying the "Melungeons" of southwestern Virginia, or Northeastern Tennessee, including parts of Kentucky and North Carolina, are Gypsies, or Portuguese, or something else. -- with only rudimentary evidence that could mean anything. They DESPERATELY need to learn how to conduct research. Learn These Concepts. Reliability -- the degree to which the result of a measurement, calculation, or specification can be depended upon to be accurate.
Validity -- Validity refers to how accurately a method measures what it is intended to measure
Occam’s Razor – the principle (attributed to William of Occam) that in explaining a thing no more assumptions should be made than are necessary. The principle is often invoked to defend reductionism or nominalism.
Proof – the action or process of establishing the truth of a statement.
An abundance of circumstantial evidence – Circumstantial evidence – Wikipedia -- https://en.wikipedia.org -- Circumstantial evidence is evidence that relies on an inference to connect it to a conclusion of fact—such as a fingerprint at the scene of a crime. – end of Wikipedia.
So an abundance of such evidence would be enough evidence to convict a person of a crime, but it could fall short of actual proof.
To fall short of actual proof -- means /=> say you are trying to prove x=y. Then we fall short of actually proving x=y if and only if we can show there exist some unique value (!) for x such that x does not equal y /=> then we have shown we can NOT prove x-y.. To prove or disprove a thing is not easy.
Standard Deviation -- If your research includes analyzing quantifiable data, know a little about mathematics. Especially learn some "Statistics and Probability Theory".. Minimum knowledge must include how to derive the standard deviation. Go to the link below and study the variables to learn how to solve the equation. If you can't perform this simple procedure -- DON'T even mention "statistical noise!" It is INPOSSIBLE for you to comprehend!

​σ/=> represents standard deviation N /=> represents the total number of data points. Σ /=> means "the sum of" The lower case "i" below the epsilon symbol is a discrete number. The entire phrase is read; "The sum of all values from 1 to N" (1, 2, 3, . . ., N) μ /=> the "mean" or average value χ /=> The actual value of the measured data point to calculate the standard deviation of those numbers: To Derive the Standard Deviation . . . 1. Work out the Mean (the simple average of the numbers) 2. Then for each number: subtract the Mean and square the result 3. Then work out the mean of those squared differences. 4. Take the square root of that and we are done! EXAMPLE -- Say we have a bunch of numbers like 9, 2, 5, 4, 12, 7, 8, 11. STEP 1. Since the mean is the average, simply add them together and divide by "N". This gives us 9+2+5+4+12+7+8+11=58. Since there are 8 numbers, our "N" is 8. So (58/8)=7.25 /=> 7.25 is our mean. STEP 2. subtract the mean from each number and square the result. So . . .
9.00-7.25= 1.75. 2.00-7.25= -5.25
5.00-7.25= -2.25
4.00-7.25= -3.25
12.00-7.25=4.75 7.00-7.25= -0.25 8.00-7.25= 0. 75 11.00-7.25=3.75 Now we need to square the results 1.75*1.75= 1.5625 -5.25*-5.25= 27.5625 -2.25*-2.25= 5.0625 -3.25*-3.25=10.5625 4.75*4.75= 18.0625 -0.25*-0.25= 0.0625 0.75*0.75= 0.5625 3.25*3.25= 10.5625 This is tedious, but it is also very simple. It only requires adding, subtracting, multiplying and dividing. By the end of the third grade, all children have mastered these skills. STEP 3. Then work out the mean of those squared differences. This is just like STEP 1. 1.5625+27.5625+5.0625+10.5625+18.0625+0.0625+0.5625+10.5625= 74. STEP 4. Take the square root of that and we are done! Every good researcher will follow and be familiar with these SEVEN concepts. Be wary of any researchers that don’t understand the seven concepts listed above. And the square root of 74=> 8.6023252670426267717294735350497 -- qed. (quod erat demonstrandum -- "thus it is shown", or "thus it is demonstrated" -- I have just demonstrated how to find a standard deviation. It is used in all kinds of research to discover how compact your data is. A professional researcher has had at least one course in Probability and Statistics and can perform the computations required to calculate standard deviations. Example of determining if your hypothesis is both reliable and valid My MDLP-World-22 option on gedmatch.com gave me a substantial ancestry from India, at least 12.5 %! That is 1/8th from the Indian sub-Continent! That’s a great grandparent! But all my great-grandparents were born in America. So I know that part of their algorithm is not reliable. In fact I have only one g-g-g-g-grandpa on one side of my family was born in Europe – Nevil Wayland Sr, was born in Cashel, County Tipperary, Ireland in 1745. All my other ancestors were born in America by 1745. On the gedmatch website they leave an option where you can write the creators of their free data analysis software. I emailed them and asked them the following -- I had told him I am mostly European, but results showed I have some African and Native American autosomal DNA. I gave him my gedmatch number showing this. I asked; “My question is this -- "Why does my admixture show so much ancestry from the Indian sub-continent? I suspect Native American mixed with African and Caucasian might "appear" similar, in a genetic since, to the DNA of the Indian subcontinent . . .” I was thinking there are a billion Asian Indians. South India has a large Black population. Northeastern India, near Tibet, will have more of an Asian component. The Hindi language arrived in India as one of the Indo-European languages., so some early day invaders must have had European features, as well.. This is the exact composition of the Melungeon peoples of the the Southern Appalachian Mountains -- part Native American (Siberian before coming to America), African and Caucasian. Since there are just a few of us and there are a BILLION Asian Indians -- they just lumped us in with them. I received a short reply, saying; “I guess that your Indian score is just a noise ("for some unknown reasons MDLP World tends to give some Indian % to persons of European descent").. Now some people are out there preaching Melungeons descend from Gypsies! Also he used the word "noise" in a manner that bothers me.. Having a background in mathematics, I know something of the origin of the concept of the use of the term, "noise" in research. A more accurate term might be " reliability" or "validity." He might know he really was referring to the reliability and validity of his data -- but the person he is talking to might not. Hearing him use the word "noise" in that manner is like a door squeaking every time it is opened or closed to me.. Note to self -- add a paragraph on the use of the concept of "statistical noise". But the point he was making was that he was admitting there were “wrinkles” in his algorithm that need to be ironed out. I have demonstrated parts of his algorithm measuring percentage of ancestry from India is not "reliable". Thus the claim of “Gypsy” ancestry for my Melungeon ancestors is not "valid" either. Reliability and validity often travel together. Again -- qed.
Statistical Noise
One problem people run into is often the claim that your data is just "statistical noise". A definition of this term can be found here; https://www.techtarget.com/whatis/definition/statistical-noise#:~:text=Statistical%20noise%20is%20unexplained%20variability,quality%20of%20signals%20and%20data.
Statistical noise is "unexplained variability within a data sample.". They are considering the example of during any experiment where an electrical current is used to measure data, the electricity coming from the wall circuit can distort your measurement. some small amount. However, some researchers use the term -- "statistical noise" -- incorrectly; to refer to small quantities of unexpected results when conducting and analyzing DNA data in a sample of DNA. They have had occasion to just dismiss Native American DNA results found in small quantities in an individual. But that small quantity is all one would expect when going back many generations.
For example, I have a great grandpa (Jefrrey Hoten Richey, b. Ar 1851-d. Ok 1926) who was born in 1851.. I was born in 1952. So 100 years can mean 4 generations.. That would mean 10 generations can cover 250 years. 1 generation => 1/2 of native DNA material remaining. 2nd generation =>1/4th remaining, and so forth. 3=>1/8th.; 4->1/16th; 5-1/32nd; 6=.1/64th; 7=>1/128th; 8=> 1/256th; 9=.1/512th; and 10 generations.=> 1/1024th of the original Native DNA will be remaining.. That's under one tenth of one percent! And that's only going back 250 years.. I am writing this in the year 2022. 250 years takes us back to the year 1778. You might have multiple mixed-race ancestors in multiple generations -- that would the complicate the calculations. But if we know the mixture back to the first mixed race ancestor, the calculations are simple enough.
The problem is we probably don't know this. Our "statistical noise" comes from this lack of knowledge of the definition of "statistical noise, and not from the empirical data. If we BELIEVE the small amount of Native DNA is for real, it can help us pinpoint which generation of out ancestor who was Native.
What might be mistake for "statistical noise" might be that we seldom, if ever, inherit EXACTLY 50% of our genetic material from each parent.. Sometimes it might be 55/45 or something similar. This is a result pf the "reliability" of the data (a concept I covered earlier in this blog entry), and NOT to "statistical noise"!,
Remember "statistical noise" is UNEXPECTED variability in your results. If you expect to find small amounts of Native DNA strands and that is what you discover, it is NOT statistical noise! It becomes noise when you find something you didn't expect -- such as Asiatic Indian DNA when you have no known ancestor from the Indian subcontinent during the timeframe your ancestor is thought to have come from that region. That can be determined from the percentage of Asian DNA they find. Thus if they tell you you are 12.5% Asia-Indian; that's 1/8th and implies a great grandparent: implying an ancestor born about 1922 (I'm writing this in 2022) was full Asia Indian. If this is NOT the case with you, that data is just "noise".
It does NOT alter the fact that Native DNA is PRESENT in the DNA sample., and that you expected this result. Since it is expected, it is NOT just "noise".
Example of Occam's Razor Everything about the Melungeon families found on the Tennessee-Virginia border can be explained by us having English/Scots-Irish /African and Native American ancestry. To complicate this by adding Gypsy, Portuguese, Turkish, or Arabic ancestry in large numbers is simply unnecessary and doesn't agree with known historical facts. Efforts to find facts on this topic are inconclusive. So -- using Occam's Razor, -- researchers need to "shave away" those unproven allegations until their claims can be shown to be more probable than the simple explanation that we descend from KNOWN populations of the region. The same is true with the origins of the word "Melungeons". This IS from a French verb meaning "we mix". Any Arabic or Turkish or Angolan or Portuguese word that looks a little bit similar, but is NOT the EXACT same word, must, by using the principle of Occam's Razor, be "shaved off", as irrelevant. Conclusion I use the above as a guideline to keep my research grounded in reason. Please keep these six principles in mind as you look through the blog entries. So many people perform "research" who have profound biases and don't even realize it. I hope knowing of research techniques will lead to better research in the future.
33 views0 comments

Recent Posts

See All

Link to Files on How to Perform Research

There is a lot of "research" online that is done improperly. DNA conclusions are being drawn from improperly data. This is a link to all...

Komentarji


bottom of page