dna, Finding John Costello

Finding John Costello – A DNA Journey: Isidore Fried & Sarah Esther Salzman Data Analysis

Finding John Costello

Last week, I wrote a quick update on my John Costello work.  I shared that my Auntie V’s DNA results had come in and that as I updated my data tables, her results helped eliminate all but two possibilities.  What a huge and exciting step forward!

In sharing my news, I got some great questions and some requests to outline my steps.  One of my readers, Amy of Brotmanblog: A Family Journey, shared that in studying DNA posts that walk through steps, “I just [get] too overwhelmed by numbers and [get] a little lost”.  So this post is an attempt to answer all questions about my own steps in a way that hopefully won’t make DNA beginners share Amy’s feelings about getting lost in the numbers.  There will be numbers, but not until we establish the steps.

 

First, a few disclaimers.

 

Disclaimer one:  I am not an expert in using DNA data with my genealogy research.  I am a genealogist who is learning to understand DNA data and its nuances so that I can accurately analyze some of my tougher research problems and see new research possibilities that come from the DNA analysis.

I am currently reading Blaine Bettinger’s book, The Family Tree Guide to DNA Testing and Genetic Genealogy.  I have and continue to read many articles, blog posts, etc from leading genetic genealogists.  I have and continue to watch presentations on DNA from leading genetic genealogists.  And lastly, I spend time working my DNA data, the DNA data of my family members, and that of various friends and patrons at my local Family History Center.

Disclaimer two:  This research problem is impacted by endogamy.  Endogamy is something I am still working on understanding.  I am particularly puzzled by the best ways/most accurate ways to factor endogamy in when analyzing the numbers.  Lara Diamond’s Endogamy work is helping a lot, but I still need to learn more about this.

Okay, disclaimers and preambles aside, let’s dive in!

 

Preparatory Work

 

My project has been ongoing for quite some time.  It required some preparatory work.  Some portions of that preparation I will just touch on lightly here – connecting the cousin matches into a cluster.  Others I will share in more detail – creating the list of relationship possibilities to test, setting up your charts, etc.  However, all of this part is essential to the rest of the analysis.  If you have questions about doing this correctly, ask away!

 

Step One:  Organize your two clusters.

 

You need to understand exactly how the people in each cluster connect to each other.

I have the known descendants of John Costello who have tested.  We look like this:

John Costello descendants who have tested
Tested individuals are highlighted in green.

The cluster I am comparing my known family to are the descendants of Isidore Fried & Sarah Esther Salzman.  They look like this:

Isidore Fried descendants who have tested

Please note that all names of the living have been changed or anonymized except for myself, Vince, and Virginia.

 

Step Two:  Create a list of relationship possibilities between your two clusters.

 

I chose to use the two people in our two clusters who were the closest match to each other to create my list.  At the time I created my charts, that was Mack from my family and Sam from cluster 1.  Mack and Sam share 161.1 cMs.

When I plug that number into the Shared cM tool on DNA Painter, I get the following list:

Screen Shot 2019-05-13 at 2.11.37 PM

If I tested every single possibility on this list, I would be checking 25 hypotheses.  But some of them are logically impossible.  For instance, all of the Great-Great-Aunt/Uncle/Niece/Nephew possibilities are simply not logical.  Mack and Sam are about the same age.  Those relationships don’t work based on the known data about the two families so I ruled them, and others, out.

I further shorten the list by taking the lowest number of shared cMs between people in the oldest tested generation and looking at their relationship possibilities with the Shared cM tool.

In my case, at the time I created my charts, the two people in the oldest generation with the lowest number of shared cMs are my Mom and Sam.  They share 132.4 cMs.  Their list looks like this:

Screen Shot 2019-05-13 at 2.21.42 PM

I’m looking for relationship possibilities that show up on both lists.

My list making was a bit fuzzy and flexible because I wasn’t exactly sure how to account for endogamy.  In the end, I came up with nine relationship possibilities that were on both lists, and were logical, that I wanted to test.  You will see that list in the next section.  You will also see that I later added a tenth possibility that I had previously ruled out.

 

Step Three:  Chart your relationship hypotheses.

 

This step can feel the most confusing until you have created several relationship charts.  Mine are in data tables, but I started by handwriting them on paper.  Here is a sample relationship chart that lines up our two families based on one hypothesized relationship:

relationship table

In this case, the hypothesis is that John Costello is the half-sibling of Isidore Fried OR Sarah Esther Salzman.  Using this hypothesis, we can then use the table to determine what genetic relationship we are testing between two people who have taken DNA tests.

For example, in this hypothesis, Sam and my Mom would be half-2nd cousins once removed or 1/2-2C1R.

 

Step Four:  Build a data table with the items you are interested in comparing.

 

This step is dependent on what you want to see and compare.  I choose to look at the following four items in my charts:

  • How many cMs do the two people share?  Comes from the DNA vendor.
  • What is their hypothesized relationship?  Comes from my list and the relationship hypothesis I am analyzing.
  • Does that relationship work based on the number of shared cMs?  Comes from the Shared cM Tool on DNA Painter.
  • What probability category does that relationship fall in using the Shared cM tool and what is the statistical probability of that relationship?  Comes from the Shared cM Tool on DNA Painter.

A simplified version of my data table looks like this:

Simplified Table categories

You can see that my family members will be listed in the left column and the DNA matches will be listed across the top.  This table includes my eight family members and the four categories I look at with each match from cluster 1.

Here is an expanded version that shows what the blank table will look like with the four DNA matches from cluster 1 along the top and my family down the left.  This chart includes only our names and the four columns I will fill in for each match compared to my eight family members:

simplified table categories expanded

Next, I add the (mostly) fixed data points – shared cMs, dashes for people who are not in the same data pools and can’t currently be compared, and # to indicate individuals who are in the same data pool but whose shared cMs I don’t currently have access to.  That table looks like this:

Fixed data

This table becomes my template for every single hypothesis that I test.  Please note that I also differentiate the two different generations in each cluster in this chart.  The people from the oldest tested generation are listed first and highlighted with darker blue.  The people from the younger tested generation are listed next and are highlighted with lighter blue.  This is a visual reminder to me that the data for the older generation is more meaningful.

Please note:  When comparing DNA between family members, if you have the DNA of a parent, there is no reason to look at the DNA of their child compared to the same DNA match.  The data of the child to that match adds nothing to the analysis.  I left the two parent/child relationships in my chart because when I began my project, the two children each were in pools that their parents were not in and the numbers to some matches were relevant.  However, as I progressed and gathered more data, I left the two children in because it has been interesting to look at the differences.  It is highlighting several principles for me – test the oldest living generation, the more distant a match the more work it takes to identify the relationship, etc.

 

 

Charting and Testing Each Relationship Hypothesis Between the Two Clusters

 

Charting & Color-coding the Data

 

If you have done a good job with your preparatory work, these steps will be pretty fast.  Please note that this is a place where you can easily make mistakes as you add data to your table.  Figure out a way to double or triple check your reading of the information on the Shared cM Tool’s tables and your input of that data into your own tables.

Please also note that I found a mistake in my original fixed data many weeks ago.  I had typed the longest segment of shared DNA between my Mom and Faith instead of their total shared cMs as seen on MyHeritage.  When pulling data from multiple vendors, you may want to double and triple check your fixed data as well.

I already had my list of relationship possibilities I wanted to test, now it was time to create two charts for each hypothesis.  One relationship chart and one updated data table.

The first theory I tested was that John Costello and Isidore Fried OR Sarah Esther Salzman were half-siblings.  That relationship table and data table look like this:

Theory One

You can see that I color coded the items in two columns.  The does-the-proposed-relationship-work column (Works?) has three color coding options.  Yes is light green, no is red, and what I call “soft nos” are light red.  [Yes, I know that “light red” is usually called pink.  I call it light red to myself because it is in the red color fill column and that works in my brain for these tables.]

What are “soft nos”?  Those are relationships that are showing up as possible but either have a zero probability or have not been recorded in the collected data for the Shared cM tool.

The second color-coded column is the “Probability” column.  There are two numbers in this column.  First I list which probability category the relationship falls in and second I list what the statistical likelihood of that relationship is from the Shared cM Tool.  So, based on this chart:

Screen Shot 2019-05-13 at 3.04.42 PM

. . . a hypothesized relationship of half-2nd cousins, or 1/2 2C, would be listed as being in probability category 1 with the statistical likelihood of 49.93%.

This column is color coded in this way:

  • Probability category 1 is bright green
  • Probability category 2 is bright yellow
  • Probability category 3 is orange
  • Probability category 4 is light gray
  • Probability category 5 is darker gray
  • Probability category 6 is even darker gray
  • 0% probability items are red

The color coding helps me take in the entire data table and visually “weigh” the DNA evidence for that hypothesis.

 

My Relationship Hypotheses

 

Auntie V’s shared cMs with Bernard were very helpful in reducing the number of possible relationships between my family and cluster 1.  Let’s look at each relationship hypothesis.

 

Hypothesis one, that John Costello and Isidore Fried OR Sarah Esther Salzman are half-siblings, is shown above and is not possible based on Auntie V’s shared cMs with Bernard.

 

Hypothesis Two – John Costello & Isidore Fried OR Sarah Esther Salzman are siblings:

Theory 2

When you focus on the comparison of the people in the oldest tested generation, this hypothesis looks really good with a strong probability.

 

Hypothesis Three – Isidore Fried OR Sarah Esther Salzman is the aunt/uncle of John Costello:

Theory 3

Again, Auntie V compared to Bernard allows us to rule out this hypothesis but the remaining data between the people in the oldest tested generation supports this theory being less likely.

 

Hypothesis Four – Isidore Fried OR Sarah Esther Salzman is the 1/2 Aunt/Uncle of John Costello:

Theory 4

This hypothesis had been ruled out before I was able to add Auntie V’s data, but having a hard no in the oldest tested generation was valuable to me.  Please note that the other four comparisons in that oldest tested generation were not very likely at all.

 

Hypothesis 5 – John Costello is the uncle of Isidore Fried OR Sarah Esther Salzman:

Theory 5

Again, Auntie V and Bernard’s shared cMs rule out this possibility, but it wasn’t looking particularly strong in the oldest tested generation before their data was added.

 

Hypothesis 6 – John Costello is the half-uncle of Isidore Fried OR Sarah Esther Salzman:

Theory 6

Just like with hypothesis four, this hypothesis had been ruled out before I was able to add Auntie V’s data, but having a hard no in the oldest tested generation was valuable to me.  Please note that the other four comparisons in that oldest tested generation were not very likely at all.

This example also highlights the fact that for every relationship you are testing that is not in the same generation, the relationship tables should be reversed.  Technically, the numbers are identical and the data table itself doesn’t need to be created again, but as a beginner, I didn’t want to overlook anything and went for some redundancy rather than missing anything.  You can also rule out some reversed possibilities because they aren’t logical.

 

Hypothesis 7 – John Costello and Isidore Fried OR Sarah Esther Salzman are first cousins once-removed, or 1C1R:

Theory 7

This theory was fully ruled out before adding Auntie’s V’s data, but I added it anyway.  I really like keeping my hard nos in my report for now.  It gives me more confidence when I look at what is left over as possible.

 

Hypothesis 8 – This is the same as hypothesis 7, but the relationship table is reversed:

Theory 8

 

Hypothesis 9 – John Costello and Isidore Fried OR Sarah Esther Salzman are first cousins, or 1C:

Theory 9

This hypothesis was looking very weak, but adding Auntie V’s data fully ruled it out.

At this point, I was left with only one possibility.  But the records about Isidore Fried are quite thin and he disappears in 1911, while John appears in 1917.  So I got to thinking, what if John Costello IS Isidore Fried?  So I added a tenth hypothesis to test.

 

Hypothesis 10 – John Costello IS Isidore Fried:

Theory 10

While this hypothesis isn’t nearly as strong as the hypothesis that John is the sibling of Isidore or Sarah, it is still possible.  Right now, I don’t have any documentary evidence that Isidore has a brother.  Additionally, cluster number 2 are also Frieds and they have a brother named Isidore.  Isidore appears to be the connection between the two clusters.  If I am correct in my tentative conclusion that Isidore connects my two closest clusters together, then that means that if John is the sibling of Isidore OR Sarah, it has to be Isidore and not Sarah.  Since right now I have no proof that Isidore has a brother, it makes me extra curious about the weaker possibility that John IS Isidore.

 

Factoring in Endogamy

 

Oh boy.  This is the area that I can’t seem to find clear, consistent direction on how to do this.  I’ve heard a few times that you should throw out segments that are under 7 cMs, but when I did a quick check I couldn’t find the source of that idea.

In case it is correct, I ran some handwritten updates to my ten hypotheses.  I adjusted the shared cM totals for Auntie V to Bernard from 264 to 245.84 cMs.  I adjusted the shared cM totals for my Mom to Bernard from 177 to 165.5 cMs.  These changes to my tables had minimal impact.  Only three probability categories changed.  But the overall assessment of each table did not change.  Hypotheses that were ruled out were still ruled out and hypotheses that were possible were still possible.  (Please note that I can’t update all of the numbers because not every vendor shows segment data.  There are additional numbers I could update, but by checking the highest shared cMs and the lowest shared cMs between members of the oldest tested generation I feel like I have considered the “toss every shared segment below 7 cMs” idea.)

Studying Lara Diamond’s endogamy tables is helping me form some generalized ideas about how endogamy impacts the numbers.  Based on comparing my data to her tables, I feel like my ruled out and possible theories are still correct.  However, I don’t know how to or if it’s even possible to quantify that.

 

Conclusion

 

After running all of the numbers, I have two theories to try to prove and disprove.  That is an awesome feeling!

This method is not something I read or saw somewhere, it’s just how my brain settled on trying to sort out what the data was really telling me.  It is definitely a work in progress and the DNA alone will not answer this question.  Now that I have two theories to look at, I’m moving on to records.  Can I prove or disprove either theory?

I’m still trying to gather more DNA data as well, but in the end, this problem – like all genealogical problems – cannot rely on DNA alone.  The answer will come in carefully analyzing and correlating the DNA data along with the documents.

 

 

ps – In my large report, I have an explanatory page for my family members.  Here is an image of that page.  It may interest you:

Reading the tables

 

Phew!  That was a doozy.  Happy Tuesday!!  I hope this answered the questions I have received and wasn’t too scary for those of you who aren’t fans of the numbers.  Please feel free to ask any questions you may have.  xoxo