dna, Finding John Costello

Finding John Costello – A DNA Journey: Isidore Fried & Sarah Esther Salzman Data Analysis

Finding John Costello

Last week, I wrote a quick update on my John Costello work.  I shared that my Auntie V’s DNA results had come in and that as I updated my data tables, her results helped eliminate all but two possibilities.  What a huge and exciting step forward!

In sharing my news, I got some great questions and some requests to outline my steps.  One of my readers, Amy of Brotmanblog: A Family Journey, shared that in studying DNA posts that walk through steps, “I just [get] too overwhelmed by numbers and [get] a little lost”.  So this post is an attempt to answer all questions about my own steps in a way that hopefully won’t make DNA beginners share Amy’s feelings about getting lost in the numbers.  There will be numbers, but not until we establish the steps.

 

First, a few disclaimers.

 

Disclaimer one:  I am not an expert in using DNA data with my genealogy research.  I am a genealogist who is learning to understand DNA data and its nuances so that I can accurately analyze some of my tougher research problems and see new research possibilities that come from the DNA analysis.

I am currently reading Blaine Bettinger’s book, The Family Tree Guide to DNA Testing and Genetic Genealogy.  I have and continue to read many articles, blog posts, etc from leading genetic genealogists.  I have and continue to watch presentations on DNA from leading genetic genealogists.  And lastly, I spend time working my DNA data, the DNA data of my family members, and that of various friends and patrons at my local Family History Center.

Disclaimer two:  This research problem is impacted by endogamy.  Endogamy is something I am still working on understanding.  I am particularly puzzled by the best ways/most accurate ways to factor endogamy in when analyzing the numbers.  Lara Diamond’s Endogamy work is helping a lot, but I still need to learn more about this.

Okay, disclaimers and preambles aside, let’s dive in!

 

Preparatory Work

 

My project has been ongoing for quite some time.  It required some preparatory work.  Some portions of that preparation I will just touch on lightly here – connecting the cousin matches into a cluster.  Others I will share in more detail – creating the list of relationship possibilities to test, setting up your charts, etc.  However, all of this part is essential to the rest of the analysis.  If you have questions about doing this correctly, ask away!

 

Step One:  Organize your two clusters.

 

You need to understand exactly how the people in each cluster connect to each other.

I have the known descendants of John Costello who have tested.  We look like this:

John Costello descendants who have tested
Tested individuals are highlighted in green.

The cluster I am comparing my known family to are the descendants of Isidore Fried & Sarah Esther Salzman.  They look like this:

Isidore Fried descendants who have tested

Please note that all names of the living have been changed or anonymized except for myself, Vince, and Virginia.

 

Step Two:  Create a list of relationship possibilities between your two clusters.

 

I chose to use the two people in our two clusters who were the closest match to each other to create my list.  At the time I created my charts, that was Mack from my family and Sam from cluster 1.  Mack and Sam share 161.1 cMs.

When I plug that number into the Shared cM tool on DNA Painter, I get the following list:

Screen Shot 2019-05-13 at 2.11.37 PM

If I tested every single possibility on this list, I would be checking 25 hypotheses.  But some of them are logically impossible.  For instance, all of the Great-Great-Aunt/Uncle/Niece/Nephew possibilities are simply not logical.  Mack and Sam are about the same age.  Those relationships don’t work based on the known data about the two families so I ruled them, and others, out.

I further shorten the list by taking the lowest number of shared cMs between people in the oldest tested generation and looking at their relationship possibilities with the Shared cM tool.

In my case, at the time I created my charts, the two people in the oldest generation with the lowest number of shared cMs are my Mom and Sam.  They share 132.4 cMs.  Their list looks like this:

Screen Shot 2019-05-13 at 2.21.42 PM

I’m looking for relationship possibilities that show up on both lists.

My list making was a bit fuzzy and flexible because I wasn’t exactly sure how to account for endogamy.  In the end, I came up with nine relationship possibilities that were on both lists, and were logical, that I wanted to test.  You will see that list in the next section.  You will also see that I later added a tenth possibility that I had previously ruled out.

 

Step Three:  Chart your relationship hypotheses.

 

This step can feel the most confusing until you have created several relationship charts.  Mine are in data tables, but I started by handwriting them on paper.  Here is a sample relationship chart that lines up our two families based on one hypothesized relationship:

relationship table

In this case, the hypothesis is that John Costello is the half-sibling of Isidore Fried OR Sarah Esther Salzman.  Using this hypothesis, we can then use the table to determine what genetic relationship we are testing between two people who have taken DNA tests.

For example, in this hypothesis, Sam and my Mom would be half-2nd cousins once removed or 1/2-2C1R.

 

Step Four:  Build a data table with the items you are interested in comparing.

 

This step is dependent on what you want to see and compare.  I choose to look at the following four items in my charts:

  • How many cMs do the two people share?  Comes from the DNA vendor.
  • What is their hypothesized relationship?  Comes from my list and the relationship hypothesis I am analyzing.
  • Does that relationship work based on the number of shared cMs?  Comes from the Shared cM Tool on DNA Painter.
  • What probability category does that relationship fall in using the Shared cM tool and what is the statistical probability of that relationship?  Comes from the Shared cM Tool on DNA Painter.

A simplified version of my data table looks like this:

Simplified Table categories

You can see that my family members will be listed in the left column and the DNA matches will be listed across the top.  This table includes my eight family members and the four categories I look at with each match from cluster 1.

Here is an expanded version that shows what the blank table will look like with the four DNA matches from cluster 1 along the top and my family down the left.  This chart includes only our names and the four columns I will fill in for each match compared to my eight family members:

simplified table categories expanded

Next, I add the (mostly) fixed data points – shared cMs, dashes for people who are not in the same data pools and can’t currently be compared, and # to indicate individuals who are in the same data pool but whose shared cMs I don’t currently have access to.  That table looks like this:

Fixed data

This table becomes my template for every single hypothesis that I test.  Please note that I also differentiate the two different generations in each cluster in this chart.  The people from the oldest tested generation are listed first and highlighted with darker blue.  The people from the younger tested generation are listed next and are highlighted with lighter blue.  This is a visual reminder to me that the data for the older generation is more meaningful.

Please note:  When comparing DNA between family members, if you have the DNA of a parent, there is no reason to look at the DNA of their child compared to the same DNA match.  The data of the child to that match adds nothing to the analysis.  I left the two parent/child relationships in my chart because when I began my project, the two children each were in pools that their parents were not in and the numbers to some matches were relevant.  However, as I progressed and gathered more data, I left the two children in because it has been interesting to look at the differences.  It is highlighting several principles for me – test the oldest living generation, the more distant a match the more work it takes to identify the relationship, etc.

 

 

Charting and Testing Each Relationship Hypothesis Between the Two Clusters

 

Charting & Color-coding the Data

 

If you have done a good job with your preparatory work, these steps will be pretty fast.  Please note that this is a place where you can easily make mistakes as you add data to your table.  Figure out a way to double or triple check your reading of the information on the Shared cM Tool’s tables and your input of that data into your own tables.

Please also note that I found a mistake in my original fixed data many weeks ago.  I had typed the longest segment of shared DNA between my Mom and Faith instead of their total shared cMs as seen on MyHeritage.  When pulling data from multiple vendors, you may want to double and triple check your fixed data as well.

I already had my list of relationship possibilities I wanted to test, now it was time to create two charts for each hypothesis.  One relationship chart and one updated data table.

The first theory I tested was that John Costello and Isidore Fried OR Sarah Esther Salzman were half-siblings.  That relationship table and data table look like this:

Theory One

You can see that I color coded the items in two columns.  The does-the-proposed-relationship-work column (Works?) has three color coding options.  Yes is light green, no is red, and what I call “soft nos” are light red.  [Yes, I know that “light red” is usually called pink.  I call it light red to myself because it is in the red color fill column and that works in my brain for these tables.]

What are “soft nos”?  Those are relationships that are showing up as possible but either have a zero probability or have not been recorded in the collected data for the Shared cM tool.

The second color-coded column is the “Probability” column.  There are two numbers in this column.  First I list which probability category the relationship falls in and second I list what the statistical likelihood of that relationship is from the Shared cM Tool.  So, based on this chart:

Screen Shot 2019-05-13 at 3.04.42 PM

. . . a hypothesized relationship of half-2nd cousins, or 1/2 2C, would be listed as being in probability category 1 with the statistical likelihood of 49.93%.

This column is color coded in this way:

  • Probability category 1 is bright green
  • Probability category 2 is bright yellow
  • Probability category 3 is orange
  • Probability category 4 is light gray
  • Probability category 5 is darker gray
  • Probability category 6 is even darker gray
  • 0% probability items are red

The color coding helps me take in the entire data table and visually “weigh” the DNA evidence for that hypothesis.

 

My Relationship Hypotheses

 

Auntie V’s shared cMs with Bernard were very helpful in reducing the number of possible relationships between my family and cluster 1.  Let’s look at each relationship hypothesis.

 

Hypothesis one, that John Costello and Isidore Fried OR Sarah Esther Salzman are half-siblings, is shown above and is not possible based on Auntie V’s shared cMs with Bernard.

 

Hypothesis Two – John Costello & Isidore Fried OR Sarah Esther Salzman are siblings:

Theory 2

When you focus on the comparison of the people in the oldest tested generation, this hypothesis looks really good with a strong probability.

 

Hypothesis Three – Isidore Fried OR Sarah Esther Salzman is the aunt/uncle of John Costello:

Theory 3

Again, Auntie V compared to Bernard allows us to rule out this hypothesis but the remaining data between the people in the oldest tested generation supports this theory being less likely.

 

Hypothesis Four – Isidore Fried OR Sarah Esther Salzman is the 1/2 Aunt/Uncle of John Costello:

Theory 4

This hypothesis had been ruled out before I was able to add Auntie V’s data, but having a hard no in the oldest tested generation was valuable to me.  Please note that the other four comparisons in that oldest tested generation were not very likely at all.

 

Hypothesis 5 – John Costello is the uncle of Isidore Fried OR Sarah Esther Salzman:

Theory 5

Again, Auntie V and Bernard’s shared cMs rule out this possibility, but it wasn’t looking particularly strong in the oldest tested generation before their data was added.

 

Hypothesis 6 – John Costello is the half-uncle of Isidore Fried OR Sarah Esther Salzman:

Theory 6

Just like with hypothesis four, this hypothesis had been ruled out before I was able to add Auntie V’s data, but having a hard no in the oldest tested generation was valuable to me.  Please note that the other four comparisons in that oldest tested generation were not very likely at all.

This example also highlights the fact that for every relationship you are testing that is not in the same generation, the relationship tables should be reversed.  Technically, the numbers are identical and the data table itself doesn’t need to be created again, but as a beginner, I didn’t want to overlook anything and went for some redundancy rather than missing anything.  You can also rule out some reversed possibilities because they aren’t logical.

 

Hypothesis 7 – John Costello and Isidore Fried OR Sarah Esther Salzman are first cousins once-removed, or 1C1R:

Theory 7

This theory was fully ruled out before adding Auntie’s V’s data, but I added it anyway.  I really like keeping my hard nos in my report for now.  It gives me more confidence when I look at what is left over as possible.

 

Hypothesis 8 – This is the same as hypothesis 7, but the relationship table is reversed:

Theory 8

 

Hypothesis 9 – John Costello and Isidore Fried OR Sarah Esther Salzman are first cousins, or 1C:

Theory 9

This hypothesis was looking very weak, but adding Auntie V’s data fully ruled it out.

At this point, I was left with only one possibility.  But the records about Isidore Fried are quite thin and he disappears in 1911, while John appears in 1917.  So I got to thinking, what if John Costello IS Isidore Fried?  So I added a tenth hypothesis to test.

 

Hypothesis 10 – John Costello IS Isidore Fried:

Theory 10

While this hypothesis isn’t nearly as strong as the hypothesis that John is the sibling of Isidore or Sarah, it is still possible.  Right now, I don’t have any documentary evidence that Isidore has a brother.  Additionally, cluster number 2 are also Frieds and they have a brother named Isidore.  Isidore appears to be the connection between the two clusters.  If I am correct in my tentative conclusion that Isidore connects my two closest clusters together, then that means that if John is the sibling of Isidore OR Sarah, it has to be Isidore and not Sarah.  Since right now I have no proof that Isidore has a brother, it makes me extra curious about the weaker possibility that John IS Isidore.

 

Factoring in Endogamy

 

Oh boy.  This is the area that I can’t seem to find clear, consistent direction on how to do this.  I’ve heard a few times that you should throw out segments that are under 7 cMs, but when I did a quick check I couldn’t find the source of that idea.

In case it is correct, I ran some handwritten updates to my ten hypotheses.  I adjusted the shared cM totals for Auntie V to Bernard from 264 to 245.84 cMs.  I adjusted the shared cM totals for my Mom to Bernard from 177 to 165.5 cMs.  These changes to my tables had minimal impact.  Only three probability categories changed.  But the overall assessment of each table did not change.  Hypotheses that were ruled out were still ruled out and hypotheses that were possible were still possible.  (Please note that I can’t update all of the numbers because not every vendor shows segment data.  There are additional numbers I could update, but by checking the highest shared cMs and the lowest shared cMs between members of the oldest tested generation I feel like I have considered the “toss every shared segment below 7 cMs” idea.)

Studying Lara Diamond’s endogamy tables is helping me form some generalized ideas about how endogamy impacts the numbers.  Based on comparing my data to her tables, I feel like my ruled out and possible theories are still correct.  However, I don’t know how to or if it’s even possible to quantify that.

 

Conclusion

 

After running all of the numbers, I have two theories to try to prove and disprove.  That is an awesome feeling!

This method is not something I read or saw somewhere, it’s just how my brain settled on trying to sort out what the data was really telling me.  It is definitely a work in progress and the DNA alone will not answer this question.  Now that I have two theories to look at, I’m moving on to records.  Can I prove or disprove either theory?

I’m still trying to gather more DNA data as well, but in the end, this problem – like all genealogical problems – cannot rely on DNA alone.  The answer will come in carefully analyzing and correlating the DNA data along with the documents.

 

 

ps – In my large report, I have an explanatory page for my family members.  Here is an image of that page.  It may interest you:

Reading the tables

 

Phew!  That was a doozy.  Happy Tuesday!!  I hope this answered the questions I have received and wasn’t too scary for those of you who aren’t fans of the numbers.  Please feel free to ask any questions you may have.  xoxo

 

 

25 thoughts on “Finding John Costello – A DNA Journey: Isidore Fried & Sarah Esther Salzman Data Analysis”

    1. Thank you, Katie! And you are most welcome. I hope it helps someone. Do you have a recent DNA challenge you are working on?

    1. Thank you! I’m so glad that most of it made sense. And, no surprise that I can’t yet write well about endogamy – it is still a bit vague for me, but I’m working on it. ❤️

  1. This is fantastic, Amberly, and I actually understood just about everything. So if you will indulge me, here are my two (for now) biggest questions: If your mother shares fewer cM with Bernard than Aunt V, why are her probabilities higher for the proposed relationships than Aunt V’s are? (This may reveal that I am still confused….numbers, numbers….). And my second question—why did you do this: “I further shorten the list by taking the lowest number of shared cMs between people in the oldest tested generation and looking at their relationship possibilities with the Shared cM tool.” What is the reason for doing that?

    One other comment—have you been in touch with Leah Larkin of the DNA Geek? I noticed that the probabilities came from her blog, and I bet she would be really interested in what you are doing. If you need contact info, let me know. She was one of the women who was helping me with my DNA questions a few years back.

    I wish I could do this with some of my bewildering matches, but I don’t have a cluster to compare with like you do. I have my known cluster and then lots of matches who don’t come with their own cluster. But this is really incredible work you have done and should be widely published to help others who are struggling to make sense of their DNA results. Brava to you! Great work!

    1. Thank you, Amy!! ❤️

      Okay, your first question – “If your mother shares fewer cM with Bernard than Aunt V, why are her probabilities higher for the proposed relationships than Aunt V’s are?”

      Okay, so first, my Mom and Auntie V are full sisters so setting the numbers aside, they share the exact same relationship with Bernard. But they share different amounts of DNA with him so they create the outer boundaries for what the possible relationships are between the two of them and Bernard. If you look at the only two relationships that are still considered possible, hypotheses 2 and 10, my Mom’s probability is higher for hypothesis 2 than Auntie V’s but lower for hypothesis 10 than Auntie V’s. What you are seeing there is the edges of the range of the whole cluster when looked at together. Another way to consider this is quite simple. Hypothesis 10 is a closer relationship than hypothesis 2, Auntie V shares more DNA with Bernard than my Mom, so the closer relationship hypothesis will have a higher probability for Auntie V. Then just reverse that. Does that make sense…?

      Then your second question about using the list of possible relationships to shorten the relationships I will test… I didn’t explain that very well. What I am looking for are relationships that occur as possible on both lists. If it is possible for one but not the other, there is no reason to check that relationship. Is that clearer? (Sorry, it’s much more difficult to write about something that is a developing body of knowledge within yourself than to write about something you are an expert on. ;))

      I am Facebook friends with Leah and I have asked her a few questions this past year. She is one of the individuals who worked on the Shared cM Tool on DNA Painter. She is awesome! I doubt very much that she is aware of my project. I’m just a rookie over here. 😉

      Thank you again for your very kind words, Amy! Have you seen the new auto-cluster tool on MyHeritage? It might be helpful for you to cluster your matches together. I have looked at it, but haven’t been able to accomplish much using it. So take my recommendation with a grain of salt. If you haven’t tested with MyHeritage, you can transfer your data there for free and then there is a small fee to turn on all of the tools (I can’t recall what the fee is, sorry).

      1. OK–thanks! I need to go back to the original post again so I can see what you mean about your mother and aunt.

        All my kits have been uploaded to MyHeritage (and FTDNA and GEDmatch). But I haven’t seen the auto-cluster tool. Will go check it out. And I am glad you have been in touch with Leah. She is awesome, and you should definitely tell her to look at what you have done! Don’t be so modest!!

        1. I found it, but have no idea what to do with it! I can see that the known relatives are all in one cluster, but no one else is, so nowhere to go with that….

        2. Wait, you only have one cluster? Or all of your known relatives are only in one cluster?

        3. No, no—all my TESTED relatives are in one cluster—my Brotman relatives, my biggest brick wall.

      2. Don’t give up on me please—but I am still confused. I see the Chart for Possibility 2—and I see that your mother shares 177 cM with Bernard where as your aunt shares 264 cM. Yet your aunt has a lower probability of being Bernard’s 2c1r than your mother does. How can that be if she shares more DNA with him? Am I just not reading the charts correctly? Or are you saying that that is too distant a relationship to exist if they share that much DNA? AH–I think that’s what it means. Am I right?

        Of course, with endogamy, I am so used to seeing shared cM numbers that do not line up with the usual expectations that I didn’t think that way. But I think I now get it. Phew!

        1. Yay!! It just struck me all of a sudden as I was writing my comment. I always say writing things out makes them more clear. 🙂 Thank you!

        2. You are welcome! And I totally agree, if you are trying to explain what you are thinking, writing it out is a great way to clarify your own thoughts. 🙂

  2. Oh my. I understand some of it. But will need to sit down with a full mind and just focus. I do understand that there is some overlap of DNA because of intermarriage. For example, two of my great grandparents were first cousins. And many cousins married each other. So that changes relations to a degree. Thank you for doing this.

    1. You are welcome, Ellen! Yes, this method would be less effective for you than it is for me. John Costello is the only person contributing Jewish DNA to his descendants so my Mom’s generation is 25% Jewish. That reduces the endogamy tangles somewhat for us. They are still there, but they aren’t as tricky to work around at this relationship distance. Good luck with you DNA learning! It has been an uphill effort for me, but it is definitely paying off! ❤️

  3. Great post Amberly. Fantastic to follow how you have been digging away at the various possibilities to form some exciting possibilities. I am so intrigued and looking forward to what further discoveries you will make.

    1. Thank you, Alex!! It is so great to be making some progress. Now I just have to get myself into Europe… 😉

  4. Wonderful post, Amberly. You do a great job of keeping track using your tables. Even though I love numbers, these are my downfall. The hardest part of working with the DNA matches is having them spread across testing companies. That’s why I use Genome Mate Pro to keep track of my matches. It’s all about what is easiest for me or for you. And we can learn from each other.

    1. Thank you, Cathy! I haven’t looked at Genome Mate Pro. I’ll have to check that out. You are exactly right about doing what works best for yourself. It is tricky to keep track of it all at the various companies.

      I doubt my method would work for very many people. It really is all about the fact that John Costello is a very recent ancestor and someone who clearly changed his identity. Most of the time, you can use the DNA to point you to records that “make sense” and help you connect to an existing group of family members. Not so with John! I haven’t come across anyone else with both of those factors – great-grandparent who changed their identity so I am guessing it is not super common. But who knows?!

      1. GMP has a steep learning curve and not a program you can start using without reading the manual.

        I know what you mean with your John and his changing his identity. I suspect the same will be true for my 2nd great-grandfather. I should be seeing 4th cousin matches who also match my 2nd and 3rd cousins with the same surname but they are NOT THERE. I’m seeing two clusters of matches who do not make sense.

        These past two weeks I’ve forced myself to research in a completely different area to get my mind off of the problem. And I’ve made some great discoveries on the Luxembourg front.

        1. Good to know, thanks Cathy. Maybe it’s not for me right now, then.

          Oh! That is frustrating. About what range are the two clusters to you? Is it close enough that you expect it to be solvable? (I hope so!)

          I’m glad you have other work that is satisfying. I feel really grateful that I have so many ongoing projects. If I was only working on the John Costello problem I think I might go a little bit crazy. All of the waiting for family members to test/transfer/answer questions would be too much. 😉

        2. I fluctuate between believing the matches are significant enough to be solvable and wondering if it is maybe just a pile up region. I’m certain these are matches who point to the area behind my (2ggf) William A. W. Dempsey brick wall. The highest ones are matches with 40 cMs segments.

Leave a Reply