TOUGH Textual Criticism math statistics question

  • I
  • Thread starter Eturnal
  • Start date
  • Tags
    Probability
  • #1
Eturnal
9
0
TL;DR Summary
If text A is 87% similar to text Z

And text B is 87% similar to text Z

Text A & B are 92% similar

Each text is 10000 words (approx)



What are the odds that when Text A and Text B disagree one of them will agree with Text Z?
Okay let me rephrase this math question and frame it. It is math dealing with ancient Biblical texts and textual criticism.
.
Codex 01 (350AD) agrees with the MT (mjority text) about 87% of the time.

Codex 03 (350AD) Agrees with MT about 87% of the time.

01 and 03 agree with each other about 92% of the time.

When 01 and 03 disagree, there is an 87% chance that one of them agrees with the MT.

Wouldn't you expect that number to be lower if these disagreements were random?

Please help math geniuses. Thank you!
 
Last edited:
Physics news on Phys.org
  • #2
If this is a homework-type problem then there is a specific format for that. You have to show some work and then we can give hints and guidance. Can you express each problem statement in terms of conditional probabilities or other formal mathematical expressions?
 
  • #3
Eturnal said:
TL;DR Summary: If text A is 87% similar to text Z

And text B is 87% similar to text Z

Text A & B are 92% similar

Each text is 10000 words (approx)
What are the odds that when Text A and Text B disagree one of them will agree with Text Z?

Okay let me rephrase this math question and frame it. It is math dealing with ancient Biblical texts and textual criticism.
.
Codex 01 (350AD) agrees with the MT (mjority text) about 87% of the time.

Codex 03 (350AD) Agrees with MT about 87% of the time.

01 and 03 agree with each other about 92% of the time.

When 01 and 03 disagree, there is an 87% chance that one of them agrees with the MT.

Wouldn't you expect that number to be lower if these disagreements were random?

Please help math geniuses. Thank you!
I can't make complete sense of what you are assuming and what you want to calculate based on those assumptions. There is also, perhaps, a missing factor about how many options are there for a text, which deteremines the underlying probability that two random texts will agree on something? Finally, you are more in the realm of statistical inference (hypothesis testing) here than in simple probabilities. More detail on this follows:

It seems you could model the situation by considering each text to be a list of things - possibly statements in this case. Perhaps each text makes one hundred statements about some things. If these are true/false statements, then each text is modelled by a binary string one hundred characters long. But, if these statements have more options than just true/false or A/B - let's say each statement has five options - then each is a string of 1/2/3/4/5 or A/B/C/D/E one hundred characters long. The first thing you need is an appropriate model like this. This is technically called a sample space.

Once you have an appropriate sample space for your problem, then you can start doing some hypothesis testing. That means to you to (precisely) frame a hypothesis to test!
 
  • Like
Likes Eturnal and Dale
  • #4
Eturnal said:
TL;DR Summary: If text A is 87% similar to text Z

And text B is 87% similar to text Z

Text A & B are 92% similar

Each text is 10000 words (approx)
What are the odds that when Text A and Text B disagree one of them will agree with Text Z?

Wouldn't you expect that number to be lower if these disagreements were random?
I don’t think that we can answer this. The similarity measure doesn’t seem like a probability. So we can’t really have any expectations on what the similarity measure should be in unmeasured situations.
 
  • Like
Likes FactChecker and Eturnal
  • #5
Thanks for your reply. Having a hard time getting started on this problem and my math is rusty. The answer is not 87% of the
 
  • #6
So there are about 7,000 words in each text in the texted section.
.
but we should be able to do the percentages so that shouldn't matter right? We have 8% of each text agreeing to disagree. Then we take a random splatter of 26% between the two texts (13% each text). But oh yeah we would have to account for all the letter/ word possibilities in those slots as well wouldn't we?
 
  • #7
Eturnal said:
So there are about 7,000 words in each text in the texted section.
.
but we should be able to do the percentages so that shouldn't matter right? We have 8% of each text agreeing to disagree. Then we take a random splatter of 26% between the two texts (13% each text). But oh yeah we would have to account for all the letter/ word possibilities in those slots as well wouldn't we?
This is even less clear than your original post. Mathematics and statistics require a well-defined problem - even if the definition includes uncertainties and probabilities. This is different from the humanities where you can argue endlessly over ill-defined concepts!
 
  • Like
Likes Dale
  • #8
PeroK said:
This is even less clear than your original post. Mathematics and statistics require a well-defined problem - even if the definition includes uncertainties and probabilities. This is different from the humanities where you can argue endlessly over ill-defined concepts!
Tell me if I'm heading in the right direction here.

Take two texts of 100 characters and highlight 8% representing the disagreement between 01 and 03.

Now randomly select 13% of each text representing the disagreements between 01/03 & the MT.

What are the odds those 13 places overlap the 8%?

Now, should we account for each character having 24 different options (letters in the Greek alphabet)? Or should we just pretend it is an A/B in eaxh character slot for simplicity sake?
 
  • #9
PeroK said:
This is even less clear than your original post. Mathematics and statistics require a well-defined problem - even if the definition includes uncertainties and probabilities. This is different from the humanities where you can argue endlessly over ill-defined concepts!
Someone posited that there is an 87% chance that the disagreements between 01 and 03 land on MT just because each agrees with MT 87%. I don't feel like their math is correct. Thank you for your help!!!
 
  • #10
Eturnal said:
Take two texts of 100 characters and highlight 8% representing the disagreement between 01 and 03.

Now randomly select 13% of each text representing the disagreements between 01/03 & the MT.

What are the odds those 13 places overlap the 8%?

Now, should we account for each character having 24 different options (letters in the Greek alphabet)? Or should we just pretend it is an A/B in eaxh character slot for simplicity sake?
You can only produce probabilities from a large sample of data or by understanding where the source material came from and hence you have some underlying assumptions.

It's a common misconception that probabilities can be conjured from one sample of data without underlying assumptions. I think this is what you trying to do here. That somehow, these percentages themselves will reveal something mathematically robust.

You can determine from the texts how correlated they are (and, indeed, that's what your percentages are trying to show). But, there is no magic wand that will tell you how likely that correlation was. The probability of a given correlation is not inherent in the data. It can only be calculated when you have a model for how the data was generated. The same correlations might be almost inevitable in one case and highly unlikely in another - even in cases where the raw data is the same.
 
  • #11
PeroK said:
You can only produce probabilities from a large sample of data or by understanding where the source material came from and hence you have some underlying assumptions.

It's a common misconception that probabilities can be conjured from one sample of data without underlying assumptions. I think this is what you trying to do here. That somehow, these percentages themselves will reveal something mathematically robust.

You can determine from the texts how correlated they are (and, indeed, that's what your percentages are trying to show). But, there is no magic wand that will tell you how likely that correlation was. The probability of a given correlation is not inherent in the data. It can only be calculated when you have a model for how the data was generated. The same correlations might be almost inevitable in one case and highly unlikely in another - even in cases where the raw data is the same.
Humor me. I'm sure someone can draw up a useful piece of math on this although yes everything will have some assumptions plugged in.
 
  • #12
Eturnal said:
Humor me. I'm sure someone can draw up a useful piece of math on this although yes everything will have some assumptions plugged in.
Being an inveterate frequentist, I'll leave that to the Bayesians!
 
  • Haha
Likes Dale
  • #13
PeroK said:
Being an inveterate frequentist, I'll leave that to the Bayesians!
It's challenging I know
 
  • #14
Eturnal said:
Someone posited that there is an 87% chance that the disagreements between 01 and 03 land on MT just because each agrees with MT 87%. I don't feel like their math is correct. Thank you for your help!!!
I don’t think that these percentages for agreement are probabilities. Probabilities are between zero and one, so it is common to write them as percentages. But that doesn’t imply that everything that is written as a percentage is a probability.

In particular, a probability is always a measure on some space of events. For example, if you are rolling a single dice then the space of events could be “a 1 is rolled”, “a 2 is rolled”, …, “a 6 is rolled”.

Here, I cannot see that there is a space of events. So I don’t think that “text A is 87% similar to text Z” is a probability. If it is a probability then what exactly is the event space and what sample of events is described by the statement?

Eturnal said:
Humor me. I'm sure someone can draw up a useful piece of math on this although yes everything will have some assumptions plugged in.
I don’t think that will be possible without some additional information about the similarity measure. It doesn’t seem like a probability to me. So I don’t think the math of probability will apply.
 
  • #15
Dale said:
I don’t think that will be possible without some additional information about the similarity measure. It doesn’t seem like a probability to me. So I don’t think the math of probability will apply.
And if a Bayesian can't do it, then nobody can!
 
  • Like
Likes Dale
  • #16
PeroK said:
Being an inveterate frequentist, I'll leave that to the Bayesians!
I am a Bayesian, so I am happy to assign probabilities without a lot of data. But I still need an event space, just like the frequentists.
 
  • #17
Dale said:
I don’t think that these percentages for agreement are probabilities. Probabilities are between zero and one, so it is common to write them as percentages. But that doesn’t imply that everything that is written as a percentage is a probability.

In particular, a probability is always a measure on some space of events. For example, if you are rolling a single dice then the space of events could be “a 1 is rolled”, “a 2 is rolled”, …, “a 6 is rolled”.

Here, I cannot see that there is a space of events. So I don’t think that “text A is 87% similar to text Z” is a probability. If it is a probability then what exactly is the event space and what sample of events is described by the statement?

I don’t think that will be possible without some additional information about the similarity measure. It doesn’t seem like a probability to me. So I don’t think the math of probability will apply.
 
  • #18
Anyone want to try their hand at being amazing?
 

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
17
Views
2K
Replies
33
Views
5K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
3K
  • Math Proof Training and Practice
2
Replies
57
Views
8K
  • Math Proof Training and Practice
2
Replies
67
Views
10K
  • Calculus and Beyond Homework Help
Replies
7
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
12
Views
1K
  • STEM Academic Advising
Replies
4
Views
2K
  • Precalculus Mathematics Homework Help
Replies
4
Views
3K
Back
Top