For much of the twentieth century it was considered an accepted fact amongst Biblical scholars that the prophecy of Isaiah was written in two or more pieces[1]. This was a process of thought that commenced in 1780 with questions regarding Isaiah 50 and which probably reached critical mass when the celebrated and conservative Franz Delitzsch conceded in 1880 that Isaiah 40 and onwards was probably written at the end of the exile[2]. Thus the notion of 'Deutero-Isaiah' was born.
The disintegration process however went much further. By the early nineteen hundreds it was already common to slice off chapters 56 to 66 into a third piece[3] believed to have been written around 450BC. Yet the knives were now unshieved and Isaiah became a fertile ground, for anyone with an opinion, to slice and dice as they felt fit. In 1910 Professor George Robinson chronicled various 'radicals' that had left only 262 of the 1292 verses of Isaiah to the original prophet.
The modern position has now shifted to a more redaction based model. This asserts that Isaiah is really a patchwork for many different written and oral records all compiled and woven into one fabric. Depending upon how you look at it this position either asserts the unity of the book or claims that the book is such a complex collection of disparate pieces that one has to treat it as a unity[4].
Of course running entirely counter to modern critical thought many believing Christians have been content to take the Word of God at face value and simply trusted the that book of Isaiah was written by Isaiah. To a large extent I personally view this as the most profitable path; whilst there may be much in the discussion that is academically amusing there is very little that would provide spiritual growth.
Nonetheless, as I have dealt with elsewhere at length[5], I do not think the contention over Isaiah is unexpected or insignificant. One of the core arguments against the unity of Isaiah is that it contains certain predictive prophecies that would be completely astounding if they were genuine. The most famous of these is the naming of Cyrus[6] in a book which claims to have been written at least a hundred years before he was born. This would be intriguing enough but the really incredible fact is that the predictions of Cyrus occur in the middle of eight different places in Isaiah[7] where God explicitly states that one of the key things that differentiates Himself from other 'gods' is that he knows the future and tells it to people.
Therefore we see that accepting the unity of Isaiah on faith 'despite all the evidence' may leave one in a theologically correct position; however it has robbed the believer of one of the more visible proofs God has given of His character and abilities. Therefore to leave the 'evidence' out there unchallenged and unquestioned is to concede an important piece of ground to the critics. It may even prove to be a piece of ground that stumbles some of our young[8].
Notwithstanding the above it is not my intent to launch a deep investigation into, or attack upon the surgeons of Isaiah. Instead I wish to delve deeper into the exact nature of one of the subject matters that often occurs in these debates: The Literary Style of Isaiah. More specifically I wish to look at what I shall call the linguistic artifacts that are present in the book. I will discuss this more mathematically under 'Linguistic Coincidence' but for this introduction a more layman's view of the concept may be beneficial.
The vocabulary used by a piece of literature will be driven by three primary factors:
The argument therefore runs that if Isaiah was written by an individual about a single subject it should have a fairly consistent vocabulary throughout and which may differ from the vocabulary of other books. In fact given the critics assert that their original reason for splitting Isaiah was that it addresses different subjects; and they further assert that the books were written for different peoples at different times the only legitimate reason for any similarity between the books whatsoever is that they were written by the same individual.
It should perhaps also be noted that I italicized legitimate as the other possibility is a forgery or impersonation. This would be a strange claim to make as one of the features of Deutero Isaiah is supposed to be his advanced enlightenment compared to the previous author. Nonetheless for completeness it is a possibility we will consider in the following.
Many conservatives have attempted to seize upon this logic to assert unity of authorship. Robinson asserts that the divine title 'the Mighty One of Israel' occurring three times in Isaiah and nowhere else is 'singular'. He is further impressed that 'streams of water' occurs twice in Isaiah yet nowhere else. Many others have also selected individual phrases and repetitions of occurrence and claimed that it showed something.
I would suggest that these fragments of evidence do suggest something but we have no real way of knowing what they suggest until we have a much better understanding of the vocabulary overlap of the Old Testament in general. How common was it for two unrelated books to happen to share a phrase that is otherwise unique?
My intent in this paper and the project it annotates is to construct a set of mathematical metrics to measure the closeness of two Biblical books. In other words: to produce a set of questions which, if objectively answered, give us factual evidence of the literary similarity of two books. Secondly I aim to execute and document these metrics against the Old Testament canon with the hope of producing results which can become an objective factual basis against further conversations of this form.
In my opinion one of the first mistakes that many people make when attempting a linguistic analysis of Isaiah, or any other Biblical book for that matter, is that they perform their analysis in an English text. This has to be a mistake for whilst the theological content is accurately rendered in the target language the linguistic artifacts of culture and personal preference will be those of the translator not those of the author in the original language. One only has to compare good sound English translations from different centuries to see the extent to which word choice can deviate wildly even if the semantic content is relatively static.
For this reason I wish to perform this analysis at the level of the Hebrew text. Further I suggest it is useless to simply look at Isaiah because we have no point of reference to define what is reasonable; therefore the whole Hebrew canon will be considered. The New Testament is left out of the analysis because it is written in Greek and the Greek language has different properties from the Hebrew[9].
Of course the downside to studying this in Hebrew is that I don't speak Hebrew and the programming languages that are out there do not easily lend themselves to processing Hebrew text. The problem moves from difficult to extreme when it is considered that the same word may appear in different lexical representations to denote shift of tense, plurality and gender. Fortunately these problems can be reduced or even removed using one simple device: the Strong's Number.
The Strong's number is effectively a numeric encoding of the Hebrew (or Greek) of the original language. It is usually used to allow easy reference from a translated English word to the original language. A number of translations are available with the Strong's numbers interspersed with the text. For our purposes it does not matter which translation we use as we can simply remove the English text leaving behind a sequence of numbers.
Whilst the sequence of numbers will then be entirely unreadable to a human, to a machine they will be entirely legible. Further we know that numeric equivalence at the Strong's level implies word equivalence in the original language. Thus we have turned a potentially complex parsing problem into one of simple numeric comparison.
Perhaps the strangest feature of the methodology, at least for someone as conservative as me, is that the process will start with the presumption that Isaiah is split into two pieces; Isaiah 1-39 and Isaiah 40-46. The reason is that we wish to examine those links that exist between the two halves and see if they are stronger or weaker than the links that exist between other books in the canon. In order to count or measure the links between two entities it is first necessary to have two entities; thus Isaiah has to be split.
For those of you more mathematically or logically inclined, the process I am really using is Reductio Ad Absurdum. By basing the process on an assumption (two Isaiah's) I aim to produce a set of statistics that may suggest that those two pieces are abnormally close. If I do so then that suggests a flaw in the process, which I would suggest is the opening assumption of their being two Isaiahs!
Whether you agree with this approach or not: in the following I shall refer to Deutero-Isaiah by which I mean Isaiah chapters 40-66. The term Isaiah shall refer to the first 39 chapters. Again for clarity - this is an assumption I am making for analytic purposes - it need not[10] reflect my opinion of reality.
One of my principle motivations to consider this problem was the knowledge that it would be possible to produce a lot of raw investigative material using readily available resources. For this project all that is required in a list of Strong's numbers by verse. However for the purposes of this exercise I acquired the texts as follows:
Whilst the language processing that this project attempts is theoretically advanced the programming concepts involved are not. Reading some twenty megabytes of text is something that a home PC can do in under a second using almost any language. The process is linear and can be accomplished in batch mode. Further the eventual amount of coding will be trivial by modern standards. Thus most of the advances made in programming language theory and practice in the last twenty years are not really going to be required to tackle this project.
As there was no compelling requirement for any given approach I instead deferred to a couple of pragmatic considerations. Firstly it may be useful to be able to showcase the results of this work; this is most readily done on the Internet. Secondly I already have a substantial amount of Perl code that processes Biblical texts that I use for my own website. The result of these two considerations is that I chose to use Perl for this exercise; almost any language could be used if the project were undertaken seriously.
There is no copyright on the KJV outside of the United Kingdom; I do not know of any restrictions placed upon the Strong's encoding of the KJV. It should be noted that the KJV text itself is only used for ease of reference so obtaining an entirely perfect version is not required for this project.
Whatever the source of the textual version a necessary precursor to performing linguistic analysis is to transform the text into a 'processing friendly' format to allow downstream processes to occur independent of the format of the input data[12]. This process is usually referred to as ingestion and the code normally has to be written for every file that is going to be used. Ideally the process is run once at the start of the project and the output of the process is then used for the remainder of the project.
The format I have personally standardized upon has one verse per line and requires the lines to be ordered to follow the Biblical sequence. Each verse is preceded by an 11 character descriptor that defines the verse that follows. The format is BB:CCC:VVV where BB is a two character book number, CCC is three characters for the chapter number and the VVV is three characters for the verse number. Thus 01:001:001 corresponds to Genesis chapter 1 verse 1. In this particular case the input text actually lists the books in alphabetic order so the text had to be processed 17 times to emit the books in the sequence I wanted.
For this particular use I changed my normal format in two ways. Firstly I spotted chapter 40 and onwards of Isaiah and moved it into a new book 67. Secondly, as I am only using the Strong's numbers, I removed all of the English text and Hebrew annotations to leave a simple stream of numbers for each verse.
One point worth noting is that all of the observations that follow are the result of one 180 line Perl program written and tested over the course of one week. Computer science moves rapidly; sixteen years ago I spent six months compressing the Bible text down so that it could realistically be loaded onto the PC of the day. Analytic and linguistic research is now programmatically available to just about everyone.
The first part of the puzzle we need if we are to form a mathematical model for the relationships between two books is a measure of the size of each book. This is because, all other things being equal, the chances of something odd occurring in a book[13] should be proportional to the length of a book. The traditional way of measuring a book tends to be in terms of verses. This certainly is a good approximation to the length of the book but it is also somewhat arbitrary as the verses to not appear in the original Hebrew. Instead therefore I intend to measure the total number of words each book contains.
There is another measure which may well be interesting too and that is the vocabulary size[14] of each book. One might expect there to be a rough correlation that longer books will have larger vocabularies. The size of vocabulary however can also be a strong indicator of literary style[15]. The aim would therefore be to plot a graph of book length to vocabulary size and see where the various books fall.
The graph above shows all of the Old Testament books plotted by Book Length and Unique Word count. For the shorter books the number of unique words is about a third of the number of words. Then as the books get longer the increase in vocabulary size decreases; Ezekiel, Genesis, Jeremiah and Psalms being the four longest books. However in addition to the trend it is useful to note the outliers. Points moving towards the bottom right of the graph have abnormally few words for the length. Thus Leviticus stands out as having a low vocabulary. Heading towards the top left end we find Isaiah is exceptional for having a very large vocabulary for its (remaining) length. 1 Chronicles and Job come in second and third, then for its length Deutero-Isaiah is a stand-out amongst the mid-length books for vocabulary size.
One of the crudest measures of distance between two books is the measure of the number of words that only exist within two given books. This will identify common subject matter and it may point to some idiosyncrasy of the author. Of course the chances of two books sharing an otherwise unique word increases with the length of the book. Thus for every pair of books, I counted the number of words that they and they alone use. I also scaled that number based upon the size of the book by taking the number of shares, multiplying it by the square of the average book length and dividing it by the two book lengths. Thus if the average book length was 1000 words and I had 15 co-occurrences between two books of length 1200 and 800 the scaled result would be 15 * 1000 * 1000 / 1200 / 800.
I computed these numbers for every pair of books in the Old Testament: however I will be presenting three tables. Firstly the top ten pair matches across all books. This will allow us to verify that the measure is somehow meaningful. The top six pair matches for Isaiah and then the Top six pair matches for Deutero-Isaiah. These will obviously allow us to see how close the books are to each other but also if they are both close to the same other kinds of books.
Book | Book | Co-occurences | Scaled Co-occurences | Comments |
---|---|---|---|---|
Ezra | Daniel | 151 | 300 | This is the most significant tie-up both actually and scaled. I was rather surprised when I first saw the linkage. One doesn't think of them as similar books. However they were both written by men that had spent substantial time in a Persian court. This number would suggest that the Hebrews of the exile developed a new section of vocabulary not shared with the earlier prophets. |
Genesis | 1Chronicles | 105 | 26 | Of course a sizeable genealogy or two can easily cause high correlation. |
Ezra | Nehemiah | 75 | 178 | This is the second highest scaled result and third highest result. Ezra and Nehemiah are of course known to be closely related books. An expected and encouraging result. |
2Samuel | 1Chronicles | 37 | 17 | Another confirmatory result. Both books are known to focus upon King David and therefore share some common core vocabulary. |
1Chronicles | Nehemiah | 21 | 18 | This one also caused me to frown until I checked in my study Bible[16]. 1 Chronicles is believed to have been completed around 425BC and Jewish tradition assigns it to Ezra. Whilst not the same book this statistic suggests the same time and setting. |
2Kings | Isaiah | 19 | 8 | Having just studied Isaiah I was ready for this one. The account in 2Ki of Hezekiah and that in Isaiah are clearly very close; this is reflected in an amount of common core vocabulary. |
1Chronicles | 2Chronicles | 19 | 6 | The correlation between these two books in time and content is well known; it should be no surprise that they are also linked linguistically. |
Leviticus | Deuteronomy | 18 | 7 | Again a correlation between two books known to have been written at a similar time by the same person. |
Job | Psalms | 17 | 5 | This one is interesting and may suggest that some of the Psalms came from the region of Job. It could also suggest some common wisdom vocabulary. However the low scaled result should be noted. It could just suggest that Psalms is a big book. |
Joshua | 1Chronicles | 16 | 8 | This correlation is probably historic. 1 Chronicles briefly recounts the history of the time up to David and Joshua is the only other book covering the history of the invasion of Canaan. |
In many ways this table contains no news; or at least very little that was not already available. However this is good news. It suggests that the measurement of unique words between two books does tend to correlate with known links between the two books. This adds some validity to the measure. We have also seen linkages due to subject matter[17], culture[18] and probably author[19].
Perhaps the one slightly disappointing thing for the conservatives is that Isaiah and Deutero-Isaiah do not make it into the top 10 linked books. In fact the two books would appear at number 35 on this list. Looking at the books that are linked to Isaiah suggests why:
Book | Book | Co-occurences | Scaled co-occurences |
---|---|---|---|
2Kings | Isaiah | 19 | 8 |
Isaiah | Jeremiah | 16 | 4 |
Psalms | Isaiah | 13 | 3 |
Job | Isaiah | 12 | 8 |
Proverbs | Isaiah | 10 | 7 |
Isaiah | Deutero-Isaiah | 8 | 6 |
We have already noted the 2Kings passage that corresponds to Isaiah. Then we find Jeremiah who predicted and lived through the fall of Jerusalem. This is obviously the same fall that Isaiah predicted. We then find Isaiah drawing his vocabulary from the, admittedly large, body of wisdom literature most of which had been written in Jerusalem some hundred and fifty years before. Bringing up sixth place, although fourth in terms of significance is then Deutero-Isaiah.
Looking at the table for Deutero-Isaiah is equally instructive:
Book | Book | Co-occurrences | Scaled co-occurences |
---|---|---|---|
Psalms | Deutero-Isaiah | 13 | 5 |
Isaiah | Deutero-Isaiah | 8 | 6 |
Job | Deutero-Isaiah | 7 | 6 |
Proverbs | Deutero-Isaiah | 6 | 6 |
Exodus | Deutero-Isaiah | 6 | 3 |
Deutero-Isaiah | Jeremiah | 6 | 2 |
Of the top four co-occurrences for Deutero-Isaiah we find three of the same wisdom books that featured for Isaiah. Also in second place we find a link to Isaiah. In sixth place we find a link to Jeremiah (which was second placed for Isaiah). The one new book we find is a throw back to the book of Exodus; Exodus had occupied the eight spot for Isaiah.
What we therefore see is that Isaiah and Deutero-Isaiah share words with each other but also with the wisdom books and Jeremiah. As we are looking for words unique between two books the fact that we appear to have a cluster of books using similar language will actually reduce the chances of any two of them having a unique pairing.
Additionally the very power of looking for unique words in pairs of books is also its greatest weakness. These words are by definition oddities: they occur in low numbers. Therefore there is a danger that the noise of randomness[20] will actually distort some of the truth in the underlying data. Fortunately both of these problems can be somewhat ameliorated by altering our concept of a word.
Counting the number of words that only occur within two books does make sense but it assumes that words appear independently within text. This is obviously not true; any word in a given sentence often has an explicit grammatical or semantic link to the word next to it. For example nouns are often preceded by an adjective. Verbs are often preceded by an adverb. The rules for Hebrew and English are different and frankly I do not know them well enough to produce all of the meaningful word pairs from a sentence. Given we are looking for oddities however, we can simply produce a list of all of the word pairs and those that are pure chance have an extremely low chance of being found in another book.
For clarity I will give a small example of what I am doing:
The large dog bit the small cat
Will produce a sequence of word-pairs thus:
The Large, large dog, dog bit, bit the, the small, small cat
A grammarian would tell you to drop the pairs using the article (the). However my assumption is that articles are sufficiently common that the act of looking for uniqueness will implicitly drop them out unless they are used in an odd context in which case they are interesting anyway!
There are far more individual word pairs than individual words and thus there are far more instances where a word-pair is only extant in two books. In fact the numbers go from about fourteen hundred instances to just over eleven thousand. This will help to even out any random noise. We shall now proceed to look at the three Top tables again.
Book | Book | Occurences | Scaled Occurences | Comments |
---|---|---|---|---|
1Kings | 2Chronicles | 611 | 202 | Parallel narrative of same period |
2Samuel | 1Chronicles | 329 | 152 | Parallel narrative of same period |
2Kings | Isaiah | 317 | 148 | Shares narrative of Hezekiah |
2Kings | 2Chronicles | 288 | 101 | Parallel narrative of similar period |
Genesis | 1Chronicles | 252 | 62 | Sharing some major geneologies |
Ezra | Nehemiah | 209 | 496 | Accounts written at similar time about similar subject possibly with new vocabulary. |
Exodus | Numbers | 199 | 42 | Continuing narrative of same period |
Leviticus | Numbers | 176 | 55 | Parallel accounts of same period |
2Kings | Jeremiah | 174 | 40 | Cover same period |
Exodus | Leviticus | 147 | 47 | Parallel accounts of same period |
1Samuel | 2Samuel | 140 | 59 | Subsequent accounts of similar events |
To me this Top Ten table is a little breathtaking. Leaving aside the mathematics for a moment take a look at those book pairs and ask yourselves how many times you have been searching for a fact or verse and not been sure which of a given pair of books it was in. My guess is that many of those not quite sure moments would involve one of the books pairs above.
The other most noteworthy change from the first table is that the Ezra - Daniel link has now dropped[21]. This suggests that the uniqueness of the Persian derived vocabulary is now diminishing in significance compared to similarity of subject matter. The table above shows that all of the pairs are now narrating identical or immediately subsequent events. This pattern is followed if we look at the table for Isaiah:
Book | Book | Occurences | Scaled |
---|---|---|---|
2Kings | Isaiah | 317 | 148 |
Isaiah | Jeremiah | 80 | 22 |
Psalms | Isaiah | 79 | 21 |
Isaiah | Deutero-Isaiah | 44 | 34 |
Isaiah | Ezekiel | 43 | 13 |
Job | Isaiah | 34 | 22 |
We find that Job and Proverbs both drop down a couple of places[22]; Psalms retains its place although scaled it drops down to fifth. 2Kings and Jeremiah remain in place and Deutero-Isaiah moves up as does Ezekiel (which also narrates the fall of Jerusalem). It should be noted that in terms of book-size the Isaiah to Deutero-Isaiah link is now second only to 2 Kings and Isaiah.
The table for Deutero-Isaiah follows the pattern although with one interesting surprise:
Book | Book | Occurences | Scaled |
---|---|---|---|
Psalms | Deutero-Isaiah | 130 | 50 |
Deutero-Isaiah | Jeremiah | 76 | 30 |
Isaiah | Deutero-Isaiah | 44 | 34 |
Deutero-Isaiah | Ezekiel | 37 | 16 |
Genesis | Deutero-Isaiah | 34 | 13 |
Job | Deutero-Isaiah | 32 | 31 |
Firstly we should note that five of the six links are identical to Isaiah showing that they are part of the same language clustering. In terms of significance the Deutero-Isaiah to Isaiah link is second place as in the Isaiah table. The tie to Psalms has actually strengthened whilst the links to Job and Proverbs have weakened. This could denote a move towards more poetic or even florid language.
The new book is Genesis[23]. This could just be noise; the scaled value is low as Genesis is a large book. However it may also be suggestive. An argument for Deutero-Isaiah is that it has a global view of God unseen earlier in Hebrew thought (or so it is claimed). My counter argument is that globalism is the precise view of God portrayed in the Bible up until the call of Abraham. It may well be that the latter parts of Isaiah are not introducing a new concept (and language) but simply moving back to the concepts laid out very early in scripture.
It would be nice if one could run a similar algorithm to detect a correlation between phrases or idioms. However, the question as to when a sequence of words turns into a known phrase is an area of ongoing research within the data sciences. One of the latest concepts is confabulation theory[24] which is largely the work of Hecht-Nielson[25]. This is a mathematical model that uses conditional probability in an attempt to detect a sequence of words that is being used sufficiently often that it forms a phrase. Unfortunately it requires a corpus of billions of words to train the model well enough to make it predictive[26]!
Fortunately for us we are not trying to find phrases everyone knows; rather those known to a relatively small number of people. Therefore I will simply produce lists of all of the three word sequences and see which ones fall into two books. Then as before I will simply assert that the fact that they are used in two places suggests that they form a meaningful unit[27].
There are far fewer hits than for two word pairs; this is not surprising. It would be quite a coincidence for someone to string three words together by chance and get the same as another person. However one might hope that this will not introduce as much noise as the individual word comparisons did. This is because three words in a sequence have to obey rules of grammar and semantics and they will have a well formed meaning; thus they will not occur randomly almost by definition.
This table is sufficiently similar to the one for two word phrases that it is worth noting the movers to get an indication as to what is occurring:
Book | Book | Occurences | Scaled | Move |
---|---|---|---|---|
1Kings | 2Chronicles | 1022 | 338 | - |
2Kings | Isaiah | 523 | 244 | +1 |
2Samuel | 1Chronicles | 465 | 215 | -1 |
2Kings | 2Chronicles | 460 | 162 | - |
Ezra | Nehemiah | 235 | 558 | +1 |
2Kings | Jeremiah | 229 | 53 | +3 |
Genesis | 1Chronicles | 203 | 50 | -2 |
Leviticus | Numbers | 179 | 56 | -1 |
Exodus | Numbers | 153 | 32 | -2 |
Exodus | Leviticus | 127 | 40 | - |
We see a slight rise of those documents describing the same thing and a marginal drop in the time-based links[28]. It is also interesting to note for reasons we will see in a moment that with the exception of the historic links between Isaiah, Jeremiah and 2 Kings that all of the books appearing in these Top Ten lists are now historical. Perhaps this is to be expected; historians strive for accuracy and use a relatively sedate style. Therefore we would expect them to write similarly about the same events.
It is the Isaiah table that shows that a shift has occurred:
Book | Book | Occurrences | Scaled |
---|---|---|---|
2Kings | Isaiah | 523 | 244 |
Isaiah | Micah | 27 | 106 |
Isaiah | Jeremiah | 27 | 7 |
Isaiah | Zechariah | 13 | 24 |
Isaiah | Ezekiel | 11 | 3 |
Isaiah | Deutero-Isaiah | 9 | 7 |
The 2Kings link stands out as being exceptional by any account. We essentially have two accounts here that appear to be copies or near copies of each other. However with that exception, Isaiah has now dropped the wisdom books in favor of the prophets. Leaping into second spot in terms of occurrences and significance is Micah; a prophet in the same place and roughly same timeframe as Isaiah. Jeremiah and Ezekiel remain in high slots with Deutero-Isaiah in sixth. Zechariah has also raced up the table to gain fourth position.
I suspect that something quite important has occurred here. The prophets often spoke their message and they wanted to influence people. One well known way to do that is through repeated and recurrent phrases. I suspect that Isaiah and Micah quite consciously used similar phrases to transmit known ideas or concepts to their audience. If I am correct then Zechariah may also have adopted an Isaiah style with some degree of deliberation.
The Deutero-Isaiah table shows similar transformation:
Book | Book | Occurrences | Scaled |
---|---|---|---|
Deutero-Isaiah | Jeremiah | 30 | 11 |
Psalms | Deutero-Isaiah | 27 | 10 |
Isaiah | Deutero-Isaiah | 9 | 7 |
Deutero-Isaiah | Ezekiel | 9 | 4 |
Deutero-Isaiah | Zechariah | 6 | 16 |
Deuteronomy | Deutero-Isaiah | 6 | 3 |
This time four of the entries in the table mirror Isaiah. Even the surprising appearance of Zechariah has been maintained. The omissions are 2Kings (expected) and Micah. Deutero-Isaiah has maintained a link to Psalms although it is reduced and a minor link to Deuteronomy appears. The strongest link is now to Jeremiah.
Whilst it would require much more detail to know exactly what we are being shown here it appears that Isaiah was firmly in the style of the immediate prophets of his time and that the same style was then picked up and used by those that came after. Deutero-Isaiah blended similarity with the major prophets with the more poetic or reflective style of Psalms. Both seem to have been picked up by the future looking Zechariah.
Having looked in detail at the peculiar coincidences between book pairs it is worth briefly investigating the general vocabulary overlap between each book pair. That is to find the percentage of the words found in one book that are also found within the other.
Whilst the number of words common between two books is easy to compute it is a little difficult to come up with a measure of the significance of the overlap that adequately deals with the vast size differences between some of the books. If you select the percentage of words in the smaller book that appear in the larger then you rapidly see a list of tiny books that derive 80+% of their vocabulary from Psalms. If you select the total number of words overlapping then the smaller books have no chance at all.
As Isaiah and Deutero-Isaiah are my main books of interest, I picked a measure that has greatest meaning for mid to large size books. Namely I compute the percentage of the total vocabulary of the two books which is common. Thus if you compare two books each with a thousand unique words and the combination has fifteen hundred unique words then the number of overlapping words is five hundred and the overlap percentage is 33.3%.
Using this measure I repeated the previous process of selecting the highest overlap overall as a control set. Then I selected the highest seven from both Isaiah and Deutero-Isaiah.
Book | Book | Overlap | Comments |
---|---|---|---|
Job | Psalms | 40.1 | Two books of wisdom literature the run the gamut of human emotion. |
Psalms | Isaiah | 38.3 | As we saw previously Isaiah draws heavily upon the language of the Psalms. Of course many of them were written in the same location and within 150 years of Isaiah. |
Isaiah | Jeremiah | 37.8 | Again we see the extent to which Jeremiah lived through what Isaiah saw. Again same locality and within a couple of hundred years. |
2Kings | 2Chronicles | 37.6 | Two different views of the same subject |
Psalms | Jeremiah | 37.5 | Clearly Isaiah, Jeremiah and the Psalmist were working from the same lexicon. |
Psalms | Deutero-Isaiah | 37.1 | Now we find as the sixth most significant overlap of all the 780 possibilities that Deutero-Isaiah also drew from the Psalmist lexicon[29]. |
1Samuel | 2Samuel | 37 | Consecutive accounts of a similar nature. |
Psalms | Proverbs | 36.8 | Psalms and proverbs draw from the same lexicon; again similar place and time. |
Exodus | Numbers | 35.8 | Exodus and Numbers from same author at same time. |
Exodus | Deuteronomy | 35.7 | Exodus and Deuteronomy again share and author and time. |
1Kings | 2Kings | 35.1 | Consecutive accounts of a similar nature. |
It is immediately apparent that a couple of clusters are forming. One is centered upon the Psalms and includes Isaiah, Jeremiah, Deutero-Isaiah and Job[30]. The other is centered upon Exodus and links the Mosaic books. Looking at the top matches for Isaiah further promotes this picture:
Book | Book | %age Overlap |
---|---|---|
Psalms | Isaiah | 38.3 |
Isaiah | Jeremiah | 37.8 |
Job | Isaiah | 33.6 |
Isaiah | Deutero-Isaiah | 33.6 |
Isaiah | Ezekiel | 31.4 |
Deuteronomy | Isaiah | 31.3 |
Proverbs | Isaiah | 30.9 |
The first two we have already seen in the global top ten. The next two further re-enforce the links between the Isaiah, Jeremiah, Deutero-Isaiah, Psalms, Job group. We see a linkage to Ezekiel who received a similar message to Isaiah regarding the fall of Jerusalem and future restoration. In sixth slot we notice a link to Deuteronomy which we have seen before and finally a link to another wisdom book - the proverbs.
The table for Deutero-Isaiah is almost identical. I have noted in the fourth column the placing difference between the given book pair and the equivalent in the Isaiah list.
Book | Book | %age overlap | Placing delta compared to Isaiah list |
---|---|---|---|
Psalms | Deutero-Isaiah | 37.1 | - |
Jeremiah | Deutero-Isaiah | 34.3 | - |
Job | Deutero-Isaiah | 33.8 | - |
Isaiah | Deutero-Isaiah | 33.6 | - |
Proverbs | Deutero-Isaiah | 31.4 | +2 |
Deuteronomy | Deutero-Isaiah | 31.1 | - |
Ezekiel | Deutero-Isaiah | 30.3 | -2 |
It is immediately apparent the Deutero-Isaiah and Isaiah are related to exactly the same books as each other and to almost exactly the same extent. The only difference is that Deutero-Isaiah appears to swap some of Ezekiel's lexicon for some of Solomon's. Note however that in the Isaiah list the percentage difference from places four to six are half a percentage point and little over one percentage point in Deutero-Isaiah. Thus for all practical purposes these lists are identical.
Perhaps I am too easily intrigued but I generally find that any good piece of research throws up more questions than answers. Certainly my mind is awash with ideas for how the previous can be extended. However my aim in this section is not so much to reach for the stars so much as to admit some areas where the numbers given could use further work to ensure legitimacy.
Firstly many of the metrics given rely upon occurrences that are unique to a pair of books. Whilst this does find certain oddities it is biased towards sources that have two accounts of something and yet is totally biased against triples of books that heavily relate to the same subject. There are two things we can do to further investigate that problem:
Secondly we have shown the validity of each metric from the Top Ten list. We have also identified a tight cluster around our two books of interest. What we have not done is significantly investigate some of the other clusters we would expect if these metrics are functioning properly. For example we would expect a Mosaic set and a post-exilic set. Establishing their existence would further validate the metrics we have.
The third issue I know of that I would like to have a better solution for is in the construction of a good vocabulary overlap percentage that is not biased towards the size of either book. The solution may be to construct a genuine percentage but to scale each result based upon the average overlap achieved by each of the books participating[31].
The aim of this project was to produce a collection of metrics and raw data around those metrics that can act as an accurate baseline for any discussions regarding the literary cohesiveness within Isaiah. The first half has already been accomplished: the data now exists and I, at least, have been able to perform some analysis upon it. However to fully achieve the paper's intent this data now has to become available to others so that it can act as such a baseline.
One step in achieving such a goal is to get this paper completed and proofed to a quality where it becomes ready for circulation. The next step is to identify and utilize some channels to get the paper reviewed and where necessary clarified. I have some contacts that may be able to help here both within the theological but also the data community. It may also make sense to conduct a web search to identify other individuals interested in this area.
A complimentary or even alternative step may be to self-publish the data results on the web. All of the data presented in this paper has been generated from a relatively simple and quick Perl program. I already operate a web-site that generates web-pages using Perl as a script. It would only take a day or two's effort to make the program available and usable from a web interface. This would allow others to investigate the data. Potentially this would have the benefit of stripping the data from my own interpretation of it: this may gain it wider acceptance.
The bottom line of this conclusion is that there is no conclusion. Having established five different metrics to measure the distance between two books not one of them insists that combining Isaiah and Deutero-Isaiah should be the top priority task. Clearly God has not allowed this war to be ended easily.
Equally however to claim that literary style or subject matter mandates the separation of these two books is entirely devoid of factual support. Every single metric places Isaiah and Deutero-Isaiah within the closest six books of each other. Further they also all link both Isaiahs to identical clusters of other books - usually including Psalms, Job and Jeremiah.
Accepting therefore that the data does not produce any easy answers it is worth remembering some of the details that the data suggests:
We see therefore that in terms of culture and vocabulary Deutero-Isaiah was closely related to books written or extant between 900BC and 600BC in Jerusalem. It is almost entirely devoid of any tell-tales of someone that had been to Babylon. In terms of subject matter he closely parallels Jeremiah and Isaiah and is looking somewhat towards the vision of Ezekiel. In terms of style he is very much a prophet and his style closely followed that which would have been heard between 800 and 600BC in Jerusalem.
Based purely upon literary style it would therefore appear that whilst Isaiah and Deutero-Isaiah may have been different people Deutero-Isaiah appears to have lived and worked in or around Jerusalem sometime between Isaiah and Jeremiah. Of course a perfectly acceptable alternate conclusion may be that over the decades that Isaiah lived and worked he slowly evolved his vocabulary and subject matter.
Book Name | Book Length | Unique Words |
---|---|---|
Genesis | 15098 | 1779 |
Exodus | 12255 | 1390 |
Leviticus | 8326 | 931 |
Numbers | 12612 | 1413 |
Deuteronomy | 9822 | 1407 |
Joshua | 7285 | 1125 |
Judges | 7224 | 1159 |
Ruth | 903 | 273 |
1Samuel | 9605 | 1205 |
2Samuel | 8063 | 1263 |
1Kings | 9672 | 1248 |
2Kings | 9101 | 1206 |
1Chronicles | 8773 | 1922 |
2Chronicles | 10246 | 1353 |
Ezra | 3266 | 943 |
Nehemiah | 4236 | 1032 |
Esther | 2291 | 432 |
Job | 6300 | 1633 |
Psalms | 15937 | 2138 |
Proverbs | 5945 | 1291 |
Ecclesiastes | 2041 | 543 |
SongofSongs | 989 | 458 |
Isaiah | 7730 | 1924 |
Deutero-Isaiah | 5354 | 1316 |
Jeremiah | 15373 | 1870 |
Lamentations | 1237 | 550 |
Ezekiel | 13756 | 1666 |
Daniel | 5062 | 1068 |
Hosea | 1768 | 681 |
Joel | 753 | 357 |
Amos | 1549 | 600 |
Obadiah | 215 | 138 |
Jonah | 498 | 215 |
Micah | 1078 | 535 |
Nahum | 458 | 325 |
Habakkuk | 525 | 357 |
Zephaniah | 584 | 338 |
Haggai | 444 | 172 |
Zechariah | 2275 | 684 |
Malachi | 626 | 271 |