Introduction

The Second Isaiah

For much of the twentieth century it was considered an accepted fact amongst Biblical scholars that the prophecy of Isaiah was written in two or more pieces[1]. This was a process of thought that commenced in 1780 with questions regarding Isaiah 50 and which probably reached critical mass when the celebrated and conservative Franz Delitzsch conceded in 1880 that Isaiah 40 and onwards was probably written at the end of the exile[2]. Thus the notion of 'Deutero-Isaiah' was born.

The disintegration process however went much further. By the early nineteen hundreds it was already common to slice off chapters 56 to 66 into a third piece[3] believed to have been written around 450BC. Yet the knives were now unshieved and Isaiah became a fertile ground, for anyone with an opinion, to slice and dice as they felt fit. In 1910 Professor George Robinson chronicled various 'radicals' that had left only 262 of the 1292 verses of Isaiah to the original prophet.

The modern position has now shifted to a more redaction based model. This asserts that Isaiah is really a patchwork for many different written and oral records all compiled and woven into one fabric. Depending upon how you look at it this position either asserts the unity of the book or claims that the book is such a complex collection of disparate pieces that one has to treat it as a unity[4].

Of course running entirely counter to modern critical thought many believing Christians have been content to take the Word of God at face value and simply trusted the that book of Isaiah was written by Isaiah. To a large extent I personally view this as the most profitable path; whilst there may be much in the discussion that is academically amusing there is very little that would provide spiritual growth.

The Value of Truth

Nonetheless, as I have dealt with elsewhere at length[5], I do not think the contention over Isaiah is unexpected or insignificant. One of the core arguments against the unity of Isaiah is that it contains certain predictive prophecies that would be completely astounding if they were genuine. The most famous of these is the naming of Cyrus[6] in a book which claims to have been written at least a hundred years before he was born. This would be intriguing enough but the really incredible fact is that the predictions of Cyrus occur in the middle of eight different places in Isaiah[7] where God explicitly states that one of the key things that differentiates Himself from other 'gods' is that he knows the future and tells it to people.

Therefore we see that accepting the unity of Isaiah on faith 'despite all the evidence' may leave one in a theologically correct position; however it has robbed the believer of one of the more visible proofs God has given of His character and abilities. Therefore to leave the 'evidence' out there unchallenged and unquestioned is to concede an important piece of ground to the critics. It may even prove to be a piece of ground that stumbles some of our young[8].

The Scope of This Essay

Notwithstanding the above it is not my intent to launch a deep investigation into, or attack upon the surgeons of Isaiah. Instead I wish to delve deeper into the exact nature of one of the subject matters that often occurs in these debates: The Literary Style of Isaiah. More specifically I wish to look at what I shall call the linguistic artifacts that are present in the book. I will discuss this more mathematically under 'Linguistic Coincidence' but for this introduction a more layman's view of the concept may be beneficial.

The vocabulary used by a piece of literature will be driven by three primary factors:

  1. The subject matter. This is the most obvious; an article about motorcycle mechanics will refer to various engineering terms, and article about theology will refer to God and Christ.
  2. The culture of the author and intended audience. As an Englishman living in America this is quite noticeable to me. There are words that are common between our languages with identical dictionary definitions in both countries and yet the term may be in common parlance in one country and an esoteric archaism in the other. Word usage often changes over time; words come into fashion and drop from it. It is also quite possible that an author will adopt a style simply for the effect is has upon the target audience.
  3. Individual mannerisms and peculiarities that each individual possesses. As any good impersonator will tell you people have physical quirks and mannerisms; some subtle and some less subtle. What is true of the body is also true of the mind. Many of our favorite preachers have phrases or expressions that make us smile because they 'remind us of them'. These phrases and the frequency of usage are also indicative of the individual.

The argument therefore runs that if Isaiah was written by an individual about a single subject it should have a fairly consistent vocabulary throughout and which may differ from the vocabulary of other books. In fact given the critics assert that their original reason for splitting Isaiah was that it addresses different subjects; and they further assert that the books were written for different peoples at different times the only legitimate reason for any similarity between the books whatsoever is that they were written by the same individual.

It should perhaps also be noted that I italicized legitimate as the other possibility is a forgery or impersonation. This would be a strange claim to make as one of the features of Deutero Isaiah is supposed to be his advanced enlightenment compared to the previous author. Nonetheless for completeness it is a possibility we will consider in the following.

The Facts

Many conservatives have attempted to seize upon this logic to assert unity of authorship. Robinson asserts that the divine title 'the Mighty One of Israel' occurring three times in Isaiah and nowhere else is 'singular'. He is further impressed that 'streams of water' occurs twice in Isaiah yet nowhere else. Many others have also selected individual phrases and repetitions of occurrence and claimed that it showed something.

I would suggest that these fragments of evidence do suggest something but we have no real way of knowing what they suggest until we have a much better understanding of the vocabulary overlap of the Old Testament in general. How common was it for two unrelated books to happen to share a phrase that is otherwise unique?

My intent in this paper and the project it annotates is to construct a set of mathematical metrics to measure the closeness of two Biblical books. In other words: to produce a set of questions which, if objectively answered, give us factual evidence of the literary similarity of two books. Secondly I aim to execute and document these metrics against the Old Testament canon with the hope of producing results which can become an objective factual basis against further conversations of this form.

Methodology

Language

In my opinion one of the first mistakes that many people make when attempting a linguistic analysis of Isaiah, or any other Biblical book for that matter, is that they perform their analysis in an English text. This has to be a mistake for whilst the theological content is accurately rendered in the target language the linguistic artifacts of culture and personal preference will be those of the translator not those of the author in the original language. One only has to compare good sound English translations from different centuries to see the extent to which word choice can deviate wildly even if the semantic content is relatively static.

For this reason I wish to perform this analysis at the level of the Hebrew text. Further I suggest it is useless to simply look at Isaiah because we have no point of reference to define what is reasonable; therefore the whole Hebrew canon will be considered. The New Testament is left out of the analysis because it is written in Greek and the Greek language has different properties from the Hebrew[9].

The Strong's Number

Of course the downside to studying this in Hebrew is that I don't speak Hebrew and the programming languages that are out there do not easily lend themselves to processing Hebrew text. The problem moves from difficult to extreme when it is considered that the same word may appear in different lexical representations to denote shift of tense, plurality and gender. Fortunately these problems can be reduced or even removed using one simple device: the Strong's Number.

The Strong's number is effectively a numeric encoding of the Hebrew (or Greek) of the original language. It is usually used to allow easy reference from a translated English word to the original language. A number of translations are available with the Strong's numbers interspersed with the text. For our purposes it does not matter which translation we use as we can simply remove the English text leaving behind a sequence of numbers.

Whilst the sequence of numbers will then be entirely unreadable to a human, to a machine they will be entirely legible. Further we know that numeric equivalence at the Strong's level implies word equivalence in the original language. Thus we have turned a potentially complex parsing problem into one of simple numeric comparison.

Reductio Ad Absurdum

Perhaps the strangest feature of the methodology, at least for someone as conservative as me, is that the process will start with the presumption that Isaiah is split into two pieces; Isaiah 1-39 and Isaiah 40-46. The reason is that we wish to examine those links that exist between the two halves and see if they are stronger or weaker than the links that exist between other books in the canon. In order to count or measure the links between two entities it is first necessary to have two entities; thus Isaiah has to be split.

For those of you more mathematically or logically inclined, the process I am really using is Reductio Ad Absurdum. By basing the process on an assumption (two Isaiah's) I aim to produce a set of statistics that may suggest that those two pieces are abnormally close. If I do so then that suggests a flaw in the process, which I would suggest is the opening assumption of their being two Isaiahs!

Whether you agree with this approach or not: in the following I shall refer to Deutero-Isaiah by which I mean Isaiah chapters 40-66. The term Isaiah shall refer to the first 39 chapters. Again for clarity - this is an assumption I am making for analytic purposes - it need not[10] reflect my opinion of reality.

Available Materials[11]

Texts that are Available Electronically

One of my principle motivations to consider this problem was the knowledge that it would be possible to produce a lot of raw investigative material using readily available resources. For this project all that is required in a list of Strong's numbers by verse. However for the purposes of this exercise I acquired the texts as follows:

Programming Language Selection for Text Processing

Whilst the language processing that this project attempts is theoretically advanced the programming concepts involved are not. Reading some twenty megabytes of text is something that a home PC can do in under a second using almost any language. The process is linear and can be accomplished in batch mode. Further the eventual amount of coding will be trivial by modern standards. Thus most of the advances made in programming language theory and practice in the last twenty years are not really going to be required to tackle this project.

As there was no compelling requirement for any given approach I instead deferred to a couple of pragmatic considerations. Firstly it may be useful to be able to showcase the results of this work; this is most readily done on the Internet. Secondly I already have a substantial amount of Perl code that processes Biblical texts that I use for my own website. The result of these two considerations is that I chose to use Perl for this exercise; almost any language could be used if the project were undertaken seriously.

Legal Considerations

There is no copyright on the KJV outside of the United Kingdom; I do not know of any restrictions placed upon the Strong's encoding of the KJV. It should be noted that the KJV text itself is only used for ease of reference so obtaining an entirely perfect version is not required for this project.

Formatting for Text Processing

Whatever the source of the textual version a necessary precursor to performing linguistic analysis is to transform the text into a 'processing friendly' format to allow downstream processes to occur independent of the format of the input data[12]. This process is usually referred to as ingestion and the code normally has to be written for every file that is going to be used. Ideally the process is run once at the start of the project and the output of the process is then used for the remainder of the project.

The format I have personally standardized upon has one verse per line and requires the lines to be ordered to follow the Biblical sequence. Each verse is preceded by an 11 character descriptor that defines the verse that follows. The format is BB:CCC:VVV where BB is a two character book number, CCC is three characters for the chapter number and the VVV is three characters for the verse number. Thus 01:001:001 corresponds to Genesis chapter 1 verse 1. In this particular case the input text actually lists the books in alphabetic order so the text had to be processed 17 times to emit the books in the sequence I wanted.

For this particular use I changed my normal format in two ways. Firstly I spotted chapter 40 and onwards of Isaiah and moved it into a new book 67. Secondly, as I am only using the Strong's numbers, I removed all of the English text and Hebrew annotations to leave a simple stream of numbers for each verse.

The Analytic Results

One point worth noting is that all of the observations that follow are the result of one 180 line Perl program written and tested over the course of one week. Computer science moves rapidly; sixteen years ago I spent six months compressing the Bible text down so that it could realistically be loaded onto the PC of the day. Analytic and linguistic research is now programmatically available to just about everyone.

The Size of the Books

The first part of the puzzle we need if we are to form a mathematical model for the relationships between two books is a measure of the size of each book. This is because, all other things being equal, the chances of something odd occurring in a book[13] should be proportional to the length of a book. The traditional way of measuring a book tends to be in terms of verses. This certainly is a good approximation to the length of the book but it is also somewhat arbitrary as the verses to not appear in the original Hebrew. Instead therefore I intend to measure the total number of words each book contains.

There is another measure which may well be interesting too and that is the vocabulary size[14] of each book. One might expect there to be a rough correlation that longer books will have larger vocabularies. The size of vocabulary however can also be a strong indicator of literary style[15]. The aim would therefore be to plot a graph of book length to vocabulary size and see where the various books fall.

The graph above shows all of the Old Testament books plotted by Book Length and Unique Word count. For the shorter books the number of unique words is about a third of the number of words. Then as the books get longer the increase in vocabulary size decreases; Ezekiel, Genesis, Jeremiah and Psalms being the four longest books. However in addition to the trend it is useful to note the outliers. Points moving towards the bottom right of the graph have abnormally few words for the length. Thus Leviticus stands out as having a low vocabulary. Heading towards the top left end we find Isaiah is exceptional for having a very large vocabulary for its (remaining) length. 1 Chronicles and Job come in second and third, then for its length Deutero-Isaiah is a stand-out amongst the mid-length books for vocabulary size.

Two Book Unique Words

One of the crudest measures of distance between two books is the measure of the number of words that only exist within two given books. This will identify common subject matter and it may point to some idiosyncrasy of the author. Of course the chances of two books sharing an otherwise unique word increases with the length of the book. Thus for every pair of books, I counted the number of words that they and they alone use. I also scaled that number based upon the size of the book by taking the number of shares, multiplying it by the square of the average book length and dividing it by the two book lengths. Thus if the average book length was 1000 words and I had 15 co-occurrences between two books of length 1200 and 800 the scaled result would be 15 * 1000 * 1000 / 1200 / 800.

I computed these numbers for every pair of books in the Old Testament: however I will be presenting three tables. Firstly the top ten pair matches across all books. This will allow us to verify that the measure is somehow meaningful. The top six pair matches for Isaiah and then the Top six pair matches for Deutero-Isaiah. These will obviously allow us to see how close the books are to each other but also if they are both close to the same other kinds of books.

Book Book Co-occurences Scaled Co-occurences Comments
Ezra Daniel 151 300 This is the most significant tie-up both actually and scaled. I was rather surprised when I first saw the linkage. One doesn't think of them as similar books. However they were both written by men that had spent substantial time in a Persian court. This number would suggest that the Hebrews of the exile developed a new section of vocabulary not shared with the earlier prophets.
Genesis 1Chronicles 105 26 Of course a sizeable genealogy or two can easily cause high correlation.
Ezra Nehemiah 75 178 This is the second highest scaled result and third highest result. Ezra and Nehemiah are of course known to be closely related books. An expected and encouraging result.
2Samuel 1Chronicles 37 17 Another confirmatory result. Both books are known to focus upon King David and therefore share some common core vocabulary.
1Chronicles Nehemiah 21 18 This one also caused me to frown until I checked in my study Bible[16]. 1 Chronicles is believed to have been completed around 425BC and Jewish tradition assigns it to Ezra. Whilst not the same book this statistic suggests the same time and setting.
2Kings Isaiah 19 8 Having just studied Isaiah I was ready for this one. The account in 2Ki of Hezekiah and that in Isaiah are clearly very close; this is reflected in an amount of common core vocabulary.
1Chronicles 2Chronicles 19 6 The correlation between these two books in time and content is well known; it should be no surprise that they are also linked linguistically.
Leviticus Deuteronomy 18 7 Again a correlation between two books known to have been written at a similar time by the same person.
Job Psalms 17 5 This one is interesting and may suggest that some of the Psalms came from the region of Job. It could also suggest some common wisdom vocabulary. However the low scaled result should be noted. It could just suggest that Psalms is a big book.
Joshua 1Chronicles 16 8  This correlation is probably historic. 1 Chronicles briefly recounts the history of the time up to David and Joshua is the only other book covering the history of the invasion of Canaan.

In many ways this table contains no news; or at least very little that was not already available. However this is good news. It suggests that the measurement of unique words between two books does tend to correlate with known links between the two books. This adds some validity to the measure. We have also seen linkages due to subject matter[17], culture[18] and probably author[19].

Perhaps the one slightly disappointing thing for the conservatives is that Isaiah and Deutero-Isaiah do not make it into the top 10 linked books. In fact the two books would appear at number 35 on this list. Looking at the books that are linked to Isaiah suggests why:

Book Book Co-occurences Scaled co-occurences
2Kings Isaiah 19 8
Isaiah Jeremiah 16 4
Psalms Isaiah 13 3
Job Isaiah 12 8
Proverbs Isaiah 10 7
Isaiah Deutero-Isaiah 8 6

We have already noted the 2Kings passage that corresponds to Isaiah. Then we find Jeremiah who predicted and lived through the fall of Jerusalem. This is obviously the same fall that Isaiah predicted. We then find Isaiah drawing his vocabulary from the, admittedly large, body of wisdom literature most of which had been written in Jerusalem some hundred and fifty years before. Bringing up sixth place, although fourth in terms of significance is then Deutero-Isaiah.

Looking at the table for Deutero-Isaiah is equally instructive:

Book Book Co-occurrences Scaled co-occurences
Psalms Deutero-Isaiah 13 5
Isaiah Deutero-Isaiah 8 6
Job Deutero-Isaiah 7 6
Proverbs Deutero-Isaiah 6 6
Exodus Deutero-Isaiah 6 3
Deutero-Isaiah Jeremiah 6 2

Of the top four co-occurrences for Deutero-Isaiah we find three of the same wisdom books that featured for Isaiah. Also in second place we find a link to Isaiah. In sixth place we find a link to Jeremiah (which was second placed for Isaiah). The one new book we find is a throw back to the book of Exodus; Exodus had occupied the eight spot for Isaiah.

What we therefore see is that Isaiah and Deutero-Isaiah share words with each other but also with the wisdom books and Jeremiah. As we are looking for words unique between two books the fact that we appear to have a cluster of books using similar language will actually reduce the chances of any two of them having a unique pairing.

Additionally the very power of looking for unique words in pairs of books is also its greatest weakness. These words are by definition oddities: they occur in low numbers. Therefore there is a danger that the noise of randomness[20] will actually distort some of the truth in the underlying data. Fortunately both of these problems can be somewhat ameliorated by altering our concept of a word.

Two Book Unique Word Pairs

Counting the number of words that only occur within two books does make sense but it assumes that words appear independently within text. This is obviously not true; any word in a given sentence often has an explicit grammatical or semantic link to the word next to it. For example nouns are often preceded by an adjective. Verbs are often preceded by an adverb. The rules for Hebrew and English are different and frankly I do not know them well enough to produce all of the meaningful word pairs from a sentence. Given we are looking for oddities however, we can simply produce a list of all of the word pairs and those that are pure chance have an extremely low chance of being found in another book.

For clarity I will give a small example of what I am doing:

The large dog bit the small cat

Will produce a sequence of word-pairs thus:

The Large, large dog, dog bit, bit the, the small, small cat

A grammarian would tell you to drop the pairs using the article (the). However my assumption is that articles are sufficiently common that the act of looking for uniqueness will implicitly drop them out unless they are used in an odd context in which case they are interesting anyway!

There are far more individual word pairs than individual words and thus there are far more instances where a word-pair is only extant in two books. In fact the numbers go from about fourteen hundred instances to just over eleven thousand. This will help to even out any random noise. We shall now proceed to look at the three Top tables again.

Book Book Occurences Scaled Occurences Comments
1Kings 2Chronicles 611 202 Parallel narrative of same period
2Samuel 1Chronicles 329 152 Parallel narrative of same period
2Kings Isaiah 317 148 Shares narrative of Hezekiah
2Kings 2Chronicles 288 101 Parallel narrative of similar period
Genesis 1Chronicles 252 62 Sharing some major geneologies
Ezra Nehemiah 209 496 Accounts written at similar time about similar subject possibly with new vocabulary.
Exodus Numbers 199 42 Continuing narrative of same period
Leviticus Numbers 176 55 Parallel accounts of same period
2Kings Jeremiah 174 40 Cover same period
Exodus Leviticus 147 47 Parallel accounts of same period
1Samuel 2Samuel 140 59 Subsequent accounts of similar events

To me this Top Ten table is a little breathtaking. Leaving aside the mathematics for a moment take a look at those book pairs and ask yourselves how many times you have been searching for a fact or verse and not been sure which of a given pair of books it was in. My guess is that many of those not quite sure moments would involve one of the books pairs above.

The other most noteworthy change from the first table is that the Ezra - Daniel link has now dropped[21]. This suggests that the uniqueness of the Persian derived vocabulary is now diminishing in significance compared to similarity of subject matter. The table above shows that all of the pairs are now narrating identical or immediately subsequent events. This pattern is followed if we look at the table for Isaiah:

Book Book Occurences Scaled
2Kings Isaiah 317 148
Isaiah Jeremiah 80 22
Psalms Isaiah 79 21
Isaiah Deutero-Isaiah 44 34
Isaiah Ezekiel 43 13
Job Isaiah 34 22

We find that Job and Proverbs both drop down a couple of places[22]; Psalms retains its place although scaled it drops down to fifth. 2Kings and Jeremiah remain in place and Deutero-Isaiah moves up as does Ezekiel (which also narrates the fall of Jerusalem). It should be noted that in terms of book-size the Isaiah to Deutero-Isaiah link is now second only to 2 Kings and Isaiah.

The table for Deutero-Isaiah follows the pattern although with one interesting surprise:

Book Book Occurences Scaled
Psalms Deutero-Isaiah 130 50
Deutero-Isaiah Jeremiah 76 30
Isaiah Deutero-Isaiah 44 34
Deutero-Isaiah Ezekiel 37 16
Genesis Deutero-Isaiah 34 13
Job Deutero-Isaiah 32 31

Firstly we should note that five of the six links are identical to Isaiah showing that they are part of the same language clustering. In terms of significance the Deutero-Isaiah to Isaiah link is second place as in the Isaiah table. The tie to Psalms has actually strengthened whilst the links to Job and Proverbs have weakened. This could denote a move towards more poetic or even florid language.

The new book is Genesis[23]. This could just be noise; the scaled value is low as Genesis is a large book. However it may also be suggestive. An argument for Deutero-Isaiah is that it has a global view of God unseen earlier in Hebrew thought (or so it is claimed). My counter argument is that globalism is the precise view of God portrayed in the Bible up until the call of Abraham. It may well be that the latter parts of Isaiah are not introducing a new concept (and language) but simply moving back to the concepts laid out very early in scripture.

Two Book Unique Three Words Sequences

It would be nice if one could run a similar algorithm to detect a correlation between phrases or idioms. However, the question as to when a sequence of words turns into a known phrase is an area of ongoing research within the data sciences. One of the latest concepts is confabulation theory[24] which is largely the work of Hecht-Nielson[25]. This is a mathematical model that uses conditional probability in an attempt to detect a sequence of words that is being used sufficiently often that it forms a phrase. Unfortunately it requires a corpus of billions of words to train the model well enough to make it predictive[26]!

Fortunately for us we are not trying to find phrases everyone knows; rather those known to a relatively small number of people. Therefore I will simply produce lists of all of the three word sequences and see which ones fall into two books. Then as before I will simply assert that the fact that they are used in two places suggests that they form a meaningful unit[27].

There are far fewer hits than for two word pairs; this is not surprising. It would be quite a coincidence for someone to string three words together by chance and get the same as another person. However one might hope that this will not introduce as much noise as the individual word comparisons did. This is because three words in a sequence have to obey rules of grammar and semantics and they will have a well formed meaning; thus they will not occur randomly almost by definition.

This table is sufficiently similar to the one for two word phrases that it is worth noting the movers to get an indication as to what is occurring:

Book Book Occurences Scaled Move
1Kings 2Chronicles 1022 338 -
2Kings Isaiah 523 244 +1
2Samuel 1Chronicles 465 215 -1
2Kings 2Chronicles 460 162 -
Ezra Nehemiah 235 558 +1
2Kings Jeremiah 229 53 +3
Genesis 1Chronicles 203 50 -2
Leviticus Numbers 179 56 -1
Exodus Numbers 153 32 -2
Exodus Leviticus 127 40 -

We see a slight rise of those documents describing the same thing and a marginal drop in the time-based links[28]. It is also interesting to note for reasons we will see in a moment that with the exception of the historic links between Isaiah, Jeremiah and 2 Kings that all of the books appearing in these Top Ten lists are now historical. Perhaps this is to be expected; historians strive for accuracy and use a relatively sedate style. Therefore we would expect them to write similarly about the same events.

It is the Isaiah table that shows that a shift has occurred:

Book Book Occurrences Scaled
2Kings Isaiah 523 244
Isaiah Micah 27 106
Isaiah Jeremiah 27 7
Isaiah Zechariah 13 24
Isaiah Ezekiel 11 3
Isaiah Deutero-Isaiah 9 7

The 2Kings link stands out as being exceptional by any account. We essentially have two accounts here that appear to be copies or near copies of each other. However with that exception, Isaiah has now dropped the wisdom books in favor of the prophets. Leaping into second spot in terms of occurrences and significance is Micah; a prophet in the same place and roughly same timeframe as Isaiah. Jeremiah and Ezekiel remain in high slots with Deutero-Isaiah in sixth. Zechariah has also raced up the table to gain fourth position.

I suspect that something quite important has occurred here. The prophets often spoke their message and they wanted to influence people. One well known way to do that is through repeated and recurrent phrases. I suspect that Isaiah and Micah quite consciously used similar phrases to transmit known ideas or concepts to their audience. If I am correct then Zechariah may also have adopted an Isaiah style with some degree of deliberation.

The Deutero-Isaiah table shows similar transformation:

Book Book Occurrences Scaled
Deutero-Isaiah Jeremiah 30 11
Psalms Deutero-Isaiah 27 10
Isaiah Deutero-Isaiah 9 7
Deutero-Isaiah Ezekiel 9 4
Deutero-Isaiah Zechariah 6 16
Deuteronomy Deutero-Isaiah 6 3

This time four of the entries in the table mirror Isaiah. Even the surprising appearance of Zechariah has been maintained. The omissions are 2Kings (expected) and Micah. Deutero-Isaiah has maintained a link to Psalms although it is reduced and a minor link to Deuteronomy appears. The strongest link is now to Jeremiah.

Whilst it would require much more detail to know exactly what we are being shown here it appears that Isaiah was firmly in the style of the immediate prophets of his time and that the same style was then picked up and used by those that came after. Deutero-Isaiah blended similarity with the major prophets with the more poetic or reflective style of Psalms. Both seem to have been picked up by the future looking Zechariah.

Inter-book Vocabulary Coincidence

Having looked in detail at the peculiar coincidences between book pairs it is worth briefly investigating the general vocabulary overlap between each book pair. That is to find the percentage of the words found in one book that are also found within the other.

Whilst the number of words common between two books is easy to compute it is a little difficult to come up with a measure of the significance of the overlap that adequately deals with the vast size differences between some of the books. If you select the percentage of words in the smaller book that appear in the larger then you rapidly see a list of tiny books that derive 80+% of their vocabulary from Psalms. If you select the total number of words overlapping then the smaller books have no chance at all.

As Isaiah and Deutero-Isaiah are my main books of interest, I picked a measure that has greatest meaning for mid to large size books. Namely I compute the percentage of the total vocabulary of the two books which is common. Thus if you compare two books each with a thousand unique words and the combination has fifteen hundred unique words then the number of overlapping words is five hundred and the overlap percentage is 33.3%.

Using this measure I repeated the previous process of selecting the highest overlap overall as a control set. Then I selected the highest seven from both Isaiah and Deutero-Isaiah.

Book Book Overlap Comments
Job Psalms 40.1 Two books of wisdom literature the run the gamut of human emotion.
Psalms Isaiah 38.3 As we saw previously Isaiah draws heavily upon the language of the Psalms. Of course many of them were written in the same location and within 150 years of Isaiah.
Isaiah Jeremiah 37.8 Again we see the extent to which Jeremiah lived through what Isaiah saw. Again same locality and within a couple of hundred years.
2Kings 2Chronicles 37.6 Two different views of the same subject
Psalms Jeremiah 37.5 Clearly Isaiah, Jeremiah and the Psalmist were working from the same lexicon.
Psalms Deutero-Isaiah 37.1 Now we find as the sixth most significant overlap of all the 780 possibilities that Deutero-Isaiah also drew from the Psalmist lexicon[29].
1Samuel 2Samuel 37 Consecutive accounts of a similar nature.
Psalms Proverbs 36.8 Psalms and proverbs draw from the same lexicon; again similar place and time.
Exodus Numbers 35.8 Exodus and Numbers from same author at same time.
Exodus Deuteronomy 35.7 Exodus and Deuteronomy again share and author and time.
1Kings 2Kings 35.1 Consecutive accounts of a similar nature.

It is immediately apparent that a couple of clusters are forming. One is centered upon the Psalms and includes Isaiah, Jeremiah, Deutero-Isaiah and Job[30]. The other is centered upon Exodus and links the Mosaic books. Looking at the top matches for Isaiah further promotes this picture:

Book Book %age Overlap
Psalms Isaiah 38.3
Isaiah Jeremiah 37.8
Job Isaiah 33.6
Isaiah Deutero-Isaiah 33.6
Isaiah Ezekiel 31.4
Deuteronomy Isaiah 31.3
Proverbs Isaiah 30.9

The first two we have already seen in the global top ten. The next two further re-enforce the links between the Isaiah, Jeremiah, Deutero-Isaiah, Psalms, Job group. We see a linkage to Ezekiel who received a similar message to Isaiah regarding the fall of Jerusalem and future restoration. In sixth slot we notice a link to Deuteronomy which we have seen before and finally a link to another wisdom book - the proverbs.

The table for Deutero-Isaiah is almost identical. I have noted in the fourth column the placing difference between the given book pair and the equivalent in the Isaiah list.

Book Book %age overlap Placing delta compared to Isaiah list
Psalms Deutero-Isaiah 37.1 -
Jeremiah Deutero-Isaiah 34.3 -
Job Deutero-Isaiah 33.8 -
Isaiah Deutero-Isaiah 33.6 -
Proverbs Deutero-Isaiah 31.4 +2
Deuteronomy Deutero-Isaiah 31.1 -
Ezekiel Deutero-Isaiah 30.3 -2

It is immediately apparent the Deutero-Isaiah and Isaiah are related to exactly the same books as each other and to almost exactly the same extent. The only difference is that Deutero-Isaiah appears to swap some of Ezekiel's lexicon for some of Solomon's. Note however that in the Isaiah list the percentage difference from places four to six are half a percentage point and little over one percentage point in Deutero-Isaiah. Thus for all practical purposes these lists are identical.

Further Avenues to Explore

Perhaps I am too easily intrigued but I generally find that any good piece of research throws up more questions than answers. Certainly my mind is awash with ideas for how the previous can be extended. However my aim in this section is not so much to reach for the stars so much as to admit some areas where the numbers given could use further work to ensure legitimacy.

Firstly many of the metrics given rely upon occurrences that are unique to a pair of books. Whilst this does find certain oddities it is biased towards sources that have two accounts of something and yet is totally biased against triples of books that heavily relate to the same subject. There are two things we can do to further investigate that problem:

  1. We can construct a measure and a graph that shows the number of unique pairings (by word-length) that a given book has. Thus a relatively isolated book should be free to enter into unique pairings widely whereas a book which is part of a tight cluster would have few unique pairings. We can then alter the scaling to adjust for those books that are shadowed by another.
  2. We can begin to look at triples and quads of books that share unique words and count those occurrences towards each of the individual pairings. Thus if three or four books are heavily clustered the relationships between each book individually will be visible.

Secondly we have shown the validity of each metric from the Top Ten list. We have also identified a tight cluster around our two books of interest. What we have not done is significantly investigate some of the other clusters we would expect if these metrics are functioning properly. For example we would expect a Mosaic set and a post-exilic set. Establishing their existence would further validate the metrics we have.

The third issue I know of that I would like to have a better solution for is in the construction of a good vocabulary overlap percentage that is not biased towards the size of either book. The solution may be to construct a genuine percentage but to scale each result based upon the average overlap achieved by each of the books participating[31].

Next Steps

The aim of this project was to produce a collection of metrics and raw data around those metrics that can act as an accurate baseline for any discussions regarding the literary cohesiveness within Isaiah. The first half has already been accomplished: the data now exists and I, at least, have been able to perform some analysis upon it. However to fully achieve the paper's intent this data now has to become available to others so that it can act as such a baseline.

One step in achieving such a goal is to get this paper completed and proofed to a quality where it becomes ready for circulation. The next step is to identify and utilize some channels to get the paper reviewed and where necessary clarified. I have some contacts that may be able to help here both within the theological but also the data community. It may also make sense to conduct a web search to identify other individuals interested in this area.

A complimentary or even alternative step may be to self-publish the data results on the web. All of the data presented in this paper has been generated from a relatively simple and quick Perl program. I already operate a web-site that generates web-pages using Perl as a script. It would only take a day or two's effort to make the program available and usable from a web interface. This would allow others to investigate the data. Potentially this would have the benefit of stripping the data from my own interpretation of it: this may gain it wider acceptance.

Conclusion

The bottom line of this conclusion is that there is no conclusion. Having established five different metrics to measure the distance between two books not one of them insists that combining Isaiah and Deutero-Isaiah should be the top priority task. Clearly God has not allowed this war to be ended easily.

Equally however to claim that literary style or subject matter mandates the separation of these two books is entirely devoid of factual support. Every single metric places Isaiah and Deutero-Isaiah within the closest six books of each other. Further they also all link both Isaiahs to identical clusters of other books - usually including Psalms, Job and Jeremiah.

Accepting therefore that the data does not produce any easy answers it is worth remembering some of the details that the data suggests:

  1. The book length to vocabulary size metric places Isaiah, Deutero-Isaiah, Job, Psalms and 1 Chronicles together as having abnormally large vocabularies for their size. 1 Chronicles is probably explained by genealogy leaving the other four books as the lexical heavyweights.
  2. Isaiah and 2 Kings have a very tight coupling due to what appears to be almost duplication of accounts around the time of Hezekiah. This linkage will be ignored in the remainder of this conclusion.
  3. The measure of unique words between book pairs was shown to be highly sensitive to words constructed in a particular culture and time. Using this measure Deutero-Isaiah clusters with the wisdom books Psalms, Job and Proverbs. It links tightly to Isaiah and somewhat less tightly to Jeremiah.
  4. Unique word pairings moves somewhat away from time and culture and shifts more towards subject matter. This loosens the links between Isaiah, Deutero-Isaiah and the wisdom books and tightens the links of both to Jeremiah. Ezekiel also comes into the frame at this point.
  5. Unique word triples moves further away from time and culture in favor of subject matter. However it also appears to select more towards style. The historic books link together, the law books link together, the wisdom books link together and so do the prophetic books. Both Isaiahs keep their links to Jeremiah, Ezekiel and each other. They also both gain a strong link to Zechariah. They have one notable difference: Isaiah links to Micah whereas Deutero-Isaiah retains a link to Psalms.
  6. Vocabulary overlap percentage ignores the tell-tale signs in favor of overall impression. The correlation between the overlap percentages for Isaiah and Deutero-Isaiah are startling. They produce almost identical lists. The books they show are joined together are Psalms, Job, Jeremiah, Isaiah and Deutero-Isaiah. Further these links form most of the seventeen strongest links in the Old Testament.

We see therefore that in terms of culture and vocabulary Deutero-Isaiah was closely related to books written or extant between 900BC and 600BC in Jerusalem. It is almost entirely devoid of any tell-tales of someone that had been to Babylon. In terms of subject matter he closely parallels Jeremiah and Isaiah and is looking somewhat towards the vision of Ezekiel. In terms of style he is very much a prophet and his style closely followed that which would have been heard between 800 and 600BC in Jerusalem.

Based purely upon literary style it would therefore appear that whilst Isaiah and Deutero-Isaiah may have been different people Deutero-Isaiah appears to have lived and worked in or around Jerusalem sometime between Isaiah and Jeremiah. Of course a perfectly acceptable alternate conclusion may be that over the decades that Isaiah lived and worked he slowly evolved his vocabulary and subject matter.

Appendix 1 - Book Length Results

Book Name Book Length Unique Words
Genesis 15098 1779
Exodus 12255 1390
Leviticus 8326 931
Numbers 12612 1413
Deuteronomy 9822 1407
Joshua 7285 1125
Judges 7224 1159
Ruth 903 273
1Samuel 9605 1205
2Samuel 8063 1263
1Kings 9672 1248
2Kings 9101 1206
1Chronicles 8773 1922
2Chronicles 10246 1353
Ezra 3266 943
Nehemiah 4236 1032
Esther 2291 432
Job 6300 1633
Psalms 15937 2138
Proverbs 5945 1291
Ecclesiastes 2041 543
SongofSongs 989 458
Isaiah 7730 1924
Deutero-Isaiah 5354 1316
Jeremiah 15373 1870
Lamentations 1237 550
Ezekiel 13756 1666
Daniel 5062 1068
Hosea 1768 681
Joel 753 357
Amos 1549 600
Obadiah 215 138
Jonah 498 215
Micah 1078 535
Nahum 458 325
Habakkuk 525 357
Zephaniah 584 338
Haggai 444 172
Zechariah 2275 684
Malachi 626 271

Bibliography

BOOKS

INTERNET REFERENCES:

PAPERS

Tweet  

JavaScript Not Supported.

JavaScript Not Supported.

JavaScript Not Supported.

The Christian Counter

The Fundamental Top 500