Strong's Encoding the NKJV New Testament


Detailed Study of a Biblical Translation

One of the largest obstacles facing an English speaking believer aiming to faithfully exegete scripture is the inability to accurately comprehend the language in which they are written. This is exacerbated by the fact that there are three original languages in which the Bible was composed and that none of them even share the same characters as English let alone any common vocabulary. The Bible student is therefore faced with the task of learning two or three very foreign languages, picking a translation or two which they hope will accurately model the original or entirely forgoing the richness of deep Biblical exegesis.

Entirely aside from the severe damage done by individual believers abandoning detailed exegesis there are some significant effects of the decision made in church life. If it is decided by the local body, as it is by many, that effectual Biblical interpretation can only be performed in the original languages then immediately a two class hierarchy develops between those that can actually understand scripture and those that are forced to ask others to settle detailed points of interpretation. This naturally leads to a form of priest craft and one which exalts the intellect and education above other gifts.

The evangelical reaction against this almost Catholic two-tier church is the selection of an English translation which is either implicitly or explicitly deemed to be the 'perfect' Word of God. This shift can happen in two startlingly different ways. One manner is for the church to move from position where the words of God matter to one where the gist of what God is saying is adequate. This allows for approved dynamic translations[1] to enter into the congregation which render the Word entirely comprehensible to the congregation without additional assistance. The alternative is to pick a translation, usually the KJV, and assert that it is the full and received Word of God and that it itself can be studied in minute detail and that it will provide all the richness that the original autographs would have done.

One of the scariest yet most useful tools for anyone steeped in a tradition of KJV based biblical study is the Vine's Concordance[2]. I remember to this day my shock when I first discovered the vast number of different Greek words that are often translated by a single word in the English. For example there are 34 different Greek words rendered by the English term 'take'[3]. Conversely many of those Greek words are in turn rendered by different English words on different occasions. Working down the first five meanings of 'take' we find that the Greek word rendered 'take' is also rendered one of: Accept, Receive, Hold and Apprehend.

Of course the benefit of Vine's concordance is that it allows the English speaking reader to ascertain more accurately the meaning of the original. Of all of the shades of meaning that a given English word may have, which is the one that was intended. There are three primary downsides to the Vine's concordance. Firstly it is keyed entirely to the KJV and is therefore of little use to students of other translations. Secondly it is the work of one man and it is only his opinion that one is able to obtain with regard to the meaning of the original. Thirdly whilst it gives the possible meanings of each Greek word an English one can render it does not always specify which Greek word is present in any given Bible verse[4].

The Benefit of the Strong's Number

The three principle deficiencies in Vine's concordance are all solved by the Strong's number. The Strong's number is essentially just a number that was used to encode every Greek word in the New Testament and another sequence that was used to encode the Hebrew of the Old. Traditionally a believer could look up an English word in Strong's concordance, scan down to find the verse in question and then read off the Strong's number. This number then acted as an index into a Greek lexicon that Strong had also provided.

It will immediately be seen that Strong's has catered for the third deficiency of Vine's. However a little thought shows that the other two were solved too, although possibly by accident. The Strong's number acts as what a computer scientist would term as an abstract interface between the English translation and a Greek lexicon. Whilst Mr Strong produced his index and verse list for the KJV it could equally well be applied to any translation; or even the Greek itself. And not matter how the number is derived it still represents the same Greek word and thus the Greek lexicon that is indexed by the number still applies.

It is however the solution to the second constraint that has promoted Strong's from a useful trick to an invaluable resource. Having produced a way that an everyday English reader could get to a Strong's number from his authorized Bible the way was open for other scholars of the original language to produce works that were keyed to the Strong's number. Then did not need to redo the work of indexing more than thirty thousand verses of Scripture; instead they could focus upon the work they wished to do in the original language.

Modern technology has taken the benefit of the Strong's number to the next level. The most laborious part of using Strong's system was the initial concordance lookup to obtain the Strong's number. Whilst I have always used a Strong's concordance I would need to be interested in a word to do the work. Modern Bible programs show the Strong's numbers inline in the KJV text. Simply moving a mouse pointer over the number is enough to pull up the meaning of the underlying Greek work in my lexicon of choice. As the level of effort has reduced my willingness to use the process has naturally increased.

Overview of Strong's Encoded Materials Available

Historically there have been other numbering systems in use such as the Thayer system and the Goodrick/Kohlenberger system. However these alternative systems have always produced 'maps' to allow them to be accessed via Strong's and with the use of technology this mapping is now seamless. Perhaps more significantly the alternative systems appear to be withering. For example Goodrick and Kohlenberger are co-authors of a new 'Strongest Strongs' which is a modernized and computerized Strong's concordance.

There is a large range of Strong's encoded reference materials available and this range appears to be growing. There are at least seventeen Greek of Hebrew lexicons or word studies available today that are keyed to Strong's numbers[5]. Particularly noteworthy is that the Brown-Driver-Briggs Hebrew-English Lexicon is Strong's encoded as is the Theological Dictionary of the New Testament. Between the breadth of existing material and the apparent move in the direction of Strong's I believe it is safe to say that Strong's is here to stay.

Translations Available with Strong's encoding

The original Strong's exhaustive concordance and Strong's numbers were keyed to the King James Version of the Bible. As well as still being available cheaply in paper format this is widely available online and cheaply or freely available for almost every Bible software program there is. Of the more modern versions the NASB is the only major translation with an inline Strong's capability that is available both in print and in software[6]. The NET Bible is producing a Strong's encoded version although it is currently only available in draft form[7]. In printed form there is a NIV version of the "Strongest Strong's".

Whilst this list is long enough to ensure that just about everyone may access the Strong's system if they wish it is still the case that working extensively with the Strong's numbers essentially forces one to use the KJV or NASB. In fact given the former is essentially a translation of the Textus Receptus and the latter of the Nestle-Aland text it can be argued that there really is no choice. In particular there is not a Strong's encoded translation that is based upon the Majority Text and there is not a modern translation keyed to the Textus Receptus.

Given that the Strong's system is a vital bridge between the English and Greek translations it is a shame that the bridge only exists for a small handful of the large range of translations that exist. The purpose of this paper is to propose a mechanism that should greatly reduce the effort required to port the other English Bible translations over to using an inline Strong's encoding. In addition it documents some early experimentation done to establish the feasibility of the proposed system.

Translating English to English

The principle observation upon which this proposal is based is that there already exist three English translations which have been Strong's encoded. I believe that this significantly reduces the amount of work involved in producing the fourth and subsequent Strong's encoded Biblical text. The reason is quite simple; for the first Strong's encoding it was necessary to establish the connection between the English language and the Greek[8] language. The encoding was then performed upon the Greek and could thus be copied back into the English. The linguistic distance between Greek and English is huge; the two languages different in script, grammar, tense and vocabulary. Bridging the gap is a significant undertaking. However the linguistic distance between two different English translations should be much smaller and thus easier to bridge.

Performing a two stage translation

Consider the problem of matching the following to the underlying Greek:

For we do not wrestle against flesh and blood, but against principalities, against powers, against the rulers of the darkness of this age, against spiritual hosts of wickedness in the heavenly places. (NKJV)

The problem is fairly daunting. However if we now consider the KJV with inline Greek[9] for the same verse:

Eph 6:12 For <hoti> we <hemin> wrestle <pale> not <ou> against <esti> <pros> flesh <sarx> and <kai> blood <haima>, but <alla> against <pros> principalities <arche>, against <pros> powers <exousia>, against <pros> the rulers <kosmokrator> of the darkness <skotos> of this <toutou> world <aion>, against <pros> spiritual <pneumatikos> wickedness <poneria> in <en> high <epouranios> places.

We immediately see that the vast majority of the words are identical and can simply be copied across. There are some differences such as the order shift between 'we wrestle not' and 'we do not wrestle', the change of the word 'age' for 'world', the NKJVs insertion of 'hosts of' and the switch of 'heavenly' for 'high'. However an astute English reader would be able to construct the NKJV with inline Greek for this verse without significant problem.

In fact in terms of tackling the problem we can initially ignore the Greek side of the equation completely. The task becomes to match the English words to the English words; once that is done one can copy back the Greek.

Worked Example - Eph 6:16

Here is a worked example showing the intermediate stages using Eph 6:16

Eph 6:16 Above <epi> all <pas>, taking <analambano> the shield <thureos> of faith <pistis>, wherewith <en> <hos> ye shall be able <dunamai> to quench <sbennumi> all <pas> the fiery <puroo> darts <belos> of the wicked <poneros>. (KJV)

16 above all, taking the shield of faith with which you will be able to quench all the fiery darts of the wicked one. (NKJV)

Step 1: Match the NKJV words to the KJV ones:

Above Above
All All
Taking Taking
The shield The shield
Of faith Of faith
Wherewith With which
Ye shall be able You will be able
To quench To quench
All All
The fiery The fiery
Darts Darts
Of the wicked Of the wicked one

Step 2: Append Greek equivalents to the table based upon the KJV column:

Above Epi Above
All Pas All
Taking Analambano Taking
The shield Thureos The shield
Of faith Pistis Of faith
Wherewith En hos With which
Ye shall be able Dunamai You will be able
To quench Sbennumi To quench
All Pas All
The fiery Puroo The fiery
Darts Belos Darts
Of the wicked Poneros Of the wicked one

Step 3: Copy out the NKJV with Greek inlined.

Eph 6:16 above <epi> all <pas>, taking <analambano> the shield <thureos> of faith <pistis>, with which <en> <hos> you will be able <dunamai> to quench <sbennumi> all <pas> the fiery <puroo> darts <belos> of the wicked one<poneros>. (NKJV)

Technologically Assisted Translation

Another central tenet of this proposal is that it will be possible to use technology to perform a significant about of the work for this mapping. Note that I am specifically not claiming that this can or will be an automated process. Eventually it will need human intervention and it will probably need some intervention from skilled linguists. However the hope is that technology will be able to reduce the need for human intervention by a sufficiently large amount that this project becomes reasonable.

There are at least two ways in which technology could assist greatly in this project. The first would be to administration a form off mass workforce approach. One could for example piece the Bible up into chunks of 20 verses and then recruit online volunteers to take these chunks of verses, perform the mapping and then submit them. The administration program would ensure that every chunk of verses was given to at least eleven different people. A given chunk would then been deemed to have been mapped correctly if 10 of the 11 return the same responses. The program could also track good and bad workers and escalate those chunks of scripture which were consistently mapped incorrectly.

The second way is the one I wish to look at further in this paper. The concept is that we use detailed analysis on a verse by verse basis to attempt to perform the English to English mapping automatically. The program would track how many words and verses it had managed to full map and would be able to display the verses that it had not been able to match to allow further rules to be written to improve the matching.

The advantage of such a rule based system is twofold. Firstly it ensures that all of the verses are mapped in a consistent fashion; this will greatly reduce the number of odd errors that linger needing to be flushed out. Secondly it produces a secondary work product; the rules themselves. These should give interesting insight into the nature of the two translations being compared.

Measuring Correlation between Two Documents

As one of the purposes of this paper is to scope the work involved in performing a Strong's encoding of an alternative English translation we need some measures to define just how much work is involved. Equally for each rule or stage in the process that we accomplish we would like to know what exactly we have achieved and what we have left to achieve. I believe there are three metrics that will combine to achieve this goal:

  1. Number of words mapped. This is arguably the most fundamental unit. Of all of the words that exist to be correlated how many have been done successfully. This can be a simple percentage taken as Words Mapped / Words in document
  2. Number of distinct words mapped. There may ultimately been need to dig in on a distinct word by distinct word basis to pick or verify that a given English word is equivalent to another one. Thus a measure of the percentage of distinct English words mapped may be useful.
  3. Number of verses fully mapped. Whilst the first measure gives an accurate indication of the amount of the document mapped it gives no indication as to whether the problems are broadly scattered or focused into a small part of the text. By counting the number of verses fully mapped we have a good indication as to how widespread the problems presently are.

Establishing Points of Correlation

The approach that I am going to take in the early experimentation on this project is to try to establish points of clear correlation between two verses and then hopefully be able to expand from these making low risk assignments. Again this is best seen by example, this time Eph 6:15. Those parts highlighted are clearly equivalent and can thus be equated leaving the two un-highlighted sections as those that need resolving:

And your feet shod with the preparation of the gospel of peace; (KJV)

and having shod your feet with the preparation of the gospel of peace; (NKJV)

As this example shows one of the easiest ways to establish points of correlation is simply to work from both ends and see how far you can get with perfect matching. This is simple to do at a machine level and should not be prone to errors. The un-matched phrases that are left are then presumed to correlate to each other.

In is important to understand that reducing the size of these correlation groups radically improves the accuracy of translation of the harder sections. Consider for example that a single correlation group is left containing eight words in both translations. Then theoretically[10] there are 40320[11] possible mappings between the two translations. However if two correlation groups were left, each containing four words, then each correlation group could only be resolved twenty four ways leaving at most 576 combinations.

In order to increase the number of correlation points, and thus reduce the size of any ambiguous sections, it is possible to look for relatively rare words forming the mid-point of a sizeable island of words. For example:

And, ye masters, do the same things unto them, forbearing threatening: knowing that your Master also is in heaven; neither is there respect of persons with him. (KJV)

And you, masters, do the same things to them, giving up threatening, knowing that your own Master also is in heaven, and there is no partiality with Him. (NKJV)

The need for the rare word is to reduce the chance that a sequence of words has repeated that is the translation of a different piece of Greek. The need for the sequence to be lengthy is to avoid problems were the English word order has shifted which might otherwise leave words in the wrong correlation group.

Translation Selection

Reasons for Choosing the NKJV

There are a number of reasons that I chose the NKJV to do my preliminary investigations. Some are related to the desirability of having a Strong's encoded NKJV and others are related to the fact that the NKJV may prove to be one of the simpler translations to encode. Notwithstanding the legitimate reasons which follow I should also probably declare a personal bias and state that I like the NKJV and therefore it is a translation that I want to work with.

The reasons that the NKJV is a particular desirable translation to Strong's encode are:

  1. It is a major translation
  2. It is a modern translation
  3. The NKJV is a natural next translation for people that use the KJV and whom are therefore used to using the Strong's system.
  4. It has many linguistic similarities to the KJV. Thus Strong's encoded resources which often reference the KJV will not appear abnormally strange.

Factors that hopefully render the NKJV relatively easy to encode are:

  1. It is one of the more literal translations. The more literal a translation is the more tightly individual words are bound to Greek equivalents rather than entire English phrases being bound to collections of Greek words.
  2. There has clearly been a deliberate effort on the part of the NKJV translators to follow the KJV when possible. This should make a mapping from the NKJV to KJV particularly easy.
  3. The NKJV uses a Greek text which is similar to the one used by the KJV. This topic is sufficiently central that it will be dealt with under the next heading.

The Greek behind the Translation

So far this paper has made one fundamental assumption that is regrettably not entirely valid: that two English translations are attempting to render the same underlying original Greek. The Strong's numbers are numeric indicators of the underlying Greek words; thus if the underlying Greek between two verses is not the same then the Strong's numbers should not be either. There are three principle texts from which English Bibles are translated: the Textus Receptus, the majority text and the Nestle/Aland (or Critical) text.

The KJV is translated from the Textus Receptus (TR)[12], the NKJV is translated from the Majority Text and most other translations are derived from the Nestle/Aland text. The Majority Text and TR are the closest of the texts to each other with 1005 translatable differences[13]. The TR and Nestle/Aland differ in a manner which is translatable 3323 places. It can be seen that failure to take this into account when attempting a two-stage translation would result in a significant number of errors.

For the purposes of the remainder of this paper I am going to ignore this problem noting simply that by picking two translations based upon the TR and Majority text I am potentially introducing a thousand errors. Attempting to move from the KJV encoded numbers to another translation would potentially introduce more than three thousand. I will also note in passing that provides a mapping at an English level between the TR and majority texts. I would expect that this could be used to identify those errors introduced and hopefully facilitate rectifying them.

Other Translations

Eventually it would be good if it were possible to migrate other translations to Strong's numbers. A key question that would need to be answered is: 'Can the NET and NASB Strong's encodings be legally used to do so?' If they can then the sensible approach would be to use one of those to translations as the basis for an 'English to English' mapping. If they cannot then a mapping would need to be attempted from either the KJV translation or from the NKJV one that had been produced. The latter may be easier as it will have the more modern language.

However, when considering the Strong's encoding of different translations one will also need to consider the extent to which the target translation is dynamic. The more dynamic the translation the greater the extent to which the translator has allowed themselves to translate at a phrase by phrase or even concept by concept level. This puts a greater linguistic distance between the original Greek and the English and it may make a direct mapping to Strong's numbers impossible.

Perhaps a related concept to the above is the relevance of a Strong's encoded version of a particular translation. It may reasonably be assumed that people currently attempting deep word level study in an English translation are likely to be using one of the less dynamic translations; probably something as literal or more literal than the NIV. It is possible that people that are currently doing deep study in English are most likely to benefit from considering the underlying Greek. People content to stay with a highly dynamic paraphrase are probably not going to find the Greek text books particularly fulfilling.

Available Materials

Texts that are Available Electronically

One of my principle motivations to consider this problem was the conviction that it would be relatively easy to perform a feasibility study using readily available resources. As part of a fuller project extra time would need to be spent validating where the best copies of individual texts can be obtained from. In particular it would be ideal to obtain an NKJV text directly from the publishers. However for the purposes of this exercise I acquired the texts as follows:

Programming Language Selection for Text Processing

Whilst the language processing that this project would eventually involve is theoretically advanced the programming concepts involved are not. Reading some twenty megabytes of text is something that a home PC can do it under a second using almost any language. The process is linear and can be accomplished in batch mode. Further the eventual amount of coding will be trivial by modern standards. Thus most of the advances made in programming language theory and practice in the last twenty years are not really going to be required to tackle this project.

As there was no compelling requirement for any given approach I instead deferred to a couple of pragmatic considerations. Firstly it may be useful to be able to showcase the results of this work; this is most readily done on the Internet. Secondly I already have a substantial amount of Perl code that processes Biblical texts that I use for my own website. The result of these two considerations is that I chose to use Perl for this proof of concept; almost any language could be used if the project were undertaken seriously.

Legal Considerations

The NKJV license permits[14] any amount of private use of the NKJV text. The text can also be published provided that no more than 50% of book is published and that the published NKJV material does not constitute more than 50% of the published material. This proof of concept falls comfortably within those bounds. In the strictest sense it would appear possible to even publish pages of NKJV text under this license. However if one wished to produce a version of the NKJV with Strong's numbers inline it would be wise to make suitable arrangements with Thomas Nelson publishers first.

There is no copyright on the KJV outside of the United Kingdom; work would need to be done to see if the Strong's encoding of the KJV is under any form of restriction. In addition one may need to compare many copies of the KJV to ensure that the version eventually chosen was one that was deemed credible by the broadest section of the community.

Formatting for Text Processing

Whatever the source of the textual version a necessary precursor to performing an 'English to English' mapping is to transform the texts into a common format to allow them to be readily compared. This process is usually referred to as ingestion and the process normally has to be written for every file that is going to be used. Ideally the process is run once at the start of the project and the output of the process is then used for the remainder of the project.

The format I have personally standardized upon has one verse per line and requires the lines to be ordered to follow the Biblical sequence. Each verse is preceded by an 11 character descriptor that defines the verse that follows. The format is BB:CCC:VVV where BB is a two character book number, CCC is three characters for the chapter number and the VVV is three characters for the verse number. Thus 40:001:001 corresponds to Matthew chapter 1 verse 1.

In order to validate the KJV text that came with the Strong's encoding and to validate my code that extracts the Strong's numbers and leaves behind the KJV text I compared the result of the KJV/Strong's after extraction to an independent KJV text derived from the Power Bible. The result was that there were only 44 verses in which the two texts differ. All of those deviations were caused by differences in the underlying text rather than processing issues. A hand inspection of the problems suggests that the KJV with Strong's encoding is a slightly better text than the one in the Power Bible.

The Analytic Results

One point worth noting as it effects the ultimate resourcing of a project such as this is that all of the observations that follow are the result of one 337 line Perl program written and tested during the course of one afternoon. Computer science moves rapidly; fifteen years ago I spent six months compressing the Bible text down so that it could realistically be loaded onto the PC of the day. Analytic and linguistic research is now programmatically available to just about everyone.

The Size of the Problem

The initial task was simply to collect some data to allow a measurement of the success criteria defined earlier. I therefore counted for each text (KJV, NKJV, Strong's Number) the number of verses, the number of words and the number of distinct words that existed in the New Testament. For these purposes punctuation is ignored so "peoples'" and "peoples" are the same word. The results were as follows:

Count KJV NKJV Strong's Numbers
Number of Verses 7957 7957 7957
Number of Words 188522 185822 124603
Number of Distinct Words 13978 14152 5520

A few observations are in order.

The need to map between phrases of different lengths is enough of an issue that I decided to conduct further investigation to see just how often the KJV and NKJV verses differed in length. The results are as follows (here -5 means a KJV verse has five fewer words than the NKJV equivalent):

-5 -4 -3 -2 -1 0 1 2 3 4 5
27 61 191 489 1287 2886 1549 752 367 183 68

Therefore in terms of sizing the problem we have 7957 verses 64% of which differ in length between the translations. That said 72% of verses differ by at most one word. We may thus assume that word by word mapping will be possible but may need a little work.

Initial Mappings

The initial pass to establish an English to English mapping between the KJV and NKJV was very simplistic and followed the pattern suggested previously. The algorithm started from either end and simply looked for words that were character for character identical and then stopped processing when a difference was encountered. The justification is that the complexity of mapping phrases comes after the first deviation because the program has had to decide how to deal with the difference. If two things look identical then we can reasonably assert that they are.

The results were a little startling in two different directions. Firstly we find that 64,077 words can be directly correlated simply working from either end of the verse. Put another way 34.5% of the NKJV text is identical[15] to the KJV equivalent. This also means that around a third of the Strong's numbers can come across without any further work. However the other result is that only 166 complete verses are identical between the two translations; that is around 2%. This would suggest that there are some broad fundamental differences between the texts.

Some early rule discovery

One of the harder decisions in performing this early analysis was to know just how hard to push the text to try to make progress rather than simply reporting on the issues left to tackle. I opted to allow myself three different rule additions in order to give some kind of indication as to how tractable the remaining two thirds of the differences are.

The three observations and rules I implemented were:

  1. The KJV often inserts the word �and� into lists where the NKJV does not. Therefore if there was an AND in the KJV text at the first point of disagreement and the following word in the KJV matched the next word in the NKJV then I allowed the KJV and to be slid past.
  2. There are two KJV tenses which are not in use in the NKJV. These correspond to 'becometh' and 'becomest' Any word ending with 'eth' or 'est' I allow to map onto a word with the same root ending in 'e', 'es' or 's'.
  3. In those situations where I have a disagreement between two words I record which word was compared to which other word? Then if they are rare words (<5 occurrences) and I have seen them all, or normal words and I have seen 80% or common words (>100 occurrences) and I have seen half and they have always mapped to the same word then I allow them to be called a match.

I reran the code with these three rules in place; I also counted how many times each rule activated. It should be remembered that the code fundamentally still only moves from the two ends until it meets a mis-match it cannot explain. Therefore at any given time a rule is only apply to at most two words for each verse not yet fully mapped. In other words each rule can only apply itself to at most 16,000 words on any given pass. The rule activation counts were:

Rule A activated 834 times

Rule B activated 461 times

Rule C activated 320 times

In total running with these three rules an initial pass yielded 74911 words that could be mapped and 259 'identical' verses. Thus about 40% of the English can now be translated across although only 3% of the complete verses. In fact because each pass gets you a little further along I was able to run a second subsequent pass which yielded numbers of 75404 and 267 without any further work.

Further Avenues to Explore

The headline numbers produced in one afternoon should be sufficient to justify further exploration in this area. The remainder of this paper is going to propose how one might set about causing that exploration to happen. However the design of the analytic phase did produce some further thinking that I wish to itemize for consideration:

  1. In addition to counting the number of English words mapped one should count the number of Strong's numbers that have been passed by the scans from either end.
  2. Currently the scanning only occurs from either end. One could look for 'islands' of matching words in the middle of the text that matched also. A non-common word (<100 occurrences) with at least three other attached words should form a large enough match block to form a new area for matching growth in the middle of longer verses.
  3. Loosening the 'always match' criteria for spotting common equivalences. At the moment two words always have to be being compared to each other to count as an automatically spotted match. This is too strict. At the very least if the words are equated 80% of the time and on the other 20% of occasions the word being matched is a 'furniture[16]' word the an automatically spotted match should be declared.
  4. Thus far only automated matching is allowed for. A fundamental enabler would be a user interface to allow a human to resolve any matching questions the system has and then proceeds to apply the new match that has been learnt.
  5. There are a number of very common irregular verbs that are not covered by the verb stemming logic I put in place. 'Hath' and 'Taketh' are obvious examples. Forming a table of well known irregular verb mappings may help significantly.

Next Steps

The Need for Credibility

As I have researched the Biblical texts that are available, the Strong's resources and some of the software initiatives that are out there the fact that has struck me most forcibly is the need for academic, spiritual and popular credibility for this project to have any value. In addition I believe it would need commitment from a significant player in Christian publishing to get the exposure it would need to be of any genuine benefit. For example I was stunned to discover the "Strongest Strong's" during my research. For four years a key resource had been available that I hadn't known about and I thought I kept myself up to date!

Of course with regard to the NKJV in particular the number one issue is that the publisher would need to grant permission for the results of this project to be distributed. In this regard it is a little disturbing to note that the NKJV is not one of the more widely available texts online[17]. It is also interesting that the "Strongest Strongs" publishers appear to have left the NKJV alone. Notwithstanding the first step in realizing this project would be to present the concept to Nelson's and see if they are prepared to let it happen.

One possibility is that Nelson would want to own the project to create Strong's numbers. In that case there would be no further steps other than possibly to hand over any useful information that had already been gathered. The remainder of this section however will assume that Nelson is prepared to see it happen but not help.

The Need for Human Resources

Ultimately to make this happen people will be needed to verify the accuracy of whatever the computer algorithms achieve. I suggest that because of the need for credibility this needs to be a fairly large and well respected group of people. Alternately it needs to be an extremely large group of people to achieve credibility through the ability to check. This is not unrealistic. There are individual NKJV based churches near here that could probably put 1000 educated people to this task one Sunday afternoon and the project would be done[18].

Whilst the above statement may appear a little facetious I actually think it may be a credible way to solve this problem. To recruit and organize a thousand scattered people would be a prodigious undertaking; for a mega church to task 4% of their membership for one afternoon is by no means implausible. Additionally if the resultant text had the mega-church's 'seal of approval' the popular side of the need for credibility would also be solved.

The Need for Visibility

For a project to gain momentum a method is needed to capture the imagination. I doubt there are a thousand people in one church that would read this paper end to end. However a website that had a couple of pages presenting the vision and then a page where the individual could interact with the program and correct it and help it to learn and see its progress could easily capture the imagination[19].

Technologically such a website would be easy to construct. It probably requires 7-10 days of programming and 3-5 days of presentation work. However, unless there was a significant sponsor in the form of manpower (such as a mega church) then there would be a big effort involved in driving visitors to the website. An alternative therefore would be to interest one of the big Christian websites to adopt this project as their own. Two of the more obvious candidates are and Interestingly a group that appears to have the best technical sophistication do this is although they may well be to involved in their own translation to wish to help another one.

The Need for the Actual Greek Text

One other feature of my research that caught my particular attention was the relatively large discrepancy between the Textus Receptus and the Majority Text. Whilst they may be much closer together than they are to the Nestle Aland version there are still in excess of a thousand differences. Further a reading of the introduction to a NKJV Bible makes abundantly clear that the translators did not faithfully follow the MT either. Thus the fact is that we don't actually have a copy of the Greek text from which the NKJV was translated.

In order for any translation method to work it is vital that an accurate copy of the Greek underlying the target translation is available. The most natural people to supply that would be the publishers themselves. Failing that research would need to be conducted to see if anyone has reconstructed the Greek underlying the NKJV. Failing that a book such as The NKJV Greek English Interlinear may need to be consulted; but that would be a laborious process requiring people with a knowledge of Greek.

In Summary

In summary I believe that the next major steps in making this a reality involve letter writing and phone calls rather more than technology or analysis. There are essentially three groups of people that need to be contacted: a publisher, some church leaders and some web masters. Each one would need to be convinced of the value of this project before it could easily become a reality.


This paper commenced as a plea for detailed word level exposition of Holy Scripture. It acknowledged however that one of the primary obstacles to this activity is the fact that the original Bible texts are incomprehensible to the vast majority of Christians. It was pointed out that this usually produces one of three consequences: a two tier system of believers, a disinterest in the precise Word of God or the veneration of a particular English translation. It then proceeded to use the KJV coupled with Vine's concordance as an example to illustrate the significant variance that can exist between even one of the very best translations and the underlying Greek text.

The notion of the Strong's number was then introduced. This is very simply a numeric encoding of every Greek[20] word that appears in the New Testament. Coupled to a concordance system or better yet to a modern computer program this system allows a very simple direct link from the KJV text to the exact meaning of the underlying Greek word that has been rendered. A brief survey was then given of resources available that use the Strong's numbering system. It was noted that the list was large and apparently growing at the expense of alternate numbering schemes. The catch however is that there are currently only three translations available that are fully Strong's encoded: the KJV, NASB and NET Bible.

A proposal was then made for a scheme to allow other translations to be Strong's encoded rapidly. It essentially involved a two stage translation; the first would go from one English translation to a close sibling which was already Strong's encoded. The second stage would then perform the Strong's encoding itself. It was further proposed that both stages of translation could be greatly assisted through the use of technology; both to track the work of individuals but also to assist and automate some of the simpler parts of the translation process.

There were two key points that were required for technology to provide genuine benefit. Firstly the technology had to be able to establish metrics to show progress; only through measurement would it be possible to see if the project were moving at the speed and with the accuracy required. The other was that the technology had to be able to establish points of correlation between the underlying texts. It was suggested that if the points of disagreement between two texts could be reduced in size then the error rate would drop radically and speed would improve significantly.

For the exploration phase of this project the NKJV was chosen as the target translation. This was because it was a major, modern translation that was linguistically similar to the KJV and which was also a natural next translation for people that had traditionally used Strong's numbers. It was also hoped that the literalness of the NKJV, the conscious effort on the part of the translators to follow KJV form and the perceived closeness between the underlying texts of the KJV and NKJV would also render this particular mapping tractable.

A brief overview of the issues raised by the differing Greek texts was also given. It is actually the case that a translator usually chooses the text he wishes to translate on a verse by verse basis; thus to perform this project with complete accuracy some manner of getting the correct underlying Greek is necessary. Research also showed that the Majority Text and Textus Receptus are actually not that close; although they are certainly closer to each other than they are to the critical text.

Whilst not considered at length the point was also made that other translations may lend themselves more or less readily to this dual stage translation process. The relatively large number of "KJV+" translations could probably be mapped to the KJV; others based upon the critical next may well map to the NASB or NET Bible. However the point was made that the more dynamic translations would have a very loose mapping to the underlying Greek and thus the Strong's encoding would be difficult and possibly not that fruitful.

After a brief pragmatic overview provided to allow the results to be replicated the findings of the exploration phase of the project were presented. We found that the NKJV and KJV have very similar vocabularies and that with only the simplest of coding over one third of the KJV text could be directly mapped to the NKJV equivalent. Then a process of simple, justifiable automatically applied translations was applied. The translations included skipping extra 'and' words, performing verb stemming for 'eth' and 'est' suffixes and spotting word pairs that always appeared in the same place in the text. These yielded 7% lift bringing the total amount of the text that could be translated without human intervention up to 40%.

One purpose of this paper was to explore how quickly progress could be made. The work presented was performed in one afternoon. No further refinement to the rules was attempted or presented in keeping with this time limit imposed. However it was noted that there were at least five avenues of exploration that could be attempted to improve the automated matching process prior to requiring human intervention.

The 'Next Steps' section then focused upon four major areas: the need for credibility, manpower, visibility and the actual Greek text of the target translation. Fundamental to them all if targeting a copyrighted source is the full agreement of the publisher. If that is obtained then it is believed that a well developed website coupled to a significant visible sponsor in the form of a mega-church or major website would be adequate to get the project done.





JavaScript Not Supported.

JavaScript Not Supported.

JavaScript Not Supported.

The Christian Counter

The Fundamental Top 500