MAC_BuildCase - An ECL Macro to Write ECL!

Every now and then one is faced with writing some very dull bits of ECL; or of solving a trivial problem a complex way. A good example is the "number to string conversion problem". I have a string field that contains some static and relatively limited number[1] of different string values; I don't want to waste the space of a string so I want to numerically encode the string and I might need to get back from number to string. The ECL CASE statement is tailor made for this kind of opportunity: it is simple and lighting fast.

If one wishes to code and un-encode the books of the Bible one can use:

Figure 1

Easy enough to code and for a list of 66 elements I can probably get to the end without getting bored; but if the list were a hundred or thousand elements long it would rapidly become tedious. I have a computer to do the tedious jobs; so why can't the computer write this code for me? Actually, it can!

Enter the ECL macro; this is one of those capabilities that can make a structured programmer cry. It is a slightly dangerous capability; you can use them to write horrible code. But for little utilities that you might wish to execute against a wide variety of datasets they can be terrific.

In this case I want to create a macro which will take a data file that has at least two columns; one containing the strings I want converted to integers and another containing the integers I want converted to strings. I want the macro to define two labels for me: one to go from integer to string and one from string to integer. The macro definition looks remarkably like the description I just gave:

A macro executes in the lexical context from which it was called. In other words; it might be stored in a particular module but it executes as if it lives somewhere else. Therefore one of the first things you usually wish to do is make sure that all of the imports you need exist.

The other risk of this 'executing in their context' is that it is possible for labels you use to clash with labels they use; this is particularly likely if they call your macro twice in the same context. For this reason you will often see macros littered with '#uniquename' - this allows you to define a symbol which will be a unique label in the context it is used in[2].

Early on in any macro I like to convert the parameters I am expecting into the parameters I wish they would have given me[3]. In this case I am going to ensure there are no duplicates in my data and I am going to sort it so that the lowest number comes first:

If you look at my target output (Figure 1) you will see that the 'meat' of the work is converting the incoming data file into a sequence of conditions such as: 1 => 'Ge' and in reverse: 'GE' => 1. The first step in doing all of them is being able to do them one by one. Therefore we construct a record structure that can take one record and convert it into two strings; one for each conversion direction:

That may look a little ugly; but it is really just a series of strings, string functions and concatenations. Note the (STRING) casts on the FieldText field; theoretically I could assume the user would be bright enough to only pass in strings as the string field. But one of my top rules of macro writing is to assume the user is trying to generate syntax errors; and defend.

Now that I can convert each individual record into strings I have to concatenate the strings for every record: I can do this with a ROLLUP.

A couple of points to note here:

First the TABLE statement inside the ROLLUP. It is generally good practice to use one pre-defined function per line and break things into lots of definitions; but many of us 'old-pros' will ignore this practice for something as simple as a TABLE statement. One of the reasons we do this is that the ECL optimizer will often strip TABLE statements out of the graph completely and roll the functionality into a previous or subsequent function.

The second is that ROLLUP with a TRUE as the condition. On a multi-processor system this would be a moderately horrible thing to do - it is a global operation and would pull all the data onto one node. However, given this is a utility macro designed to be executed on a tiny amount of data - this is perfectly acceptable.

All that remains is to wrap the outside of the case statements around the heart that we have just produced:

Probably the most novel thing in the above is the #TEXT statement. Generally if you use a macro label inside a macro then the label is considered to be a token. Therefore if I pass in a parameter foobar and use the corresponding macro label it is exactly as if I typed foobar into that place in the code. However if I use #TEXT(macro-label) then it is exactly as if I had typed in an ECL Constant String containing the letters foobar. In the case here it allows me to give meaningful names to the functions I am creating.

The other thing that might be a little surprising is the use of %s[1].Val and %s[1].Rval. The result of the previous rollup is a dataset with 1 row; that is not quite the same as just having a row. Therefore I do %s[1] to actually extract my 1 row from the dataset - and then I can get at the two fields.

The upshot of our labors is that if save this macro in a folder called UT we can then execute:

using the datafile we ingested in our Bible Search file. I actually get back the two lines given in figure 1.

The full text of the macro is given here in cut & paste able form:

/* MAC_BuildCase takes a file with at least two columns: 
 one labeled field text
 one labelled field-num
 The result of the macro is two labels each of which contain a string which is valid ECL.
 The first will map from values of FieldNum to values of FieldText
 The second will map from values of FieldText to values of FieldNum 
Sample Usage:
 IMPORT KJV,UT;
  ut.MAC_BuildCase(KJV.File_KJV.Txt,Book,BookNum,O,RO);
  O;
  RO
*/
EXPORT MAC_BuildCase(Infile,FieldText,FieldNum,OutVal,ROutVal) := MACRO
  IMPORT Std.Str AS *;
  #uniquename(ded)
  %ded% := DEDUP(SORT(Infile,FieldNum),FieldNum);
#uniquename(r)
%r% := RECORD
	 STRING Val := (STRING)%ded%.FieldNum + ' => \'' + TRIM((STRING)%ded%.FieldText) + '\'';
	 STRING RVal := '\'' + ToUpperCase(TRIM((STRING)%ded%.FieldText)) + '\' => '+(STRING)%ded%.FieldNum;
 END;
#uniquename(Conc)
%r% %Conc%(%r% le,%r% ri) := TRANSFORM
	 SELF.Val := le.Val + ',' + ri.Val;
	 SELF.RVal := le.RVal + ',' + ri.RVal;
 END;
#uniquename(s)
%s% := ROLLUP(TABLE(%ded%,%r%),TRUE,%Conc%(LEFT,RIGHT));
OutVal := 'IntTo'+#TEXT(FieldText)+'(INTEGER i) := CASE(i,'+%s%[1].Val+',\'?\');';
ROutVal := #TEXT(FieldText)+'ToInt(STRING s) := CASE(ToUpperCase(s),'+%s%[1].RVal+',0);';
 ENDMACRO;
This macro will be used in our Bible Search series - Part III

Tweet  

The Christian Counter

The Fundamental Top 500