|Arrays, strings and other Inform goodies
he Inform Designer's Manual (4th edition) dedicates sections 2.4 and 2.5 to the world of arrays. From this point on, arrays crop up sporadically when it's necessary to explain how some function works or how to achieve some programming somersault. Like many other topics covered by the manual, you may find that while all the necessary information is there, the author's need for brevity (in a 576 page book that wastes not one single paragraph) leaves the novice with a feeling of "things a-happening beyond my grasp". This article aims to provide the novice to Inform with a little background material in order to more easily understand arrays.
Since the time when this article was written, Andrew C. Plotkin (aka. Zarf) has brought forth Glulx, a new Inform Virtual Machine which lifts the memory restrictions of the Z-Machine -- among other things. This made some parts of the text incomplete and some code examples "disasters waiting to happen". We have tried to bring the article up to date to cover Glulx compatibility and reduce disasters to the minimum. Even if you don't know what we are talking about, don't worry. Nothing that you read here will be hazardous to your games.
1. Basic concepts
An array is nothing more that an area in memory where a collection of data items can be stored sequentially, one after the other. Each item is called an "entry" in the array, and can be identified by the position it holds in the list. Example:
The "position it holds" may be called "entry number" or "index" of the array, while the data "items" stored are simply "entries". Therefore we can say that, in the above example, the third entry is 127, at position/index/entry number 2. Note that in the example, the first index is 0. Some languages use 1 instead as the first index of an array. Inform uses both depending on the particularities of the array, as we shall see.
Unfortunately, in order to understand how Inform manages arrays, it's useful to know a little bit about how computers store data in memory.
You are probably aware that the most common information unit in computers is the byte. One byte equals 8 bits, and since bits can take two values (0 or 1), there are 256 different bit combinations in each byte (2 to the power 8). This implies that if we want to store numbers in one byte, we can only handle quantities between 0 and 255. If we wish to store a single character, we can only code 256 different characters (which sounds good enough for an European alphabet). In fact, when storing characters, each one is encoded through a number (its ASCII code if it's from the American alphabet, or its ISO 8859-X code for other West and Central European alphabets -- which include local variations like the Spanish 'ñ'); that means that the item stored in the byte is always a number.
Computers frequently combine several bytes to store larger data items. For example, using two bytes together gives you 16 bits instead of 8 and, therefore, 65536 combinations (2 to the power 16). This is useful to encode characters from exotic alphabets which offer more than 256 symbols, like the Unicode standard, or for larger numbers (in the range from 0 to 65535). Computers and modern software may use more bytes (3, which means 24 bits; or 4, which implies 32 bits). This combination of bytes is called the "word size", so we may talk about a two-byte word size (16 bits) or a four-byte word size (32 bits).
Inform may compile for the Z-machine, which is a Virtual Machine (VM)designed by Infocom in 1.979 -- when memory was scarce --, or for Glulx, which is a new VM designed by Andrew Plotkin with Year-2000-Computers in mind. The Z-machine uses a two-byte word size while Glulx handles a four-byte word size. Most of your Inform code will compile fine for both (an appealing thought), but we must be careful when applying some calculations. We'll address this issue a bit later; simply bear in mind that when we mention two-byte word size or 16 bits in relation to the Z-machine, you may also read four-byte word size or 32 bits in relation to Glulx.
Inform -- which was originally conceived to recreate the Z-machine -- uses extensively this two-byte word size; in fact, every data type managed by Inform is stored in two bytes -- numbers, objects, some text strings, dictionary words... all of them are encoded in a 16-bit number and stored in a "word" (for those conversant with 'pointers', let us say that everything in Inform are 16-bit pointers to memory addresses on the Z-machine). Single bytes are best used to store single characters.
When we wish to store an array in memory, we'll have to tell Inform if the items will be bytes (small numbers) or words (bigger numbers). For single characters, one byte should be enough, but if we have numbers or other data types in mind, words are much better. In fact, words may handle all kinds of data; the only advantage of a byte array is that it economises on memory.
2. Array declaration and usage in Inform
To declare an array, all we have to do is type the word "Array" followed by the name we wish the array to have and the symbol -> (for byte arrays) or --> (for word arrays). Finally, we indicate the number of entries (items) that our array will handle.
2.1 Byte Arrays
would declare an array with four entries, each of which will be a byte, since we have used the -> notation.
This generates a memory area with a size of 4 bytes, ready to store four small numbers.
***IT'S VERY IMPORTANT TO REMEMBER THAT BYTE ARRAYS START TO NUMBER THEIR ENTRIES FROM 0***
Let us suppose that we wish to assign the following items in the above array.
The last number can't be stored in a single byte, since it goes beyond the 255 limit, so we're sure to run into trouble. We'll soon see what happens.
These numbers can be assigned to the array in the following manner:
We've said that the last entry is too big for a single byte. This won't cause a complaint about the overflow problem at compile time, though. Inform will accept the array if its syntax is correctly written but, because of the way numbers are stored in bytes, our 1000 has been transformed into a different number. This will cause bugs at run-time that will be difficult to diagnose.
That's why number storage is better reserved for word arrays. We'll keep byte arrays just for single characters -- and they're only worth the effort if you really must save on memory. Let's try another example with the same array declaration.
Now we want to store the four characters which form the phrase "Fish". To accomplish this, we assign as follows:
Observe the apostrophes -- or single quotation marks -- ('), as opposed to double quotation marks ("). Just like in C, a single letter surrounded by apostrophes is understood as "the ASCII code for this letter". (Note: If you put more than a single character between apostrophes, the compiler thinks you are defining a dictionary word. The use of single or double quotation marks is one of those syntactic niceties in Inform which may require another article, but until then, look up http://www.firthworks.com/roger/informfaq/ for the basics if you are interested).
We could now try to print these letters on the screen, perhaps with the aid of this loop:
which means: set the variable i equal to 0; print the ith entry in the array and increase the value of i by one; repeat while i remains less than four.
We expect the phrase "Fish" to appear on the screen, but what we actually get is 70105115104. Hey, what's going on? Well, each of the 'F' 'i' 's' 'h' letters has been encoded in the array with a number (its ASCII code), 70 for 'F', 105 for 'i', 115 for 's' and 104 for 'h', and that's what the print statement has poured onto the screen, all glued together (print won't use spaces unless we tell it to.)
How do we effectively print the actual characters? We must politely indicate to the print statement that the item to output is a character (print is unaware of this, since we could have stored numbers in the array -- unless we state our wishes otherwise, print will always assume that we want to display a number.)
The correct loop would be:
And behold! We now succeed in getting Fish.
This way to assign data items to the array is a bit tiresome. Fortunately, Inform accepts a much more comfortable syntax to achieve the same result:
Note that in this case we omit the number 4. Inform will count how many letters you wish to store in the array and will reserve the required space automatically.
There's even a shorter syntax for this:
So instead of specifying each letter separately surrounded by single quotation marks, we put them all together between double quotation marks. Once again Inform counts four letters and reserves the required space for them.
In all of these examples, it has been the programmer's task to count how many letters there are in order to code the printing loop -- you have to indicate in the loop that the index goes from 0 to 3. This is tedious and error-prone (go on: make a byte array and a printing loop for "supercalifragilisticexpialidocious" if you don't believe it) and that's why string arrays exist.
2.2 String arrays (special byte arrays)
If we put "string" (with no quotes) instead of the -> symbol, we create a byte array with a special structure which happens to be quite useful.
This array declaration is just like the one we made before, but this time Inform creates a memory area for *5* bytes (even though we have asked for 4). The extra byte is used to store how many more bytes the array has. In our example, the extra byte would hold the number 4 to indicate that the array has four free entries.
This extra byte is always stored in the 0th entry of the index, and the remaining bytes will go between 1 and 4. ***IT'S VERY IMPORTANT TO REMEMBER THAT STRING ARRAYS NUMBER THEIR ENTRIES FROM 1*** -- unlike byte arrays, which started from 0.
Thus, if we want to store the letters "Fish" as in the previous examples, the tedious method of assignment would be:
which of course may be shorthanded with the known variations, letter by letter...
...or "all together now":
With either "shorthand" method, Inform discovers that there are four letters and therefore stores number 4 in the 0th index entry of the array. This way the programmer doesn't need to know how many letters form the chosen phrase, because it can always be read from the 0th entry. The loop needed to print the stored letters would be:
Please notice that the variable i now begins with a 1 (and not 0, like before) and goes on while its value is less or equal than a_few_bytes->0 (the index entry that holds the length of the string). Number 4 has disappeared from the loop, which is now generic and can print strings of any length.
It's always a good idea to optimise code if you know how to. Suppose you have several arrays and you wish to print their characters: you could copy and paste the above loop and change the name of the array for each one of them, or you could create a general-purpose printing routine:
We now have a routine named char_array that defines two variables, arr (which will be fed with the name of the current array) and i. We could make a call to the routine from any point in our code, using either of these forms:
This way, the code necessary for printing characters stored in an array only happens once in the whole program.
2.3 Word arrays
Storing single characters has its uses, but the time comes when we wish to work with numbers, pointers (to functions, objects, messages) or dictionary words... which is all the same to Inform, since everything is encoded as 16-bit numbers (when compiling for the Z-machine) or 32-bit numbers (when compiling for Glulx). A byte array is not enough to handle these data types. What we need now is a word array.
Word arrays can be declared in the same manner as byte arrays, but we use the --> notation instead of ->.
which creates a word array with four entries:
Now Inform, when compiling for the Z-machine, reserves an 8-byte memory chunk (because we want four elements, and each one of them requires two bytes).If, on the other hand, you compile for Glulx, the memory area needed will be 16 bytes (four bytes per entry).
To assign data items to the array entries, we do as before. First, the verbose method:
You see that we now use --> instead of ->, because we are assigning words instead of bytes. We discover that we have no problem this time to store number 1000, because it fits nicely in a word.
We can define the array in one line, by using the following syntax:
Either way, we set up the array:
We could display the contents of the array with a loop:
And we'd get 25 130 240 1000.
As was previously the case with byte arrays, here the programmer must remember that the array holds four entries, in order to specify which is the maximum possible value of i in the above loop. This is a nuisance, so Inform provides a special word array (just as string arrays are related to byte arrays) which is called a "table":
2.4 Tables (special word arrays)
When we declare the word array, we can use "table" (no quotes) in place of the --> symbol. This generates a word array with a little extra structure which (again) turns out to be quite useful.
In this case (analogous to the string array), Inform reserves an extra entry in the array (so that this array now needs 10 bytes = (4+1)*2 for the Z-machine or 20 bytes = (4+1)*4 for Glulx) which holds the number of entries minus itself -- so in the example, a_few_words-->0 would hold number 4).
***IT'S IMPORTANT TO REMEMBER THAT TABLES BEGIN TO NUMBER ENTRIES FROM 1*** because the 0th entry is reserved to hold the length of the array.
This too would create an array with five entries.
Now let's display the contents of the table. We need another loop:
The following table summarizes the different kinds of arrays, how to access their entries and the allowed range for their entries. If the programmer ignores this range and breaks the bounds of the array (by calling a->5 in a byte array with four entries, for instance) the compiler will remain silent, but at run-time we'll get a glorious crash.
4. Basic concepts with a twist
4.1 String arrays are NOT strings
This is yet another one of those things that leads programmers into confusion.
A string array is a sequence of single characters stored in memory one by one (each character takes one byte). You can access any single character through its entry number. For instance:
If we take a look at the value stored in abc->4, we'll get the fourth letter from the text, in this case 'i' (well, we'd get its ASCII code, which is 105). In order to know the length of the text, all we have to do is check abc->0.
No surprises so far. Everything behaves as it has been explained. C programmers nod happily in understanding.
However, storing text through the use of arrays has a couple of snags:
1) They consume quite a lot of memory. A 1000 character text would take 1000 bytes. Since text adventures are basically made of text, it would be desirable to store text more sparingly.That's why Inform encodes most of the texts in a way that compresses their size, with a rough 2/3 ratio, and makes them gibberish without the appropriate Z-machine interpreter.
This way of storing text is what Inform calls "strings", which is, perhaps unfortunately, the same word used for "string" arrays, even if the meaning is quite different. In this article, we shall call them "encrypted" strings as opposed to "conventional" strings (which will stand for string arrays).
Any text in double quotes, "this one, for instance", is encoded by Inform as an encrypted string, unless it's part of a string array declaration, in which case it will be encoded as a conventional string. Example:
In the first case, the string "I like fish" is encoded as a conventional string, letter by letter, with no compression or encryption. Each character may be read through its entries: abc->1, abc->2, etc.
In the other two examples, text is encoded as encrypted strings. They take up less memory and are unreadable by curious eyes. On the other hand, it is impossible to read each character separately or to know the length of the encrypted strings, because they are not arrays. Once the compiler has encrypted them, their values get stored in the variable pqr as 16-bit numbers (with no apparent meaning).
If we try to
we'd get that meaningless number. If we want to "decipher" the encrypted string and discover the text, we have to use a slightly different syntax:
Now we get "But I don't like fishing." on the screen. This syntax, however, will not work for abc, because abc is not an encrypted string. If, nevertheless, we try:
the result will be unpredictable, because print (innocently unaware of our mischievous intent) will try to decipher the contents of abc as it were an encrypted string. The way to get at abc has already been shown; it requires a loop:
Object names and descriptions are encrypted strings too, but it's the library that takes care of printing them when necessary -- using print(string), naturally.
4.2 Dictionary words are NOT strings
Inform builds up a dictionary with the words that the game will "understand" when the player types them. For instance, the words given in the <name> property of an object or the defined verbs. The dictionary will gather all the words surrounded by single quotation marks that we write in our game. For instance:
Why do we repeat piranha so much? The first piranha -- with no quotes -- is the inner name of the object. It will only be seen and used by the programmer when he/she wants to refer to the piranha object. The second "piranha" -- in double quotes -- is the word that the player will see when the game needs to mention the piranha object (say, in an inventory). It's an encrypted string. The third 'piranha' -- in single quotes -- is the word that the player will be able to type when he/she wishes to interact with the piranha. This will go in the game dictionary.
***CAUTION: Inform allows you to put double quotes around the third piranha and it will still be understood as a dictionary word. This only happens in the <name> property of an object and in the <verb> directive -- otherwise, anything you write for properties between double quotes will be understood as an encrypted string. DO NOT USE THIS UGLY AND CONFUSING PRACTICE. The clearer the syntax, the clearer the concepts.***
A dictionary word may be stored in any global variable, like this:
In this case, 'shark' is neither a conventional string nor an encrypted string. Inform transforms it into a number (which is the place the word holds in the dictionary). If we try to:
we'd just get a meaningless number. If we try:
the result will be chaos, because print will try to decipher the dictionary number like it were an encrypted string (which it isn't). We can't read the individual characters in 'shark' because it's not a byte array. The only way you could eventually output the word 'shark' would be through:
The (address) modifier tells print to display the dictionary word whose number we are providing.
4.3 Arrays as object properties
Most objects have a property which is, in fact, an array -- the "name" property. This is an array whose entries will be the dictionary words that the player may use when interacting with the object. Consult the end of section 3.5 in the Designer's Manual. Another example of an array used as a property is found_in, which lists all the locations you want a certain object to be in (useful for coding general scenery fixtures like bushes, carpeting, or even the sun -- if it's visible from various places).
When arrays are used as properties and not as global variables (which was the case of everything explained in 2), the syntax is different, because you don't need neither the directive Array, nor any of the -> or --> symbols, nor the modifiers "string" or "table". For instance:
We declare an object called strange_thing. This is not your usual text adventure object, because it lacks name and description, but from a programming point of view, it's a perfectly valid object (Inform will compile it without a whimper).
This object has just one property called a_few_words (it's perfectly legal to create our own properties, as explained in section 3.5 of the DM). It could store a number or a collection of them. The difference resides in what we write after the property name.
In the above example we write four numbers (separated just by spaces... no commas, semicolons or any other punctuation), so Inform will understand that a_few_words is in fact a collection of four items, with values 25 130 240 1000. These collections will always become word arrays (it is actually impossible to make a byte array out of a property), but this is hardly a limitation, since almost always we'll want the property to hold 16/32-bit items. The question now is: how do we read the entries of these arrays?
Let's try an apparently reasonable statement:
This will print just the first entry of the property (in the example, number 25). This would be all right if the property held just one value, but since we have more, we might need to know: how many are there? and: how do we access each one of them?
The answer is simple, but the syntax is a bit ugly. The value:
tells us how many bytes are there in the property (its "length"). We now reach the place where all the previous ramblings about Z-machine's two-byte word size and Glulx's four-byte word size become important.
Since the information we get from the value is the total number of bytes taken by the whole property, in order to discover how many entries we've got, we need to divide this global quantity by the number of bytes that store a single entry.
Remember that in the Z-machine, each entry from a word array was stored in two bytes, while in Glulx it took up to four bytes. So it seems that, depending on the VM you are planning to use, you should divide either by 2 or by 4 (this is true, and if you divide wrongly you'll get in trouble). Fortunately, there is a way around this, as we shall soon see.
Now, the value:
is a word array which collects the numbers 25 130 240 1000 (remember that the entry numbers begin with 0).
Suppose we want to print these numbers on the screen. On the days when we only had Z-machine, we needed to code a loop thus:
For Glulx, you should change
But what if you don't know what platform you'll be using? Well, you can use cautious code for the undecided. There is a bi-platform library and a bi-platform compiler for Inform so that you may compile your source code for either VM -- and that's what you'll be using if you want to compile for Glulx. This compiler predefines a needful constant named WORDSIZE: if you choose to compile for the Z-machine, its value is set to 2, but if you compile for Glulx it will be set to 4. You may then code:
and now the problem is almost out of the way.
Then again, you may not be sure yet if you'll want your game compiled with Inform's Z-machine compiler (the original stuff written by Graham Nelson, which is widely used and much beloved). The current version of Graham's compiler (6.21) does not define WORDSIZE, so you need to add the following lines at the top of your code, before any "Includes":
TARGET_ZCODE is another constant predefined in the bi-platform compiler (if you are beginning to feel fascinated by these concepts, feel free to consult Zarf's "The Game Author's Guide to Glulx Inform", http://www.eblong.com/zarf/glulx/inform-guide.txt). The only snag is that we get a compiler warning, since we define a constant -- TARGET_ZCODE -- that is not used. You can avoid it (logically) by forcing a harmless use of the constant. If you write:
you make TARGET_ZCODE equal to zero and add it to WORDSIZE. The constant is in use (although altering nothing) so the compiler is happy. And, much more importantly, our code will be ready to accept either set of Inform libraries or compilers.
Back to properties. If the property were to hold encrypted strings or dictionary words, we would still need the appropriate modifiers (string) or (address). For instance:
If we wish to list on the screen the stock of dictionary words that the player can use to refer to the stone, we would code:
If we only want to show the first of these words, we just need:
And to show the description:
This is because there's only one element in the description property. There could be more than one (theoretically), in which case we would proceed thus:
I can hear you thinking "Why on earth am I going to need to list the dictionary words of an object or its highly improbable multiple descriptions?", and you are right, you'll probably never need to do so. However, you may define sometime a property which holds various values and, if you want to access them, this is how it's done. Understanding how things work and the philosophy of the language will bear fruit some day.
That's it for now. Remember that a good practice is to borrow shamelessly other people's code and to try to modify it, to see if things work as we expected.
5. What do I do now?
"That's a lot of handsome theory, but I didn't come here for a lecture. I want to program my own games and I want to know the practical uses of arrays right now!"
Well, there's a couple of places you might want to check. Firstly, go back to the DM, make a search for "array" and start looking at the examples. There's quite a lot of them and you should be able to tell by now how the array bits fit in the puzzle. Also, Marnie Parker's Inform Primer (http://members.aol.com/doepage/doefaq.htm) covers in section 8 a few interesting working examples of array usage.