Friday, May 26, 2017

Retro Machine: TIS-100

If you have not yet gotten your hands on a TIS-100, I highly recommend it.  It is a bit of a vintage item, but in many ways I believe it was way ahead of its time (much like the Amiga, which came out a decade later).  The name TIS-100 also harkens back to my first computer, the TS-1000 (known as ZX81 in the UK) with its 2K RAM. In any case, TIS-100 is a fun little machine to use to solve coding problems.  I haven't seen many on eBay, but there is a decent emulator available from Zachtronics.

And here's a handy quick reference guide to get you started:

TIS-100 (Tessellated Intelligence System) Quick Reference Guide

Basic Execution Node Instruction Set
NOP              NO OPERATION
MOV <SRC> <DST>  MOVE <SRC> TO <DST>
SWP SWAP ACC AND BAK
SAV SAVE ACC TO BAK
ADD <SRC> ADD <SRC> TO ACC
SUB <SRC> SUBTRACT <SRC> FROM ACC
NEG NEGATE ACC (NEG 0 = 0)
JMP <LABEL> JUMP TO <LABEL>
JEZ <LABEL> JUMP TO <LABEL> IF ACC = 0
JNZ <LABEL> JUMP TO <LABEL> IF ACC != 0
JGZ <LABEL> JUMP TO <LABEL> IF ACC > 0
JLZ <LABEL> JUMP TO <LABEL> IF ACC < 0
JRO <SRC> JUMP TO RELATIVE OFFSET <SRC>
                 (A value of 0 causes an infinite loop. Both
                 positive and negative jumps are bounded by the
                 node's first and last instructions.)
HCF HALT & CATCH FIRE
                 (Undocumented; resets the TIS-100.)


<SRC>            Refers to ACC, UP, DOWN, LEFT, RIGHT, ANY, LAST,
                 NIL, literal (-999...999)
<DST> Refers to ACC, UP, DOWN, LEFT, RIGHT, ANY, LAST, 
                 NIL
<LABEL> Refers to location marked by alphanumeric 
                 label '<LABEL>:'


# Indicates a comment in source.
## Indicates program title to the debugger.
! Triggers a breakpoint when using the debugger.


Registers
All registers and ports can store an integer in the range of -999 to 999.

ACC      The primary storage and computational register in
                 each node.

BAK              BAK cannot be directly addressed (see SWP, SAV).

UP, DOWN,  Each output port holds its value until read,
 LEFT, RIGHT  thereby behaving like an additional register. 

ANY              Reads to or writes from the next available port 
                 (UP, RIGHT, DOWN, or LEFT) that is receiving or 
                 sending a value.  
LAST             Maps to the last port selected by ANY. LAST maps to
                 NIL if ANY has not been used.

NIL              As a source, NIL maps to 0. As a destination, it 
                 has no effect.

Nodes
Up to 12 nodes are available per program. 
There are two types of nodes available on the TIS-100, Basic Execution and Stack Memory. Note that a defective or damaged node is automatically disabled.

Basic Execution Node (T21)
- Each node can run a subprogram with up to 15 instructions.  Labels
  are not considered instructions.
- When the last instruction in a node is run, execution continues 
  with the first. Note that this wraparound does not cost any 
  cycles.
Stack Memory Node (T30)
- Can hold up to 15 values.
- Supports push/pop from any connected port.
- Pop from an empty stack node will stall.
- Push to a full stack node will stall.
- Note that push/pop stalls can be recovered if another port is used
  to remove/add a value from the stack.

Cycles
- Writes take 2 cycles. Other operations require only one cycle 
  assuming data is available.
- The first "step" in a run does not count as a cycle.
- A cycle is needed to write out the last output port.


List of Sample Programs Included*

TIS-100 SEGMENT MAP

00150 SELF-TEST DIAGNOSTIC
10981 SIGNAL AMPLIFIER
20176 DIFFERENTIAL CONVERTER
21340 SIGNAL COMPARATOR
22280 SIGNAL MULTIPLEXER
30647 SEQUENCE GENERATOR
31904 SEQUENCE COUNTER
32050 SIGNAL EDGE DETECTOR
33762 INTERRUPT HANDLER
40196 SIGNAL PATTERN DETECTOR
41427 SEQUENCE PEAK DETECTOR
42656 SEQUENCE REVERSER
43786 SIGNAL MULTIPLIER
50370 IMAGE TEST PATTERN 1
51781 IMAGE TEST PATTERN 2
52544 EXPOSURE MASK VIEWER
53897 HISTOGRAM VIEWER
60099 SIGNAL WINDOW FILTER
61212 SIGNAL DIVIDER
62711 SEQUENCE INDEXER
63534 SEQUENCE SORTER
70601 STORED IMAGE DECODER

TIS-NET DIRECTORY

NEXUS 00.526.6   SEQUENCE MERGER
NEXUS 01.874.8   INTERNER SERIES CALCULATOR
NEXUS 02.981.2   SEQUENCE RANGE LIMITER
NEXUS 03.176.9   SIGNAL ERROR CORRECTOR
NEXUS 04.340.5   SUBSEQUENCE EXTRACTOR
NEXUS 05.647.1   SIGNAL PRESCALER
NEXUS 06.786.0   SIGNAL AVERAGER
NEXUS 07.050.0   SUBMAXIMUM SELECTOR
NEXUS 08.633.9   DECIMAL DECOMPOSER
NEXUS 09.904.9   SEQUENCE MODE CALCULATOR
NEXUS 10.656.5   SEQUENCE NORMALIZER
NEXUS 11.711.2   IMAGE TEST PATTERN 3
NEXUS 12.534.4   IMAGE TEST PATTERN 4
NEXUS 13.370.9   SPATIAL PATH VIEWER
NEXUS 14.781.3   CHARACTER TERMINAL
NEXUS 15.897.9   BACK-REFERENCE REIFIER
NEXUS 16.212.8   DYNAMIC PATTERN DETECTOR
NEXUS 17.135.0   SEQUENCE GAP INTERPOLATOR
NEXUS 18.427.7   DECIMAL TO OCTAL CONVERTER
NEXUS 19.762.9   PROLONGED SEQUENCE SORTER
NEXUS 20.433.1   PRIME FACTOR CALCULATOR
NEXUS 21.601.6   SIGNAL EXPONENTIATOR
NEXUS 22.280.8   T20 NODE EMULATOR
NEXUS 23.727.9   T32 NODE EMULATOR
NEXUS 24.511.7   WAVE COLLAPSE SUPERVISOR

*My machine appears to have had memory damage, as most of the sample programs listed in the directories were either corrupted or erased. I've reprogrammed some of them, with varying degrees of efficiency. I have included some for reference:
Back-Reference Reifer (661/9/42)
Histogram Viewer (2173/9/59)
Exposure Mask Viewer (808/7/50)
Sequence Gap Interpolator (790/5/58)
Sequence Merger (475/6/34)
Signal Multiplier (638/4/31)
Prime Factor Calculator (10884/4/49)
Signal Exponentiator (4259/6/34)

Adventures in Useless Reverse Engineering


LucasArts Classic Adventures


I recently found my old copy of LucasArts Classic Adventures and used Boxer/DOSBox on the Mac to install and play some classic LucasArts adventures. For whatever reason, I was curious about how the old LucasArts installer worked. The below investigation was done on a Mac, and primarily uses Boxer (with DOSBox), MAC OS X terminal (for the unix-type commands cat, tail, cmp, and strings), and TextWrangler.

PART I

As already stated, for some bizarre reason, I was curious about how the old LucasArts installer worked. It seemed to use some .xxx file format which I’d never seen before.  Although all of these games could originally be run from disks, this installer only installed to a hard drive. If all of the archive (*.xxx) files were dumped into a directory on the hard drive, the installer would run just fine without asking the user to insert disks.  The installer will always install to \MANIAC, \ZAK, etc (i.e., based off the root directory of destination drive) regardless of where it was run.

The disks contain the installer and archive files for each game.  Here's the file listing for each disk:

LucasArts Classic Adventures (IBM) - Seven 3.5” Disks

CLASSIC 1:
INSTALL  EXE             24,762 21-09-1992 17:57
MANIAC_A XXX            401,060 26-08-1992 11:36
ZAK____A XXX            633,717 26-08-1992 11:33
LOOM___A XXX            394,088 08-09-1992 12:35
    4 file(s)         1,453,627 bytes total

CLASSIC 2:
INDY___B XXX            571,088 28-08-1992 12:00
LOOM___B XXX            874,101 08-09-1992 12:35
    2 file(s)         1,445,189 bytes total

CLASSIC 3:
INDY___C XXX          1,454,088 28-08-1992 12:00
    1 file(s)         1,454,088 bytes total

CLASSIC 4:
INDY___D XXX          1,454,088 28-08-1992 12:00
    1 file(s)         1,454,088 bytes total

CLASSIC 5:
INDY___E XXX             73,054 28-08-1992 12:00
MONKEY_E XXX          1,384,088 08-09-1992 11:41
    2 file(s)         1,457,142 bytes total

CLASSIC 6:
MONKEY_F XXX          1,454,088 08-09-1992 11:41
    1 file(s)         1,454,088 bytes total

CLASSIC 7:
MONKEY_G XXX            697,641 08-09-1992 11:41
    1 file(s)           697,641 bytes total

So the first question I had was, what are .xxx files?  As I mentioned, I’ve never heard it as a file extension.  Searching for info on it online didn’t return much.  A look at a hex dump of the .xxx files (using TextWrangler) gives a few hints:

MANIAC_A.XXX
00000000: 4C 46 47 21 9C 1E 06 00 6D 61 6E 69 61 63 5F 41  LFG!....maniac_A
00000010: 2E 78 78 78 00 00 01 00 C9 FA 09 00 46 49 4C 45  .xxx........FILE
00000020: 3C 04 00 00 30 30 2E 4C 46 4C 00 2E 18 2E 79 25  <...00.LFL....y%
00000030: 55 19 C4 07 00 00 02 00 01 00 00 00 00 06 FE F9  U...............

'LFG!’: LucasFilm Games, perhaps?  The name of the archive (maniac_a.xxx) is clearly there, as well as the word 'FILE', which must be file marker.  After a few unknown bytes it is followed by the name of one of the files in the archive, '00.LFL'.

Here’s another hex dump from another file:

MONKEY_E.XXX
00000000: 4C 46 47 21 90 1E 15 00 6D 6F 6E 6B 65 79 5F 43  LFG!....monkey_C
00000010: 2E 78 78 78 00 00 03 00 9D 27 45 00 46 49 4C 45  .xxx.....'E.FILE
00000020: F1 0B 00 00 30 30 30 2E 4C 46 4C 00 EE 35 79 25  ....000.LFL..5y%
00000030: 2B 21 A5 20 00 00 02 00 01 00 00 00 00 06 8A 0C  +!. ............

Well, that's interesting. It looks like the archive name is just for reference, since it doesn't match the actual name of the archive file. But again we do have the 'FILE' marker and a filename for one of the files in the archive ('000.LFL').

Searching for info on a 'LFG' or 'LFG!' archive type (online) again yields no results. So, assuming that the '!' is a part of this assumed 'LFG!' archive marker, we can see if the next four bytes before the archive name might be something 
obvious.  It looks like it could be a length in little endian order (ie, least significant byte first).

MANIAC_A.XXX: 
00000004: 9C 1E 06 00 = 0x00061E9C =   401,052

MONKEY_E.XXX: 
00000004: 90 1E 15 00 = 0x00151E90 = 1,384,080

Wait a minute! That is the length of the archive file, minus these 8 bytes used for the marker and the length! Also of note: the length we've decoded here is only the length of the particular archive file. (We know this since it matches the MONKEY_E.XXX file length, where the MONKEY archive spans multiple files.)

Okay, so we have a archive type marker, a length, and archive name. It is not clear whether the archive name is a fixed number of bytes or simply null terminated, since all the archive names use the full 8+3 characters and there is a null (0) as the next byte in each case.  

The next non-zero value is a 1 for Maniac Mansion, and 3 for Monkey Island. Hmmm...the Maniac Mansion files are in a single archive, while Monkey Island uses 3 disks (therefore 3 files).  Could this byte indicate the number of disks?  Looking further, it is consistent with all the five games. A zero follows this each time. It is not clear if the disk count is 2 bytes or if the zero has another meaning (or is just a delimiter), so we'll ignore that for now.

Next are another four bytes that look suspiciously like a length.  Let's check it out:

MANIAC_A.XXX: 00000018: C9 FA 09 00 = 0x0009FAC9 =   654,025
MONKEY_E.XXX: 00000018: 9D 27 45 00 = 0x0045279D = 4,532,125

Now let's take a look at the final install directories that were created when I ran the installer:

Contents of folder C:\MANIAC\
...
   57 file(s)           654,025 bytes total

Contents of folder C:\MONKEY\
...
   10 file(s)         4,532,125 bytes total
   
Look at that!  That number is clearly the total number of uncompressed bytes from the entire archive (spanning all archive files/disks).  This must be where the installer finds out how much space is required for the install.

Next is the 'FILE' delimiter, which we already saw.  I'm assuming this marks the beginning of info for each file in the archive.  And searching for those bytes, we do find that 'FILE' occurs 57 times in the MANIAC archive:

$ strings MANIAC_?.XXX | grep -c FILE
57

This works for the others as well (although multi-disk files need to be summed):

$ strings ZAK____?.XXX | grep -c FILE
62
$ strings LOOM___?.XXX | grep -c FILE
75
$ strings INDY___?.XXX | grep -c FILE
91
$ strings MONKEY_?.XXX | grep -c FILE
10

Okay, what next?  Following FILE is yet another number that looks like a 4-byte length:

MANIAC_A.XXX: 
0000001C: 3C 04 00 00 = 0x0000043C = 1,084

This should be for file 00.LFL.  The decompressed length of this file is:

00       LFL              1,988 

Hmm...this doesn't look right.  The other option is that this is the compressed length; that is, the length of the compressed block of data use to create the final decompressed file. We can find out if this is the case by finding out where the next file starts:

MANIAC_A.XXX
00000020: 3C 04 00 00 30 30 2E 4C 46 4C 00 2E 18 2E 79 25  <...00.LFL....y%

00000460: 46 49 4C 45 BC 33 00 00 30 31 2E 4C 46 4C 00 2E  FILE.3..01.LFL..

The next file marker starts at 0x460.  The length we are checking ends at 0x24.  Subtract and we have 0x43C = 1,084.  So this is the length of the data that makes up the file.

Checking another one:

MONKEY_E.XXX
00000020: F1 0B 00 00 30 30 30 2E 4C 46 4C 00 EE 35 79 25  ....000.LFL..5y%

00000C10: 5B 1F 04 FC 03 46 49 4C 45 BE 03 00 00 39 30 31  [....FILE....901

At 0x1C: F1 0B 00 00 = 0x00000BF1 = 3,057
0xC15 - 0x24 = 0xBF1 = 3,057
Confirmed! 

After this length is the file name, which appears to be null terminated.

It seems like there should be an uncompressed length. We saw earlier that for the 00.LFL file in MANIAC it should be 1,988, which is 0x7C4. And look:

00000030: 55 19 C4 07 00 00 02 00 01 00 00 00 00 06 FE F9  U...............

There are 7 unknown bytes in between. Hmmm. Well, let's look at another file.

For MONKEY, the first file's final length is:

000      LFL              8,357 

This is 0x20A5.  And here it is:

00000030: 2B 21 A5 20 00 00 02 00 01 00 00 00 00 06 8A 0C  +!. ............

Oddly, there are now only 6 bytes in between the end of the filename and this length. Could the filename field be a fixed length? From looking at multiple instances, it looks like this is the case.  There are at least 12 bytes allocated
for the filename (8+’.’+3), which is null terminated if less than 12 bytes.  It isn't clear whether it would have a null if all 12 bytes were used (as there is no instance of this) but it would make the implementation cleaner. So there are still one or two bytes unaccounted for; it is unknown whether these are padding or some real used values.

After the uncompressed length, there is this pattern for every file in every archive:
02 00 01 00 00 00 00 06

This must have something to do with the compression.  To get more info, we are going to have to look at the installer (install.exe) itself.

PART II

The first two bytes of install.exe indicate that it is indeed an MZ executable file.

install.exe
0000: 4D 5A 2B 00 2E 00 00 00 02 00 20 06 FF FF 7E 08  MZ+....... ...~.
0010: 80 00 00 00 0E 00 8B 05 1C 00 00 00 4C 5A 39 31  ............LZ91

Running 'strings' on install.exe is interesting as well. Here’s a portion:  

1) Maniac Mansion                                        
2) Zak McKracken                                         
3) Loom                                                  
4) Indiana Jones and the Last Crusade                    
5) The Secret of Monkey Island                           
maniac   maniac_a.xxx   1
//directory, filename, beginning disk#
zak      zak____a.xxx   1
loom     loom___a.xxx   1
indy     indy___a.xxx   2
monkey   monkey_a.xxx   5

It was nice of LucasArts to leave the comment in.  Perhaps the number of games (5) is hardcoded, but it looks like 
this info controls the install directory name, the install archive name, and the first installation disk number.

If you look back at the names of the archives they do not all start with A.  Indy starts with B. Monkey start with E.
So it appears that the ending letter is incremented based on the disk number (a=disk 1, b=disk 2, etc).

The output of ‘strings’ also reveals more strings earlier in the file, but they appear somewhat corrupted.  Could they be packed somehow?

Looking back at the first 32 bytes of the file, we see 'LZ91'.  LZ could refer to Lempel-Ziv; time to hit the web again.  This time, information is available:  Searching "LZ91 exe" on Google leads to the "LZEXE home page" (http://bellard.org/lzexe.html). Nice!

LZEXE.exe is a program from ~1989 that allows executables to be compressed. When launched, it will decompress itself into memory.  The program places either a ‘LZ09’ (for versions .90) or ‘LZ91’ (for version .91) tag at offset 1C of the compressed executable. (See http://www.shikadi.net/moddingwiki/LZEXE.) LucasArts must have used version .91!

There is also a utility to reverse the process; that is, expand the compressed exe file back to its uncompressed form.  This utility is UNLZEXE.EXE.  Running it on install.exe (in DOSBox) creates a new install.exe that is larger, and a install.olz which is the compressed version.    

C:\>UNLZEXE.EXE INSTALL.EXE

Running the expanded version of install.exe seems to work fine at first. But wait, but it looks different.  Attempting to install any of the games results in a rude message about poking yourself in the eye. Plus, it fails. What’s up with that, LucasArts?

Running strings on this new uncompressed version (“strings install.exe”) reveals not only that those corrupted looking strings seen earlier are corrected (compete with the poke yourself references) but also that the original clean strings seen (with the directory and archive info) are gone!  Huh?

It look like LucasArts sneakily tacked on some file data (namely, those original strings) on the end of the compressed exe file.  When decompressed by the utility, it does not know that there is data after the program data and so this is lost.  This can be verified by using the INFOEXE.EXE program distributed with the LZEXE.EXE program:

LucasArts’ compressed install.exe:
                                           Decimal            Hex

Length on disk (bytes)                       24762       000060BA
Length of code in the EXE (bytes)            23083       00005A2B

There appears to be an extra 1,679 bytes on the end of the file. The UNLZEXE.EXE utility ignores this extra data.  And if the install program does not detect this data, it apparently will tell you to poke yourself in the eye. Weird.

Just for further understanding, a few more things were tried. LDEXE.EXE (Version 0.91) was downloaded and run on the decompressed install.exe. (Note that the original LDEXE.EXE is in French.) Now compare the original file and the recompressed file:

C:\>LZEXE.EXE INSTALL.EXE

In terminal, the original installer and the recompressed decompressed version were compared:
$ cmp -b -l install.olz install.exe
cmp: EOF on install.exe

They match, but install.olz is a longer file since it has the extra data.  Note that running the recompressed version still doesn’t work. Let’s see if we can fix it.  First copy the extra data from the end of the original installer to a text file:

$ tail -c 1679 install.olz > install.txt

Running it again, and wallah! It works! No more poking in the eye! But why?  Running strings earlier revealed an “install.txt” string.  It didn’t seem like too far a leap to assume that during development install.txt could hold that data that would eventually be tacked on the end of the compressed file.  And it does!

To get the file to normal though, we can concatenate it:

$ cat install.txt >> install.exe
$ cmp -b -l install.olz install.exe

They match!  So install.exe was successfully decompressed, recompressed, and combined with the extra data to match the original file exactly.  When install.txt is deleted, it still works.

The text data at the end can be modified, and it will take effect whether in a separate text file or tacked back on to the executable.  For instance, the directories, file names, and starting disk can all be changed easily (not that you’d want to).

Okay, let’s look into the decompressed install.exe some more.  Here’s a section of output from ‘strings’ that stand out:

PKWARE Data Compression Library(tm)
Copyright 1990-92 PKWARE Inc.  All Rights Reserved.
Patent No. 5,051,745
Version 1.03

Ah ha!  Finally a clue as to how the actual data in the xxx archive is compressed!  A little searching on the net brings up this under the zip standard (with the note that it is seldom implemented):

10 - PKWARE Data Compression Library Imploding 

Even better, there is a description of the format here: http://pecunia.nerdcamp.net/projects/citybuilding/pkware

The header bytes (even though there are only 2) match for each file; and the specification looks straightforward to implement.  I think we can safely say this is the compression method used.  I'd say we now have a working understanding of the format.


Afterward

After implementing the spec in C, I was able to verify that the files are all using the PKWARE Implode method that was described in the link above.  The new implementation extracts all files and are a exact match with the original files.

Once I had written the decompressor (which I called "LFGExtract"), it seemed only natural to write a compression utility as well.  I had two goals:  First, create archives that meet or beat the compression ratios achieved in the LucasArts Classic Adventures archives (.xxx files), and second, have the original install.exe program be able to decompress the archives with no problem.

The initial implementation for the implode compressor was fairly straightforward and just followed the spec used to implement the decompressor. Basically, for each new encode attempt, start with a length of 2 and look through the dictionary to see if there is a match that can be used. If so, continue to look for increasing length until a match can no longer be found. At this point encode the offset and length into the compressed bitstream.  If no match can be found at all, just encode one literal byte. Move to the next byte that hasn't been encoded and repeat the process.  

All valid lengths may be used (2 through 518), as well as the minimum and maximum offsets (0 and 4095, mapping to the first and last elements in the dictionary).  Oddly, it appears that the archives on the Classic Adventures disks were compressed with limits on these values (for example, offset 0 is not used).  I'm not sure if there is a benefit to this.

One optimization was later made to the algorithm: before committing to using an offset/length entry, check if encoding the first byte as a literal, and then using a dictionary entry from the byte after that would produce better results (where better results are defined as a shorter bit length used for encoding of both the literal and the new offset/length pair).  This helped reduce the compressed size further, and results from this implementation beat whatever was used to produce the original achieved files for the classic games.  (First goal met!)   

Finally, archives created by this new implode implementation (I called it LFGMake) can be decompressed and extracted by the original install.exe utility. Success!

Next time, perhaps, we’ll look at some of the actual SCUMM game files. (I suppose that would be more exciting.)
 

Fun with LucasArts Classic Games: Make a Super Installer!

Make a super installer for all your classic LucasArts adventure games! Why? I have no idea!   LucasArts Classic Adventures  has five games. ...