Lars Skovlund, Dark Minister and Christoph Reichenbach
Version 1.0, 6. July 1999
This document describes thee design of the Sierra PMachine (the virtual CPU used for executing SCI programs). It is a special CPU, in the sense that it is designed for object oriented programs.
There are three kinds of memory in SCI: Variables, objects, and stack space. The stack space is used in a Last-In-First-Out manner, and is primarily used for temporary space in a routine, as well as passing data from one routine to another. Note that the stack space is used bottom-up by the original in- terpreter, instead of the more usual top-down. I don't know if this has any significance for us.
Scripts are loaded into the PMachine by creating a memory imagee of it on the heap. For this reason, the script file format may seem a bit obscure at times. It is optimized for in-memory performance, not readability. It should be mentioned here that a lot of fixup stuff is done by the interpreter. In the script files, all addresses are specified as script-relative. These are converted to absolute offsets. The species and superClass fields of all objects are converted into pointers to the actual class etc.
There are four types of variables. These are called global, local, temporary, and parameter. All four types are simple arrays of 16-bit words. A pointer
is kept for each type, pointing to the list that is currently active. In fact, only the global variable list is constant in memory. The other pointers
are changed frequently, as scripts are loaded/unloaded, routines called, etc. The variables are always referenced as an index into the variable list.
I'll explain the four types below - the names in parantheses will be used occasionally in the rest of the text:
This variable type is called "local" because it belongs to a specific script. Each script may have its own set of local variables, defined
by script block type 10. As long as the code from a specific script is running, the local variables for that script are "active" (pointed to
by the mentioned pointer).
These, like the local variables, reside in script space (in fact, they are the local variables of script 0!). But the pointer to them remains constant
for the whole duration of the program.
These are allocated by specific subroutines in a script. They reside on the PMachine stack and are allocated by the link opcode. The temp variables
are automatically discarded when the subroutine returns.
These variables also reside on the stack. They contain information passed from one routine to another. Any routine in SCI is capable of taking a variable
number of parameters, if need be. This is possible because a list size is pushed as the first thing before calling a routine. In addition to this, a
frame size is passed to the call* functions.
While two adjacent variables may be entirely unrelated, the contents of an object is always related to one task. The object, like the variable tables,
provides storage space. This storage space is called properties. Depending on the instructions used, a property can be referred to by index into the
object structure, or by property IDs (PIDs). For instance, the name property has the PID 17h, but the offset 6. The property IDs are assigned by the
SCI compiler, and it is the "compatible" way of accessing object data. Whereas the offset method is used only internally by an object to access
its own data, the PID method is used externally by objects to read/write the data fields of other objects. The PID method is also used to call methods
in an object, either by the object itself, by another object, or by the SCI inter- preter. Yes, this really happens sometimes.
The PMachine can be said to have a number of registers, although none of them can be accessed explicitly by script code. They are used/changed implicitly by the script opcodes:
Acc - the accumulator. Used for result storage and input for a number of opcodes.The PMachine, apart from the actual instruction pointer, keeps a record of which object is currently executing.
The PMachine CPU potentially has 128 instructions (however, a couple of these are invalid and generate an error). Some of these instructions have a flag which specify whether the opcode has byte- or word-sized operands (I will refer to this as variably-sized parameters, as opposed to constant parameters). Other instructions have only one calling form. These instructions simply disregard the operand size flag. Ideally, however, all script instructions should be prepared to take variably-sized operands. Yet another group of instructions take both a constant parameter and a variably-sized parameter. The format of an opcode byte is as follows:
bit 7-1
opcode number
bit 0
operand size flag
Certain instructions (in particular, branching ones) take relative addresses as a parameter. The actual address is calculated based on the instruction after the branching instruction itself. In this example, the bnt instruction, if the branch is made, jumps over the ldi instruction.
eq? bnt +2 ldi byte 2 push
Relative addresses are signed values.
The callb and calle instructions take a so-called dispatch index as a parameter. This index is used to look up an actual script address, using the so-called
dispatch table. The dispatch table is located in script block type 7 in the script file. It is a series of words - the first one, as in so many other
places in the script file, is the number of entries.
In every call instruction, a value is included which determines the size of the parameter list, as an offset into the stack. This value discounts the list size pushed by the SCI code. For instance, consider this example from real SCI code:
pushi 3 ; three parameters passed pushi 4 ; the screen flag pTos x ; push the x property pTos y ; push the y property callk OnControl, 6
Notice that, although the callk line specifies 6 bytes of parameters, the kernel routine has access to the list size (which is at offset 8)!
These are internal errors in the interpreter. They are usually caused by buggy script code. The PErrors end up displaying an "Oops!" box in
the original interpreter (it is interesting to see how Sierra likes to believe that PErrors are caused by the user - judging by the message "You
did something we weren't expecting"!). In the original interpreter, specifying -d on the command line causes it to give more detailed information
about PErrors, as well as activating the internal debugger if one occurs.
The key to finding a specific class lies in the class table. This class table resides in VOCAB.996, and contains the numbers of scripts that carry classes.
If a script has more than one class defintion, the script number is repeated as necessary. Notice how each script number is followed by a zero word?
When the interpreter loads a script, it checks to see if the script has classes. If it does, a pointer to the object structure is put in this empty space.
The instructions are described below. I have used Dark Minister's text on the subject as a starting point, but many things have changed; stuff explained more thoroughly, errors corrected, etc. The first 23 instructions (up to, but not including, bt) take no parameters.
These functions are used in the pseudocode explanations:
pop(): sp -= 2; return *sp;op 0x00: bnot (1 byte)
acc ^= 0xffff;
op 0x02: add (1 byte)
acc += pop();
op 0x04: sub (1 byte)
acc = pop() - acc;
op 0x06: mul (1 byte)
acc *= pop();
op 0x08: div (1 byte)
acc = pop() / acc;Division by zero is caught => acc = 0.
op 0x0a: mod (1 byte)
acc = pop() % acc;Modulo by zero is caught => acc = 0.
op 0x0c: shr (1 byte)
acc = pop() >> acc;
op 0x0e: shl (1 byte)
acc = pop() << acc;
op 0x10: xor (1 byte)
acc ^= pop();
op 0x12: and (1 byte)
acc &= pop();
op 0x14: or (1 byte)
acc |= pop();
op 0x16: neg (1 byte)
acc = -acc;
op 0x18: not (1 byte)
acc = !acc;
op 0x1a: eq? (1 byte)
prev = acc; acc = (acc == pop());
op 0x1c: ne? (1 byte)
prev = acc; acc = !(acc == pop());
op 0x1e: gt? (1 byte)
prev = acc; acc = (pop() > acc);
op 0x20: ge? (1 byte)
prev = acc; acc = (pop() >= acc);
op 0x22: lt? (1 byte)
prev = acc; acc = (pop() < acc);
op 0x24: le? (1 byte)
prev = acc; acc = (pop() <= acc);
op 0x26: ugt? (1 byte)
acc = (pop() > acc);
op 0x28: uge? (1 byte)
acc = (pop() >= acc);
op 0x2a: ult? (1 byte)
acc = (pop() < acc);
op 0x2c: ule? (1 byte)
acc = (pop() <= acc);
op 0x2e: bt W relpos (3 bytes)
if (acc) pc += relpos;
op 0x30: bnt W relpos (3 bytes)
if (!acc) pc += relpos;
op 0x32: jmp W relpos (3 bytes)
pc += relpos;
op 0x34: ldi W data (3 bytes)
acc = data;Sign extension is done for 0x35 if required.
op 0x36: push (1 byte)
push(acc)
op 0x38: pushi W data (3 bytes)
push(data)Sign extension for 0x39 is performed where required.
op 0x3a: toss (1 byte)
pop();For confirmation: Yes, this simply tosses the TOS value away.
op 0x3c: dup (1 byte)
push(*TOS);
op 0x3e: link W size (3 bytes)
sp += (size * 2);
op 0x40: call W relpos, B framesize (4 bytes)
(See description below) sp -= (framesize + 2 + &rest_modifier); &rest_modifier = 0;This calls a script subroutine at the relative position relpos, setting up the ParmVar pointer first. ParmVar points to sp-framesize (but see also the &rest operation). The number of parameters is stored at word offset -1 relative to ParmVar.
op 0x42: callk W kfunct, B kparams (4 bytes)
sp -= (kparams + 2 + &rest_modifier); &rest_modifier = 0; (call kernel function kfunct)
op 0x44: callb W dispindex, B framesize (4 bytes)
(See description below) sp -= (framesize + 2 + &rest_modifier); &rest_modifier = 0;This operation starts a new execution loop at the beginning of script 0, public method dispindex (Each script comes with a dispatcher list (type 7) that identifies public methods). Parameters are handled as in the call operation.
op 0x46: calle W script, W dispindex, B framesize (5 bytes)
(See description below) sp -= (framesize + 2 + &rest_modifier); &rest_modifier = 0;This operation starts a new execution loop at the beginning of script scripts public method dispindex. The dispatcher list (the script's type 7 object) is used to dereference the requested method. Parameters are handled as described for the call operation.
op 0x48: ret (1 byte)
op 0x4a: send B framesize (2 bytes)
Send looks up the supplied selector(s) in the object pointed to by the accumulator. If the selector is a variable selector, it is read (to the accumulator) if it was sent for with zero parameters. If a parameter was supplied, this selector is set to that parameter. Method selectors are called with the specified parameters.
The selector(s) and parameters are retreived from the stack frame. Send first looks up the selector ID at the bottom of the frame, then retreives the number of parameters, and, eventually, the parameters themselves. This algorithm is iterated until all of the stack frame has been "used up". Example:
; This is an example for usage of the SCI send operation pushi x ; push the selector ID of x push1 ; 1 parameter: x is supposed to be set pushi 42 ; That's the value x will get set to pushi moveTo ; In this example, moveTo is a method selector. push2 ; It will get called with two parameters- push ; The accumulator... lofss 17 ; ...and PC-relative address 17. pushi foo ; Let's assume that foo is another variable selector. push0 ; This will read foo and return the value in acc. send 12 ; This operation does three quite different things.
op 0x4c
op 0x50: class W function (3 bytes)
op 0x52
op 0x54: self B stackframe (2 bytes)
op 0x56: super W class, B stackframe (4 bytes)
op 0x58: &rest W paramindex (3 bytes)
function a(y,z) and function b(x,y,z)
function b wants to call function a with its own y and z parameters. Easy job, using the the normal lsp instruction. Now suppose that both function a and b are designed to take a variable number of parameters:
function a(y,z,...) and function b(x,y,z,...)
Since lsp does not support register indirection, we can't just push the variables in a loop (as we would in C). Instead this function is used. In this case, the instruction would be &rest 2, since we want the copying to start from y (inclusive), the second parameter.
Note that the values are copied to the stack immediately. The &rest_modifier is set to the number of variables pushed afterwards.
op 0x5a: lea W type, W index ( bytes)
The variable type is a bit-field used as follows:
unused
the number of the variable list to use
0 - globalVar
unused
set if the accumulator is to be used as additional index
< short *vars[4]; int acc; int lea(int vt, int vi) { return &((vars[(vt >> 1) & 3])[vt & 0x10 ? vi+acc : vi]); }
op 0x5c: selfID (1 bytes)
acc = object
op 0x5e
op 0x60: pprev (1 bytes)
push(prev)
op 0x62: pToa W offset (3 bytes)
op 0x64: aTop W offset (3 bytes)
op 0x66: pTos W offset (3 bytes)
op 0x68: sTop W offset (3 bytes)
op 0x6a: ipToa W offset (3 bytes)
op 0x6c: dpToa W offset (3 bytes)
op 0x6e: ipTos W offset (3 bytes)
op 0x70: dpTos W offset (3 bytes)
op 0x72: lofsa W offset (3 bytes)
acc = pc + offsetAdds a value to the post-operation pc and stores the result in the accumulator.
op 0x74: lofss W offset (3 bytes)
push(pc + offset)Adds a value to the post-operation pc and pushes the result on the stack.
op 0x76: push0 (1 bytes)
push(0)
op 0x78: push1 (1 bytes)
push(1)
op 0x7a: push2 (1 bytes)
push(2)
op 0x7c: pushSelf (1 bytes)
push(object)
op 0x7e
op 0x80 - 0xfe: [ls+-][as][gltp]i? W index (3 bytes)
Used as with all other opcodes with variably-sized parameters:
0: 16 bit parameter
The type of variable to operate on:
0: Global
Whether to use the accumulator or the stack for operations:
0: Accumulator
Whether to use the accumulator as a modifier to the supplied index:
0: Don't use accumulator as an additional index
The type of execution to perform:
0: Load the variable to the accumulator or stack
Always 1 (identifier for these opcodes)
[1] FreeSCI calls this the "Program Counter" or PC, which is the more general term.
Top
You can help keep The Sierra Help Pages and its affiliates alive by helping to defray some of the costs of hosting this site. If it has been of help to you, please consider contributing to help keep it online.Thank you.
The Sierra Help Pages | Sierra Game Help | Walkthroughs | Hints, Tips & Spoilers | Utilities | Links | SHP Forums | Search
© 2013 to present The Sierra Help Pages. All rights reserved. All Sierra games, artwork and music © Sierra.