I have purchased an XGS for the purpose of developing a Forth for it. This page details that effort.
I've been thinking about the XGS Forth over the weekend. I read three tutorials: Rem PAC, plasma demo, and planes demo. I came up with a few ideas about timing. It seems to me that we want to integrate the Forth execution engine with the NTSC driving code, instead of just making it interrupt driven. I believe this because it just seems quite difficult to manage an interrupt system with the address bus logic.
At the heart of the integration idea is the notion of executing instructions during the HORIZ BLANKING period. During this sequence, before an instruction is picked up there would be a test against the RTCC: is there enough free time to execute another instruction? If yes, then pick up the next instruction and go.
Note that during the HBLANK period the PC address can be locked into the address register. But, it occured to me that we could cache instructions from the current PC location "page" (where a page is a 4 bit address space with 1 bit of HI/LO-ness, so 32 bytes or 16 words). When the PC changes pages we could just suck all 32 bytes into some of the 256 bytes of CPU RAM. Then, we could just execute out of cache without requiring the address register hold the current PC value. Instead, we just note that when the PC changes pages, we reload the cache. This load would be quite quick (32 bytes x 3 cycles per, or so, plus some overhead). And then, we don't have to worry about the PC in the address register until the PC changes pages.
This would certainly be efficient around @ and ! and MOVE and COMPARE words. Other than that, the other nice benefit is that when we end a HBLANK region and enter into the active HORIZ DRAW region we can just load the address of the video memory into the address registers and start reading video memory. Then when we enter HBLANK again, we just continue executing tokens from the Icache.
I -think- the instruction format that makes the most sense is 2 bits + 6 bits + 2 + 6. The first 2 bits allow CALL, JUMP, ?? [LIT?], and NORMAL. Then there's 6 bits of token. The second 2 bits indicates: NEXT, RETURN, RETURN if TOS is 0 (RTZ to pop or not to pop, I don't know), and ??.
Alternatively, 1 + 5 + 5 + 5 might work out, too. This would allow for a maximum of 3 x 16 == 48 instructions per page. Much better density with minimal effort to extract the tokens.
Others have done well with 5 bits for instruction tokens in Forth; I've never figured out how to make it work.
However, for JUMP and CALL it makes sense to allow only have 12 bit of address and that word definitions must start on page boundaries.
I think a design along these lines will give us the greatest execution time.
We use a separate TOS in the global registers for speed. We can access the global registers in 1 cycle using the "direct" addressing model. To access both the lo and hi of the TOS element would look like this:
mov W,TOS_lo mov W,TOS_hi
Accessing the current element at the top of the push down stack, looks like:
mov W,DSP mov FSR,W mov W,INDF ; Fetch the lo 8 bytes of the ; top element on the push down stack inc FSR mov W,INDF ; Fetch the hi 8 bytes of the ; top element on the push down stack
We use a little endian notation of numbers on the stacks. The reason for that is that because the SP points to some value (remember: pre-decr, post-incr) we want it to point to the LO value so that after an ADD of the LO then we'll be ready to tackle the HI portion.
Allocation of Registers:
Global
$A == DSP
$B == RSP
$C == PC (lo)
$D == PC (hi)
$E == TOS (lo)
$F == TOS (hi)
-------------
$A0-$FF == Data stack (48 elements)
$70-$9F == Return stack (24 levels of nesting)
$50-$6F == I Cache
$20-$4F == Video text line buffer
$00-$1F == Forth interpeter global variables:
WRD - A pointer to the current forth word being executed
ICINDX - ICache Index (0..31)
Example code:
; dpush_tos ; ; The current Top Of Stack value is stored in Global Registers ; and needs to be securely saved onto the Data Stack. An example ; of why this is necessary is the LIT command. That command will ; replace the TOS with a new value. To do that, it must push the ; current TOS and then it can load the TOS registers with a new ; value. dpush_tos: dec DSP ; pre-decr stack pointer mov W,DSP ; W == DSP mov FSR,W ; FSR == W mov W,TOS_hi ; W == TOS_hi, etc. mov INDF,W dec FSR mov W,TOS_lo mov INDF,W dec DSP rts ; add ; ; The current Top Of Stack value is added to the Next Of Stack, ; or the value currently in the stack array. ; mov W,DSP mov FSR,W mov W,TOS_lo add W,INDF snc inc TOS_hi mov TOS_lo,W inc FSR mov W,TOS_hi add W,INDF mov TOS_hi,W inc DSP inc DSP rts ; complement ; ; One's complement of the top of stack not TOS_lo not TOS_hi rts ; negate ; ; Two's complement of the top of stack ; Perform a NOT and then increment the value; ; skip the hi if the Status.Zbit is not set, ; that is if we didn't roll over the lo byte. not TOS_lo not TOS_hi inc TOS_lo snb STATUS.Zbit inc TOS_hi rts ; dup ; ; DUP is implemented by 'pushing' the current TOS. jmp dpush ; swap ; ; Swap the current TOS and the value actually at the DSP. ; This algorithm uses the byte below the current top element ; on the stack as temporary storage. mov W,DSP mov FSR,W mov W,INDF ; Pick up the lo on the stack dec FSR mov INDF,W ; Store in temp spot inc FSR inc FSR mov W,INDF ; Pick up the hi on the stack dec FSR mov INDF,W ; Store in lo on the stack inc FSR mov W,TOS_hi ; Pick up the TOS_hi mov INDF,W ; Store in hi on the stack dec FSR mov W,INDF ; Pick up the hi on the stack mov TOS_hi,W ; Store in the TOS_hi dec FSR mov W,TOS_lo ; Pick up the TOS_lo mov INDF,W ; Store in lo on the stack dec FSR mov W,INDF ; Pick up the hi on the stack mov TOS_lo,W ; Store in the TOS_lo rts ; SWAP complete A shorter version (by 3 instructions): mov W,DSP mov FSR,W mov W,INDF ; Pick up the lo on the stack dec FSR mov INDF,W ; Store in temp spot inc FSR mov W,TOS_lo mov INDF,W ; Store TOS_lo onto stack inc FSR mov W,INDF ; Pick up hi off stack mov TOS_lo,W ; Unused spot to save hi mov W,TOS_hi mov INDF,W ; Store in hi on the stack mov W,TOS_lo ; This is the hi from the stack mov TOS_hi,W ; Hi is completely swapped dec FSR dec FSR mov W,INDF ; Pick up the lo from the stack [temp] mov TOS_lo,W ; Lo is completely swapped rts ; SWAP complete ; init_stacks ; ; This routine will initialize the data and return stacks ; The data stack occupies $C0 - $FF. It's a pre-decrement/ ; post-increment stack. ; The return stack occupies $A0 - $BF; same on the pre-decr/ ; post-incr. ; NOTE: The TOS held in global registers is -not- accounted for ; by the current stack pointer. Think of it as the items on ; the stack. And, there's a TOS that's implied in stack ops. clr DSP mov W,#$C0 mov RSP,W rts ; CALL_push ; ; Push the current Program Counter onto the return stack. dec RSP mov W,RSP mov FSR,W mov W,PC_hi mov INDF,W dec FSR mov W,PC_lo mov INDF,W rts ; Fetch ; ; This routine will fetch a primitive instruction from the Icache. ; It will increment the Icache index. Upon entry, if the Icache ; index is 32 or greater (bit 5 set) then the cache will be reloaded. fetch: _bank $0 snb ICINDX.5 ; If bit 5 is clear, then skip jmp loadcache ; Reload the cache from the current PC's page mov