XGS: XGameStation
Site Navigation for XGS

XGameStation Programming

I have purchased an XGS for the purpose of developing a Forth for it. This page details that effort.

I've been thinking about the XGS Forth over the weekend. I read three tutorials: Rem PAC, plasma demo, and planes demo. I came up with a few ideas about timing. It seems to me that we want to integrate the Forth execution engine with the NTSC driving code, instead of just making it interrupt driven. I believe this because it just seems quite difficult to manage an interrupt system with the address bus logic.

At the heart of the integration idea is the notion of executing instructions during the HORIZ BLANKING period. During this sequence, before an instruction is picked up there would be a test against the RTCC: is there enough free time to execute another instruction? If yes, then pick up the next instruction and go.

Note that during the HBLANK period the PC address can be locked into the address register. But, it occured to me that we could cache instructions from the current PC location "page" (where a page is a 4 bit address space with 1 bit of HI/LO-ness, so 32 bytes or 16 words). When the PC changes pages we could just suck all 32 bytes into some of the 256 bytes of CPU RAM. Then, we could just execute out of cache without requiring the address register hold the current PC value. Instead, we just note that when the PC changes pages, we reload the cache. This load would be quite quick (32 bytes x 3 cycles per, or so, plus some overhead). And then, we don't have to worry about the PC in the address register until the PC changes pages.

This would certainly be efficient around @ and ! and MOVE and COMPARE words. Other than that, the other nice benefit is that when we end a HBLANK region and enter into the active HORIZ DRAW region we can just load the address of the video memory into the address registers and start reading video memory. Then when we enter HBLANK again, we just continue executing tokens from the Icache.

I -think- the instruction format that makes the most sense is 2 bits + 6 bits + 2 + 6. The first 2 bits allow CALL, JUMP, ?? [LIT?], and NORMAL. Then there's 6 bits of token. The second 2 bits indicates: NEXT, RETURN, RETURN if TOS is 0 (RTZ to pop or not to pop, I don't know), and ??.

Alternatively, 1 + 5 + 5 + 5 might work out, too. This would allow for a maximum of 3 x 16 == 48 instructions per page. Much better density with minimal effort to extract the tokens.

Others have done well with 5 bits for instruction tokens in Forth; I've never figured out how to make it work.

However, for JUMP and CALL it makes sense to allow only have 12 bit of address and that word definitions must start on page boundaries.

I think a design along these lines will give us the greatest execution time.

We use a separate TOS in the global registers for speed. We can access the global registers in 1 cycle using the "direct" addressing model. To access both the lo and hi of the TOS element would look like this:

	mov	W,TOS_lo
	mov	W,TOS_hi

Accessing the current element at the top of the push down stack, looks like:

	mov	W,DSP
	mov	FSR,W
	mov	W,INDF		; Fetch the lo 8 bytes of the
				; top element on the push down stack
	inc	FSR
	mov	W,INDF		; Fetch the hi 8 bytes of the
				; top element on the push down stack

We use a little endian notation of numbers on the stacks. The reason for that is that because the SP points to some value (remember: pre-decr, post-incr) we want it to point to the LO value so that after an ADD of the LO then we'll be ready to tackle the HI portion.

Allocation of Registers:

Global
$A == DSP
$B == RSP
$C == PC (lo)
$D == PC (hi)
$E == TOS (lo)
$F == TOS (hi)
-------------
$A0-$FF == Data stack (48 elements)
$70-$9F == Return stack (24 levels of nesting)
$50-$6F == I Cache
$20-$4F == Video text line buffer
$00-$1F == Forth interpeter global variables:
           WRD - A pointer to the current forth word being executed
           ICINDX - ICache Index (0..31)

Example code:


; dpush_tos
;
; The current Top Of Stack value is stored in Global Registers
; and needs to be securely saved onto the Data Stack.  An example
; of why this is necessary is the LIT command.  That command will
; replace the TOS with a new value.  To do that, it must push the
; current TOS and then it can load the TOS registers with a new
; value.
dpush_tos:
	dec	DSP		; pre-decr stack pointer
	mov	W,DSP		; W == DSP
	mov	FSR,W		; FSR == W
	mov	W,TOS_hi	; W == TOS_hi, etc.
	mov	INDF,W
	dec	FSR
	mov	W,TOS_lo
	mov	INDF,W
	dec	DSP
	rts

; add
;
; The current Top Of Stack value is added to the Next Of Stack,
; or the value currently in the stack array.
; 	
	mov	W,DSP
	mov	FSR,W
	mov	W,TOS_lo
	add	W,INDF
	snc
	inc	TOS_hi
	mov	TOS_lo,W
	inc	FSR
	mov	W,TOS_hi
	add	W,INDF
	mov	TOS_hi,W
	inc	DSP
	inc	DSP
	rts

; complement
;
; One's complement of the top of stack
	not	TOS_lo
	not	TOS_hi
	rts

; negate
;
; Two's complement of the top of stack
; Perform a NOT and then increment the value;
; skip the hi if the Status.Zbit is not set,
; that is if we didn't roll over the lo byte.
	not	TOS_lo
	not	TOS_hi
	inc	TOS_lo
	snb	STATUS.Zbit
	inc	TOS_hi
	rts

; dup
;
; DUP is implemented by 'pushing' the current TOS.
	jmp	dpush

; swap
;
; Swap the current TOS and the value actually at the DSP.
; This algorithm uses the byte below the current top element
; on the stack as temporary storage.
	mov	W,DSP
	mov	FSR,W
	mov	W,INDF		; Pick up the lo on the stack
	dec	FSR
	mov	INDF,W		; Store in temp spot
	inc	FSR
	inc	FSR
	mov	W,INDF		; Pick up the hi on the stack
	dec	FSR
	mov	INDF,W		; Store in lo on the stack
	inc	FSR
	mov	W,TOS_hi	; Pick up the TOS_hi
	mov	INDF,W		; Store in hi on the stack
	dec	FSR
	mov	W,INDF		; Pick up the hi on the stack
	mov	TOS_hi,W	; Store in the TOS_hi
	dec	FSR
	mov	W,TOS_lo	; Pick up the TOS_lo
	mov	INDF,W		; Store in lo on the stack
	dec	FSR
	mov	W,INDF		; Pick up the hi on the stack
	mov	TOS_lo,W	; Store in the TOS_lo
	rts			; SWAP complete
	

A shorter version (by 3 instructions):

	mov	W,DSP
	mov	FSR,W
	mov	W,INDF		; Pick up the lo on the stack
	dec	FSR
	mov	INDF,W		; Store in temp spot
	inc	FSR
	mov	W,TOS_lo
	mov	INDF,W		; Store TOS_lo onto stack
	inc	FSR
	mov	W,INDF		; Pick up hi off stack
	mov	TOS_lo,W	; Unused spot to save hi
	mov	W,TOS_hi
	mov	INDF,W		; Store in hi on the stack
	mov	W,TOS_lo	; This is the hi from the stack
	mov	TOS_hi,W	; Hi is completely swapped
	dec	FSR
	dec	FSR
	mov	W,INDF		; Pick up the lo from the stack [temp]
	mov	TOS_lo,W	; Lo is completely swapped
	rts			; SWAP complete

; init_stacks
;
; This routine will initialize the data and return stacks
; The data stack occupies $C0 - $FF.  It's a pre-decrement/
; post-increment stack.
; The return stack occupies $A0 - $BF; same on the pre-decr/
; post-incr.
; NOTE: The TOS held in global registers is -not- accounted for
; by the current stack pointer.  Think of it as the items on
; the stack.  And, there's a TOS that's implied in stack ops.
	clr	DSP
	mov	W,#$C0		
	mov	RSP,W
	rts

; CALL_push
;
; Push the current Program Counter onto the return stack.
	dec	RSP
	mov	W,RSP
	mov	FSR,W
	mov	W,PC_hi
	mov	INDF,W
	dec	FSR
	mov	W,PC_lo
	mov	INDF,W
	rts

; Fetch
;
; This routine will fetch a primitive instruction from the Icache.
; It will increment the Icache index.  Upon entry, if the Icache
; index is 32 or greater (bit 5 set) then the cache will be reloaded.
fetch:
	_bank	$0
	snb	ICINDX.5	; If bit 5 is clear, then skip
	jmp	loadcache	; Reload the cache from the current PC's page
	mov