Action's guide to AGA fixing software!

Action's Guide to AGA-Fixing!

Don't you hate it! You upgrade your Amiga to a fancy 1200 or 4000, and a lot of the old software you used to use no longer works! Well there are a number of people out there fixing games, so here is a brief description of what is generally required to get games and demos working:

24 Bit Addresses
Blitter
CPU Caches
CPU Prefetch
Disk based protection
Dodgy Code
Faulty Memory Detection
Freezes on keypresses
Keyboard Interrupts
Movem instruction
Self Modifying Code
Stackframes are different on each processor
VBR (Vector Base Register)
Weird ROM Accesses

Please note that this section is called "AGA fixing" and most of the problems will occur with ECS upgraded configurations. 680x0 fixing may be a more suitable name, but that title does not look as nice as the current one! So no complaints please!

Greetings must go to Tachyon, Legionary, N.O.M.A.D, Jean François Fabre, Bert Jahn, Galahad/Fairlight and numerous others for their help in this area. Thanks a lot guys!

24bit Addresses

In non-EC CPUs such as 68030, 68040, 68060 (not 68000 nor 680EC20), an address is coded on 32 bits, whereas 68000 and 680EC20 only take the 24 lower bits in consideration when accessing memory, for data or instruction fetch.

e.g: on a 68000, if you want to jump to $4000, you can code either of:

        jmp     $4000               ; the simplest
        jmp     $xx004000           ; xx = anything different from zero
                                    ; (the stupidest)

Then, jmp $ff004000 will jump to $4000 (the PC will be equal to $ff004000 but the instuctions will be fetched from $4000 and so on and it won't crash.

Conversely, as real 32 bits CPUs don't mask the most significant byte, the same instruction jmp $ff004000 executed on a 68040 will cause the cpu to fetch the instructions from $FF004000, which will most likely cause a superb crash.

"But the programmers never code jmp $ff004000 instead of jmp $4000", I hear you say. Yes, but they often use address tables (for fast switch/case) like this:

        move.l  d0,d1
        lsl.l   #2,d1
        lea     addresstable,a0
        move.l  (a0,d1.l),a1
        jsr     (a1)            ; Jump to fetched address
        move.w  #$95,d1
        moveq.l #0,d0

addresstable:
        dc.l    $00004000,$00004046,$0000502A,...

A very convenient technique, except if the programmer has the stupid idea to use the unused most significant byte of an address (for instance at location (addresstable)) to store 1 byte data, such as a counter. Then, the JSR will only be correct if the value is 0, else it will crash.

This kind of error is very hard to find, and then you must patch it. It was the cause of the crash of Xenon2 and Z-Out, for instance, but only when a special bonus was taken.

On Agony it was harder to detect, as some enemies cannot be killed on a 68060, even with all caches off. This is still a 24 bit problem.

Once found, you can modify the code as follows:

        move.l  d0,d1
        lsl.l   #2,d1
        lea     addresstable,a0
        move.l  (a0,d1.l),a1
        jsr     My24BitPatch
        NOP
        moveq.l #0,d0

Add NOPs as you won't have the room for the jsr (6 bytes vs 2 for JSR (A1)) and copy some original code you overwrote:

My24BitPatch:
        move.l  d0,-(a7)        ; save d0
        move.l  a1,-(a7)        ; save a1
        move.l  a1,d0
        and.l   #$00ffffff,d0   ; filter the MSB (only with data registers)
        move.l  d0,a1
        jsr     (a1)            ; jump
        move.l  (a7)+,a1        ; in case the game uses MSB of a1 (suckers!!)
        move.l  (a7)+,d0        ; restore d0
        move.w  #$95,d0         ; original game code
        rts                     ; return

This example was adapted from the Xenon 2 patch by Jean François Fabre.

Blitter

This is a very frequent problem on the Amiga. You try a game, and the character flashes, or the game crashes after a few seconds, and the screen is corrupt.

In most cases, the game can be played, but the graphic bugs are annoying. One would think the AGA chipset is the cause but it's not. For proof, test the faulty game on an accelerated A2000 and the result is the same.

The problem is that programmers often don't wait for the old blit to finish before starting another. This is no concern on A500/68000 because they know the blits are over due to the terrible speed of the 68000, but when you upgrade to A1200/020 you see the good coders.

Be aware that some blitter problems cannot be detected even on 68060 processors! This is due to the chipmem access speed on certain cards such as the 68060 Blizzard card. It is terribly slow compared to even a standard A1200/020 with caches on or a Blizzard 68030-IV card!

There's also some Bltpri configuration which seems to change behaviour whether setpatch has been run or not.

To find them, you've got to search write accesses to $dff058 blitter register. It can be found in various flavours in the code:

        move.w  d1,$dff058

or:

        lea      $dff000,a5
        ...
        move.w   #$56,($58,a5)

or even:

        lea      $dff048,a6
        ...
        move.w   #$56,($10,a6)

As you can see, it can be a real pain in the arse to find these instructions! Once you do find one, search for more identical instructions in the code. This is because programmers are generally lazy and if they get some code working, they will copy and paste it throughout the rest of the program.

To fix blitter problems, you've got to make the CPU wait before or after the blit is performed. It sounds natural that if you wait after, you'll lose CPU cycles since non-blitter related stuff has to wait. That can slow down the game a lot.

If you wait before the blit, non-blitter code will be able to execute in parallel with the DMA blit, and the CPU will wait only in the case the old blit is not over when you reach the new blit instruction. This synchronization is necessary between the CPU and Blitter.

To wait for blitter operation to complete:

       btst     #6,$dff002      ; dmaconr
wait:  btst     #6,$dff002      ; test twice to fix old bug
       beq.b    wait            ; wait until blitter DMA is over
       <make the blit>

If you have the graphics library open, it is safer to call the WaitBlit() function than code your own - this should be guaranteed to work for all processors. Unfortunately, most games hit the hardware directly so you usually have to insert the code above anyway :(

CPU Caches

The CPU caches cause a lot of problems on older software. Self modifying code and CPU delay loops all fail on anything above a 68020. Turning off the caches helps the software to run correctly.

To make software run faster, turn on the caches. Remember the data cache is not enabled on an Amiga 4000/030/040 until the setpatch command has run.

Programmers should use the CacheClearU() or CacheClearE() routines on KS36+ machines.

CPU Prefetch

All processors of the 68K family have a prefetch feature. The processor assumes that the code will be executed without break of sequence, so it prefetches the instructions to avoid memory accesses, and this feature cannot be disabled, unlike the cache feature which can be controlled. So disabling the caches will not solve all the self-modifying code problems.

On a 68000, this code will work properly:

        move.b  #1,moveinst+2
        nop
moveinst:
        move.b  #0,D0
        ...

The value in D0 will be 1 (self-modifying code).

On a 68020 this still seems to work, at least the first time, and all the time if you disable the instruction cache.

On a 68060, this code will not work (D0 will be equal to 0!!) whether the caches are on or off. This can be harmless but can also lead to strange behaviour if the instruction dynamically modified is a branch!

In this case there's no other way than patching the code 'by hand', by breaking the instruction flow (e.g by a TRAP or a BSR.B).

Hint: Coders insert NOPs in the code like the example above to be sure prefetch will be knocked out on a 68000. Search for NOPs in the code, and you'll be surprised to find interesting things like CPU dependent loops, prefetch and cracked software :)

Dodgy Code

Sometimes you have to wonder how people who write software ever managed to create a program which runs. Check out the following code from an old Oracle intro:

        move.l  #4,a0
        move.l  (a0),a0
        move.l  (a0),a6
        ...proceed to use a6 as GfxBase...

How often will GfxBase happen to be the 2nd library down the chain? Another classic example is from the game Final Blow, cracked by Crystal. Their intro is a standard AmigaDos executable program and the bootblock decides to load it at $40000 and then jsr $40000. Slight problem. The AmigaDos hunks are the first $20 odd bytes. It's a miracle this code ever ran at all. Now I know why Jean François Fabre hates crackers so much :)

Disk based protection

Surprisingly enough, most games will not work on non 68000 computers due to shitty disk based protection and dodgy loaders! Often the game itself is reasonably well written and would have worked had the disk format been standard. For example, Rob Northen Copylock v1.0 fails on anything higher than a 68010, so many old (original) games will not work!

Remove the protection and you will often get the game working! Contact your local cracker!

Faulty Memory Detection

A problem encountered sometimes is a faulty chip memory detection because of the "unexpected" fact that AGA amigas have got 2MB of chip memory. (Programmers should not assume anything about memory on a computer but often do!)

Some games try to find fast memory at $200000:

        move.l  #$aaaaaaaa,$200000    ; Poke in $200000
        cmp.l   #$aaaaaaaa,$200000    ; Re-read to check valid address
        bne     NoFastAt200000
        ...

This detection works provided the chipmem size does not exceed 1Mb ($0 to $FFFFF). With an AGA Amiga like the A1200, chip memory can be found from $0 to $1FFFFF.

When you poke in $200000 on a A1200 without fastmem at $200000, there is a mirror effect and the write address is decoded as $0. So, the re-read is OK, and the game trusts that there is at least 512K of memory from $200000 to $27FFFF. It actually stores the data from $0 to $7FFFF and it crashes very efficiently, as program code and stack are in that zone 99% of the time.

The "safe" thing to do would have been:

       move.l   #0,$0
       move.l   #$aaaaaaaa,$200000    ; Poke in $200000
       cmp.l    #$aaaaaaaa,$200000    ; Re-read to check valid address
       bne      NoFastAt200000
       cmp.l    $0,$200000            ; Test the mirror effect
       beq      NoFastAt200000

The mirror or modulo effect is detected. This error was found and corrected by Jean François Fabre in Lotus Turbo Challenge 2 and 3, from Gremlin.

Freezes on keypresses

Some games work fine until you press a key. Then it freezes and you're forced to reboot. But the music is still playing, and some animation can continue!

That sounds strange and it is! Keyboard interrupts have got a priority level of 2, while VBL interrupts (often used by tracker routines) have a level of 3. Which explains that the music can continue playing. The keyboard interrupt was not acknowledged, and it happens all the time. The program can't continue. Only higher interrupts can run.

This is often caused by an acknowledge too soon before the rte:

KbInt:  move.w  #8,$dff09c      ; Acknowledge interrupt
        ...
        ...
        rte

Moving the acknowledge instruction just before the rte can be enough. If this does not work, try replacing #8 by #$7FFF.

Interrupt 3 can behave the same way too. Use the same solution (try #$70 before trying #$7FFF).

These problems were noticed (and fixed) in Ninja Spirit, R-Type 2 and Z-Out!

Keyboard Interrupts

Keyboard routines are usually setup to run from the level 2 interrupt. When a keypress is detected, the level 2 interrupt is fired. Between detecting the key and acknowledging the keypress, there needs to be a time delay. The ways to achieve this are (ranked from best to worst):

CIA Timer
Raster positioning
A CPU Loop

The CIA timer is virtually never used in (older) games, and the raster positioning method is hardly ever found. Most games resort to a stupid loop like this between detecting and acknowledging the keypress: move.b $bfec01,d0 ; Keypress not.b d0 ror.b #1,d0 ; d0 now contains the raw key dump moveq #36,d1 loop dbf d1,loop ; Stupid loop

The problem with this is that faster CPU's make the loop almost non existant, and the keypress is not acknowledged. The computer then locks up. The easiest fix is to steal 6 bytes from the code, and insert a jsr to some patch code which has a working delay:

        lea     .KBFix(pc),a0       ;Your keyboard fix routine
        move.w  #$4e79,dumb         ;Insert JSR
        move.l  a0,loop

        ...

.KBFix  movem.l d0-d1/a0,-(sp)      ;Horizontal raster timing code
        lea     (_custom+vhposr),a0
        moveq   #3-1,d0             ;Wait because handshake min 75µs
.1      move.b  (a0),d1
.2      cmp.b   (a0),d1             ;One line is 63.5µs
        beq.b   .2
        dbf     d0,.1               ;Min=127µs Max=190.5µs
        movem.l (sp)+,d0-d1/a0
.RTS    rts                         ;Back to original code

Movem instruction

The following movem command yields different results when run on a 68000/010 and a 68020/030/040/060:

        movem.x rl,-(an)

There is a difference if the register used in predecrement mode is also contained in the register list. For the 68020, 68030 and 68040 the value written to memory is the initial register value decemented by the size of the operation. The 68000 and 68010 write the initial register value (not decremented).

Because this type of construction is not very useful, no problems to existing software are known.

Self Modifying Code

This is a major problem found in a surprisingly large number of games, including several reasonably new ones. Basically the programmers write code which modifies itself while it is running. As more efficient CPU's came out, they had caches on them which store the last few instructions executed. When the code modified itself, it modified only the copy in the cache and not the one in real memory. Therefore the game goes kaboom after a while!

Stackframes are different on each processor

The stackframes created by the processor on interrupts and exceptions are different for the members of the 68k family. On the 68000 a stackframe is 6 bytes, except on Bus and Address Error. The stackframe contains first the saved SR at (a7) and the saved PC at (2,a7). On all other processors (68010+) the minimal stackframe is 8 bytes and additionally contains the vector number as word at (6,a7). This Four-Word stackframe format $0 is created for "Trap #xx" and Interrupts on 68010-68060. The stackframes on other exceptions are different on each processor.

The RTE instruction works differently on the 68000 against 68010+. On a 68000 it simply writes the SR and PC back and continues program execution at the interrupted address. On the 68010+ it additionally frees the stackframe depending on the stackframe format.

Some programs push an address (PC) and a SR and then execute an RTE instruction. This works on a 68000 only, but on 68010+ this will have undefinable results.

If a program contains this awful code, you have to emulate it. Sometimes it may be enough to replace the rte with an rtr.

Another way is to make the stackframe independent from the type of processor by using a TRAP (the JSR equivalent of RTE).

VBR (Vector Base Register)

All 680x0 machines have a 1K "exception table" starting at address $0 when the machine first boots. This can subsequently be moved into another area on 010's and higher using the Vector Base Register (VBR) and a simple copy loop.

Assuming it's located at $0, then $0 - $ff is where the main action's at, on all 68k machines. The address range $100 - $3ff is reserved by Motorola for "user vectors". This address space is unused on the Amiga, but might be used on other machines. The point is that $0 - $ff are the important vectors on all 68k machines, and $100 - $3ff *might* also be used, or then again, might not be.

Weird ROM Accesses

Some games rely on some values in ROM to work properly. For example, the great game "Gods" works OK from OS1.3 to OS3.0, but fails on OS3.1 on the same computer! It works OK with a softkicked OS3.0, but fails with a OS3.1 chip!)

The solution: the game reads in $FFxxxx (I don't remember now) and I really don't know why. By luck, it worked until OS3.0 but the game seems not to like the values returned by OS3.1!

Some other games even call ROM addresses directly (Gravity Force) or poke in non documented exec strucures, copperlists...

A possible explanation of those accesses in non-DOS games is a protection against hardware freezers like Action Replay or Nordic Power.

Another example: There is a check in $F00000 for the value $1111 in the game Pinball Dreams, maybe to detect such a device but:

It does not detect an Action Replay III cartridge.
It will detect a Blizzard 1260 card and crash!

If you find routines looking like this, remove them, or it may cause problems to other users of your patches. For the moment, the solution is:

Look.
Get a good value (using a 1.3 kick, and/or another Amiga).
Remove.
Imitate.