Don't you hate it! You upgrade your Amiga to a fancy 1200 or 4000, and a lot
of the old software you used to use no longer works! Well there are a number
of people out there fixing games, so here is a brief description of what
is generally required to get games and demos working:
Greetings must go to Tachyon, Legionary, N.O.M.A.D, Jean François Fabre, Bert Jahn, Galahad/Fairlight and numerous others for their help in this area. Thanks a lot guys!
In non-EC CPUs such as 68030, 68040, 68060 (not 68000 nor 680EC20), an address is coded on 32 bits, whereas 68000 and 680EC20 only take the 24 lower bits in consideration when accessing memory, for data or instruction fetch.
e.g: on a 68000, if you want to jump to $4000, you can code either of:
jmp $4000 ; the simplest
jmp $xx004000 ; xx = anything different from zero
; (the stupidest)
Then, jmp $ff004000 will jump to $4000 (the PC will be equal to $ff004000 but the instuctions will be fetched from $4000 and so on and it won't crash.
Conversely, as real 32 bits CPUs don't mask the most significant byte, the same instruction jmp $ff004000 executed on a 68040 will cause the cpu to fetch the instructions from $FF004000, which will most likely cause a superb crash.
"But the programmers never code jmp $ff004000 instead of jmp $4000", I hear you say. Yes, but they often use address tables (for fast switch/case) like this:
move.l d0,d1
lsl.l #2,d1
lea addresstable,a0
move.l (a0,d1.l),a1
jsr (a1) ; Jump to fetched address
move.w #$95,d1
moveq.l #0,d0
addresstable:
dc.l $00004000,$00004046,$0000502A,...
A very convenient technique, except if the programmer has the stupid idea to use the unused most significant byte of an address (for instance at location (addresstable)) to store 1 byte data, such as a counter. Then, the JSR will only be correct if the value is 0, else it will crash.
This kind of error is very hard to find, and then you must patch it. It was the cause of the crash of Xenon2 and Z-Out, for instance, but only when a special bonus was taken.
On Agony it was harder to detect, as some enemies cannot be killed on a 68060, even with all caches off. This is still a 24 bit problem.
Once found, you can modify the code as follows:
move.l d0,d1
lsl.l #2,d1
lea addresstable,a0
move.l (a0,d1.l),a1
jsr My24BitPatch
NOP
moveq.l #0,d0
Add NOPs as you won't have the room for the jsr (6 bytes vs 2 for JSR (A1)) and copy some original code you overwrote:
My24BitPatch:
move.l d0,-(a7) ; save d0
move.l a1,-(a7) ; save a1
move.l a1,d0
and.l #$00ffffff,d0 ; filter the MSB (only with data registers)
move.l d0,a1
jsr (a1) ; jump
move.l (a7)+,a1 ; in case the game uses MSB of a1 (suckers!!)
move.l (a7)+,d0 ; restore d0
move.w #$95,d0 ; original game code
rts ; return
This example was adapted from the Xenon 2 patch by Jean François Fabre.
This is a very frequent problem on the Amiga. You try a game, and the character flashes, or the game crashes after a few seconds, and the screen is corrupt.
In most cases, the game can be played, but the graphic bugs are annoying. One would think the AGA chipset is the cause but it's not. For proof, test the faulty game on an accelerated A2000 and the result is the same.
The problem is that programmers often don't wait for the old blit to finish before starting another. This is no concern on A500/68000 because they know the blits are over due to the terrible speed of the 68000, but when you upgrade to A1200/020 you see the good coders.
Be aware that some blitter problems cannot be detected even on 68060 processors! This is due to the chipmem access speed on certain cards such as the 68060 Blizzard card. It is terribly slow compared to even a standard A1200/020 with caches on or a Blizzard 68030-IV card!
There's also some Bltpri configuration which seems to change behaviour whether setpatch has been run or not.
To find them, you've got to search write accesses to $dff058 blitter register. It can be found in various flavours in the code:
move.w d1,$dff058
or:
lea $dff000,a5
...
move.w #$56,($58,a5)
or even:
lea $dff048,a6
...
move.w #$56,($10,a6)
As you can see, it can be a real pain in the arse to find these instructions! Once you do find one, search for more identical instructions in the code. This is because programmers are generally lazy and if they get some code working, they will copy and paste it throughout the rest of the program.
To fix blitter problems, you've got to make the CPU wait before or after the blit is performed. It sounds natural that if you wait after, you'll lose CPU cycles since non-blitter related stuff has to wait. That can slow down the game a lot.
If you wait before the blit, non-blitter code will be able to execute in parallel with the DMA blit, and the CPU will wait only in the case the old blit is not over when you reach the new blit instruction. This synchronization is necessary between the CPU and Blitter.
To wait for blitter operation to complete:
btst #6,$dff002 ; dmaconr
wait: btst #6,$dff002 ; test twice to fix old bug
beq.b wait ; wait until blitter DMA is over
<make the blit>
If you have the graphics library open, it is safer to call the WaitBlit() function than code your own - this should be guaranteed to work for all processors. Unfortunately, most games hit the hardware directly so you usually have to insert the code above anyway :(
The CPU caches cause a lot of problems on older software. Self modifying code and CPU delay loops all fail on anything above a 68020. Turning off the caches helps the software to run correctly.
To make software run faster, turn on the caches. Remember the data cache is not enabled on an Amiga 4000/030/040 until the setpatch command has run.
Programmers should use the CacheClearU() or CacheClearE() routines on KS36+ machines.
All processors of the 68K family have a prefetch feature. The processor assumes that the code will be executed without break of sequence, so it prefetches the instructions to avoid memory accesses, and this feature cannot be disabled, unlike the cache feature which can be controlled. So disabling the caches will not solve all the self-modifying code problems.
On a 68000, this code will work properly:
move.b #1,moveinst+2
nop
moveinst:
move.b #0,D0
...
The value in D0 will be 1 (self-modifying code).
On a 68020 this still seems to work, at least the first time, and all the time if you disable the instruction cache.
On a 68060, this code will not work (D0 will be equal to 0!!) whether the caches are on or off. This can be harmless but can also lead to strange behaviour if the instruction dynamically modified is a branch!
In this case there's no other way than patching the code 'by hand', by breaking the instruction flow (e.g by a TRAP or a BSR.B).
Hint: Coders insert NOPs in the code like the example above to be sure prefetch will be knocked out on a 68000. Search for NOPs in the code, and you'll be surprised to find interesting things like CPU dependent loops, prefetch and cracked software :)
Sometimes you have to wonder how people who write software ever managed to create a program which runs. Check out the following code from an old Oracle intro:
move.l #4,a0
move.l (a0),a0
move.l (a0),a6
...proceed to use a6 as GfxBase...
How often will GfxBase happen to be the 2nd library down the chain? Another classic example is from the game Final Blow, cracked by Crystal. Their intro is a standard AmigaDos executable program and the bootblock decides to load it at $40000 and then jsr $40000. Slight problem. The AmigaDos hunks are the first $20 odd bytes. It's a miracle this code ever ran at all. Now I know why Jean François Fabre hates crackers so much :)
Surprisingly enough, most games will not work on non 68000 computers due to shitty disk based protection and dodgy loaders! Often the game itself is reasonably well written and would have worked had the disk format been standard. For example, Rob Northen Copylock v1.0 fails on anything higher than a 68010, so many old (original) games will not work!
Remove the protection and you will often get the game working! Contact your local cracker!
A problem encountered sometimes is a faulty chip memory detection because of the "unexpected" fact that AGA amigas have got 2MB of chip memory. (Programmers should not assume anything about memory on a computer but often do!)
Some games try to find fast memory at $200000:
move.l #$aaaaaaaa,$200000 ; Poke in $200000
cmp.l #$aaaaaaaa,$200000 ; Re-read to check valid address
bne NoFastAt200000
...
This detection works provided the chipmem size does not exceed 1Mb ($0 to $FFFFF). With an AGA Amiga like the A1200, chip memory can be found from $0 to $1FFFFF.
When you poke in $200000 on a A1200 without fastmem at $200000, there is a mirror effect and the write address is decoded as $0. So, the re-read is OK, and the game trusts that there is at least 512K of memory from $200000 to $27FFFF. It actually stores the data from $0 to $7FFFF and it crashes very efficiently, as program code and stack are in that zone 99% of the time.
The "safe" thing to do would have been:
move.l #0,$0
move.l #$aaaaaaaa,$200000 ; Poke in $200000
cmp.l #$aaaaaaaa,$200000 ; Re-read to check valid address
bne NoFastAt200000
cmp.l $0,$200000 ; Test the mirror effect
beq NoFastAt200000
The mirror or modulo effect is detected. This error was found and corrected by Jean François Fabre in Lotus Turbo Challenge 2 and 3, from Gremlin.
Some games work fine until you press a key. Then it freezes and you're forced to reboot. But the music is still playing, and some animation can continue!
That sounds strange and it is! Keyboard interrupts have got a priority level of 2, while VBL interrupts (often used by tracker routines) have a level of 3. Which explains that the music can continue playing. The keyboard interrupt was not acknowledged, and it happens all the time. The program can't continue. Only higher interrupts can run.
This is often caused by an acknowledge too soon before the rte:
KbInt: move.w #8,$dff09c ; Acknowledge interrupt
...
...
rte
Moving the acknowledge instruction just before the rte can be enough. If this does not work, try replacing #8 by #$7FFF.
Interrupt 3 can behave the same way too. Use the same solution (try #$70 before trying #$7FFF).
These problems were noticed (and fixed) in Ninja Spirit, R-Type 2 and Z-Out!
Keyboard routines are usually setup to run from the level 2 interrupt.
When a keypress is detected, the level 2 interrupt is fired.
Between detecting the key and acknowledging the keypress, there needs
to be a time delay. The ways to achieve this are (ranked from best to
worst):
move.b $bfec01,d0 ; Keypress not.b d0 ror.b #1,d0 ; d0 now contains the raw key dump moveq #36,d1 loop dbf d1,loop ; Stupid loop
The problem with this is that faster CPU's make the loop almost non
existant, and the keypress is not acknowledged. The computer then locks
up. The easiest fix is to steal 6 bytes from the code, and insert a
jsr to some patch code which has a working delay:
lea .KBFix(pc),a0 ;Your keyboard fix routine
move.w #$4e79,dumb ;Insert JSR
move.l a0,loop
...
.KBFix movem.l d0-d1/a0,-(sp) ;Horizontal raster timing code
lea (_custom+vhposr),a0
moveq #3-1,d0 ;Wait because handshake min 75µs
.1 move.b (a0),d1
.2 cmp.b (a0),d1 ;One line is 63.5µs
beq.b .2
dbf d0,.1 ;Min=127µs Max=190.5µs
movem.l (sp)+,d0-d1/a0
.RTS rts ;Back to original code
The following movem command yields different results when run on a 68000/010 and a 68020/030/040/060:
movem.x rl,-(an)
There is a difference if the register used in predecrement mode is also
contained in the register list. For the 68020, 68030 and 68040 the value
written to memory is the initial register value decemented by the size of
the operation. The 68000 and 68010 write the initial register value
(not decremented).
Because this type of construction is not very useful, no problems to existing software are known.
This is a major problem found in a surprisingly large number of games, including several reasonably new ones. Basically the programmers write code which modifies itself while it is running. As more efficient CPU's came out, they had caches on them which store the last few instructions executed. When the code modified itself, it modified only the copy in the cache and not the one in real memory. Therefore the game goes kaboom after a while!
Stackframes are different on each processor
The stackframes created by the processor on interrupts and exceptions are different for the members of the 68k family. On the 68000 a stackframe is 6 bytes, except on Bus and Address Error. The stackframe contains first the saved SR at (a7) and the saved PC at (2,a7). On all other processors (68010+) the minimal stackframe is 8 bytes and additionally contains the vector number as word at (6,a7). This Four-Word stackframe format $0 is created for "Trap #xx" and Interrupts on 68010-68060. The stackframes on other exceptions are different on each processor.
The RTE instruction works differently on the 68000 against 68010+. On a 68000 it simply writes the SR and PC back and continues program execution at the interrupted address. On the 68010+ it additionally frees the stackframe depending on the stackframe format.
Some programs push an address (PC) and a SR and then execute an RTE instruction. This works on a 68000 only, but on 68010+ this will have undefinable results.
If a program contains this awful code, you have to emulate it. Sometimes it may be enough to replace the rte with an rtr.
Another way is to make the stackframe independent from the type of processor by using a TRAP (the JSR equivalent of RTE).
All 680x0 machines have a 1K "exception table" starting at address $0 when the machine first boots. This can subsequently be moved into another area on 010's and higher using the Vector Base Register (VBR) and a simple copy loop.
Assuming it's located at $0, then $0 - $ff is where the main action's at, on all 68k machines. The address range $100 - $3ff is reserved by Motorola for "user vectors". This address space is unused on the Amiga, but might be used on other machines. The point is that $0 - $ff are the important vectors on all 68k machines, and $100 - $3ff *might* also be used, or then again, might not be.
Some games rely on some values in ROM to work properly. For example, the great game "Gods" works OK from OS1.3 to OS3.0, but fails on OS3.1 on the same computer! It works OK with a softkicked OS3.0, but fails with a OS3.1 chip!)
The solution: the game reads in $FFxxxx (I don't remember now) and I really don't know why. By luck, it worked until OS3.0 but the game seems not to like the values returned by OS3.1!
Some other games even call ROM addresses directly (Gravity Force) or poke in non documented exec strucures, copperlists...
A possible explanation of those accesses in non-DOS games is a protection against hardware freezers like Action Replay or Nordic Power.
Another example: There is a check in $F00000 for the value $1111 in the game Pinball Dreams, maybe to detect such a device but: