ghidra-tlcs900h

Ghidra processor module for Toshiba TLCS-900/H.

Tested with version 10.3.2.

Status

Work in progress:

Instruction set should be fully disassembled, if you notice anything missing, please open an issue / pull request!
- Note: Disassemblers usually disagree on presentation / ambiguity:
  - Disabled interrupts: di vs. ei 0x07;
  - Mnemonic qualifiers for word/long sized operands: sll vs. sllw;
  - Flag shown for conditional instructions: jp NZ,XWA vs. jp NE,XWA;
  - Destination register shown when result storage space is larger than loaded values: mul A,(XWA) vs. mul WA,(XWA);
  - Accepting decoding of extra cases that aren't documented: ldir and ldirw can be decoded with 5 other combinations of bits 0..2;
Semantics are mostly done, and are now being thoroughly tested through pcode emulation.

Usage

GHIDRA_INSTALL_DIR=#FIXME
mkdir -p "$GHIDRA_INSTALL_DIR/Ghidra/Processors/TLCS900H"
cp -r . "$GHIDRA_INSTALL_DIR/Ghidra/Processors/TLCS900H"

Examples

See ghidra-neogeopocket-loader.

Verification

The following sections describe how to verify the processor module's correctness. These can be tested independently.

Disassembly

First, we need a dataset that guarantees "good enough" coverage. 2 approaches can be combined to generate it:

Approach #1: Extracting instructions from Neo Geo Pocket ROMs, using an existing disassembler (e.g. MAME's unidasm):

MAME_BUILD_DIR=#FIXME
for i in roms/*; do
    # We only care for (valid) instructions that are 4 or more bytes long,
    # since we can feasibly bruteforce smaller ones
    "$MAME_BUILD_DIR/unidasm" "$i" -arch tlcs900 \
        | sed 's/   */  /g' \
        | cut -d':' -f2- \
        | sed 's/^[ \t]*//g' \
        | grep '^ *.. .. .. .*  ' \
        | grep -v '  db$' \
        | sort -u \
        >> 4to7out_ngp
done

Approach #2: Generating candidates by bruteforcing instruction decoding. Up to 3 byte instructions are covered in a few minutes, everything else is generated using a subset of all possibilities. MAME's unidasm is particularly fit for this, since it will expand parsed bytes with dummy bytes, until an instruction with a given prefix can be decoded. We can take these cases into consideration when generating candidates that are 4 or more bytes longer:

./scripts/dis/1n2.py \
        | sed 's/   */  /g' \
        | cut -d':' -f2- \
        | grep -v '^ *.. ..  db' \
        | grep -v '^ *..  db' \
        | sort -u \
        > 1n2out
./scripts/dis/3.py \
        | sed 's/   */  /g' \
        | cut -d':' -f2- \
        | grep -v '^ *.. .. ..  db' \
        | sort -u \
        > 3out

cat 1n2out 3out | sort -u | grep '^ .. .. .. ..  ' > 4in
./scripts/dis/expand.py 4in 4 \
        | sed 's/   */  /g' \
        | cut -d':' -f2- \
        | grep -v '^ *.. .. .. ..  db' \
        | sort -u \
        > 4out
./scripts/dis/clean.sh 4out 4

cat 1n2out 3out 4out_clean | sort -u | grep '^ *.. .. .. .. ..  ' > 5in
./scripts/dis/expand.py 5in 5 \
        | sed 's/   */  /g' \
        | cut -d':' -f2- \
        | grep -v '^ *.. .. .. .. ..  db' \
        | sort -u \
        > 5out
./scripts/dis/clean.sh 5out 5

cat 1n2out 3out 4out_clean 5out_clean | sort -u | grep '^ *.. .. .. .. .. ..  ' > 6in
./scripts/dis/expand.py 6in 6 \
        | sed 's/   */  /g' \
        | cut -d':' -f2- \
        | grep -v '^ *.. .. .. .. .. ..  db' \
        | sort -u \
        > 6out
./scripts/dis/clean.sh 6out 6

cat 1n2out 3out 4out_clean 5out_clean 6out_clean | sort -u | grep '^ *.. .. .. .. .. .. ..  ' > 7in
./scripts/dis/expand.py 7in 7 \
        | sed 's/   */  /g' \
        | cut -d':' -f2- \
        | grep -v '^ *.. .. .. .. .. .. ..  db' \
        | sort -u \
        > 7out
./scripts/dis/clean.sh 7out 7

cat 1n2out 3out 4out_clean 5out_clean 6out_clean 7out_clean | sed 's/^[ \t]*//g' | sort -u > 1to7out

Now, we can reassemble this dataset, then attempt to disassemble it again with Ghidra:

# Combine both datasets into one, skipping invalid instructions
sort -u 1to7out 4to7out_ngp \
    | grep -v -e '(),' -e ',()' -e '(unknown)' -e 'r[0-9A-F][0-9A-F][WABCDEHL]' \
    > out

# Take instruction bytes and write binary
./scripts/dis/dis2bin.py out out.bin

# Disassemble with Ghidra.
# Terminates early if an instruction could not be decoded at a given offset,
# either because it's missing or incorrect in our SLEIGH specification.
# Results are saved in ~/export/1.
GHIDRA_INSTALL_DIR=#FIXME
GHIDRA_PROJECT_DIR=#FIXME
mkdir -p ~/export/
rm -f ~/export/1 && "$GHIDRA_INSTALL_DIR/support/analyzeHeadless" "$GHIDRA_PROJECT_DIR" tlcs900h_headless \
    -import out.bin \
    -noanalysis \
    -overwrite \
    -processor 'TLCS900H:LE:32:default' \
    -scriptPath ./scripts/dis/ \
    -postScript ana2dis.py \
    ~/export/1

Finally, we can manually compare results between Ghidra vs. MAME:

diff -uw \
    <(sed 's/   */  /g; s/_\([0-9]\)/\1/g; s/_P/-1/g; s/0x0*\([0-9a-f]\)/\1/g' ~/export/1) \
    <(sed 's/   */  /g; s/0x0*\([0-9a-f]\)/\1/g' out) \
    | less

Emulation

These tests compare trace logs generated by Ghidra vs. a modified version of ares (patch for commit c4a5fcea9), which includes additional logging and dumping features, toggled with environment variables:

ARES_START: When the given hexadecimal address is reached, dumps memory and registers to files, to be loaded in Ghidra before stepping through instructions;
ARES_END: Similar to ARES_START, but exits after dumping memory;
ARES_WATCH: Dereferences and prints the value of the given address whenever it is modified;

First, start ares:

env ARES_START=0x2000e4 ARES_END=0x2009ca ~/opt/ares/desktop-ui/out/ares --system ngpc /media/fn/TOSHIBA-EXT/FN-NUX/cputest.ngc

Make sure you include these hardware events in trace logs:

After target addresses have been reached, process and move all state files to the working directory defined in the Ghidra emulation script:

mv cputest-20230817-113623.log 0x00200046.cputest-20230817-113623.log
mkdir -p ~/code/wip/tlcs900h/tmp/
for i in *.log; do
    addr=$(echo "$i" | grep '^0x' | cut -d'.' -f1)
    test -n "$addr" || continue
    sed -i '/APU I\/O: read/d; /APU Interrupt/d;' "$i"
    grep -n 'CPU Interrupt' "$i" | sed 's/^\([0-9]*\):.*:.*(\(.*\))$/\1,\2/g' > ~/code/wip/tlcs900h/tmp/"$addr".int
    grep -n 'CPU I/O' "$i" | sed 's/^\([0-9]*\):.*:[ \t]*\(.*\)$/\1,\2/g' > ~/code/wip/tlcs900h/tmp/"$addr".io
    grep -n 'APU I/O' "$i" | sed 's/^\([0-9]*\):.*:[ \t]*\(.*\)$/\1,\2/g' > ~/code/wip/tlcs900h/tmp/"$addr".apu.io
    grep -n 'APU  ' "$i" | sed 's/^\([0-9]*\):[ \t]*\(.*\)$/\1,\2/g' > ~/code/wip/tlcs900h/tmp/"$addr".apu
done

mv /tmp/0x* ~/code/wip/tlcs900h/tmp/

Emulate with Ghidra:

GHIDRA_INSTALL_DIR=#FIXME
GHIDRA_PROJECT_DIR=#FIXME
mkdir -p ~/export/
rm -f ~/export/2 && "$GHIDRA_INSTALL_DIR/support/analyzeHeadless" "$GHIDRA_PROJECT_DIR" tlcs900h_headless \
    -import roms/CPU_Test_199x_Judge_PD.bin \
    -overwrite \
    -processor 'TLCS900H:LE:32:default' \
    -scriptPath ./scripts/emu/ \
    -postScript TLCS900Emu.java \
    ~/export/2

Finally, compare output trace log against the one from ares:

./scripts/emu/tracediff.py ./0x00200046.cputest.emu.log ./0x00200046.cputest-20230817-113623.log

Example output from CPU Test by Judge, where we start emulating at the program's entrypoint, and all registers matched every stepped instruction, passing all tests:

Architecture

It was a bit challenging to map some memory addressing modes, not that they go off the beaten path much, but the particular way Toshiba decided to encode them:

So far so good, now let's look at how they get included in an instruction:

Usually, variable length tokens like r32b_eam sit at the tail end of an instruction, but here, they happen to be in the middle, followed by 0x38 + an 8/16-bit immediate value imm8/imm16. SLEIGH expects variable tokens at the end of an instruction, which means that constructors had to be duplicated for each addressing mode with a distinct length. TODO: Could context vars avoid this duplication?

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.settings		.settings
data		data
scripts		scripts
.antProperties.xml		.antProperties.xml
.gitignore		.gitignore
Module.manifest		Module.manifest
README.md		README.md
ares.png		ares.png
build.gradle		build.gradle
extension.properties		extension.properties
ins.png		ins.png
mems.png		mems.png
tracediff.png		tracediff.png

nevesnunes/ghidra-tlcs900h

Folders and files

Latest commit

History

Repository files navigation

ghidra-tlcs900h

Status

Usage

Examples

Verification

Disassembly

Emulation

Architecture

About

Resources

Stars

Watchers

Forks

Languages