Why auto disassembly is tough

This is where the BIN Hackers and definition junkies discuss the inner workings of the EEC code and hardware. General tuning questions do not go here. Only technical/hardware-specific/code questions and discussions belong here.

Moderators: cgrey8, EDS50, Jon 94GT, 2Shaker

Post Reply
motorhead1991
Regular
Posts: 124
Joined: Tue Nov 21, 2017 2:32 am

Re: Why auto disassembly is tough

Post by motorhead1991 » Tue May 14, 2019 8:50 am

jsa wrote:
Mon May 13, 2019 7:38 pm
Another 3.07 vs 3.08.

In the attached zip;
KRAF5ZV.bin has ford WDS bank order.
LFQ1*.bin has come from some EEC with that bank order and is 256kb.
Both are LFQ1 catchcode and KRAFZV strategy, just obtained in a different manner.

1st run SAD3.07 handles LFQ1 better than the WDS version.

1st run SAD3.08 for both

Code: Select all

Cannot find a Start bank [bank 8]
abandoned !!
Along this same note, I've noticed that the bank zero command is redundant and often doesn't work for me. Usually it crashes the disassembly process.
1990 Ford Ranger FLH2 conversion. Ford forged/dished pistons, Total Seal file-fit rings, Clevite rod and main bearings, Clevite cam bearings, IHI turbo, Siemens Deka 60lb/hr injectors, Ford slot MAF in custom 3" housing. Moates Quarterhorse with Binary Editor, using the PAAD6 database.

OpenEEC Telegram Chat:
Telegram

User avatar
tvrfan
Tuning Addict
Posts: 394
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Wed May 15, 2019 3:16 pm

Guys, got the bin, will have a look.

Confirming -

Neither 3.07 or 3.08 work well for subroutines with variable arguments. In progress to try to get a solution to that. Can do manually now by using the 'args' command to manually set number of data items for each subr call. (but it's a PITA)

Main difference between 07 and 08 was the rewrite of bank detection. 3.07 had several issues with file sizes and was broken by some bins which had leftover bytes at the bank edges. It also couldn't handle a bin with a repeated bank 8 (someone must have made this - it appears to have the same bank code with different tunes set...

For 08, I added some more stringent checks to make sure that SAD was truly at a bank start. This check was to make sure that -
1) the first jump is to greater than 0x201f (0x205f for 8065) for bank 8 or single bank. Or jump must be a loopback jump (not bank 8 )
2) the interrupt vectors must all be valid addresses (all 8 or all 48).
3) there must be at least one 'dummy' interrupt handler.
2 and 3 should also track across banks correctly. Some bins have just a bank swop call or jump to the true handler subr.

It's quite possible there's a bug in this new system too, but for most bins I tested it works better, especially for those with crap data around the edges, and I couldn't get a crash.

Other than that, there were a few small bug fixes which didn't add up to all that much....

Bank 0 ?? the RAM set opcode probably doesn't work quite right, but the main BNK opcode should ...
Last edited by tvrfan on Wed May 15, 2019 3:57 pm, edited 1 time in total.
TVR, Triumph (cars), kit cars, classics. Ex IT geek, development and databases.

https://github.com/tvrfan/EEC-IV-disassembler

motorhead1991
Regular
Posts: 124
Joined: Tue Nov 21, 2017 2:32 am

Re: Why auto disassembly is tough

Post by motorhead1991 » Wed May 15, 2019 3:53 pm

tvrfan wrote:
Wed May 15, 2019 3:16 pm
Guys, got the bin, will have a look.

Confirming -

Neither 3.07 or 3.08 work well for subroutines with variable arguments. In progress to try to get a solution to that. Can do manually now by using the 'args' command to manually set number of data items for each subr call. (but it's a PITA)

Main difference between 07 and 08 was the rewrite of bank detection. 3.07 had several issues with file sizes and was broken by some bins which had leftover bytes at the bank edges. It also couldn't handle a bin with a repeated bank 8 (someone must have made this - it appears to have the same bank code with different tunes set...

For 08, I added some more stringent checks to make sure that SAD was truly at a bank start. This check was to make sure that -
1) the first jump is to greater than 0x201f (0x205f for 8065) for bank 8 or single bank. Or jump must be a loopback jump (not bank 8)
2) the interrupt vectors must all be valid addresses (all 8 or all 48).
3) there must be at least one 'dummy' interrupt handler.
2 and 3 should also track across banks correctly. Some bins have just a bank swop call or jump to the true handler subr.

It's quite possible there's a bug in this new system too, but for most bins I tested it works better, especially for those with crap data around the edges, and I couldn't get a crash.

Other than that, there were a few small bug fixes which didn't add up to all that much....

Bank 0 ?? the RAM set opcode probably doesn't work quite right, but the main BNK opcode should ...
It was this command:

Code: Select all

bank 0 0 dfff
It complained of a fill address out of bounds despite removing the "fill" commands from the dir file.
1990 Ford Ranger FLH2 conversion. Ford forged/dished pistons, Total Seal file-fit rings, Clevite rod and main bearings, Clevite cam bearings, IHI turbo, Siemens Deka 60lb/hr injectors, Ford slot MAF in custom 3" housing. Moates Quarterhorse with Binary Editor, using the PAAD6 database.

OpenEEC Telegram Chat:
Telegram

User avatar
tvrfan
Tuning Addict
Posts: 394
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Wed May 15, 2019 3:55 pm

A note of where I'm up to.
I gave up and went back to 3.08 for a do over !

This wasn't a complete waste of time as I did improve quite a few things on the way, and wrote some new stuff which I have kept. But the various main ideas didn't pan out. The heart of the issue is how to balance a static analysis (which builds a 'tree' structure of blocks from the 'if/then' jumps) with a correct 'recurse' structure for subroutines. Previous SAD had a list of "scan these blocks with these values", so blocks were repeated apart from the values, and subroutines weren't called in order, so had a whole raft of complex stuff to work out when a subroutine had arguments, and had a 'hold' system for callers of each subroutine. This works for fixed args and A9L style variable args, but can't ever handle the CARD like 'if/then' with a jump into the middle of a different subr. This was further complicated by code which jumps from one subroutine into another, with arguments, (e.g. signed & unsigned lookups in later bins) and code which PUSHes a new return address for itself, and so on. I tried taking the jumps as if it was all the same block of code, but I couldn't stop some bins looping around forever, and it was a mess. Plus analysis will have to emulate some subroutines, and that causes loops as well. AARGHH!!!!

I finally had a good idea about how to handle the jumps, which uses one of those new tricks I wrote.
It's actually from "how can you tell when a 'goto x' is a true fixed goto, or just an 'else' ??

I now have a test version which is a lot smaller and seems to work just fine (not for any subr arguments yet) without a whole layer of code, AND can handle multiple jumps between subroutines (22CA), and gets the signed/unsigned lookup names right. So it's smaller and simpler code, without any 'holds' required, which is ALWAYS good. That wasn't remotely where I was going, but hey.....

So now, I'm about to add an 'emulate' piece which will run bits of the actual code itself in a controlled & fake environment. Here's hoping......
Last edited by tvrfan on Wed May 15, 2019 3:59 pm, edited 1 time in total.
TVR, Triumph (cars), kit cars, classics. Ex IT geek, development and databases.

https://github.com/tvrfan/EEC-IV-disassembler

User avatar
tvrfan
Tuning Addict
Posts: 394
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Wed May 15, 2019 3:56 pm

motorhead1991 wrote:
Wed May 15, 2019 3:53 pm

It was this command:

Code: Select all

bank 0 0 dfff
It complained of a fill address out of bounds despite removing the "fill" commands from the dir file.
Ahhh. OK, could be a screw up in the command parser then....
Thanks.
TVR, Triumph (cars), kit cars, classics. Ex IT geek, development and databases.

https://github.com/tvrfan/EEC-IV-disassembler

jsa
Tuning Addict
Posts: 544
Joined: Sat Nov 23, 2013 7:28 pm
Location: 'straya

Re: Why auto disassembly is tough

Post by jsa » Thu May 16, 2019 2:42 am

The post was about a cold turkey 1st run of SAD against a bin, yeah understood 2nd step of creating a dir with args set would improve 3.08 results for the changes you describe.

Pondering Bank8, have any been sighted that;
* Don't start with FF FA
* Don't have Checksum at 0x200A

Is more required than calculate checksum, then look for FF FA xx xx xx xx xx xx xx xx CS UM ?

Great to here of the promising progress.
Cheers

John

95 Escort RS Cosworth - GHAJ0 / ANTI on a COSY box code
Moates QH & BE
ForDiag

User avatar
tvrfan
Tuning Addict
Posts: 394
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Thu May 16, 2019 2:52 pm

jsa wrote:
Thu May 16, 2019 2:42 am
The post was about a cold turkey 1st run of SAD against a bin, yeah understood 2nd step of creating a dir with args set would improve 3.08 results for the changes you describe.

Pondering Bank8, have any been sighted that;
* Don't start with FF FA
* Don't have Checksum at 0x200A

Is more required than calculate checksum, then look for FF FA xx xx xx xx xx xx xx xx CS UM ?

Great to here of the promising progress.
YES. some earlier bins have the FF FA *after* the jump.
my very early AA box for example.

Code: Select all


2000: e7,1d,00            jump  2020             goto 2020;
...
2020: fa                  di                     disable ints;
2021: ff                  nop                    
2022: 01,16               clrw  R16              codeflags = 0;
2024: 11,02               clrb  R2               LSO_Port = 0;
2026: a3,01,50,20,30      ldw   R30,[R0+2050]    R30 = ROMStart;
202b: a3,01,5e,20,32      ldw   R32,[R0+205e]    R32 = ROMEnd;
Last two statements load start and end addresses for the checksum loop...............

CKSUM.
I am not convinced that all the bins use the same checksum addition. Again the early UK ones seem to do a simple word addition from 0x2000 through to end (0x3fff, 5fff or 7fff), but A9L series appears to add separate bytes. I admit though I haven't studied this much.

I went for the jump over the intr vectors, and verifying the vectors themselves seems more 'universal' as it's a defined CPU requirement.
SAD looks for that jump and its destination, it does NOT check for an FF or FA.

A multibank looks like this (can be short or long jump)

bank 8 - start of code (22CA)

Code: Select all


82000: ff                 nop                    
82001: fa                 di                     disable ints;
82002: e7,74,05           jump  82579            goto 82579;
and not bank 8 looks like this -

Code: Select all

02000: ff                 nop                    
02001: fa                 di                     disable ints;
02002: 27,fe              sjmp  02002            goto 02002;
note the loopstop versus real jump, but all banks have valid interrupt vector/pointer lists....................
TVR, Triumph (cars), kit cars, classics. Ex IT geek, development and databases.

https://github.com/tvrfan/EEC-IV-disassembler

User avatar
tvrfan
Tuning Addict
Posts: 394
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Thu May 23, 2019 12:31 am

While testing, I ran into a problem with vects (not same as reported by jsa, but still....) and I discovered yet more CRAZY stuff.

first - I changed vector list detection in 3.08 as it wasn't working with later binaries - so I changed the 'signature' options for a 'detect' type (when SAD arrives at a PUSH/RET combination, it attempts to find a list address.....push [R30, 3456] is obvious where list starts (but some lists start at R30 = 2, not zero...)

I discovered this is BROKEN in 3.08 for the list style where it does an ADD R30, 3456, PUSH [R30] , RET, and worse, you can't override it with a vect command, due to the user command being overridden (which should NEVER happen !!).

and then, I found this little piece of code in 2DBD binary

Code: Select all

8496: a0,7c,0e            ldw   Re,R7c           HSO_Time = R7c;
8499: b1,0f,0d            ldb   Rd,f             HSO_Cmd = f;
849c: c9,a7,84            push  84a7             push(@Sub162);
849f: ad,00,30            ldzbw R30,0            wR30 = 0;
84a2: cb,31,66,84         push  [R30+8466]       push([R30+8466]);
84a6: f0                  ret                    return;
AAARRRGGGHHHHH !!!!! the PAIN !!!! will this crap never stop ? !!!!! :BigGrin:
why the hell make R30 zero and then use it as an OFFSET ??? WTF ?????

8466 gives the clue

Code: Select all

8466: 06,d0               word  d006
8468: 06,c0               word  c006
846a: 06,e0               word  e006
846c: 09,d0               word  d009
846e: 09,c0               word  c009
8470: 09,e0               word  e009
so these are the typical cal console entry addresses..... but it still makes me ask the question WHY ????

So I need to look at vector list detection now......the 'signature' isn't flexible enough, but the detect lets too much through........
TVR, Triumph (cars), kit cars, classics. Ex IT geek, development and databases.

https://github.com/tvrfan/EEC-IV-disassembler

jsa
Tuning Addict
Posts: 544
Joined: Sat Nov 23, 2013 7:28 pm
Location: 'straya

Re: Why auto disassembly is tough

Post by jsa » Thu May 23, 2019 1:30 am

Edit;
I remember picking the eyes out of this previosly, found here
http://www.efidynotuning.com/forum/view ... ole#p33407

Another thought bubble from times past,
viewtopic.php?f=2&t=21159&p=129595&hili ... le#p129595
Have you bins that need something different to find the console routine?
Cheers

John

95 Escort RS Cosworth - GHAJ0 / ANTI on a COSY box code
Moates QH & BE
ForDiag

jsa
Tuning Addict
Posts: 544
Joined: Sat Nov 23, 2013 7:28 pm
Location: 'straya

Re: Why auto disassembly is tough

Post by jsa » Thu May 23, 2019 7:59 am

Are the signatures for subr. just based on an opcode sequence ignoring the arguments?
Cheers

John

95 Escort RS Cosworth - GHAJ0 / ANTI on a COSY box code
Moates QH & BE
ForDiag

User avatar
tvrfan
Tuning Addict
Posts: 394
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Thu May 23, 2019 3:03 pm

No, I designed the 'signature' routine after the unix regular expression (= RE), up to a point, but even that isn't flexible enough.
If you've never used this, the RE is embedded in a wide range of unix/linux edit and search applications, and although a more complex RE looks like gobbledegook, the spec makes for an incredibly powerful, flexible pattern matcher. (have a read on web !) Windows has only a pale shadow of this full functionality.

SAD has a pattern definition of opcode type + operands, where operands can be 'stored' as an index, and that index then used to specify that this value (or register) must occur again elsewhere, plus options for auto increments, indirect etc., ability to save jump addresses, bits, carry, plus sub-patterns (match a defined number of opcodes within a bigger pattern in any order), skippable and optional patterns (skip up to n opcodes up to a defined one) and a few other bits as well. Yes I thought that code wot I wrote was wonderful. And then.....

Even THAT isn't enough, I ended up with FOUR different vector patterns (for background task list) and it still wouldn't match later bins.
So it was getting out of hand.

[ hmmm... perhaps I might consider a sequence of characters just like an RE instead of a 'bit/mask/table' style like now....er....um...but probably still won't work, but easier to write new pattern matches....]

Compared to Function lookup (1d lookup) which has only 1 pattern with a few options and matches everything so far, and 2 table lookup (2D) patterns with a few options which also match everything so far. They DO work well.

Trouble is that there are so many different ways to actually code the same logic.
So I went to the style of checking when a PUSH is encountered, especially with subroutine arguments on the agenda, which goes -

1) PUSH found
2) is there a RET following close by ?
3) is it a) immediate, b) indexed or c) indirect ?
4) different action for each of the types in 3)
3a) if it's a valid address pushed, it's probably a pushed common return address for a task list
3b) indexed, almost certainly a list, as in [start + register] --- except for what I found above !!! ---
3c) indirect could be a list where you can find - add Rx, start addr - push [reg] .
-> but problem here, as quite a lot of later bins use push [Reg+offset] in their data structures (A9L does this too).
5) none of these ? OK it's just a plain old register push.

Plus for subroutine arguments -

6) is it a PUSH of a register that was previously POPped, which itself was a CALL return address ?
7) if 6) is true, then it's a candidate for a subroutine argument extract.

NB - A9L (and others) actually have DATA embedded in some of their test list addresses, so logic has to allow for some skips....

see - simple.....................(why don't smilies work for me ?)
Last edited by tvrfan on Thu May 23, 2019 4:12 pm, edited 3 times in total.
TVR, Triumph (cars), kit cars, classics. Ex IT geek, development and databases.

https://github.com/tvrfan/EEC-IV-disassembler

User avatar
tvrfan
Tuning Addict
Posts: 394
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Thu May 23, 2019 3:15 pm

'Console present'

Yes - quite a few bins have that style of check where 0xd00 (or similar addresses) appear to be a console status flag, and there are various other 'set' addresses around for plug in or special function chips/peripherals. But like the cal console (0xd000 or d006 or d009 or e000 or ....) there are a list of possibles. Some of these are in the Ford handbook, but I haven't bothered to cross ref them all. I might have to some day !
TVR, Triumph (cars), kit cars, classics. Ex IT geek, development and databases.

https://github.com/tvrfan/EEC-IV-disassembler

jsa
Tuning Addict
Posts: 544
Joined: Sat Nov 23, 2013 7:28 pm
Location: 'straya

Re: Why auto disassembly is tough

Post by jsa » Fri May 24, 2019 4:11 pm

I will have a read about RE.

For me, finding that short string with 0xD00 nails an address for working backwards to the subroutine list. When SAD misses that list, it makes for a quick find. 0xD00 seems unique, whereas 0xD000 and 0xE000 have other uses in some bins.

I agree those 'set' addresses need to be leveraged.
Cheers

John

95 Escort RS Cosworth - GHAJ0 / ANTI on a COSY box code
Moates QH & BE
ForDiag

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest