Why auto disassembly is tough

This is where the BIN Hackers and definition junkies discuss the inner workings of the EEC code and hardware. General tuning questions do not go here. Only technical/hardware-specific/code questions and discussions belong here.

Moderators: cgrey8, EDS50, 2Shaker, Jon 94GT

Post Reply
motorhead1991
Regular
Posts: 152
Joined: Tue Nov 21, 2017 2:32 am

Re: Why auto disassembly is tough

Post by motorhead1991 » Tue May 14, 2019 8:50 am

jsa wrote:
Mon May 13, 2019 7:38 pm
Another 3.07 vs 3.08.

In the attached zip;
KRAF5ZV.bin has ford WDS bank order.
LFQ1*.bin has come from some EEC with that bank order and is 256kb.
Both are LFQ1 catchcode and KRAFZV strategy, just obtained in a different manner.

1st run SAD3.07 handles LFQ1 better than the WDS version.

1st run SAD3.08 for both

Code: Select all

Cannot find a Start bank [bank 8]
abandoned !!
Along this same note, I've noticed that the bank zero command is redundant and often doesn't work for me. Usually it crashes the disassembly process.
1990 Ford Ranger FLH2 conversion. Ford forged/dished pistons, Total Seal file-fit rings, Clevite rod and main bearings, Clevite cam bearings, IHI turbo, Siemens Deka 60lb/hr injectors, Ford slot MAF in custom 3" housing. Moates Quarterhorse with Binary Editor, using the PAAD6 database.

OpenEEC Telegram Chat:
Telegram

User avatar
tvrfan
Tuning Addict
Posts: 413
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Wed May 15, 2019 3:16 pm

Guys, got the bin, will have a look.

Confirming -

Neither 3.07 or 3.08 work well for subroutines with variable arguments. In progress to try to get a solution to that. Can do manually now by using the 'args' command to manually set number of data items for each subr call. (but it's a PITA)

Main difference between 07 and 08 was the rewrite of bank detection. 3.07 had several issues with file sizes and was broken by some bins which had leftover bytes at the bank edges. It also couldn't handle a bin with a repeated bank 8 (someone must have made this - it appears to have the same bank code with different tunes set...

For 08, I added some more stringent checks to make sure that SAD was truly at a bank start. This check was to make sure that -
1) the first jump is to greater than 0x201f (0x205f for 8065) for bank 8 or single bank. Or jump must be a loopback jump (not bank 8 )
2) the interrupt vectors must all be valid addresses (all 8 or all 48).
3) there must be at least one 'dummy' interrupt handler.
2 and 3 should also track across banks correctly. Some bins have just a bank swop call or jump to the true handler subr.

It's quite possible there's a bug in this new system too, but for most bins I tested it works better, especially for those with crap data around the edges, and I couldn't get a crash.

Other than that, there were a few small bug fixes which didn't add up to all that much....

Bank 0 ?? the RAM set opcode probably doesn't work quite right, but the main BNK opcode should ...
Last edited by tvrfan on Wed May 15, 2019 3:57 pm, edited 1 time in total.
TVR, kit cars, classic cars. Ex IT geek, development and databases.
https://github.com/tvrfan/EEC-IV-disassembler

motorhead1991
Regular
Posts: 152
Joined: Tue Nov 21, 2017 2:32 am

Re: Why auto disassembly is tough

Post by motorhead1991 » Wed May 15, 2019 3:53 pm

tvrfan wrote:
Wed May 15, 2019 3:16 pm
Guys, got the bin, will have a look.

Confirming -

Neither 3.07 or 3.08 work well for subroutines with variable arguments. In progress to try to get a solution to that. Can do manually now by using the 'args' command to manually set number of data items for each subr call. (but it's a PITA)

Main difference between 07 and 08 was the rewrite of bank detection. 3.07 had several issues with file sizes and was broken by some bins which had leftover bytes at the bank edges. It also couldn't handle a bin with a repeated bank 8 (someone must have made this - it appears to have the same bank code with different tunes set...

For 08, I added some more stringent checks to make sure that SAD was truly at a bank start. This check was to make sure that -
1) the first jump is to greater than 0x201f (0x205f for 8065) for bank 8 or single bank. Or jump must be a loopback jump (not bank 8)
2) the interrupt vectors must all be valid addresses (all 8 or all 48).
3) there must be at least one 'dummy' interrupt handler.
2 and 3 should also track across banks correctly. Some bins have just a bank swop call or jump to the true handler subr.

It's quite possible there's a bug in this new system too, but for most bins I tested it works better, especially for those with crap data around the edges, and I couldn't get a crash.

Other than that, there were a few small bug fixes which didn't add up to all that much....

Bank 0 ?? the RAM set opcode probably doesn't work quite right, but the main BNK opcode should ...
It was this command:

Code: Select all

bank 0 0 dfff
It complained of a fill address out of bounds despite removing the "fill" commands from the dir file.
1990 Ford Ranger FLH2 conversion. Ford forged/dished pistons, Total Seal file-fit rings, Clevite rod and main bearings, Clevite cam bearings, IHI turbo, Siemens Deka 60lb/hr injectors, Ford slot MAF in custom 3" housing. Moates Quarterhorse with Binary Editor, using the PAAD6 database.

OpenEEC Telegram Chat:
Telegram

User avatar
tvrfan
Tuning Addict
Posts: 413
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Wed May 15, 2019 3:55 pm

A note of where I'm up to.
I gave up and went back to 3.08 for a do over !

This wasn't a complete waste of time as I did improve quite a few things on the way, and wrote some new stuff which I have kept. But the various main ideas didn't pan out. The heart of the issue is how to balance a static analysis (which builds a 'tree' structure of blocks from the 'if/then' jumps) with a correct 'recurse' structure for subroutines. Previous SAD had a list of "scan these blocks with these values", so blocks were repeated apart from the values, and subroutines weren't called in order, so had a whole raft of complex stuff to work out when a subroutine had arguments, and had a 'hold' system for callers of each subroutine. This works for fixed args and A9L style variable args, but can't ever handle the CARD like 'if/then' with a jump into the middle of a different subr. This was further complicated by code which jumps from one subroutine into another, with arguments, (e.g. signed & unsigned lookups in later bins) and code which PUSHes a new return address for itself, and so on. I tried taking the jumps as if it was all the same block of code, but I couldn't stop some bins looping around forever, and it was a mess. Plus analysis will have to emulate some subroutines, and that causes loops as well. AARGHH!!!!

I finally had a good idea about how to handle the jumps, which uses one of those new tricks I wrote.
It's actually from "how can you tell when a 'goto x' is a true fixed goto, or just an 'else' ??

I now have a test version which is a lot smaller and seems to work just fine (not for any subr arguments yet) without a whole layer of code, AND can handle multiple jumps between subroutines (22CA), and gets the signed/unsigned lookup names right. So it's smaller and simpler code, without any 'holds' required, which is ALWAYS good. That wasn't remotely where I was going, but hey.....

So now, I'm about to add an 'emulate' piece which will run bits of the actual code itself in a controlled & fake environment. Here's hoping......
Last edited by tvrfan on Wed May 15, 2019 3:59 pm, edited 1 time in total.
TVR, kit cars, classic cars. Ex IT geek, development and databases.
https://github.com/tvrfan/EEC-IV-disassembler

User avatar
tvrfan
Tuning Addict
Posts: 413
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Wed May 15, 2019 3:56 pm

motorhead1991 wrote:
Wed May 15, 2019 3:53 pm

It was this command:

Code: Select all

bank 0 0 dfff
It complained of a fill address out of bounds despite removing the "fill" commands from the dir file.
Ahhh. OK, could be a screw up in the command parser then....
Thanks.
TVR, kit cars, classic cars. Ex IT geek, development and databases.
https://github.com/tvrfan/EEC-IV-disassembler

jsa
Tuning Addict
Posts: 594
Joined: Sat Nov 23, 2013 7:28 pm
Location: 'straya

Re: Why auto disassembly is tough

Post by jsa » Thu May 16, 2019 2:42 am

The post was about a cold turkey 1st run of SAD against a bin, yeah understood 2nd step of creating a dir with args set would improve 3.08 results for the changes you describe.

Pondering Bank8, have any been sighted that;
* Don't start with FF FA
* Don't have Checksum at 0x200A

Is more required than calculate checksum, then look for FF FA xx xx xx xx xx xx xx xx CS UM ?

Great to here of the promising progress.
Cheers

John

95 Escort RS Cosworth - GHAJ0 / ANTI on a COSY box code
Moates QH & BE
ForDiag

User avatar
tvrfan
Tuning Addict
Posts: 413
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Thu May 16, 2019 2:52 pm

jsa wrote:
Thu May 16, 2019 2:42 am
The post was about a cold turkey 1st run of SAD against a bin, yeah understood 2nd step of creating a dir with args set would improve 3.08 results for the changes you describe.

Pondering Bank8, have any been sighted that;
* Don't start with FF FA
* Don't have Checksum at 0x200A

Is more required than calculate checksum, then look for FF FA xx xx xx xx xx xx xx xx CS UM ?

Great to here of the promising progress.
YES. some earlier bins have the FF FA *after* the jump.
my very early AA box for example.

Code: Select all


2000: e7,1d,00            jump  2020             goto 2020;
...
2020: fa                  di                     disable ints;
2021: ff                  nop                    
2022: 01,16               clrw  R16              codeflags = 0;
2024: 11,02               clrb  R2               LSO_Port = 0;
2026: a3,01,50,20,30      ldw   R30,[R0+2050]    R30 = ROMStart;
202b: a3,01,5e,20,32      ldw   R32,[R0+205e]    R32 = ROMEnd;
Last two statements load start and end addresses for the checksum loop...............

CKSUM.
I am not convinced that all the bins use the same checksum addition. Again the early UK ones seem to do a simple word addition from 0x2000 through to end (0x3fff, 5fff or 7fff), but A9L series appears to add separate bytes. I admit though I haven't studied this much.

I went for the jump over the intr vectors, and verifying the vectors themselves seems more 'universal' as it's a defined CPU requirement.
SAD looks for that jump and its destination, it does NOT check for an FF or FA.

A multibank looks like this (can be short or long jump)

bank 8 - start of code (22CA)

Code: Select all


82000: ff                 nop                    
82001: fa                 di                     disable ints;
82002: e7,74,05           jump  82579            goto 82579;
and not bank 8 looks like this -

Code: Select all

02000: ff                 nop                    
02001: fa                 di                     disable ints;
02002: 27,fe              sjmp  02002            goto 02002;
note the loopstop versus real jump, but all banks have valid interrupt vector/pointer lists....................
TVR, kit cars, classic cars. Ex IT geek, development and databases.
https://github.com/tvrfan/EEC-IV-disassembler

User avatar
tvrfan
Tuning Addict
Posts: 413
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Thu May 23, 2019 12:31 am

While testing, I ran into a problem with vects (not same as reported by jsa, but still....) and I discovered yet more CRAZY stuff.

first - I changed vector list detection in 3.08 as it wasn't working with later binaries - so I changed the 'signature' options for a 'detect' type (when SAD arrives at a PUSH/RET combination, it attempts to find a list address.....push [R30, 3456] is obvious where list starts (but some lists start at R30 = 2, not zero...)

I discovered this is BROKEN in 3.08 for the list style where it does an ADD R30, 3456, PUSH [R30] , RET, and worse, you can't override it with a vect command, due to the user command being overridden (which should NEVER happen !!).

and then, I found this little piece of code in 2DBD binary

Code: Select all

8496: a0,7c,0e            ldw   Re,R7c           HSO_Time = R7c;
8499: b1,0f,0d            ldb   Rd,f             HSO_Cmd = f;
849c: c9,a7,84            push  84a7             push(@Sub162);
849f: ad,00,30            ldzbw R30,0            wR30 = 0;
84a2: cb,31,66,84         push  [R30+8466]       push([R30+8466]);
84a6: f0                  ret                    return;
AAARRRGGGHHHHH !!!!! the PAIN !!!! will this crap never stop ? !!!!! :BigGrin:
why the hell make R30 zero and then use it as an OFFSET ??? WTF ?????

8466 gives the clue

Code: Select all

8466: 06,d0               word  d006
8468: 06,c0               word  c006
846a: 06,e0               word  e006
846c: 09,d0               word  d009
846e: 09,c0               word  c009
8470: 09,e0               word  e009
so these are the typical cal console entry addresses..... but it still makes me ask the question WHY ????

So I need to look at vector list detection now......the 'signature' isn't flexible enough, but the detect lets too much through........
TVR, kit cars, classic cars. Ex IT geek, development and databases.
https://github.com/tvrfan/EEC-IV-disassembler

jsa
Tuning Addict
Posts: 594
Joined: Sat Nov 23, 2013 7:28 pm
Location: 'straya

Re: Why auto disassembly is tough

Post by jsa » Thu May 23, 2019 1:30 am

Edit;
I remember picking the eyes out of this previosly, found here
http://www.efidynotuning.com/forum/view ... ole#p33407

Another thought bubble from times past,
viewtopic.php?f=2&t=21159&p=129595&hili ... le#p129595
Have you bins that need something different to find the console routine?
Cheers

John

95 Escort RS Cosworth - GHAJ0 / ANTI on a COSY box code
Moates QH & BE
ForDiag

jsa
Tuning Addict
Posts: 594
Joined: Sat Nov 23, 2013 7:28 pm
Location: 'straya

Re: Why auto disassembly is tough

Post by jsa » Thu May 23, 2019 7:59 am

Are the signatures for subr. just based on an opcode sequence ignoring the arguments?
Cheers

John

95 Escort RS Cosworth - GHAJ0 / ANTI on a COSY box code
Moates QH & BE
ForDiag

User avatar
tvrfan
Tuning Addict
Posts: 413
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Thu May 23, 2019 3:03 pm

No, I designed the 'signature' routine after the unix regular expression (= RE), up to a point, but even that isn't flexible enough.
If you've never used this, the RE is embedded in a wide range of unix/linux edit and search applications, and although a more complex RE looks like gobbledegook, the spec makes for an incredibly powerful, flexible pattern matcher. (have a read on web !) Windows has only a pale shadow of this full functionality.

SAD has a pattern definition of opcode type + operands, where operands can be 'stored' as an index, and that index then used to specify that this value (or register) must occur again elsewhere, plus options for auto increments, indirect etc., ability to save jump addresses, bits, carry, plus sub-patterns (match a defined number of opcodes within a bigger pattern in any order), skippable and optional patterns (skip up to n opcodes up to a defined one) and a few other bits as well. Yes I thought that code wot I wrote was wonderful. And then.....

Even THAT isn't enough, I ended up with FOUR different vector patterns (for background task list) and it still wouldn't match later bins.
So it was getting out of hand.

[ hmmm... perhaps I might consider a sequence of characters just like an RE instead of a 'bit/mask/table' style like now....er....um...but probably still won't work, but easier to write new pattern matches....]

Compared to Function lookup (1d lookup) which has only 1 pattern with a few options and matches everything so far, and 2 table lookup (2D) patterns with a few options which also match everything so far. They DO work well.

Trouble is that there are so many different ways to actually code the same logic.
So I went to the style of checking when a PUSH is encountered, especially with subroutine arguments on the agenda, which goes -

1) PUSH found
2) is there a RET following close by ?
3) is it a) immediate, b) indexed or c) indirect ?
4) different action for each of the types in 3)
3a) if it's a valid address pushed, it's probably a pushed common return address for a task list
3b) indexed, almost certainly a list, as in [start + register] --- except for what I found above !!! ---
3c) indirect could be a list where you can find - add Rx, start addr - push [reg] .
-> but problem here, as quite a lot of later bins use push [Reg+offset] in their data structures (A9L does this too).
5) none of these ? OK it's just a plain old register push.

Plus for subroutine arguments -

6) is it a PUSH of a register that was previously POPped, which itself was a CALL return address ?
7) if 6) is true, then it's a candidate for a subroutine argument extract.

NB - A9L (and others) actually have DATA embedded in some of their test list addresses, so logic has to allow for some skips....

see - simple.....................(why don't smilies work for me ?)
Last edited by tvrfan on Thu May 23, 2019 4:12 pm, edited 3 times in total.
TVR, kit cars, classic cars. Ex IT geek, development and databases.
https://github.com/tvrfan/EEC-IV-disassembler

User avatar
tvrfan
Tuning Addict
Posts: 413
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Thu May 23, 2019 3:15 pm

'Console present'

Yes - quite a few bins have that style of check where 0xd00 (or similar addresses) appear to be a console status flag, and there are various other 'set' addresses around for plug in or special function chips/peripherals. But like the cal console (0xd000 or d006 or d009 or e000 or ....) there are a list of possibles. Some of these are in the Ford handbook, but I haven't bothered to cross ref them all. I might have to some day !
TVR, kit cars, classic cars. Ex IT geek, development and databases.
https://github.com/tvrfan/EEC-IV-disassembler

jsa
Tuning Addict
Posts: 594
Joined: Sat Nov 23, 2013 7:28 pm
Location: 'straya

Re: Why auto disassembly is tough

Post by jsa » Fri May 24, 2019 4:11 pm

I will have a read about RE.

For me, finding that short string with 0xD00 nails an address for working backwards to the subroutine list. When SAD misses that list, it makes for a quick find. 0xD00 seems unique, whereas 0xD000 and 0xE000 have other uses in some bins.

I agree those 'set' addresses need to be leveraged.
Cheers

John

95 Escort RS Cosworth - GHAJ0 / ANTI on a COSY box code
Moates QH & BE
ForDiag

motorhead1991
Regular
Posts: 152
Joined: Tue Nov 21, 2017 2:32 am

Re: Why auto disassembly is tough

Post by motorhead1991 » Wed May 29, 2019 11:37 am

Taking a break from m0m2 since it's useable, so I'm focused on some SD tunes that I've cloned.

One in particular is 8SD, which doesn't seem to follow conventional wisdom with Rbases. Certain subroutines have their own offsets defined which makes for a hell of a time disassembling them. Does SAD have a workaround for this situation?
1990 Ford Ranger FLH2 conversion. Ford forged/dished pistons, Total Seal file-fit rings, Clevite rod and main bearings, Clevite cam bearings, IHI turbo, Siemens Deka 60lb/hr injectors, Ford slot MAF in custom 3" housing. Moates Quarterhorse with Binary Editor, using the PAAD6 database.

OpenEEC Telegram Chat:
Telegram

jsa
Tuning Addict
Posts: 594
Joined: Sat Nov 23, 2013 7:28 pm
Location: 'straya

Re: Why auto disassembly is tough

Post by jsa » Thu May 30, 2019 3:55 pm

motorhead1991 wrote:
Wed May 29, 2019 11:37 am
Certain subroutines have their own offsets defined which makes for a hell of a time disassembling them. Does SAD have a workaround for this situation?
Sad.pdf wrote: rbase 76 4080
rbase 76 8740 2230 2350

Many binaries use defined registers as permanent fixed data 'base' pointers, and then use the index mode of instructions to get at the data. This command allows SAD to decode the index to produce a true absolute address, and add a symbol name, if there is one defined. SAD will normally detect the most commonly coded 'calibration pointers' (typically Rf0 – Rfe) and set these automatically.
Many binaries also set registers as pointers into RAM and KAM. SAD attempts to detect and confirm these too, but is not guaranteed to get them right, as they are only used as a temporary pointer.
From version 3.07 you can also specify an rbase register definition within an address range, say for a single subroutine. When no addresses are specified, the rbase command defaults to the whole binary. This is shown in the second example, where R76 would be redefined between addresses 2230 and 2350, and the default (i.e. first one) would apply everywhere else.
Cheers

John

95 Escort RS Cosworth - GHAJ0 / ANTI on a COSY box code
Moates QH & BE
ForDiag

User avatar
tvrfan
Tuning Addict
Posts: 413
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Thu May 30, 2019 8:47 pm

motorhead1991 wrote:
Wed May 29, 2019 11:37 am
Taking a break from m0m2 since it's useable, so I'm focused on some SD tunes that I've cloned.

One in particular is 8SD, which doesn't seem to follow conventional wisdom with Rbases. Certain subroutines have their own offsets defined which makes for a hell of a time disassembling them. Does SAD have a workaround for this situation?
It's on my list of possibles to do this automatically, but so far still trying to get a reliable way to do variable arguments. It's very frustrating, or I'm too dumb to find the answer... so for now, yes jsa's reply is the right one. Do it manually.

History - I was wondering a while ago whether to track 'data types'.. That is keep track of what each register holds from its last assignment. ie. a pointer, an immediate value (candidate for an rbase), a 'raw' sensor value (A/D voltage), a 'time value', and so on. At that time it was to see if there was a way to auto scale some of the ROM values, and perhaps tables and functions too.

Currently, I haven't bothered to identify the end of a subroutine either (the 'ret' instruction). This would be necessary to track a register as a address limited base. I know that sounds totally nuts, but SAD works on 'blocks' of code where a static jump (or return) ends that block. The jump then makes/calls another block. So SAD doesn't need to know (at present). Those blocks aren't necessarily scanned/checked in their correct order, they just go in a list, and in places where the same code block is shared between subroutines (Yes, it's quite common !) SAD doesn't always rescan that block.

With emulation for variable arguments this is changing, so I will look at this again...............

(edited to correct bad description)
TVR, kit cars, classic cars. Ex IT geek, development and databases.
https://github.com/tvrfan/EEC-IV-disassembler

User avatar
tvrfan
Tuning Addict
Posts: 413
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

PROTOTYPE SAD with variable arguments

Post by tvrfan » Tue Jun 04, 2019 5:54 pm

Jsa - John,

VARIABLE ARGS !!!!!

I think I may finally have cracked a method to get the arguments working, with a part-scan, part emulate approach, which is a 'merge and modify' of a couple of previous attempts. I have attached a CARD listing done with NO COMMANDS by current development SAD, so it's fully auto, and it looks quite good to me so far. Tables and funcs not resolved correctly yet, but the subr naming (funcs and tabs) seems right.

As you are much more familiar with CARD can you please scan it for me and tell me if you spot any code/argument errors ???
(No, not funcs/tabs quite yet, and just raw bytes in the args).

CARD was a tough nut as it calls subroutines in the middle of getting arguments, and jumps between subrs for different arg numbers, and such 'nasty' stuff.

I am very pleased that I have finally got what appears to be a viable approach !!

here is an A9L too, which looks good (to me), but haven't managed to crack its injection table automatically yet, and more readers are familiar with this bin.
Please tell me if you spot code errors.

Debug print of A9L shows no invalid opcodes (a flag for where SAD goes wrong...)
Debug print of CARD shows 2 which I think are incorrect pointers into data (to be double checked ...)

Thanks in advance,

tvrfan (Andy)
Attachments
A9L_lst.txt
(829.74 KiB) Downloaded 15 times
CARD_lst.txt
(879.65 KiB) Downloaded 13 times
TVR, kit cars, classic cars. Ex IT geek, development and databases.
https://github.com/tvrfan/EEC-IV-disassembler

jsa
Tuning Addict
Posts: 594
Joined: Sat Nov 23, 2013 7:28 pm
Location: 'straya

Re: Why auto disassembly is tough

Post by jsa » Tue Jun 04, 2019 6:54 pm

Shall do Andy.
Cheers

John

95 Escort RS Cosworth - GHAJ0 / ANTI on a COSY box code
Moates QH & BE
ForDiag

jsa
Tuning Addict
Posts: 594
Joined: Sat Nov 23, 2013 7:28 pm
Location: 'straya

Re: PROTOTYPE SAD with variable arguments

Post by jsa » Tue Jun 04, 2019 9:45 pm

Thanks for your ongoing efforts.
Straight up, without _dir or _cmt, 3.09beta looks a decent step forward from 3.07 or 3.08.

A quick compare of 1st run 3.09beta to 3.07 with _dir & _cmt

3.09beta

Code: Select all

263e: f8                  ??? 
expected

Code: Select all

263e: f8                  clc                    CY = 0;

3.09beta seems to have lost its way here, until 2f20

Code: Select all

2ee8: ef,6e,15            call  4459             Sub77(2c,f0,e,1,0,0);
2eeb: 2c,f0,0e,01,00,00   #args  

2ef1: ef                  ???   

2ef2: 65,15,0e,01         ad2w  R1,e15           R1 += e15;
I think it should be another call to 4459 taking another 6 bytes worth of args

Code: Select all

2ef1: ef,65,15            call  4459             

3.09beta, I think it should takes 4 bytes worth of args here and for subsequent occurences

Code: Select all

2f2b: ef,28,15            call  4456             Sub76();

Looking at lines 38bf and 3904, requires a more thorough review; 8 VS 10 bytes of args. Could be the dreaded, two solutions look equally valid at first glance, trick! The 4 letter word, variable args, comes to mind.


3.09beta

Code: Select all

39e4: ef,72,0a            call  4459             Sub77();
39e7: 54,f1,da,01         ad3b  R1,Rda,Rf1       R1 = Rda + Rf1;
At some point in history, I convinced myself of;

Code: Select all

39e4: ef,72,0a            call  4459             Sub4459(54,f1,da,01,3c,04);
39e7: 54,f1,da,01,3c,04   #args                                                    # 54,F1 xF1 has B7 set 1 neither byte is
                                                                                   #       processed by Sub4459->Sub449F
                                                                                   # DA,01
                                                                                   #       processed by Sub4459->Sub449F
                                                                                   # 
                                                                                   # 3C,04 used by Sub40F8->Sub4102
                                                                                   #       3C+17c=[1B8] 3c+Rfe=[DB90]
                                                                                   

tvrfan wrote:
Tue Jun 04, 2019 5:54 pm
I think I may finally have cracked a method to get the arguments working, with a part-scan, part emulate approach, which is a 'merge and modify' of a couple of previous attempts. I have attached a CARD listing done with NO COMMANDS by current development SAD, so it's fully auto, and it looks quite good to me so far. Tables and funcs not resolved correctly yet, but the subr naming (funcs and tabs) seems right.
The code that has been discovered looks pretty good to me generally. Will need to take some more time to checkout lookups. Some code has not been auto discovered, but that may improve with ARG resolution.

As you are much more familiar with CARD can you please scan it for me and tell me if you spot any code/argument errors ???
(No, not funcs/tabs quite yet, and just raw bytes in the args).
See above.

CARD was a tough nut as it calls subroutines in the middle of getting arguments, and jumps between subrs for different arg numbers, and such 'nasty' stuff.
Indeed, I may still have work to do on my interpretation of CARD args. :x

I am very pleased that I have finally got what appears to be a viable approach !!
Yes it does look promising.

here is an A9L too, which looks good (to me), but haven't managed to crack its injection table automatically yet, and more readers are familiar with this bin.
Please tell me if you spot code errors.
Out of time to look at A9L today.
Comparing a _lst from _dir & _cmt to 1st run, is pretty messy, so more would be obvious once 3.09 is run similarly.

Debug print of A9L shows no invalid opcodes (a flag for where SAD goes wrong...)
Debug print of CARD shows 2 which I think are incorrect pointers into data (to be double checked ...)
Yeah, !inv count is a good success score until multiple solutions get rid of them. I may have used that metric to rough out some code before. :roll:
Are you tracking arg counts for each subr?
Are you looking at ??? counts?
Have I noted the "2" above?
Cheers

John

95 Escort RS Cosworth - GHAJ0 / ANTI on a COSY box code
Moates QH & BE
ForDiag

User avatar
tvrfan
Tuning Addict
Posts: 413
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Wed Jun 05, 2019 12:12 am

Thanks John, that's fantastic - will check out your issues spotted. Possibly some new code trick I didn't see.

Often, one example of a faulty args decode, will apply to all those subr calls, so improvement can be quite large from one fix. I hope.

I see that some of the 4459 calls show 6 args correctly, but some have no args.. and 4456 (4 bytes) is the same..
Hmmm.....OK.

At the moment, the new code just produces lots of "args <start> <end> " commands.

The func lookups in CARD do an in-line address decode trick (A9L calls another subr to get these encoded pars),
and SAD isn't handling those inline yet... it was done via a 'signature/fingerprint' which didn't always work, but that isn't necessary with an emulation, but need to catch the address value AFTER the decode but BEFORE the true lookup starts....

Early days, but FAR better than my previous attempts. Tried on all my test bins, and SAD 309 doesn't crash on any of them [but doesn't mean they are right !]

Not working for multi banks yet (need to sort out the LDX [R20 + x] style of getting at the stack, along with the pushp 'push psw' instruction which they use to track which bank the call came from)

Sorting args FIRST, all the rest later !!
TVR, kit cars, classic cars. Ex IT geek, development and databases.
https://github.com/tvrfan/EEC-IV-disassembler

User avatar
tvrfan
Tuning Addict
Posts: 413
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Wed Jun 05, 2019 3:23 pm

John, I found the bug with the 4456 and 4459 args. I think that's all working right now.

Here is a new listing in case it's any use to you. Code should be right (I hope!), but data isn't !

I see there are still some blocks which look like code but are not decoded, along with some obvious all-data structs
A9L has a few of (both of) these too.
In A9L, the code blocks are subroutines called from a data structure which is not decoded (e.g. the main injection table).
In CARD though, there seems to be too many for this, so that's something I will investigate more.
Could be several data structs I guess.

I have some code for the idea of a 'gap checker' for code blocks, which I might put in for these blocks...

Next - multibank args. This needs to capture the LDX and STX instructions which access the stack directly, and work out how to
correctly handle the 'pushp' entries (which push the psw, so that the caller's bank can be determined).

More Later ........
Attachments
CARD_lst.txt
(884.62 KiB) Downloaded 14 times
TVR, kit cars, classic cars. Ex IT geek, development and databases.
https://github.com/tvrfan/EEC-IV-disassembler

jsa
Tuning Addict
Posts: 594
Joined: Sat Nov 23, 2013 7:28 pm
Location: 'straya

Re: Why auto disassembly is tough

Post by jsa » Wed Jun 05, 2019 3:52 pm

Thanks.

I have mentioned previously how all the blocks of code are overlooked and how fixing one detail snowballed a pile of code. I will try to find it.

The start of many structues are found.
Cheers

John

95 Escort RS Cosworth - GHAJ0 / ANTI on a COSY box code
Moates QH & BE
ForDiag

User avatar
tvrfan
Tuning Addict
Posts: 413
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Wed Jun 05, 2019 4:16 pm

jsa wrote:
Wed Jun 05, 2019 3:52 pm
Thanks.
I have mentioned previously how all the blocks of code are overlooked and how fixing one detail snowballed a pile of code. I will try to find it.
The start of many structues are found.
Good that it finds the start points at least !!
I have been playing with the idea of some kind of 'data pattern' code (in my brain), which would 'score' an undefined lump of bytes to see if it has a regular structure. I think that's an idea worth some time and experiment... Tables are too hard and too irregular, but funcs and A9L injection table (and therefore many others) would probably work. May be able to find code on the web as a basis, this kind of stuff done for pattern analysis?

so a 'code gap' and a 'data pattern' analyser phase is a likely experiment soon.
TVR, kit cars, classic cars. Ex IT geek, development and databases.
https://github.com/tvrfan/EEC-IV-disassembler

jsa
Tuning Addict
Posts: 594
Joined: Sat Nov 23, 2013 7:28 pm
Location: 'straya

Re: Why auto disassembly is tough

Post by jsa » Wed Jun 05, 2019 7:55 pm

Code: Select all

4502: a1,10,45,34         ldw   R34,4510         R34 = 4510;        <<<<<<<<<<<<<<<<<<<<<<<<
4506: 26,8c               sjmp  4394             goto 4394;

   Sub79:
4508: 3d,e0,02            jb    B5,Re0,450d      if (B5_Re0 = 0)  {
450b: 29,1c               scall 4629             Sub84(); }
450d: c8,34               push  R34              push(R34);         <<<<<<<<<<<<<<<<<<<<<<<<
450f: f0                  ret                    return;

4510: 9b,70,99,00,df,07,28,87,52,05,02,e7,f8,37,df,07 ???           <<<<<<<<<<<<<<<<<<<<<<<<
4520: 28,7d,51,05,80,e6,f6,29,1a,65,05,04,00,fe,28,6f ???           Subr is missed
4530: 56,05,80,46,00,88,30,00,df,0e,a3,31,50,03,3a,89 ??? 
4540: 56,05,3a,d7,03,91,01,92,3c,df,07,28,f6,58,05,02 ??? 
4550: 00,fc,28,ef,65,05,04,00,fe,9b,f3,28,03,00,df,07 ??? 
4560: 28,59,61,05,e0,02,06,91,20,e0,36,df,12,c7,6c,91 ??? 
4570: 00,28,2c,64,05,40,de,02,ef,65,04,9c,f0,28,aa,3d ??? 
4580: df,02,20,10,c7,6c,91,00,28,15,63,05,20,de,fa,ef ??? 
4590: 4e,04,9d,f0,71,df,e0,28,90,91,10,d7,e7,8f,06 ??? 
Snowballs because

Code: Select all

4552: 28,ef               scall 4643             SUB4643(65,5,4,0,fe);
.
.
.
.
   SUB4643:
4643: 2f,8d               scall 45d2             Sub45d2();
4645: 28,08               scall 464f             Sub464F();
4647: 2f,be               scall 4607             Sub4607();
4649: 2f,a3               scall 45ee             Sub45ee();
464b: 28,02               scall 464f             Sub464F();
464d: 27,69               sjmp  45b8             goto 45b8 ;
Load a word, push it later and return is a common theme for overlooked code.
Maybe scan any for code any time a load push return sequence is encountered.
Last edited by jsa on Wed Jun 05, 2019 9:00 pm, edited 1 time in total.
Cheers

John

95 Escort RS Cosworth - GHAJ0 / ANTI on a COSY box code
Moates QH & BE
ForDiag

jsa
Tuning Addict
Posts: 594
Joined: Sat Nov 23, 2013 7:28 pm
Location: 'straya

Re: Why auto disassembly is tough

Post by jsa » Wed Jun 05, 2019 8:19 pm

slydog ad2w

Code: Select all

68a2: 65,00,54,8e         ad2w  R8e,5400         R8e += 5400;
68a6: ca,8e               push  [R8e]            push([R8e]); }
68a8: f0                  ret                    return; }
and so it goes

Code: Select all

5400: f4,54               vect  54f4             Sub54F4
5402: d7,55               vect  55d7             Sub55d7
Effectively load with an offset push return
Cheers

John

95 Escort RS Cosworth - GHAJ0 / ANTI on a COSY box code
Moates QH & BE
ForDiag

User avatar
tvrfan
Tuning Addict
Posts: 413
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Wed Jun 05, 2019 10:18 pm

jsa wrote:
Wed Jun 05, 2019 7:55 pm
<code snipped>
Load a word, push it later and return is a common theme for overlooked code.
Maybe scan any for code any time a load push return sequence is encountered.
Yes, A9L does that, and I've been stuck on how to handle it properly. The problem is that a register is loaded with an address in a code block
which has no link to the block which does the PUSH[reg], RET to actually call it. If it's a pointer to a vector list, then the list won't get detected....

I do look for PUSH[Rx + 4234] style anywhere, where 4234 (or 4236) would be the start of the vector list, but something like
LDW R30, 4234; ADD R30, R32; GOTO x; <change scan blocks> ... PUSH [R30]; RET; will sneak through at the moment. Hence the 'code gap detector' idea.

That second example SHOULD ALREADY WORK !! so it's a SAD bug for me to find.

Now that I've got the part emulation to work, I may be able to deal with first case.

The current solution is to flag a block/subr as Emulate Required, finish the standard 'tree' scan, and then do a limited emulation run on that block (with fake stack build and some safety stuff). At the moment, only flagging block for subr arguments, but can see other things setting the flag too.

Thanks again for spotting that.

Andy.
TVR, kit cars, classic cars. Ex IT geek, development and databases.
https://github.com/tvrfan/EEC-IV-disassembler

jsa
Tuning Addict
Posts: 594
Joined: Sat Nov 23, 2013 7:28 pm
Location: 'straya

Re: Why auto disassembly is tough

Post by jsa » Thu Jun 06, 2019 2:15 am

Yes, you have mentioned code block separation in the previous exchanges on this code segment.

I see 2 options;
1. Follow the code wherever it goes until it gets back to push then return. Emulation spiral of death.
Or
2. Go fuzzy. Big picture view is that code branching off then coming back is the norm in these boxes. So take a proximity view of load to push and return on a line distance basis. Most of these loads seem to be immediate addressing. Check to see if an immeadiate loaded register is pushed nearby. So if those meet some thresholds, just run the scan command and accept valid code if it does not contradict other stuff. Other immediate loads seem to be constants for conversions which are not pushed.

Hopefully we don't revisit this in another 6 months!
Cheers

John

95 Escort RS Cosworth - GHAJ0 / ANTI on a COSY box code
Moates QH & BE
ForDiag

User avatar
tvrfan
Tuning Addict
Posts: 413
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Thu Jun 06, 2019 2:24 pm

Thanks John,

Methods
1) already thought about possible endless loop, so emulation has an 'opcodes executed' maximum, currently at 1000 (may change this)

2) Interesting idea. I admit I had not thought that way at all. I already have a 'called by' chain, which is necessary for subroutines (and arguments), and was holding to the idea that extending that to jumps might work, but the 'nearby' method is a good and simple idea.....

Andy.
TVR, kit cars, classic cars. Ex IT geek, development and databases.
https://github.com/tvrfan/EEC-IV-disassembler

jsa
Tuning Addict
Posts: 594
Joined: Sat Nov 23, 2013 7:28 pm
Location: 'straya

Re: Why auto disassembly is tough

Post by jsa » Fri Jun 07, 2019 1:43 am

If proximity proves to have some merit, I think exposing the proximity threshold setting the the end user would be a good thing.
Cheers

John

95 Escort RS Cosworth - GHAJ0 / ANTI on a COSY box code
Moates QH & BE
ForDiag

User avatar
tvrfan
Tuning Addict
Posts: 413
Joined: Sat May 14, 2011 11:41 pm
Location: New Zealand

Re: Why auto disassembly is tough

Post by tvrfan » Tue Jun 11, 2019 7:07 pm

John,

Just discovered a bug in the way code blocks are queued for scanning, which may also explain some of the undecoded sections in CARD.
Effectively this bug means some block scan combos are lost entirely.
Found it when working on the multibank argument emulate/decode. So I'll fix that first and recheck ....

Info - Multibanks tend to use LDX, Rx [STACK+n] instead of POP, and STX [STACK +N, Rx] instead of PUSH. (STACK = R20) This in itself isn't too bad, but they also do PUSHP in the middle to store/save the bank which the call came from (and so where the arguments are), which makes it trickier to make sure a PUSHP doesn't get interpreted as an address return (as it gets loaded with a LDX and then LDX, R11, ... for arg bank select), as it isn't always done in exactly the same way, and at least one of my test bins actually uses both POPS and LDX [stack+n]. Haven't got to that one yet...
Here's an example from 22CA which gets 4 bytes from caller.

Code: Select all

   Sub257:
86e30: a3,20,02,3a        ldw   R3a,[R20+2]      R3a = [StackPtr+2];     # this matches a pushp (push psw)
86e34: a3,20,04,3e        ldw   R3e,[R20+4]      R3e = [StackPtr+4];     # this is subr return address
86e38: f2                 pushp                  push(PSW);             # this to restore psw after args
86e39: fa                 di                     disable ints;
86e3a: 18,02,3b           shrb  R3b,2            R3b >>= 2;
86e3d: b0,3b,11           ldb   R11,R3b          BANK_Select = R3b;      # set data bank (from bits 10-13 of psw word)
86e40: b2,3f,3a           ldb   R3a,[R3e++]      R3a = [R3e++];
86e43: b2,3f,3b           ldb   R3b,[R3e++]      R3b = [R3e++];
86e46: b2,3f,3c           ldb   R3c,[R3e++]      R3c = [R3e++];
86e49: b2,3f,3d           ldb   R3d,[R3e++]      R3d = [R3e++];            # get pars from 'new' bank
86e4c: b1,11,11           ldb   R11,11           BANK_Select = 11;        # reset data bank to 1 (and stack's bank to 1)
86e4f: f3                 popp                   PSW = pop();              # restore psw (and will enable ints)
86e50: c3,20,04,3e        stw   R3e,[R20+4]      [StackPtr+4] = R3e;     # update return address (+4)
86e54: ac,3b,3e           ldzbw R3e,R3b          wR3e = yR3b;
86e57: 71,1f,3b           an2b  R3b,1f           R3b &= 1f;
86e5a: 08,04,3e           shrw  R3e,4            R3e >>= 4;
86e5d: 71,0e,3e           an2b  R3e,e            R3e &= e;
86e60: 67,3f,f0,00,3a     ad2w  R3a,[R3e+f0]     R3a += [R3e+f0];      # do address decode
86e65: f0                 ret                    return;
TVR, kit cars, classic cars. Ex IT geek, development and databases.
https://github.com/tvrfan/EEC-IV-disassembler

Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests