80x86 32-bit Disassembler and Assembler

Legal part
Introduction
Brief description of functions
Assemble
Checkcondition
Decodeaddress
Disasm
Disassembleback
Disassembleforward
Isfilling
Printfloat* functions

Download source



Legal part

This package includes source code of 32-bit Disassembler and 32-bit single line Assembler for 80x86-compatible processors. The source is a slightly stripped version of code used in OllyDbg v1.04 and is well proven by its numerous users. (If you haven't heard before, OllyDbg is a 32-bit Assembler level debugger with powerful analyzing capabilities that makes binary machine code understandable).

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License (http://www.fsf.org/copyleft/gpl.html) for more details.

You should have received a copy of the GNU General Public License (gpl.txt) along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA.

All brand names and product names used in 80x86 Assembler and Disassembler, accompanying files or in this help file are trademarks, registered trademarks, or trade names of their respective holders.
 



Introduction

Disassembler understands all standard 80x86 commands, FPU, MMX, AMD's MMX extensions, Athlon/PIII MMX extensions and 3DNow! instructions. It does not decode SSI or SSI2 commands. Disassembler assumes 32 bit code and data segments but correctly decodes prefixed 16-bit commands. Several decoding modes allow you to select the amount of returned information (which is inversely proportional to execution speed): command length only, basic information useful for code analysis, or full decoding with dump and assembler form. Multiple options select desired format. Disassembler and Assembler support both MASM and Borland's IDEAL modes.

Assembler converts single command from the ASCII form to the binary code. It allows to find several possible encodings, or even to create search patterns with undefined operands.

This package includes following files:

  • disasm.h - common definitions
  • disasm.c - Disassembler
  • assembl.c - Assembler
  • asmserv.c - table of commands and service functions
  • main.c - demo program
Total source size exceeds 3800 lines of dense text (more than 190 K!). I have used Borland C and do not guarantee that it will work with any other compiler. Please set the default character type to unsigned! Please also place the following statements into the main file of your program, and do not #define MAINPROG in any other file:

    #define MAINPROG // Place all unique variables here
    #include "disasm.h"

(I use this trick to define shared global variables). Below is a small piece of code disassembled with OllyDbg 1.04 using different text settings:
 

004505B3  A1 DC464B00         MOV EAX,DS:[4B46DC]
004505B8  8B0498              MOV EAX,DS:[EAX+EBX*4]
004505BB  50                  PUSH EAX
004505BC  8D85 E0FBFFFF       LEA EAX,SS:[EBP-420]
004505C2  50                  PUSH EAX
004505C3  E8 141BFCFF         CALL 004120DC 
004505C8  83C4 08             ADD ESP,8
004505CB  43                  INC EBX
004505CC  3B1D D8464B00       CMP EBX,DS:[4B46D8]
004505D2  0F8C AFFEFFFF       JL 00450487 
004505D8  80BD E0FDFFFF 00    CMP BYTE PTR SS:[EBP-220],0
004505DF  75 14               JNZ SHORT 004505F5
004505E1  68 B39E4600         PUSH 469EB3 
004505E6  8D85 E0FDFFFF       LEA EAX,SS:[EBP-220]
004505EC  50                  PUSH EAX
004505ED  E8 521BFCFF         CALL 00412144 

 
004505B3  A1 DC464B00         mov     eax,[dword ds:4B46DC]
004505B8  8B0498              mov     eax,[dword ds:eax+ebx*4]
004505BB  50                  push    eax
004505BC  8D85 E0FBFFFF       lea     eax,[dword ss:ebp-420]
004505C2  50                  push    eax
004505C3  E8 141BFCFF         call    004120DC
004505C8  83C4 08             add     esp,8
004505CB  43                  inc     ebx
004505CC  3B1D D8464B00       cmp     ebx,[dword ds:4B46D8]
004505D2  0F8C AFFEFFFF       jl      00450487
004505D8  80BD E0FDFFFF 00    cmp     [byte ss:ebp-220],0
004505DF  75 14               jnz     short 004505F5
004505E1  68 B39E4600         push    469EB3
004505E6  8D85 E0FDFFFF       lea     eax,[dword ss:ebp-220]
004505EC  50                  push    eax
004505ED  E8 521BFCFF         call    00412144



Brief description of functions
  • int Assemble(char *cmd,ulong ip,t_asmmodel *model,int attempt,int constsize,char *errtext) - assembles text command to binary code;
  • int Checkcondition(int code,ulong flags) - checks whether flags met condition in the command;
  • int Decodeaddress(ulong addr,ulong base,int addrmode,char *symb,int nsymb,char *comment) - user-supplied function that decodes addresses into symbolic names;
  • ulong Disasm(char *src,ulong srcsize,ulong srcip,t_disasm *disasm,int disasmmode) - determines length of the binary command or disassembles it to the text;
  • ulong Disassembleback(char *block,ulong base,ulong size,ulong ip,int n) - walks binary code backward;
  • ulong Disassembleforward(char *block,ulong base,ulong size,ulong ip,int n) - walks binary code forward;
  • int Isfilling(ulong addr,char *data,ulong size,ulong align) - determines whether command is equivalent to NOP;
  • int Print3dnow(char *s,char *f) - converts 3DNow! constant to text without triggering FPU exception for invalid operands;
  • int Printfloat10(char *s,long double ext) - converts 10-byte floating constant to text without causing exception;
  • int Printfloat4(char *s,float f) - converts 4-byte floating constant to text without causing exception;
  • int Printfloat8(char *s,double d) - converts 8-byte floating constant to text without causing exception.


Assemble

Function Assemble(), as expected, converts command from ASCII form to binary 32 bit code. It shares command table with Disasm(), so if some command can be disassembled, it can be assembled back too, with one exception: Assemble doesn't support 16 bit addresses. With some unimportant exceptions, 16 bit addresses cannot be used in Win32 programs.

Some commands have more than one encoding. Assemble() allows you to find them all. This is important, for example, if you want to find the shortest possible code or to find all possible occurrences of this command in the code. There are two parameters, constsize and attempt. First parameter selects size of immediate constant and address constant (8 or 32 bits), second is the occurrence of the command in the command table. To find all variants, call Assemble() with attempt=0,1,2... and for each attempt with constsize=0,1,2,3 as long as function reports success for at least one constsize. Generated codes may repeat. Please note that if command uses memory addresses, only one form will be generated in each case: [EAX*2] but not [EAX+EAX]; [EBX+EAX] but not [EAX+EBX]; [EAX] will not use SIB byte; no DS: prefix and so on.

Assemble compiles also imprecise commands that include following generalized operands:

  • R8 - any 8-bit register (stays for AL, BL, CL, DL, AH, BH, CH, DH)
  • R16 - any 16 bit register (AX, BX, CX, DX, SP, BP, SI, DI)
  • R32 - any 32 bit register (EAX, EBX, ECX, EDX, ESP, EBP, ESI, EDI)
  • FPU - any FPU register (ST0..ST7)
  • MMX - any MMX register (MM0..MM7)
  • CRX - any control register (CR0..CR7)
  • DRX - any debug register (DR0..DR7)
  • CONST - any constant
This allows to generate imprecise search patterns, where mask contains zero bits at the positions occupied by imprecise operands in binary code. For example, patterns generated for command MOV R32,CONST will match both MOV EAX,1 and MOV ECX,12345678h.

Function returns number of bytes in assembled code or non-positive (zero or negative) number in case of error or when variant selected by combination of attempt and constsize doesn't exist. This number is the negative position of error in the input command. If you generate executable code, imprecise commands are usually not allowed. To assure that command is precise, check that all significant bytes in mask contain 0xFF.

int Assemble(char *cmd,ulong ip,t_asmmodel *model,int attempt,int constsize,char *errtext);

Parameters:

  • cmd - pointer to zero terminated ASCII command;
  • pi - address of the first byte of generated binary command in memory;
  • model - pointer to the structure that receives machine code and mask, see detailed description below;
  • attempt - index of alternative encoding of the command. Call Assemble with attempt=0,1,2... to obtain all possible versions of the command. Stop this sequence when Assemble reports error;
  • constsize - requested size of address constant and immediate data. Call Assemble with constsize=0,1,2,3 to obtain all possible encodings of the version selected by attempt;
  • errtext - pointer to text buffer of length at least TEXTLEN bytes that receives description of detected error.
t_asmmodel: structure that receives assembled code.

typedef struct t_asmmodel {    // Model to search for assembler command
    char code[MAXCMDSIZE];     // Binary code
    char mask[MAXCMDSIZE];     // Mask for binary code (0: bit ignored)
    int length;                // Length of code, bytes (0: empty)
    int jmpsize;               // Offset size if relative jump
    int jmpoffset;             // Offset relative to IP
    int jmppos;                // Position of jump offset in command
} t_asmmodel;

Members:

  • code - binary code of the command. Only bits that have 1's in corresponding mask bits are significant;
  • mask - comparison mask. Search routine ignores all code bits where mask is set to 0;
  • length - length of code and mask, bytes. If length is 0, search model is empty or invalid;
  • jmpsize - if nonzero, command is a relative jump and jmpsize is a size of offset in bytes;
  • jmpoffset - if jmpsize is nonzero, jump offset relative to address of the following command, otherwise undefined;
  • jmppos - if jmpsize is nonzero, position of the first byte of the offset in code, otherwise undefined.


Checkcondition

Checks whether 80x86 flags meet condition code in the command. Returns 1 if condition is met and 0 if not.

int Checkcondition(int code,ulong flags);

Parameters:

  • code - byte of command that contains condition code;
  • flags - contents of register EFL.


Decodeaddress

Custom user-supplied function that converts constant (address) into symbolic name. Initially, source code includes dummy function that returns 0.

Decodeaddress() decodes memory address or constant to the ASCII string and optionally comments this address. Returns length of decoded string (not including terminal 0), or 0 on error or if symbolic name is not available.

int Decodeaddress(ulong addr,char *symb,int nsymb,char *comment);

Parameters:

  • addr - address to decode in address space of debugged program;
  • symb - pointer to buffer of length at least nsymb bytes where Decodeaddress() places decoded string;
  • nsymb - length, in characters, of buffer symb;
  • comment - pointer to string of length at least TEXTLEN bytes or NULL, receives comment associated with addr.


Disasm

The most important (and complex) function in this package. Depending on the specified disasmmode, Disasm() performs one of the four functions:

  • DISASM_SIZE - quickly determines size of the command. Use this mode if you want to walk through the code. In this mode, treat all members of disasm as undefined;
  • DISASM_DATA - determines size and analyses operands. Use this mode for quick analysis, for example, if you need to calculate jump destination. Members of disasm marked with asterisk (*) are undefined;
  • DISASM_FILE - determines size, analyses operand and disassembles command, but doesn't attempt to convert addresses to symbols. Use this mode if there is no correspondence between addresses and symbols, for example, if you dump the contents of binary file;
  • DISASM_CODE - full disassembly.
Function returns size of disassembled command. There are several global constants that influence the behavior of this function. They are described later in this section. All symbolic constants are described in file disasm.h.

ulong Disasm(char *src,ulong srcsize,ulong srcip,t_disasm *disasm,int disasmmode);

Parameters:

  • src - pointer to binary code that must be disassembled;
  • srcsize - size of src. Length of 80x86 command is limited to MAXCMDSIZE bytes;
  • srcip - address of the command;
  • disasm - pointer to structure that receives results of disassembling, see detailed description below;
  • disasmmode - disassembly mode, one of DISASM_xxx (see above).
t_disasm:

typedef struct t_disasm {     // Results of disassembling
    ulong pi;                 // Instruction pointer
    char dump[TEXTLEN];       // (*) Hexadecimal dump of the command
    char result[TEXTLEN];     // (*) Disassembled command
    char comment[TEXTLEN];    // (*) Brief comment
    int cmdtype;              // One of C_xxx
    int memtype;              // Type of addressed variable in memory
    int nprefix;              // Number of prefixes
    int indexed;              // Address contains register(s)
    ulong jmpconst;           // Constant jump address
    ulong jmptable;           // Possible address of switch table
    ulong adrconst;           // Constant part of address
    ulong immconst;           // Immediate constant
    int zeroconst;            // Whether contains zero constant
    int fixupoffset;          // Possible offset of 32 bit fixups
    int fixupsize;            // Possible total size of fixups or 0
    int error;                // Error while disassembling command
    int warnings;             // Combination of DAW_xxx
} t_disasm;

Members:

  • pi - address of the disassembled command;
  • dump - ASCII string, formatted hexadecimal dump of the command;
  • result - ASCII string, disassembled command itself;
  • comment - ASCII string, brief comment that applies to the whole command;
  • cmdtype - type of the disassembled command, one of C_xxx possibly ORed with C_RARE to indicate that command is seldom in ordinary Win32 applications. Commands of type C_MMX additionally contain size of MMX data in the 3 least significant bits (0 means 8-byte operands). Non-MMX commands may have C_EXPL bit set which means that some memory operand has size which is not conform with standard 80x86 rules;
  • memtype - type of memory operand, one of DEC_xxx, or DEC_UNKNOWN if operand is non-standard or command does not access memory;
  • nprefix - number of prefixes that this command contains;
  • indexed - if memory address contains index register, set to scale, otherwise 0;
  • jmpconst - address of jump destination if this address is a constant, and 0 otherwise;
  • jmptable - if indirect jump can be interpreted as switch, base address of switch table and 0 otherwise;
  • adrconst - constant part of memory address;
  • immconst - immediate constant or 0 if command contains no immediate constant. The only command that contains two immediate constants is ENTER. Disasm() ignores second constant which is anyway 0 in most cases;
  • zeroconst - nonzero if command contains immediate zero constant;
  • fixupoffset - possible start of 32 bit fixup within the command, or 0 if command can't contain fixups;
  • fixupsize - possible total size of fixups (0, 4 or 8). If command contains both immediate constant and immediate address, they are always adjacent on 80x86 processors;
  • error - Disasm() was unable to disassemble command (for example, command does not exist or crosses end of memory block), one of DAE_xxx;
  • warnings - command is suspicious or meaningless (for example, far jump or MOV EAX,EAX preceded with segment prefix), combination of DAW_xxx bits;
Global flags that influence text of disassembled command:
  • ideal - force IDEAL decoding mode
  • lowercase - force lowercase
  • tabarguments - insert tab between mnemonic and arguments
  • extraspace - insert extra space between arguments
  • putdefseg - show default segments
  • showmemsize - always show memory size
  • shownear - show NEAR modifiers
  • shortstringcmds - use short form of string commands
  • sizesens - mode of decoding of size-sensitive mnemonics (16/32 bits) like:
                0 - PUSHA/PUSHAD
                1 - PUSHAW/PUSHAD
                2 - PUSHAW/PUSHA
  • symbolic - show symbolic addresses, requires Decodeaddress()
Global flags that warn of potentially invalid commands:
  • farcalls - accept far calls, returns & addresses
  • decodevxd - decode VxD calls (Win95/98)
  • privileged - accept privileged commands
  • iocommand - accept I/O commands
  • badshift - accept shift out of range 1..31
  • extraprefix - accept superfluous prefixes
  • lockedbus - accept LOCK prefixes
  • stackalign - accept unaligned stack operations
  • iswindowsnt - when checking for dangerous commands, assume NT-based OS
If Disasm() encounters potentially invalid command and corresponding flag is 0, it sets bit in disasm->warning and places warning message in disasm->comment.



Disassembleback

Calculates address of assembler instruction that is n instructions (maximally 127) back from the instruction at specified pi. Returns address of found instruction. In case of error, it may be less than n instructions apart.

80x86 commands have variable length. Disassembleback uses heuristical methods to separate commands and in some (astoundingly rare!) cases may return invalid answer.

ulong Disassembleback(char *block,ulong base,ulong size,ulong ip,int n);

Parameters:

  • block - pointer to the copy of code;
  • base - address of first byte in the code block;
  • size - size of code block;
  • pi - address of current instruction;
  • n - number of instructions to walk back.


Disassembleforward

Calculates address of assembler instruction that is n instructions forward from instruction at specified address. Returns address of found instruction. In case of error, it may be less than n instructions apart.

ulong Disassembleforward(char *block,ulong base,ulong size,ulong ip,int n,int usedec);

Parameters:

  • block - pointer to the copy of code;
  • base - address of first byte in the code block;
  • size - size of code block;
  • pi - address of current instruction;
  • n - number of instructions to walk forward.


Isfilling

Function determines whether pointed instruction is a no-action command (equivalent to NOP) used by different compilers to fill the gap between procedures or data blocks to a specified aligned border. Returns length of filling command in bytes or 0 if command is not a recognized filling.

int Isfilling(ulong addr,char *data,ulong size,ulong align);

Parameters:

  • addr - address of the first byte of analyzed command;
  • data - pointer to the binary command;
  • size - size of data;
  • align - assumed alignment of the next non-filling command (power of 2), or 0 if alignment is not required.


Printfloat* functions

These functions decode 4-, 8-, 10-byte floating point number or 8-byte 3DNow! operand into the text form to string s. They correctly decode all cases of NANs or INFs without triggering floating point exceptions. If operand is not a valid floating point number, functions print hexadecimal dump of the number. Return length of decoded string in bytes, not including terminal 0.

int Print3dnow(char *s,char *f);
int Printfloat10(char *s,long double ext);
int Printfloat4(char *s,float f);
int Printfloat8(char *s,double d);
 
 

Copyleft (C) 2001 Oleh Yuschuk