Assembly, Disassembly and Emulation using Python

Learn how to use Keystone engine, Capstone engine and Unicorn engine to assemble, disassemble and emulate machine code (ARM, x86-64 and more) in Python
  · 7 min read · Updated nov 2020 · General Python Topics


Processors execute Assembly code, which is a low-level programming language that makes direct use of registers and memory, inside a native executable. Assembly code is stored in its assembled form, as binary data, there are processor manuals that specify how each instruction can be encoded into bytes of data.

Disassembly is the reverse process of assembly, bytes of data are parsed and translated into assembly instructions (which are more readable to users).

Different processor architectures can have different instruction sets, and one processor can only execute assembly instructions in its own instruction set, in order to run code meant for different architectures, we need to use an emulator, which is a program that translates code for the unsupported architecture into code that can run on the host system.

There are many scenarios where assembling, disassembling or emulating code for different architectures can be useful, one of the main interests is for learning (most universities teach MIPS assembly) for running and testing programs written for different devices like routers (fuzzing, etc.), and for reverse engineering.

In this tutorial, we will assemble, disassemble and emulate assembly code written for ARM using Keystone engine, Capstone engine and Unicorn engine, which are frameworks that offer convenient Python bindings to manipulate Assembly code, they support different architectures (x86, ARM, MIPS, SPARC and more) and they have native support for major operating systems (including Linux, Windows and MacOS).

First, let's install these three frameworks:

pip3 install keystone-engine capstone unicorn

For the demonstation of this tutorial, we will take a factorial function implemented in ARM assembly, assemble the code, and emulate it.

We will also disassemble an x86 function (to show how multiple architectures can easily be handled).

Assembling ARM

We start by importing what we gonna need for the ARM assembly:

# We need to emulate ARM
from unicorn import Uc, UC_ARCH_ARM, UC_MODE_ARM, UcError
# for accessing the R0 and R1 registers
from unicorn.arm_const import UC_ARM_REG_R0, UC_ARM_REG_R1
# We need to assemble ARM code
from keystone import Ks, KS_ARCH_ARM, KS_MODE_ARM, KsError

Let's write our ARM assembly code, which calculates factorial(r0), where r0 is an input register:

ARM_CODE = """
// n is r0, we will pass it from python, ans is r1
mov r1, 1       	// ans = 1
loop:
cmp r0, 0       	// while n >= 0:
mulgt r1, r1, r0	//   ans *= n
subgt r0, r0, 1 	//   n = n - 1
bgt loop        	// 
                	// answer is in r1
"""

Let's assemble the above Assembly code (convert it into bytecode):

print("Assembling the ARM code")
try:
    # initialize the keystone object with the ARM architecture
    ks = Ks(KS_ARCH_ARM, KS_MODE_ARM)
    # Assemble the ARM code
    ARM_BYTECODE, _ = ks.asm(ARM_CODE)
	# convert the array of integers into bytes
    ARM_BYTECODE = bytes(ARM_BYTECODE)
    print(f"Code successfully assembled (length = {len(ARM_BYTECODE)})")
    print("ARM bytecode:", ARM_BYTECODE)
except KsError as e:
    print("Keystone Error: %s" % e)
    exit(1)

The function Ks returns an Assembler in ARM mode, the asm() method assembles the code, and returns the bytes, and the number of instructions it assembled.

The bytecode can now be written on a memory region, and be executed by an ARM processor (or emulated, in our case):

# memory address where emulation starts
ADDRESS = 0x1000000

print("Emulating the ARM code")
try:
    # Initialize emulator in ARM mode
    mu = Uc(UC_ARCH_ARM, UC_MODE_ARM)
    # map 2MB memory for this emulation
    mu.mem_map(ADDRESS, 2 * 1024 * 1024)
    # write machine code to be emulated to memory
    mu.mem_write(ADDRESS, ARM_BYTECODE)
    # Set the r0 register in the code, let's calculate factorial(5)
    mu.reg_write(UC_ARM_REG_R0, 5)
    # emulate code in infinite time and unlimited instructions
    mu.emu_start(ADDRESS, ADDRESS + len(ARM_BYTECODE))
    # now print out the R0 register
    print("Emulation done. Below is the result")
    # retrieve the result from the R1 register
    r1 = mu.reg_read(UC_ARM_REG_R1)
    print(">>  R1 = %u" % r1)
except UcError as e:
    print("Unicorn Error: %s" % e)

In the above code, we initialize the emulator in ARM mode, we map 2MB of memory at the specified address (2*1024*1024 bytes), we write the result of our assembly into the mapped memory area, we set the r0 register to 5, and we start emulating our code.

The emu_start() method takes an optional timeout argument, and an optional maximum number of instructions to emulate, which can be useful for sandboxing code or limiting emulation to a certain portion of the code.

After the emulation comptes, we read the content of the r1 register, which should contain the result of the emulation, running the code outputs the following results:

Assembling the ARM code
Code successfully assembled (length = 20)
ARM bytecode: b'\x01\x10\xa0\xe3\x00\x00P\xe3\x91\x00\x01\xc0\x01\[email protected]\xc2\xfb\xff\xff\xca'
Emulating the ARM code
Emulation done. Below is the result
>>  R1 = 120

We get the expected result, the factorial of 5 is 120.

Disassembling x86-64 code

Now what if we have the machine code of x86 and we want to disassemble it, the following code does that:

# We need to emulate ARM and x86 code
from unicorn import Uc, UC_ARCH_X86, UC_MODE_64, UcError
# for accessing the RAX and RDI registers
from unicorn.x86_const import UC_X86_REG_RDI, UC_X86_REG_RAX
# We need to disassemble x86_64 code
from capstone import Cs, CS_ARCH_X86, CS_MODE_64, CsError

X86_MACHINE_CODE = b"\x48\x31\xc0\x48\xff\xc0\x48\x85\xff\x0f\x84\x0d\x00\x00\x00\x48\x99\x48\xf7\xe7\x48\xff\xcf\xe9\xea\xff\xff\xff"
# memory address where emulation starts
ADDRESS = 0x1000000
try:
      # Initialize the disassembler in x86 mode
      md = Cs(CS_ARCH_X86, CS_MODE_64)
      # iterate over each instruction and print it
      for instruction in md.disasm(X86_MACHINE_CODE, 0x1000):
            print("0x%x:\t%s\t%s" % (instruction.address, instruction.mnemonic, instruction.op_str))
except CsError as e:
      print("Capstone Error: %s" % e)

We initialize a disassembler in x86-64 mode, disassemble the machine code provided, iterate over the instructions in the disassembly result, and for each of them, we print the instruction and the address where it occurs.

This yields the following output:

0x1000: xor     rax, rax
0x1003: inc     rax
0x1006: test    rdi, rdi
0x1009: je      0x101c
0x100f: cqo
0x1011: mul     rdi
0x1014: dec     rdi
0x1017: jmp     0x1006

Now let's try to emulate it with Unicorn engine:

try:
    # Initialize emulator in x86_64 mode
    mu = Uc(UC_ARCH_X86, UC_MODE_64)
    # map 2MB memory for this emulation
    mu.mem_map(ADDRESS, 2 * 1024 * 1024)
    # write machine code to be emulated to memory
    mu.mem_write(ADDRESS, X86_MACHINE_CODE)
    # Set the r0 register in the code to the number of 7
    mu.reg_write(UC_X86_REG_RDI, 7)
    # emulate code in infinite time & unlimited instructions
    mu.emu_start(ADDRESS, ADDRESS + len(X86_MACHINE_CODE))
    # now print out the R0 register
    print("Emulation done. Below is the result")
    rax = mu.reg_read(UC_X86_REG_RAX)
    print(">>> RAX = %u" % rax)
except UcError as e:
    print("Unicorn Error: %s" % e)

Output:

Emulation done. Below is the result
>>> RAX = 5040

We get the result of 5040, and we input 7. If we look closer at this x86 assembly code, we notice that this code computes the factorial of the rdi register (5040 is the factorial of 7).

Conclusion

The three frameworks manipulate Assembly code in a uniform way, as you can see in the code emulating x86-64 Assembly, which is really similar to the ARM emulating version. Disassembling and assembling code also is done the same way with any supported architecture.

One thing to keep in mind is that the Unicorn emulator emulates raw machine code, it does not emulate Windows API calls, and does not parse and emulate file formats like PE and ELF.

In some scenarios, it's useful to emulate a whole operating system, or a program that is in the form of a kernel driver, or a binary file meant for a different operating system, there is a great framework built on top of Unicorn that handles these limitations, while offering Python bindings as well, which is Qiling framework, it also allows binary instrumentation (for example, faking system call return values, file descriptors and so on).

After testing the three Python frameworks, we conclude that manipulating Assembly code with Python is very easy, the simplicity of Python combined with the convenient and uniform Python interfaces offered by Keystone, Capstone and Unicorn make it easy, even for beginners, to assemble, disassemble and emulate Assembly code for different architectures.

Learn also: How to Convert Python Files into Executables.

Happy Coding ♥

View Full Code
Sharing is caring!



Read Also





Comment panel