45  RzIL

RzIL is the new intermediate language in Rizin, primarily intended for representing the semantics of machine code. It is designed as a clone of BAP’s Core Theory, with minor deviations where necessary; it is worth noting that in practice, RzIL is very similar to the SMT representation with bitvectors and bitvector-indexed arrays as well as effects.

More details related to the implementation can be found here.

45.1 IL statements

Instructions can are internally represented as IL statements; these statements are expressed as s-expressions (symbolic expression) which utilize LISP-like syntax as string format.

  • aoi generates a one-line LISP-like syntax (JSON format is available via aoj command).
  • aoip generates a prettified LISP-like syntax

Here an example using aoip (the prettified output) of two PowerPC instructions (stwu and mflr).

[0x10000488]> pd 2
|           0x10000488      stwu  r1, -0x10(r1)
|           0x1000048c      mflr  r0
[0x10000488]> aoi?
Usage: aoi[p]   # Print the RzIL of next N instructions
| aoi [<n_instructions>]  # Print the RzIL of next N instructions
| aoip [<n_instructions>] # Pretty print the RzIL of next N instructions
[0x10000488]> aoip 2
0x10000488
(seq
  (storew 0
    (+
      (var r1)
      (let v
        (bv 16 0xfff0)
        (ite
          (msb
            (var v))
          (cast 32
            (msb
              (var v))
            (var v))
          (cast 32
            false
            (var v)))))
    (cast 32
      false
      (var r1)))
  (set r1
    (+
      (var r1)
      (let v
        (bv 16 0xfff0)
        (ite
          (msb
            (var v))
          (cast 32
            (msb
              (var v))
            (var v))
          (cast 32
            false
            (var v)))))))
0x1000048c
(set r0
  (cast 32
    false
    (var lr)))

The plf command allows to generate an in-line representation of the entire function in s-expressions.

[0x100002bc]> pdf @ sym.example
/ sym.example();
|           ; var int32_t var_1ch @ stack - 0x1c
|           0x1000044c      stwu  r1, -0x10(r1)
|           0x10000450      mflr  r0
|           0x10000454      stw   r0, 0x14(r1)
|           0x10000458      lwz   r0, 0x14(r1)
|           0x1000045c      addi  r1, r1, 0x10
|           0x10000460      mtlr  r0
\           0x10000464      blr
[0x100002bc]> plf @ sym.example
0x1000044c (seq (storew 0 (+ (var r1) (let v (bv 16 0xfff0) (ite (msb (var v)) (cast 32 (msb (var v)) (var v)) (cast 32 false (var v))))) (cast 32 false (var r1))) (set r1 (+ (var r1) (let v (bv 16 0xfff0) (ite (msb (var v)) (cast 32 (msb (var v)) (var v)) (cast 32 false (var v)))))))
0x10000450 (set r0 (cast 32 false (var lr)))
0x10000454 (seq (storew 0 (+ (var r1) (let v (bv 16 0x14) (ite (msb (var v)) (cast 32 (msb (var v)) (var v)) (cast 32 false (var v))))) (cast 32 false (var r0))) empty)
0x10000458 (seq (set r0 (let ea (+ (var r1) (let v (bv 16 0x14) (ite (msb (var v)) (cast 32 (msb (var v)) (var v)) (cast 32 false (var v))))) (let loadw (loadw 0 32 (var ea)) (cast 32 false (var loadw))))) empty)
0x1000045c (seq (set a (var r1)) (set b (let v (bv 16 0x10) (ite (msb (var v)) (cast 32 (msb (var v)) (var v)) (cast 32 false (var v))))) empty (set r1 (+ (var a) (var b))) empty empty empty)
0x10000460 (set lr (cast 32 false (var r0)))
0x10000464 (seq (set CIA (bv 32 0x10000464)) empty empty (set NIA (& (bv 32 0xfffffffc) (var lr))) (jmp (var NIA)))
[0x100002bc]> 

The same output of aoi can be obtained via rz-asm like this:

$ rz-asm -de -a ppc 7c0802a6
mflr r0
$ rz-asm -Ie -a ppc 7c0802a6
(set r0 (cast 32 false (var lr)))

45.2 Emulation

Rizin enables instruction emulation by leveraging RzIL. The emulation can be used to record changes within the VM, like read and writes of registers and memory locations (e io.buffers=true is required for memory ops). The emulation is controlled via the aez commands.

[0x00000000]> aez?
Usage: aez<isv?>   # RzIL Emulation
| aezi                     # Initialize the RzIL Virtual Machine at the current offset
| aezs [<n_times>]         # Step N instructions within the RzIL Virtual Machine
| aezse[j] [<n_times>]     # Step N instructions within the RzIL VM and output VM changes (read &
                             write)
| aezsu <address>          # Step until PC equals given address
| aezsue <address>         # Step until PC equals given address and output VM changes (read & write)
| aezv[jqt] [<var_name> [<number>]] # Print or modify the current status of the RzIL Virtual Machine

Supported architectures can be inspected via the La command. If the architecture has an I, as in the example below, it supports RzIL.

_dAeI 32 64      ppc         BSD     Capstone PowerPC disassembler

Here is an example of emulation of a PowerPC binary printing a string via printf.

In this example, r9 contains the base address which is used to calculate the pointer to the string (stored in r3) used by ‘reloc.printf’.

[0x1000049c]> pd 3
|           0x1000049c      lis   r9, 0x1000
|           0x100004a0      addi  r3, r9, 0x640
|           0x100004a4      bl    reloc.printf

First we need to initialize the RzIL Virtual Machine at the current offset using aezi

[0x1000049c]> aezi?
Usage: aezi   # Initialize the RzIL Virtual Machine at the current offset
[0x1000049c]> aezi

Then we execute 2 instructions via aezs (quiet) or use aezse to see the actual changes within the RzIL VM.

[0x1000049c]> aezse?
Usage: aezse[j] [<n_times>]   # Step N instructions within the RzIL VM and output VM changes (read & write)
[0x1000049c]> aezse 2 # execute 2 instructions
pc_write(old: 0x1000049c, new: 0x100004a0)
var_write(name: r9, old: 0x0, new: 0x10000000)
pc_write(old: 0x100004a0, new: 0x100004a4)
var_write(name: r3, old: 0x0, new: 0x10000640)

It’s possible to see (or modify) the values of the registers in the RzIL VM via aezv.

[0x1000049c]> # We can also print the content of the RzIL VM via 'aezv'
[0x1000049c]> aezv?
Usage: aezv[jqt] [<var_name> [<number>]]   # Print or modify the current status of the RzIL Virtual Machine
[0x1000049c]> aezv r3
 r3: 0x10000640
[0x1000049c]> aezv r9
 r9: 0x10000000

Now that we know that the string is situated at 0x10000640, we can print it.

[0x1000049c]> # hexdump the content of address 0x10000640 with a buffer size of 0x20 bytes.
[0x1000049c]> px @ 0x10000640 @! 0x20
- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x10000640  5369 6d70 6c65 2050 5043 2070 726f 6772  Simple PPC progr
0x10000650  616d 2e00 0000 0000 ffff ffff ffff ffff  am..............
[0x1000049c]>
[0x1000049c]> # Decode and print the utf-8 string at address 0x10000640
[0x1000049c]> ps @ 0x10000640
Simple PPC program.
[0x1000049c]>