This blog post is a follow-up on the announcement of Triton v0.8, where we explain how we added support for ARMv7 and provide a guideline for adding new architectures.

As you may have read in our previous blog post, the release of Triton v0.8 came with a lot of features and improvements. Support for the ARMv7 architecture is amongst the main contributions of this new version.

This blog post provides some extra details about how we achieved it. Furthermore, we would like to describe the process and general guidelines to add new architectures to Triton. Contrarily to what one might think, the process is pretty straightforward in terms of integration (the core does not need much modifications). However, it needs some effort regarding development, which ultimately depends on the complexity and quirks of the target architecture.

A quick introduction to the ARMv7 architecture

Let's start with a very brief overview of the architecture. ARMv7 is a RISC processor, with a Load/Store memory model (which means memory access is restricted to specific instructions). It has thirteen general-purpose 32-bit registers (R0 to R12) and three 32-bit registers which have special uses: SP (Stack Pointer), LR (Link Register), and PC (Program Counter) (they can also be referred to as R13, R14, and R15, respectively). Besides, there is a 32-bit Application Program Status Register (APSR), which holds the flags (N, Z, C and V).

One peculiar aspect of the architecture is that it has two main execution modes: ARM and Thumb (instructions are encoded for one or the other). Transitions between these two modes can occur anytime during execution (only through specific instructions, though). Instructions encoded for ARM mode are fixed in size, 4 bytes; whereas those encoded for Thumb can be 2 or 4 bytes long. Another interesting feature, is that most instructions are conditional, that is, they execute (or not) based on the current values of the flags. Lastly, the memory also offers flexibility as data accesses can be either little-endian or big-endian (just data, instructions are always little-endian).

The ubiquity of ARM processors is one of the main reasons for adding support for ARMv7 in Triton. ARMv7 is a widely popular architecture, particularly in embedded devices and mobile phones. We wanted to bring the advantages of Triton to this architecture (most tools are prepared to work on Intel x86/x86_64 only). The other reason is to show the flexibility and extensibility of Triton. ARMv7 poses some challenges in terms of implementation given its many features and peculiarities (some of them quite different from the rest of the supported architectures). Therefore, ARMv7 makes a great architecture to add to the list of supported ones.

Now without further ado, let's describe all the necessary steps to implement a new architecture in Triton.

Step 1: Describing registers specification and defining enums

The first step consists in describing the registers specification of the new architecture. The description is defined in a *.spec* file and will be interpreted as C/C++ macro definitions. The definitions are pretty straightforward and must follow the following syntax for each register:

REG_SPEC(UPPER_NAME, LOWER_NAME, UPPER_BIT_POS, LOWER_BIT_POS, PARENT_REG, IS_MUTABLE)

UPPER_NAME and LOWER_NAME are the string name of the register (e.g: R1 and r1). UPPER_BIT_POS and LOWER_BIT_POS are the bit positions of the register in its bitvector. For ARMv7 these fields are mainly used to define the size of the register. So for every ARMv7 register, their lower bit position is 0 and their upper bit position is 31 but for other architectures like x86, this field varies (e.g: the ah register has an upper bit position to 15 and a lower bit position to 8). The IS_MUTABLE field defines if the register is writable (e.g: ZXR in AArch64 is immutable). Below the ARMv7 spec file we made for this architecture:

// Thirteen general-purpose 32-bit registers, R0 to R12
REG_SPEC(R0,   r0,   triton::bitsize::dword-1, 0, R0,  TT_MUTABLE_REG)  // r0
REG_SPEC(R1,   r1,   triton::bitsize::dword-1, 0, R1,  TT_MUTABLE_REG)  // r1
[...]
REG_SPEC(R12,  r12,  triton::bitsize::dword-1, 0, R12, TT_MUTABLE_REG)  // r12
REG_SPEC(SP,   sp,   triton::bitsize::dword-1, 0, SP,  TT_MUTABLE_REG)  // SP
REG_SPEC(R14,  r14,  triton::bitsize::dword-1, 0, R14, TT_MUTABLE_REG)  // LR (r14)
REG_SPEC(PC,   pc,   triton::bitsize::dword-1, 0, PC,  TT_MUTABLE_REG)  // PC
REG_SPEC(APSR, apsr, triton::bitsize::dword-1, 0, APSR, TT_MUTABLE_REG) // APSR

REG_SPEC_NO_CAPSTONE(C, c, 0, 0, C, TT_MUTABLE_REG) // C (Carry)
REG_SPEC_NO_CAPSTONE(N, n, 0, 0, N, TT_MUTABLE_REG) // N (Negative)
REG_SPEC_NO_CAPSTONE(V, v, 0, 0, V, TT_MUTABLE_REG) // V (Overflow)
REG_SPEC_NO_CAPSTONE(Z, z, 0, 0, Z, TT_MUTABLE_REG) // Z (Zero)

As you can see, some flags are defined with REG_SPEC_NO_CAPSTONE instead of REG_SPEC. The reason for this is the following. Capstone, the library Triton uses for disassembly, defines the APSR register, which holds all 4 flags, as a "single" register. However, we would like to be able to access each flag independently from one another. REG_SPEC_NO_CAPSTONE is used for this purpose: it defines a flag and states that it is not present in Capstone (the values of the APSR register and each flag are "synchronized").

Once the registers specification is done, we have to define enums for instructions and registers. As mentioned, Triton uses Capstone to disassemble opcodes, however, we define our own enums for things such as instructions mnemonics. Why don't we use Capstone enums? Our goal is to be as independent as possible of any external library. For example, if we move away to another disassembler, we don't want to change the base code of our engines and semantics. To avoid this scenario, we have to convert every Capstone enum into a Triton enum. This is the role the following functions and they are basically just switch cases:

  • Arm32Specifications::capstoneRegisterToTritonRegister

  • Arm32Specifications::capstoneInstructionToTritonInstruction

These functions are primarily used during the disassembly stage (next step).

Step 2: Creating a CPU interface

The second step consists in implementing what is called the CPU interface. Basically, all architectures in Triton share the same interface. It provides access to CPU registers, memory and also useful information such as which registers are the program counter and the stack pointer. One of the most important methods of this interface is disassembly which, as its name clearly states, disassembles instructions provided by the user. The workflow is the following: the user creates an instruction, sets the opcode and address, and calls the processing method (here is where all the magic happens). The code looks like this (using the Python bindings):

ctx = TritonContext(ARCH.ARM32)

# Set memory, PC, etc...

while pc != stop_address:
    # Fetch next opcode.
    opcode = ctx.getConcreteMemoryAreaValue(pc, 4)

    # Create a Triton instruction.
    instruction = Instruction(pc, opcode)

    # Process the instruction (i.e., disassemble it and build its semantics).
    ctx.processing(instruction)

    # Update the program counter.
    pc = ctx.getConcreteRegisterValue(ctx.registers.pc)

In turn, ctx.processing(instruction) calls the aforementioned disassembly method. It uses Capstone to disassemble the instruction and then uses the information supplied to fill the rest of the fields of the instruction (basically, there is a translation from the Capstone representation of an instruction to the Triton one, as explained in the previous step).

For most architectures the job would be done by now. However, the ARMv7 architecture presents unique challenges (to be fair to ARM, every architecture does). The disassembly method has to take into account the current execution mode, which can be ARM or Thumb. Transitions between these two modes can occur anytime in the code (although only through specific instructions, such as branch and exchange instructions, or some selected instructions [1] that have PC as their destination register). And when it does occur, the PC register is updated (with the address of the next instruction to execute) and its least significant bit is set to 0 when the target instruction is in ARM mode or to 1 when it is in Thumb mode. Therefore, dealing with transitions in Triton is simple. It only consists in checking when the PC register is set (it is done in just one place) and setting a flag that states which mode it is currently in (depending on it the instruction will be disassembled using one mode or the other).

Besides specificities such as the one described above, the implementation of the CPU interface is quite simple and straightforward. Anyone trying to implement a new architecture ( ;) ) can use any of the available ones (x86, AArch64 and now ARMv7) as a reference.

Step 3: Describing the semantics

Each instruction modifies the state of the registers, memory and flags in a precise way, we call this its semantics. This step shows how to write the semantics of an instruction so every time we emulate one in Triton it does exactly what it is supposed to do (accordingly to what the ARMv7 manual says).

Similarly to the previous step, there is a semantics interface which we have to implement when adding a new architecture to Triton. This interface is quite simple and has one method only, namely: buildSemantics. It is invoked by the processing method after the disassembly of the instruction has finished.

The method consists of a big switch statement that processes instructions according to their mnemonics (for example: ID_INS_ADD, ID_INS_MOV, which correspond to the ADD and MOV instructions). The handling of each instruction is done in a separate method. The structure of such method is roughly the following:

void Arm32Semantics::<MNEMONIC>_s(triton::arch::Instruction& inst) {
  auto& dst  = inst.operands[0];
  auto& src1 = inst.operands[1];
  auto& src2 = inst.operands[2];

  /* Create symbolic operands */
  auto op1 = this->symbolicEngine->getOperandAst(inst, src1);
  auto op2 = this->symbolicEngine->getOperandAst(inst, src2);

  /* Create the semantics */
  auto node = <build semantics using the AstContext object>;

  /* Create symbolic expression */
  auto expr = this->symbolicEngine->createSymbolicExpression(inst, node, dst, "<MNEMONIC> operation");

  /* Get condition code node (in case it is a conditional instruction) */
  auto cond = node->getChildren()[0];

  /* Spread taint */
  this->spreadTaint(inst, ...);

  /* Update symbolic flags */
  if (inst.isUpdateFlag() == true) {
    /* Update flags accordingly to the result of instruction. */
  }

  /* Update condition flag */
  if (cond->evaluate() == true) {
    /* In case it is a conditional execution instruction, make the
     * necessary adjustments (for instance, let Triton know the instruction
     * was in fact executed, switch execution modes, etc).
     */
  }

  /* Update the symbolic control flow */
  this->controlFlow_s(inst, ...);
}

Each instruction is different and has specific needs (and/or quirks), however, for most of them, the definition of their semantics looks similar to the example code above. In the case of ARMv7, we had to account for various aspects that made the implementation complex and, in some cases, even cumbersome. Firstly, as already mentioned, ARMv7 has two main execution modes: ARM and Thumb. Instructions are encoded for one or the other. Typically, they look the same, nonetheless, they have some differences. Perhaps, the most important one is the conditional execution. ARMv7 instructions are in their vast majority conditional, that is, they execute (or not) depending on the current values of the flags. For ARM, this information is encoded within each instruction (enabled by a suffix, for example: ADDNE r0, r1, r2), whereas for Thumb they require an extra instruction (IT [2]) to make it conditional (for example: IT NE; ADD r0, r1, r2).

There are also more subtle differences which demand extra attention. For instance, two instructions that look the same but whose operands behave slightly differently (at least, according to Capstone). This is the case of ASRS r0, r1, #2 (Arithmetic Shift Right, the S suffix states that the flags should be updated), where the immediate (i.e., the #2) is interpreted differently when encoded in ARM and Thumb (shown as Shift and as operands[2], respectively). Below you can see the differences:

$ cstool -d arm "\x41\x01\xb0\xe1" 1000
1000  41 01 b0 e1  asrs r0, r1, #2
    op_count: 2
        operands[0].type: REG = r0
        operands[0].access: WRITE
        operands[1].type: REG = r1
        operands[1].access: READ
            Shift: 1 = 2
    Update-flags: True
    Registers read: r1
    Registers modified: r0
    Groups: arm

$ cstool -d thumb "\x88\x10" 1000
1000  88 10  asrs   r0, r1, #2
    op_count: 3
        operands[0].type: REG = r0
        operands[0].access: WRITE
        operands[1].type: REG = r1
        operands[1].access: READ
        operands[2].type: IMM = 0x2
    Update-flags: True
    Registers read: r1
    Registers modified: r0
    Groups: thumb thumb1only

Switching between modes is another matter that required effort. It is possible to switch modes not only using explicit instructions such as BX (Branch and eXchange) but also through standard instructions, for instance, arithmetic and bitwise. In the latter case, the only thing needed is the PC register to be used as the destination operand. This didn't pose a difficulty in itself but took considerable amount of time when testing given the many cases to consider.

Amongst the things that make the implementation of ARMv7 complex, we can emphasize the many variations a single instruction can have (in terms of the number and type of operands as well as the condition code).

The current state of the ARMv7 implementation is quite advanced. Nonetheless, there is still some more work to do. We have implemented the most frequent instructions and it is possible to emulate full binaries (as we'll comment in the next section). Adding support for new instructions is relatively easy now as the heavy part is already done, and the testing infrastructure is in place. We'll be adding more instructions in future releases. We have not considered yet support for features such as SIMD, floating-point extensions or big-endian memory access (we'll consider them as need arises, though).

Step 4: Testing the semantics

Implementing an instruction set can be tricky and requires a lot of attention. Reference manuals are not always as clear as one would like. Therefore, testing is crucial.

Testing involves processing instructions and comparing their outputs (that is, the values of registers and memory) to a well known implementation of the architecture under test. Triton relies on Unicorn for emulation (which is based on QEMU, a widely known, used and tested emulator).

The development process was the following. We started by implementing scripts to emulate an instruction using Unicorn and using Triton. Then, each time we implemented a new instruction we emulated it using both scripts and compared the results. In case there was a difference we investigated it and made the necessary fixes.

Once the development was completed (that is, we implemented all the instructions we originally planned) we included the aforementioned tests to Triton's CI infrastructure. We currently test many variations of the same instruction, with different conditional codes and operands. We test instructions encoded for ARM and Thumb as well. Additionally, we test instructions that switch execution modes. As the number of instructions tested is large, tests are separated by instruction category (data, branch, load/store), encoding (ARM/Thumb) and mode switching (from ARM to Thumb and the other way around).

As an extra step, we also test the implementation emulating entire binaries. In this case, we have chosen a binary sample that computes the sha256 of a string (which proved to be really useful to find some missing details in previous tests). This sample was compiled for ARM and Thumb modes with different optimization flags (-O0, -O1, -O2, -O3, -Os, and -Oz), providing an extensive range of instructions and variations.

As part of its CI infrastructure, Triton collects coverage information from its tests (you can take a better look at our Codecov page). This information helped us guide our testing efforts during the development process. As already mentioned, ARMv7 instructions have many variations, and it was not always obvious which one were missing from the tests.

Files organisation

Regarding the ARMv7 architecture and the files organisation, every step is handled by the following files:

Conclusion

Triton proved to be prepared for the addition of another architecture. ARMv7 posed some challenges, as described throughout this post. However, Triton handled them nicely (very few changes were needed in its core). The current implementation is quite advanced and we are going to add support for missing instructions in future releases.

This blog post, besides describing the experience of providing support for ARMv7, is meant to be used as a guideline for adding new architectures. As seen in the first two steps, adding basic support for disassembly is simple and straightforward. The heavy work resides in Step three. However, the task can be tackled progressively, allowing you to implement only those instructions you need for your analysis (and to have an immediate feedback of your implementation as well). If you want to bring the benefits of Triton to another architecture (or if you simply want to deepen your knowledge), now you known how to proceed!

Acknowledgments

  • Thanks to all our Quarkslab colleagues who proofread this article.

  • Thanks to Romain for providing testing samples.

References

[1]Check section "Changing between Thumb state and ARM state" of the reference manual (ARM Architecture Reference Manual, ARMv7-A and ARMv7-R edition).
[2]Currently, the IT instruction is not supported natively. However, it can be easily handled as shown in this example.

If you would like to learn more about our security audits and explore how we can help you, get in touch with us!