Writing Beacon Object Files: Flexible, Stealthy, and Compatible
This post focuses on creating Cobalt Strike Beacon Object Files using the MinGW compiler on Linux. We will discuss several ideas and best practices that will increase the quality of your BOFs.
Flexibility
Compiling to Both Object Files and Executables
While writing a BOF is great, it’s always worth making the code compile to both BOF and EXE.
This provides a lot more options: we could run our capability outside Beacon by just writing the EXE to disk and executing it. We could then convert it into position independent shellcode using donut and run it from memory.
Usually, calling a Windows API from Beacon Object File would appear as follows:
program.h
WINBASEAPI size_t __cdecl MSVCRT$strnlen(const char *s, size_t maxlen);
program.c
int length = MSVCRT$strnlen(someString, 256); BeaconPrintf(CALLBACK_OUTPUT, "The variable length is %d.", length);
Makefile
BOFNAME := program CC_x64 := x86_64-w64-mingw32-gcc all: $(CC_x64) -c source/program.c -o compiled/$(BOFNAME).x64.o -masm=intel -Wall
However, we would like to create both a BOF and an EXE file using the same file. A practical option to achieve the creation of both files is to add a conditional compilation clause as shown below. In this example, we are using BOF
:
Makefile
BOFNAME := program CC_x64 := x86_64-w64-mingw32-gcc all: $(CC_x64) -c source/program.c -o compiled/$(BOFNAME).x64.o -masm=intel -Wall -DBOF $(CC_x64) source/program.c -o compiled/$(BOFNAME).x64.exe -masm=intel -Wall
program.h
#ifdef BOF WINBASEAPI size_t __cdecl MSVCRT$strnlen(const char *s, size_t maxlen); #define strnlen MSVCRT$strnlen #endif #ifdef BOF #define PRINT(...) { \ BeaconPrintf(CALLBACK_OUTPUT, __VA_ARGS__); \ } #else #define PRINT(...) { \ fprintf(stdout, __VA_ARGS__); \ fprintf(stdout, "\n"); \ } #endif
program.c
int length = strnlen(someString, 256); PRINT("The variable length is %d.", length);
Finally, in our program.c file, we would define the “go” (BOF's entry point) and “main” functions:
program.c
#ifdef BOF void go(char* args, int length) { // BOF code } #else int main(int argc, char* argv[]) { // EXE code { #endif
Stealth
Syswhispers2 Integration
syswhispers2 is an awesome implementation of direct syscalls. However, if we take a look under the hood, we can see that it uses a global variable to achieve its objective. Unfortunately, global variables do not work very well with Beacon. This is because Beacon Object Files don't have a .bss section, which is where global variables are typically stored.
A useful trick, originally suggested by Twitter user @the_bit_diddler, is to move the global variables to the .data section using a compiler directive, as shown below:
syscalls.c (before)
SW2_SYSCALL_LIST SW2_SyscallList;
syscalls.c (after)
SW2_SYSCALL_LIST SW2_SyscallList __attribute__ ((section(".data")));
This small change will allow the use of the syswhispers2 logic in a BOF.
In addition to the global variables change, there are other minor changes that need to be made so that the the code of syswhispers2 can compile with MinGW. For example, the API hashes format needs to be changed from 0ABCD1234h to: 0xABCD1234. The tool InlineWhispers should take care of the rest.
Hiding the Use of syscalls
Using direct syscalls is a powerful technique to avoid userland hooks. Ironically, using them could get us caught.
There are at least two ways of detecting direct syscalls: dynamic and static.
The dynamic method is simply detecting that a syscall was called from a module that is not ntdll.dll. The static method is to find a syscall instruction by inspecting the program's code and memory. How can we avoid both these detections? The answer is to call our syscalls from ntdll.dll.
First, we must locate where ntdll.dll is loaded. Luckily, syswhispers2 already has the code to do just that. Then, we can parse its headers and locate the code section.
Once we know code section base address and size of ntdll.dll, all we need to do is search for the opcodes of the instructions syscall; ret. In x64, the bytes we are looking for are: {0x0f, 0x05, 0xc3}.
While it is true that EDRs and other tools hook (overwrite) syscalls in ntdll.dll, they certainly do not hook all existing syscalls, so we are guaranteed to find at least one occurrence of these three bytes. We might even find them by chance in a misaligned offset.
Once we find the syscall; ret bytes, we can save the address in a global variable (stored in the .data section). That way, we only need to find it once.
All what we have just described can be seen in the following code sequence:
syscalls.c
#ifdef _WIN64 #define PEB_OFFSET 0x60 #define READ_MEMLOC __readgsqword #else #define PEB_OFFSET 0x30 #define READ_MEMLOC __readfsdword #endif PVOID SyscallAddress __attribute__ ((section(".data"))) = NULL; __attribute__((naked)) void SyscallNotFound(void) { __asm__(" SyscallNotFound: \n\ mov eax, 0xC0000225 \n\ ret \n\ "); } PVOID GetSyscallAddress(void) { #ifdef _WIN64 BYTE syscall_code[] = { 0x0f, 0x05, 0xc3 }; #else BYTE syscall_code[] = { 0x0f, 0x34, 0xc3 }; #endif // Return early if the SyscallAddress is already defined if (SyscallAddress) { // make sure the instructions have not been replaced if (!strncmp((PVOID)syscall_code, SyscallAddress, sizeof(syscall_code))) return SyscallAddress; } // set the fallback as the default SyscallAddress = (PVOID) SyscallNotFound; // find the address of NTDLL PSW2_PEB Peb = (PSW2_PEB)READ_MEMLOC(PEB_OFFSET); PSW2_PEB_LDR_DATA Ldr = Peb->Ldr; PIMAGE_EXPORT_DIRECTORY ExportDirectory = NULL; PVOID DllBase = NULL; PVOID BaseOfCode = NULL; ULONG32 SizeOfCode = 0; // Get the DllBase address of NTDLL.dll. NTDLL is not guaranteed to be the second // in the list, so it's safer to loop through the full list and find it. PSW2_LDR_DATA_TABLE_ENTRY LdrEntry; for (LdrEntry = (PSW2_LDR_DATA_TABLE_ENTRY)Ldr->Reserved2[1]; LdrEntry->DllBase != NULL; LdrEntry = (PSW2_LDR_DATA_TABLE_ENTRY)LdrEntry->Reserved1[0]) { DllBase = LdrEntry->DllBase; PIMAGE_DOS_HEADER DosHeader = (PIMAGE_DOS_HEADER)DllBase; PIMAGE_NT_HEADERS NtHeaders = SW2_RVA2VA(PIMAGE_NT_HEADERS, DllBase, DosHeader->e_lfanew); PIMAGE_DATA_DIRECTORY DataDirectory = (PIMAGE_DATA_DIRECTORY)NtHeaders->OptionalHeader.DataDirectory; DWORD VirtualAddress = DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress; if (VirtualAddress == 0) continue; ExportDirectory = SW2_RVA2VA(PIMAGE_EXPORT_DIRECTORY, DllBase, VirtualAddress); // If this is NTDLL.dll, exit loop. PCHAR DllName = SW2_RVA2VA(PCHAR, DllBase, ExportDirectory->Name); if ((*(ULONG*)DllName | 0x20202020) != 0x6c64746e) continue; if ((*(ULONG*)(DllName + 4) | 0x20202020) == 0x6c642e6c) { BaseOfCode = SW2_RVA2VA(PVOID, DllBase, NtHeaders->OptionalHeader.BaseOfCode); SizeOfCode = NtHeaders->OptionalHeader.SizeOfCode; break; } } if (!BaseOfCode || !SizeOfCode) return SyscallAddress; // try to find a 'syscall' instruction inside of NTDLL's code section PVOID CurrentAddress = BaseOfCode; PVOID EndOfCode = SW2_RVA2VA(PVOID, BaseOfCode, SizeOfCode - sizeof(syscall_code) + 1); while ((ULONG_PTR)CurrentAddress <= (ULONG_PTR)EndOfCode) { if (!strncmp((PVOID)syscall_code, CurrentAddress, sizeof(syscall_code))) { // found 'syscall' instruction in ntdll SyscallAddress = CurrentAddress; return SyscallAddress; } // increase the current address by one CurrentAddress = SW2_RVA2VA(PVOID, CurrentAddress, 1); } // syscall entry not found, using fallback return SyscallAddress; }
syscalls.h
EXTERN_C PVOID GetSyscallAddress(void);
In the extremely unlikely scenario in which we do not find ANY occurrence of these three bytes in the code section of ntdll.dll, we can instead use our own function: SyscallNotFound. This simply returns STATUS_NOT_FOUND
. We could implement a syscall; ret, but keep in mind that we want to avoid having the syscall instruction in our code in order to evade static analysis.
Once we have the memory address of interest, all we need to do is to modify the assembly of our syscall functions to jump to this memory address:
push rcx ; save volatile registers push rdx push r8 push r9 sub rsp, 0x28 ; allocate some space on the stack call GetSyscallAddress ; call the C function and get the address of the 'syscall' instruction in ntdll.dll add rsp, 0x28 push rax ; save the address in the stack sub rsp, 0x28 ; allocate some space on the stack mov ecx, 0x0123ABCD ; set the syscall hash as the parameter call SW2_GetSyscallNumber ; get the id of the syscall using syswhispers2 add rsp, 0x28 pop r11 ; store the address of the 'syscall' instruction on r11 pop r9 ; restore the volatile registers pop r8 pop rdx pop rcx mov r10, rcx jmp r11 ; jump to ntdll.dll and call the syscall from there
And voilà, we use direct syscalls from a valid module (ntdll.dll) without having a syscall instruction in our code 😊.
Stripping the Debug Symbols
While this step is not critical, stripping your binaries is clever enough that it is worth the extra step. Once completed, they are not only a lot harder to analyze but they also get smaller in size.
All we need to do is modify the Makefile to look as follows:
BOFNAME := program CC_x64 := x86_64-w64-mingw32-gcc STRIP_x64 := x86_64-w64-mingw32-strip all: $(CC_x64) -c program.c -o compiled/$(BOFNAME).x64.o -masm=intel -Wall -DBOF $(STRIP_x64) --strip-unneeded compiled/$(BOFNAME).x64.o $(CC_x64) program.c -o compiled/$(BOFNAME).x64.exe -masm=intel -Wall $(STRIP_x64) --strip-all compiled/$(BOFNAME).x64.exe
Once the debugging symbols are stripped, if the program is compiled without changing the code, the resulting object file and executable will be the same regardless of who compiled it. This means that everyone will get the same object files after compiling it.
Is that a bad thing? Potentially, but only if fingerprinting is a concern. The code could be slightly modified and recompiled. For example, the seed of syswhispers2 could be changed. If code is run from a Beacon or in memory in the form of shellcode, fingerprinting should not be worrisome, as static analysis in those cases is not possible.
Compatibility
Supporting x86 might seem hard and pointless, but we shouldn’t limit ourselves and have every 32-bit machine out of our reach. Supporting x86 is a fun challenge and pays off in the end.
Code Logic
We’ll begin by introducing some conditional compilation clauses based on the architecture:
#if _WIN64 // x64 version of some logic #else // x86 version of some logic #endif
If we want to add some code that is exclusive to x64:
#if _WIN64 // some code only for x64 #endif
If we want to add some code that is exclusive to x86:
#ifndef _WIN64 // some code only for x86 #endif
X86 syscall Support
To support syscalls in x86, we will have to deal with a few difficulties that are very manageable.
Function Names Within x86 Assembly
The main issue that we can encounter trying to call the C functions SW2_GetSyscallNumber and GetSyscallAddress from x86 inline assembly, results in these compiler errors:
/usr/lib/gcc/i686-w64-mingw32/11.2.0/../../../../i686-w64-mingw32/bin/ld: /tmp/ccbjuGDN.o:program.c:(.text+0x68): undefined reference to `GetSyscallAddress' /usr/lib/gcc/i686-w64-mingw32/11.2.0/../../../../i686-w64-mingw32/bin/ld: /tmp/ccbjuGDN.o:program.c:(.text+0x73): undefined reference to `SW2_GetSyscallNumber'
There is some GCC documentation which explains that, for some reason, in x86 inline assembly, C functions (and variables) are prepended with an underscore to their name. So, in this case, GetSyscallAddress becomes _GetSyscallAddress and SW2_GetSyscallNumber becomes _SW2_GetSyscallNumber.
Instead of calling them with the underscore, we can just adapt their definition to specify their name in assembly, like this:
syscalls.h
EXTERN_C DWORD SW2_GetSyscallNumber(DWORD FunctionHash) asm ("SW2_GetSyscallNumber"); EXTERN_C PVOID GetSyscallAddress(void) asm ("GetSyscallAddress");
We also need to do the same with the definitions for all the syscalls in syscalls.h. For example, here’s how we can modify NtOpenProcess:
syscalls.h (before)
EXTERN_C NTSTATUS NtOpenProcess( OUT PHANDLE ProcessHandle, IN ACCESS_MASK DesiredAccess, IN POBJECT_ATTRIBUTES ObjectAttributes, IN PCLIENT_ID ClientId OPTIONAL);
syscalls.h (after)
EXTERN_C NTSTATUS NtOpenProcess( OUT PHANDLE ProcessHandle, IN ACCESS_MASK DesiredAccess, IN POBJECT_ATTRIBUTES ObjectAttributes, IN PCLIENT_ID ClientId OPTIONAL) asm ("NtOpenProcess");
Once this is done, the weird x86 naming system should work fine.
Syscalls With Conflicting Types
There are some syscalls that fail to compile in x86, and produce an error message like:
error: conflicting types for ‘NtClose’;
While there are surely others, these syscalls are confirmed to have this issue:
- NtClose
- NtQueryInformationProcess
- NtCreateFile
- NtQuerySystemInformation
- NtQueryObject
It appears that in x86, MinGW already has a definition of these functions somewhere. To fix this, we just need to rename the troubling syscalls by prepending an underscore to their name in the x86 version.
program.h
#ifndef _WIN64 #define NtClose _NtClose #define NtQueryInformationProcess _NtQueryInformationProcess #define NtCreateFile _NtCreateFile #define NtQuerySystemInformation _NtQuerySystemInformation #define NtQueryObject _NtQueryObject #endif
In program.c, we can call these functions normally, without prepending the underscore to their name.
X86 assembly code
For the assembly code, we’ll need to update syscalls-asm.h to look as follows:
syscalls-asm.h
#pragma once #include#if _WIN64 // all the x64 syscalls definitions #else // all the x86 syscalls definitions #endif
Finally, the x86 assembly will look like this:
call GetSyscallAddress ; call the C function and get the address of the 'sysenter; ret' instructions in ntdll.dll push eax ; save the address in the stack push 0x0123ABCD ; set the syscall hash as the parameter call SW2_GetSyscallNumber ; get the id of the syscall using syswhispers2 add esp, 4 pop ebx ; store the address of the 'sysenter; ret' instructions on ebx mov edx, esp sub edx, 4 ; save the (future) address of the stack in edx call ebx ; call the 'sysenter' instruction ret
After all these changes, we have syscalls x86 support.
WoW64 Support?
WoW64 stands for Windows on Windows64, which means there are 32-bit programs running on 64-bit Windows machines.
In WoW64 processes, syscalls are not called via a syscall or sysenter instruction. Instead, a jump to fs:[0xc0] is performed. Understanding the way this works requires a long explanation, but for the purpose of this article, all we need to know is that it translates syscalls from 32 to 64-bit so that the kernel can understand them.
One quick way of “supporting” syscalls on WoW64 processes is to perform the same jump from our code. However, there are a few drawbacks when doing this. First, this is by no means a direct syscall. EDRs can hook these calls. Additionally, in some syscalls that use pointers, we will not be able to reference addresses above 32-bit.
Truly supporting direct syscalls for WoW64 processes would require us to transition via a far jmp instruction into 64-bit code, translate the parameters to their 64-bit counterparts, adjust the calling convention, set the stack alignment and more. These actions alone could make up an entire post.
That being said, jumping to fs:[0xc0] is an easy trick and at least we would have some support for WoW64, which might be useful for some scenarios.
To detect if our program is running as WoW64 process, we’ll define a function called IsWoW64:
syscalls-asm.h
#if _WIN64 #define IsWoW64 IsWoW64 __asm__("IsWoW64: \n\ mov rax, 0 \n\ ret \n\ "); #else #define IsWoW64 IsWoW64 __asm__("IsWoW64: \n\ mov eax, fs:[0xc0] \n\ test eax, eax \n\ jne wow64 \n\ mov eax, 0 \n\ ret \n\ wow64: \n\ mov eax, 1 \n\ ret \n\ "); #endif
syscalls.h
EXTERN_C BOOL IsWoW64(void) asm ("IsWoW64");
program.c
if(IsWoW64()) { PRINT("This is a 32-bit process running on a 64-bit machine!\n"); }
If detection is a concern when running under a WoW64 context, just call IsWow64() and bail out if it returns as true.
This can be checked on the .CNA file in Cobalt Strike:
program.cna
$barch = barch($1); $is64 = binfo($1, "is64"); if($barch eq "x86" && $is64 == 1) { berror($1, "This program does not support WoW64"); return; }
We’ll also need to make a small change to the function GetSyscallAddress in order to set the syscall address to fs:[0xc0] if the process Is WoW64:
PVOID GetSyscallAddress(void) { #ifdef _WIN64 BYTE syscall_code[] = { 0x0f, 0x05, 0xc3 }; #else BYTE syscall_code[] = { 0x0f, 0x34, 0xc3 }; #endif #ifndef _WIN64 if (IsWoW64()) { // if we are a WoW64 process, jump to WOW32Reserved SyscallAddress = (PVOID)READ_MEMLOC(0xc0); return SyscallAddress; } #endif // Return early if the SyscallAddress is already defined if (SyscallAddress) { // make sure the instructions have not been replaced if (!strncmp((PVOID)syscall_code, SyscallAddress, sizeof(syscall_code))) return SyscallAddress; } // set the fallback as the default SyscallAddress = (PVOID)DoSysenter; …
Finally, we’ll update our Makefile to compile for both 64 and 32-bit.
Makefile
BOFNAME := program CC_x64 := x86_64-w64-mingw32-gcc CC_x86 := i686-w64-mingw32-gcc STRIP_x64 := x86_64-w64-mingw32-strip STRIP_x86 := i686-w64-mingw32-strip all: $(CC_x64) -c program.c -o compiled/$(BOFNAME).x64.o -masm=intel -Wall -DBOF $(STRIP_x64) --strip-unneeded compiled/$(BOFNAME).x64.o $(CC_x86) -c program.c -o compiled/$(BOFNAME).x86.o -masm=intel -Wall -DBOF $(STRIP_x86) --strip-unneeded compiled/$(BOFNAME).x86.o $(CC_x64) program.c -o compiled/$(BOFNAME).x64.exe -masm=intel -Wall $(STRIP_x64) --strip-all compiled/$(BOFNAME).x64.exe $(CC_x86) program.c -o compiled/$(BOFNAME).x86.exe -masm=intel -Wall $(STRIP_x86) --strip-all compiled/$(BOFNAME).x86.exe clean: rm compiled/$(BOFNAME).*.*
Conclusion
To summarize, this post explored several technical solutions to achieve the following objectives:
- Create executables as well as BOF using the same codebase
- Use syscalls from ntdll.dll instead of using them directly from an unknown module
- Strip executables to make them smaller and harder to analyze
- Run on both 64-bit and 32-bit
- Have partial support for syscalls in WoW64
If you want to see an example of all this working together, check out nanodump.
Interested in other Red Teaming techniques?
Learn more in our article, Nanodump: A Red Team Approach to Minidumps.