A touch on CVE-2020-0796

Introduction

In this post, we will aim to touch on the CVE-2020-0796 vulnerability, which was initially publicly disclosed by Microsoft.

This post also attempts to illustrate basic windbg usage for dynamic vulnerability analysis. IDA disassembler will also be used in some parts.

Feel free to contact us for any errors, comments or additions.

CVE-2020-0796

As discussed in the security update, a “remote code execution vulnerability exists in the way that the Microsoft Server Message Block 3.1.1 (SMBv3) protocol handles certain requests. An attacker who successfully exploited the vulnerability could gain the ability to execute code on the target server or client”.

The vulnerability can be triggered by sending a malicious packet at an SMBv3 server or SMBv3 enabled client. In addition, the vulnerability can be triggered without prior authentication; simple network access to an unpatched, exposed SMBv3 server is enough. The same goes for the client: if the client attempts to connect to a malicious SMBv3 server, the vulnerability can be triggered on the client side.

The vulnerability affects the default installations of the following Windows versions:

Windows 10 1903 for x86, x64 as well as ARM64-based systems
Windows 10 1909 for x86, x64 as well as ARM64-based systems
Windows server 1903 server core
Windows server 1909 server core

Based on the above, the vulnerability severity is rated as Critical.

Unintended disclosure

Directly, no further details had been provided by the security update, however, as a workaround, MS recommends to disable SMBv3 compression. Specifically, one of the “new” features that was introduced at the new SMB3.1.1 dialect is compression and Windows versions prior to 1903 aren’t vulnerable, since this is part of a feature update that was introduced in Windows 1903 and later versions.

Windows kernel debugging setup

To be able to investigate the vulnerability, we should setup a lab that enables us to debug the Windows Kernel. We are going to need two virtual machines if using Linux or macOS hosts, or just one, if we use Windows as a host.

In this guide we are going to assume that two VMs are being used, as we are using a Linux host to do the investigation. Also, we also use VMware Workstation for virtualization.

vms

VM1: Windows 10 that will run the windbg debugger. We call this machine “debugger”. In our setup, we used Windows 10 Enterprise 1809 as the debugger.

VM2: Windows 10 that will have its kernel debugged. We will call this machine “debuggee”. In our setup, we used Windows 10 Enterprise 1909 OS Build 18363.592 as the debuggee

debugger

Install the following:

Visual Studio (2019), community will do
Latest Windows 10 SDK, included in the VS 2019
Windows Driver Kit (DDK), also included in VS 2019 as “Debugging tools for Windows”

Setup Microsoft Symbols: we believe that the most effective way to configure the MS symbols at your machine is to use an environment variable. Create a new system environment variable named _NT_SYMBOL_PATH and set its value to srv*c:\symbols*http://msdl.microsoft.com/download/symbols

Edit the VM .vmx file in order to remove any references to serial0 which, by default, is used for printing.

Inside the VM .vmx file, add the following lines:

serial0.present = "TRUE"
serial0.fileType = "pipe"
serial0.yieldOnMsrRead = "TRUE"
serial0.fileName = "/tmp/kerneldbg"
serial0.pipe.endPoint = "server"
serial0.tryNoRxLoss = "TRUE"

if using macOS, change the serial0.fileName to be "/private/tmp/com1"

if using a Windows host and want to debug via the network, which gives much faster performance, add a new Network Card from your host to the debuggee, and set it to host-only.

debuggee generic

Run cmd.exe and: sc config wuauserv start= disabled and then sc stop wuauserv in order to disable and stop the Windows Update service.

To make sure that the Windows update service doesn’t start on its own, open gpedit.msc and go to Computer Configuration\Administrative Templates\Windows Components\Windows Update -> Configure Automatic Updates and set it to Disabled

debuggee over serial port

Edit the VM .vmx file in order to remove any references to serial0 which, by default, is used for printing.

Inside the .vmx file, add the following lines:

serial0.present = "TRUE"
serial0.fileType = "pipe"
serial0.yieldOnMsrRead = "TRUE"
serial0.fileName = "/tmp/kerneldbg"
serial0.pipe.endPoint = "client"
serial0.tryNoRxLoss = "TRUE"

if using macOS, change the serial0.fileName to be "/private/tmp/com1"

Open cmd.exe as Administrator and run: bcdedit /debug on and bcdedit /dbgsettings serial debugport:1 baudrate:115200.

debuggee over network

If you are using a Windows host and want to debug the machine via network:

Open Device Manager => Network Adapters and select the second adapter you added => General tab.

Write down the Location. It is something like PCI Slot 256 (PCI bus 27, device 0, function 0).

You can get these parameters using powershell as follows Get-NetAdapterHardwareInfo -InterfaceDescription *Intel* | select Name, InterfaceDescription, DeviceType, Busnumber, Devicenumber, Functionnumber | FL

Open cmd.exe as Administrator and run:

bcdedit /debug on
bcdedit /dbgsettings net hostip:192.168.50.1 port:50000 key:my.secure.key.here
bcdedit /set "{dbgsettings}" busparams 27.0.0

The busparams has the format BusNumber.Devicenumber.FunctionNumber

As you notice, the hostip is the IP address of the VMware host at the host-only interface (the debugger machine). Here, you could potentially use the IP address of another Virtual Machine that you chose to be the debugger. We haven’t tested this setup though.

And then: shutdown -r -t 0

verify the setup works

To verify that the setup works, start the debugger first, run windbg and then select File > Kernel Debug... (or Ctrl + K). Go to the tab COM. Make sure baud rate is set to 115200 and port is set to com1. Pipe and reconnect should NOT be checked.

if you are debugging over network using a windows host select File > Kernel Debug... (or Ctrl + K) and at the NET tab, set the Port and key to have the value you entered at the configuration at the previous step.

Then, start the debuggee.

During the debuggee boot process, in windbg running on the debugger, you should be able output similar to this:

Once the debuggee has booted, you should see output something similar to this:

windbg usage

Once the debuggee has booted, press g for it to continue to run and Ctrl + Break in order to break. Some laptops that don’t have the break/pause key try using Ctrl + Fn + B or Alt + Del to break.

In windbg you can also setup and save your workspace the way you like it and everytime you debug something, the window arrangement and setup will be the one that you saved.

To do it, setup your windows the way you want them and then select File -> Save Workspace and then File -> Save Workspace As.

Once you have selected your setup and have the debuggee connected successfully, to verify that the symbols are downloaded and work as expected:

1: kd> !sym noisy

1: kd> .reload -f

This process might take a few minutes, depending on your Internet connection speed. Once done, you can verify that the symbols are downloaded and work as expected by running the lm command to List Loaded Modules.

Specifically, the lm command will list all currently loaded modules, as well as drivers. The output should be similar to this:

Run vertarget to display the debuggee version.

All windbg commands can be viewed by running the .hh meta-command.

Vulnerability details

The vulnerability actually occurs in the driver code that implements the SMB service, srv2.sys, located in C:\Windows\System32\drivers\. More specifically, the vulnerability is an integer overflow that occurs when the client (or the server) sends a malicious SMBv3 packet.

Microsoft has published “Open Specifications Documentation” regarding the internals of its implementation of the SMB protocol. As already discussed, the SMB2 protocol implements a “dialect” called SMBv3.1.1. SMB2 version 3.1.1 isn’t called a protocol on its own by Microsoft. It is a dialect of SMB2.

We are interested in the compression that is supported for SMBv3.1.1 dialect of SMB 2.

There are currently three dialect families of the SMB 2 Protocol:

Dialect Family	Dialect Revisions	Revision Code
SMB 2.0.2	SMB 2.0.2 dialect revision	0x0202
SMB 2.1	SMB 2.1 dialect revision	0x0210
SMB 3.x	SMB 3.0 dialect revision	0x0300
SMB 3.x	SMB 3.0.2 dialect revision	0x0302
SMB 3.x	SMB 3.1.1 dialect revision	0x0311

Specifically, the SMB 3.1.1 dialect introduces the following enhancements:

Supporting the negotiation of encryption and integrity algorithms.
Enhanced protection of negotiation and session establishment.
Reconnecting with a specified dialect.
Supporting the compression of messages between client and server.

The Microsoft web page provides a wealth of information regarding the implementation of the SMB 3.1.1 dialect. The sections that we will mostly use are 2.2.42 and 2.2.3.1.3.

PoC used

The PoC exploit that we will use to reproduce and analyze the vulnerability uses the latest impacket library version and is listed here.

windbg

We attach the kernel debugger and

.reload -f to force reload symbols

If the above takes a lot of time, break and then:

!sym noisy to view the progress

and then run .reload -f again and g or F5

We proceed to run the poc.

Our debugger breaks and we press !analyze -v

When the bug check occured, this was the status of the stack (partial):

...
ffff9e85`fbdffb90 fffff800`296ebb37 : nt!KiPageFault+0x360
ffff9e85`fbdffd20 fffff800`29130326 : nt!RtlDecompressBufferLZNT1+0x57
ffff9e85`fbdffdb0 fffff800`2dc7e58d : nt!RtlDecompressBufferEx2+0x66
ffff9e85`fbdffe00 fffff800`2dd27f41 : srvnet!SmbCompressionDecompress+0xdd
ffff9e85`fbdffe70 fffff800`2dd2699e : srv2!Srv2DecompressData+0xe1
ffff9e85`fbdffed0 fffff800`2dd69a9f : srv2!Srv2DecompressMessageAsync+0x1e
ffff9e85`fbdfff00 fffff800`291c4dde : srv2!RfspThreadPoolNodeWorkerProcessWorkItems+0x13f
ffff9e85`fbdfff80 fffff800`291c4d9c : nt!KxSwitchKernelStackCallout+0x2e
ffff9e85`fc78a970 fffff800`2906a16e : nt!KiSwitchKernelStackContinue
...

From the above we can see that function srv2!Srv2DecompressMessageAsync calls srv2!Srv2DecompressData, which calls srvnet!SmbCompressionDecompress, which calls nt!RtlDecompressBufferEx2, which calls nt!RtlDecompressBufferLZNT1 and then a Page Fault occurs.

IDA Pro

We load the driver in IDA Pro. IDA Pro also reads the _NT_SYMBOL_PATH environment variable. Our goal here is to try and identify any functions that implement compression, as the Microsoft security update indicates.

On the left, the “Functions window” is used to list every function that ID has recognized in the database. For each function, we can see the address of the function, if the function returns to the caller is indicated by the R. These are the values that are detected by IDA, but they can be wrong. If we know more information about a function, we can edit it and change many of its details, such as whether it returns, whether it uses the RBP/EBP register to reference local variables and function arguments, and more.

At this window we press ctrl + f to search for compress and see that there are 5 actually functions whose name relates to compression:

Smb2GetHonorCompressionAlgOrder     .text 00000001C0001660 00000007                   R . . . . . .
Srv2DecompressMessageAsync          .text 00000001C0016980 000000B8 00000028 00000010 R . . . . . .
Srv2DecompressData                  .text 00000001C0017E60 00000152 00000058 00000020 R . . . . . .
Smb2ValidateCompressionCapabilities PAGE  00000001C005600C 000000E2 00000038 00000018 R . . . . . .
Smb2SelectCompressionAlgorithm      PAGE  00000001C0056350 00000064 00000028 00000010 R . . . . . .

We can quickly identify that Smb2GetHonorCompressionAlgOrder has no information that might assist our search.

The Srv2DecompressMessageAsync is a small function that fist calls Srv2DecompressData, among other things.

We will check Srv2DecompressData and if need be, we will get back to Srv2DecompressMessageAsync

After going through the function disassembly we deduce the following decompilation of part of the function Srv2DecompressData:

int Srv2DecompressData(ptrSmbPacket) //rcx
{
	// at the start this function saves RBX, RBP, RSI in the "shadow space"
	// then saves RDI, R14 and R15 in the stack, and then allocates 64 bytes
	// of stack space for local variables
	// sets a "wall" of 0x00 bytes just above the saved return address and 
	// right below the shadow space
	PULONG pToSmbPacket = ptrSmbPacket; // rdi
	PULONG puPtr = *(ptrSmbPacket.ptrSmbPacketData);  // mov rax, [rcx+0F0h]
											// ptrSmbPacketData at offset 0xF0
	PVOID ptrToBuffer;

	if (*(puPtr + 0x24) < 0x10) // cmp     dword ptr [rax+24h], 10h // this is the size of SMB2 COMPRESSION_TRANSFORM_HEADER
		return 0xC000090B;

	puPtr = *(puPtr + 0x18); 		// mov rax, [rax+18h]

	REGXMM0 regXMM0 = *(puPtr);		// movups  xmm0, xmmword ptr [rax]. XMM0 now has our mal header
	puPtr = *(ptrSmbPacket + 0x50); 	// mov rax, [rcx+50h]
	DWORD stackLoc1 = REGXMM0; 			// save REGXMM0 to RSP + 0x30. RSP + 0x30 now has our mal hdr

	ptrSmbPacket = *(puPtr + 0x1F0);	// mov rcx, [rax+1F0h]
	regXMM0 = regXMM0 >> 8;				// psrldq  xmm0, 8

	DWORD stackLoc2 = *(ptrSmbPacket + 0x8C); 	// mov ebp, [rcx+8Ch]. Is this the abs offset from packet
												// 	to the header of the data sent? 

	ptrSmbPacket = regXMM0; 			// ptrSmbPacket now points to 
										// 	_COMPRESSION_TRANSFORM_HEADER
										//  this is our malicious header

	puPtr = (WORD)ptrSmbPacket;		// movzx eax, cx

	// so far we need to know what is at each offset, such as:
	// 		puPtr + 0x18
	//		ptrSmbPacket + 0x50
	//		*(ptrSmbPacket + 0x50) + 0x1F0

	if(stackLoc2 == puPtr) // now our header is in the stack
	{
		puPtr = *(stackLoc1);		// rax points to our header again
		puPtr = puPtr >> 32;		// shr rax, 20h
		ptrSmbPacket = ptrSmbPacket >> 32;  // shr rax, 20h

		ptrToBuffer = SrvNetAllocateBuffer(puPtr + ptrSmbPacket, 0, ...); 
		if (ptrToBuffer == NULL)
			return 0xC000009A;

		// to do loc_1C0017EF7
		__imp_SmbCompressionDecompress();
	}

	return 0xC00000BB;
}

Things to note:

1. Windows x64 by default use the Fastcall calling convention where the first four function parameters are passed via registers in the following order: rcx, rdx, r8 and r9 (left-to-right) Function parameters 5 and above are passed via the stack (right to left).

2. In the fastcall calling convention, registers rcx, rdx, r8 to r11 are volatile. rbx, rbp, rdi, rsi, r12 to r15 are non-volatile. Non-volatile registers should be saved before being modified during the function execution, while volatile registers can be used freely.

3. Space is allocated on the call stack as a shadow store for callees to save those registers.

4. The code listing above is an approximation. It has errors. However, it attempts to show how the actual binary handles the data that we are interested in.

5. The integer overflow occurs at the addition performed to the first parameter passed to the SrvNetAllocateBuffer function call.

The function does some preparation and calls the srvnet module export function SrvNetAllocateBuffer in order to allocate memory for the buffer to be decompressed. Then, depending on the results, it calls the SmbCompressionDecompress function to decompress the buffer. The SmbCompressionDecompress is also exposed by the srvnet module at C:\Windows\System32\drivers\srvnet.sys.

integer overflows

Before proceeding with the dynamic analysis, we need to briefly mention integer overflows.

The integer data type range at 32-bit and 64-bit systems is 32-bit. This means that an integer can range from -2^(32-1) to 2^(32-1) - 1, or from -2147483648 to 2147483647. The MSB (Most Significant Bit) is used as a sign. Likewise, the unsigned integer data type range is also 32-bit. As a result, an unsigned int can range from 0 to 2^32 - 1, or from 0 to 4294967295.

Let’s assume that we have the largest unsigned integer number 0xffffffff (4294967295). If we add 2 to this number, the result we get would require more space to be represented than 32 bits and therefore is truncated wraps around 0. Therefore the result of 0xffffffff + 2 is 1.

These vulnerabilities can have devastating effects, especially when they manifest in the context of memory allocations.

windbg

Having discussed the integer overflows, we will continue with the dynamic analysis. We attach the debugger, load the symbols and using the x command we will list all the symbols of the srv2 driver:

0: kd> x srv2!*

To reduce the output, we will run the following:

0: kd> x srv2!Srv2*compress*

We then set a breakpoint to Srv2DecompressData:

0: kd> bp srv2!Srv2DecompressData and verify it with bl

We also unassemble the function using the uf command:

0: kd> uf srv2DecompressData

The Srv2DecompressData function takes one argument, a pointer to the SMB packet. Since this is the fastcall calling convention, the rcx register holds this pointer.

It should be noted here, that the astute reader will notice that there are some differences between the disassembly listing of windbg and IDA Pro. For example, this instruction in IDA Pro

.text:00000001C0017E9B movups xmmword ptr [rsp+58h+Size], xmm0

is shown like this in windbg

movups xmmword ptr [rsp+30h],xmm0

Or this instruction in IDA Pro

.text:00000001C0017EC8 mov rax, qword ptr [rsp+58h+Size]

is shown like this in the debugger:

mov rax,qword ptr [rsp+30h]

We can speculate as to why that is, but the accurate offsets are the ones that are shown dynamically by windbg.

We run the exploit again, check that the bp is hit and then we see the registers:

r

Then unassemble at rip for 10 lines:

u rip L10

As already discussed, the first few instructions store the non-volatile registers and setup the stack:

0: kd> u rip L10
srv2!Srv2DecompressData:
fffff804`60d77e60 mov     rax,rsp
fffff804`60d77e63 mov     qword ptr [rax+10h],rbx
fffff804`60d77e67 mov     qword ptr [rax+18h],rbp
fffff804`60d77e6b mov     qword ptr [rax+20h],rsi
fffff804`60d77e6f push    rdi
fffff804`60d77e70 push    r14
fffff804`60d77e72 push    r15
fffff804`60d77e74 sub     rsp,40h
fffff804`60d77e78 and     dword ptr [rax+8],0

We unassemble the whole function :

uf .

fffff804`60d77e7c mov     rdi,rcx
fffff804`60d77e7f mov     rax,qword ptr [rcx+0F0h]
fffff804`60d77e86 cmp     dword ptr [rax+24h],10h
fffff804`60d77e8a jb      srv2!Srv2DecompressData+0x134 (fffff804`60d77f94)
...
srv2!Srv2DecompressData+0x134:
fffff804`60d77f94 mov     eax,0C000090Bh
fffff804`60d77f99 mov     rbx,qword ptr [rsp+68h]
fffff804`60d77f9e mov     rbp,qword ptr [rsp+70h]
fffff804`60d77fa3 mov     rsi,qword ptr [rsp+78h]
fffff804`60d77fa8 add     rsp,40h
fffff804`60d77fac pop     r15
fffff804`60d77fae pop     r14
fffff804`60d77fb0 pop     rdi
fffff804`60d77fb1 ret

This part of the disassembly checks whether the size of the SMB2 COMPRESSION_TRANSFORM_HEADER is indeed 0x10 bytes, as per the specification. If it isn’t the function restores the previously saved register values and exits with error code 0xC000090B.

The next disassembly part starts preparing the normal branch and checks the sizes of other parts of the structure (COMPRESSION_TRANSFORM_HEADER.CompressionAlgorithm):

srv2!Srv2DecompressData+0x30:
fffff804`60d77e90 mov     rax,qword ptr [rax+18h]
fffff804`60d77e94 movups  xmm0,xmmword ptr [rax]
fffff804`60d77e97 mov     rax,qword ptr [rcx+50h]
fffff804`60d77e9b movups  xmmword ptr [rsp+30h],xmm0
fffff804`60d77ea0 mov     rcx,qword ptr [rax+1F0h]
fffff804`60d77ea7 psrldq  xmm0,8
fffff804`60d77eac mov     ebp,dword ptr [rcx+8Ch]
fffff804`60d77eb2 movq    rcx,xmm0
fffff804`60d77eb7 movzx   eax,cx
fffff804`60d77eba cmp     ebp,eax
fffff804`60d77ebc je      srv2!Srv2DecompressData+0x68 (fffff804`60d77ec8)

If the COMPRESSION_TRANSFORM_HEADER.CompressionAlgorithm doesn’t match the actual compression algorithm, the function returns with error code 0xC00000BB.

Following this part, the following branch is the one that calculates the sum of the OriginalCompressedSegmentSize + Offset/Length and calls SrvNetAllocateBuffer with this sum as an argument:

srv2!Srv2DecompressData+0x68:
fffff804`60d77ec8 mov     rax,qword ptr [rsp+30h]
fffff804`60d77ecd xor     edx,edx
fffff804`60d77ecf shr     rax,20h
fffff804`60d77ed3 shr     rcx,20h
fffff804`60d77ed7 add     ecx,eax
fffff804`60d77ed9 mov     r10,qword ptr [srv2!_imp_SrvNetAllocateBuffer (fffff804`60da1928)]
fffff804`60d77ee0 call    srvnet!SrvNetAllocateBuffer (fffff804`60bf6730)
fffff804`60d77ee5 mov     rbx,rax
fffff804`60d77ee8 test    rax,rax
fffff804`60d77eeb jne     srv2!Srv2DecompressData+0x97 (fffff804`60d77ef7)  Branch

We can see that rax and rcx are shifted right with the shr instruction for 0x20 (32) bytes. The shr instruction implies that these values are unsigned integer values. The addition done at add ecx,eax is an unsigned one. If the sum of ecx and eax is larger than 0xffffffff then the value will wrap around zero to a much smaller one. The values stored in these registers are completely under the attacker’s control, as indicated by the open specification.

To showcase the above, trace until the xor edx,edx and inspect the registers:

1: kd> r
rax=00000400424d53fc rbx=ffffae8f61978010 rcx=ffffffffffff0001
rdx=ffffae8f61978020 rsi=ffffffffffffffff rdi=ffffae8f61978010
rip=fffff80460d77ecd rsp=ffffe00ca72dbe70 rbp=0000000000000001
 r8=0000000000000000  r9=fffff8045be00000 r10=ffffae8f616617d0
r11=ffffe00ca72dbcd8 r12=0000000000000000 r13=ffffae8f61661880
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl zr na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00040246
srv2!Srv2DecompressData+0x6d:
fffff804`60d77ecd xor     edx,edx

rax has 00000400424d53fc and rcx has ffffffffffff0001

Tracing 3 instructions and checking the registers again indicates:

1: kd> t 3
srv2!Srv2DecompressData+0x6f:
fffff804`60d77ecf shr     rax,20h
srv2!Srv2DecompressData+0x73:
fffff804`60d77ed3 shr     rcx,20h
srv2!Srv2DecompressData+0x77:
fffff804`60d77ed7 add     ecx,eax
1: kd> r
rax=0000000000000400 rbx=ffffae8f61978010 rcx=00000000ffffffff
rdx=0000000000000000 rsi=ffffffffffffffff rdi=ffffae8f61978010
rip=fffff80460d77ed7 rsp=ffffe00ca72dbe70 rbp=0000000000000001
 r8=0000000000000000  r9=fffff8045be00000 r10=ffffae8f616617d0
r11=ffffe00ca72dbcd8 r12=0000000000000000 r13=ffffae8f61661880
r14=0000000000000000 r15=0000000000000000
iopl=0         ov up ei pl nz na po cy
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00040a07
srv2!Srv2DecompressData+0x77:
fffff804`60d77ed7 add     ecx,eax

We can see that rax shifted and has 0x400, as well as rcx now has 0xffffffff.

The addition that is performed in the next instruction will overflow rcx to be 1 byte less than 0x400:

1: kd> t
srv2!Srv2DecompressData+0x79:
fffff804`60d77ed9 mov     r10,qword ptr [srv2!_imp_SrvNetAllocateBuffer (fffff804`60da1928)]
1: kd> r
rax=0000000000000400 rbx=ffffae8f61978010 rcx=00000000000003ff
rdx=0000000000000000 rsi=ffffffffffffffff rdi=ffffae8f61978010
rip=fffff80460d77ed9 rsp=ffffe00ca72dbe70 rbp=0000000000000001
 r8=0000000000000000  r9=fffff8045be00000 r10=ffffae8f616617d0
r11=ffffe00ca72dbcd8 r12=0000000000000000 r13=ffffae8f61661880
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl nz na po cy
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00040207

As expected rcx has become 0x3ff. This is the value that will be passed to SrvNetAllocateBuffer.

Continuing, we can see that if we set a breakpoint after the function returns, no crash occurs. It’s just the system has allocated much less memory than required. As we can imagine, the crash will most likely occur when the system attempts to write data to the buffer that has been allocated.

bp srv2!Srv2DecompressData

u . L25

Break right after the call srvnet!SrvNetAllocateBuffer instruction, at mov rbx, rax (where the return value of the SrvNetAllocateBuffer function is copied in rbx)

bp fffff807'2de97ee0 (offset srv2!Srv2DecompressData+0x85)

We see that as we expected, the function returned.

The next jump will be taken because the function has returned normally and rax is not 0.

The next disassembly chunk prepares the registers for the SmbCompressionDecompress function that will handle the decompression of the buffer received from the SMB packet:

srv2!Srv2DecompressData+0x97:
fffff804`3fda7ef7 488b97f0000000  mov     rdx,qword ptr [rdi+0F0h]
fffff804`3fda7efe 8bcd            mov     ecx,ebp 					// 0x0001 is the CompressionAlgorithm (rcx)
fffff804`3fda7f00 4c8b4818        mov     r9,qword ptr [rax+18h]	// r9 has 0x1000 
fffff804`3fda7f04 8b74243c        mov     esi,dword ptr [rsp+3Ch]   // esi will get 0xffffffff (the offset/length we sent in the header)
fffff804`3fda7f08 448b742434      mov     r14d,dword ptr [rsp+34h]  // r14d will get 0x400, which is the UncompressedSize
fffff804`3fda7f0d 4c03ce          add     r9,rsi 					// offset (esi) is added at 0x1000, which is the OriginalCompressedSegmentSize
fffff804`3fda7f10 448b4224        mov     r8d,dword ptr [rdx+24h]   
fffff804`3fda7f14 488b4218        mov     rax,qword ptr [rdx+18h]  
fffff804`3fda7f18 442bc6          sub     r8d,esi
fffff804`3fda7f1b 488d5610        lea     rdx,[rsi+10h]
fffff804`3fda7f1f 4183e810        sub     r8d,10h
fffff804`3fda7f23 4803d0          add     rdx,rax
fffff804`3fda7f26 488d442460      lea     rax,[rsp+60h]
fffff804`3fda7f2b 4889442428      mov     qword ptr [rsp+28h],rax
fffff804`3fda7f30 4489742420      mov     dword ptr [rsp+20h],r14d
fffff804`3fda7f35 4c8b1574990200  mov     r10,qword ptr [srv2!_imp_SmbCompressionDecompress (fffff804`3fdd18b0)]
fffff804`3fda7f3c e86f65e7ff      call    srvnet!SmbCompressionDecompress (fffff804`3fc1e4b0)

The offset 0xF0 from rdi points at 0x18 bytes before the SMB header that we sent.

To better understand this: rdi points at an address. We add 0xF0 bytes at this address and we dereference this address and we get a new address (doulbe pointer). At this new address, we add 0x18 bytes and we have a pointer to the SMB header we sent with the exploit we run:

0: kd> dq rdi L2
ffffbb8b`17e3e010  00000005`00000001 000006c0`000000dc
0: kd> dq rdi+0xF0 L2
ffffbb8b`17e3e100  ffffbb8b`1bf7d150 ffffbb8b`1bf7d150
0: kd> dq ffffbb8b`1bf7d150 + 0x18 L2
ffffbb8b`1bf7d168  ffffbb8b`1bf7c050 00000410`00001100
0: kd> dq ffffbb8b`1bf7c050 L4
ffffbb8b`1bf7c050  00000400`424d53fc ffffffff`ffff0001
ffffbb8b`1bf7c060  41414141`41414141 41414141`41414141

Or, we can calculate this directly with:

0: kd> dq poi(poi(rdi+0xF0)+0x18) L4
ffffbb8b`1bf7c050  00000400`424d53fc ffffffff`ffff0001
ffffbb8b`1bf7c060  41414141`41414141 41414141`41414141

Therefore rdx has what is pointed by rdi + 0xF0. 7 instructions below, we see that what is contained at rdx + 0x18 is stored in rax. Therefore, rax at that point will contain a pointer to our SMB header. rdx is modified to contain the memory location calculated by rsi + 0x10 (it’s not dereferenced). This is a pointer to the compressed buffer that contains our data. Register rcx gets the value of 0x1 which is the CompressionAlgorithm. Register r9 (fourth function parameter) gets a pointer that points to the sum of an offset 0x18 from the allocated buffer plus the address of the SMB header we sent. r8 has a value that is inside our SMB packet that is calculated by 0x240 bytes inside the SMB packet plus 0x36 minus our offset/length minus the size of the SMB COMPRESSION_TRANSFORM_HEADER (the compression header we sent).

Finally, 2 more parameters are passed via the stack, as per the fastcall calling convention: r14d as the UncompressedSize at rsp + 0x20 and rax gets value that points to 0 and pushed at rsp+0x28.

In the same manner, we set a bp at where the call srvnet!SmbCompressionDecompress instruction is:

bp fffff804'3fda7f3c

We unassemble srvnet!SmbCompressionDecompress and set a breakpoint right after this call instruction (in order know where the actual crash occurs):

uf srvnet!SmbCompressionDecompress

bp fffff802'70787f41 (@ instruction test eax,eax)

We run gu to go up until the srvnet!SmbCompressionDecompress returns.

However, our last breakpoint is never reached, as the crash occurs somewhere inside the srvnet!SmbCompressionDecompress function.

As the open specification says, this is the structure of the SMB2 COMPRESSION_TRANSFORM_HEADER:

typedef struct SMB2COMPRESSION_TRANSFORM_HEADER
{
	UCHAR ProtocolId[4];				// MUST be 0x424D53FC (\xfc\x53\x4d\x42 == \xfc'SMB')
	UINT OriginalCompressedSegmentSize; // 4 bytes
	WORD CompressionAlgorithm; 			// 2 bytes
	WORD Flags; 						// 2 bytes
	DWORD OffsetOrLength; 				// 4 bytetes
} SMB2COMPRESSION_TRANSFORM_HEADER;

Going through the srvnet!SmbCompressionDecompress:

We now know that the crash occurs somewhere in this function srvnet!SmbCompressionDecompress, or a function called within this one, therefore we can simply break to it:

bp srvnet!SmbCompressionDecompress

uf srvnet!SmbCompressionDecompress

Going through the function srvnet!SmbCompressionDecompress we set a bp at test ecx,ecx instead of tracing (ecx is 1 and its the CompressionAlgorithm of our SMB header), right before the jne srvnet!SmbCompressionDecompress+0x39 instruction. The jump should be taken.

bp fffff805'7096e4db

Our breakpoint is hit and we just keep tracing (t) until we reach around srvnet!SmbCompressionDecompress+0x59 where a call to nt!RtlGetCompressionWorkSpaceSize is made.

nt!RtlGetCompressionWorkSpaceSize is a very small function, exported by the kernel (ntoskrnl.exe) and is documented in MSDN.

Its parameters: USHORT CompressionFormatAndEngine: ecx got its value (2) from bx. PULONG CompressBufferWorkSpaceSize: rdx is a pointer to a buffer and gets its value from the stack lea rdx,[rsp+70h]. Will receive the number of bytes required to compress a buffer. PULONG CompressFragmentWorkSpaceSize: r8 is a pointer to a buffer and also gets its value from the stack lea r8,[rsp+40h]. Will receive the number of bytes required to decompress a buffer.

Following that function, we see that a guard_dispatch_icall is reached and called. This fuction is part of the kernel-mode CFG / CFI (Control Flow Guard/Control Flow Integrity) implementation on Windows. As this is a big subject on its own and out of scope of this blogspot, we won’t mention anything else, besides that this basically is an exploit mitigation regarding the control of indirect calls (such as call eax). It essentially validates that the target memory address of such a call is indeed a start of an actual function. This way, ROP based exploitation (since ROP gadgets are jumping in the middle, or the end, of functions) is mitigated, or is significantly hindered.

Breaking at nt!RtlGetCompressionWorkSpaceSize+0x35 which is the call nt!guard_dispatch_icall instruction we check the value of the rax register and we notice that the execution will eventually jump to it. Before following the call, we set a breakpoint at nt!RtlGetCompressionWorkSpaceSize+0x3a, where the execution will return. Now we unassemble function pointed by the rax register and we see:

1: kd> uf rax
nt!RtlCompressWorkSpaceSizeLZNT1:
fffff802`7323de00 6685c9          test    cx,cx
fffff802`7323de03 0f85effa0000    jne     nt!RtlCompressWorkSpaceSizeLZNT1+0xfaf8 (fffff802`7324d8f8)  Branch

nt!RtlCompressWorkSpaceSizeLZNT1+0x9:
fffff802`7323de09 c70220000100    mov     dword ptr [rdx],10020h

nt!RtlCompressWorkSpaceSizeLZNT1+0xf:
fffff802`7323de0f 41c70000100000  mov     dword ptr [r8],1000h
fffff802`7323de16 33c0            xor     eax,eax
fffff802`7323de18 c3              ret

nt!RtlCompressWorkSpaceSizeLZNT1+0xfaf8:
fffff802`7324d8f8 b800010000      mov     eax,100h
fffff802`7324d8fd 663bc8          cmp     cx,ax
fffff802`7324d900 750b            jne     nt!RtlCompressWorkSpaceSizeLZNT1+0xfb0d (fffff802`7324d90d)  Branch

nt!RtlCompressWorkSpaceSizeLZNT1+0xfb02:
fffff802`7324d902 c70220000000    mov     dword ptr [rdx],20h
fffff802`7324d908 e90205ffff      jmp     nt!RtlCompressWorkSpaceSizeLZNT1+0xf (fffff802`7323de0f)  Branch

nt!RtlCompressWorkSpaceSizeLZNT1+0xfb0d:
fffff802`7324d90d b8bb0000c0      mov     eax,0C00000BBh
fffff802`7324d912 c3              ret

As expected, this is indeed a function that matches the LZNT1 compression; that is the value (0x1) that we set in our SMB Header. 2.2.3.1.3 SMB2_COMPRESSION_CAPABILITIES.

This is an undocumented function, which resides inside the ntoskrnl.exe, as can be seen in the following screenshot.

After continuing, we are landing at the cleanup of the nt!RtlGetCompressionWorkSpaceSize function. We will return to srvnet!SmbCompressionDecompress, at offset 0x72 where test eax,eax is run in order to check the result of this call and the jump will not be taken.

We check registers again

r

We see rax being 0, therefore the js won’t be executed.

Next, we see a mov edx,dword ptr [rsp+70h] instruction.

We check where RSP points at by dps rsp or limit that to show 5 lines with dps rsp L5.

Next, we check the value that dps rsp+0x70 L1 points at. This should end up as the value inside edx.

Next, mov ecx, 200h is executed.

We see that a call nt!ExAllocatePoolWithTag is made. As this is x64 fastcall calling convention (left-to-right order C, D, 8, 9) (args 5 and up are passed through the stack, right-to-left). MSDN mentions that this function takes 3 arguments:

1. PoolType, which is the type of pool memory to allocate: This is the 0x200 stored in ecx.

2. NumberOfBytes, which is the number of bytes to allocate: This is the number of bytes, stored in edx (rightmost 4 bytes of rcx as per the dword ptr suffix)

3. Tag, which is the pool tag to use for the allocated memory. This is the 0x2532534C value that is stored in r8.

The function returns a VOID pointer to the memory pool. rax will have the ptr to this pool.

We expect that this function should return normally, as it just allocates a pool of kernel memory with size of 0x10020 bytes. We set a breakpoint at the instruction right after it and then type gu.

After the call, we see that the result is stored in rdi. So rdi points to that kernel pool memory. Following that, there is an error check whether rax was zero. If rax is not zero -meaning that the function returned a valid result and no error- the code goes to srvnet!SmbCompressionDecompress+0xa0, else most likely an error result is returned (srvnet!SmbCompressionDecompress+0xfe), initially stored in rbx and then copied in rax. Then after some cleanup, the function returns. The error code will be 0x0C000009A.

The assembly listing at srvnet!SmbCompressionDecompress+0xa0 prepares the function call for the RtlDecompressBufferEx2 function, which is a documented function in MSDN, exported by ntoskrnl.exe, as can be seen in the following screenshot.

srvnet!SmbCompressionDecompress+0xa0:
fffff802`7796e550 488bb42498000000 mov     rsi,qword ptr [rsp+98h]
fffff802`7796e558 4d8bcf          mov     r9,r15
fffff802`7796e55b 48897c2438      mov     qword ptr [rsp+38h],rdi
fffff802`7796e560 498bd6          mov     rdx,r14
fffff802`7796e563 4889742430      mov     qword ptr [rsp+30h],rsi
fffff802`7796e568 0fb7cb          movzx   ecx,bx
fffff802`7796e56b c744242800100000 mov     dword ptr [rsp+28h],1000h
fffff802`7796e573 896c2420        mov     dword ptr [rsp+20h],ebp
fffff802`7796e577 8bac2490000000  mov     ebp,dword ptr [rsp+90h]
fffff802`7796e57e 448bc5          mov     r8d,ebp
fffff802`7796e581 4c8b1518400100  mov     r10,qword ptr [srvnet!_imp_RtlDecompressBufferEx2 (fffff802`779825a0)]
fffff802`7796e588 e8237d46fb      call    nt!RtlDecompressBufferEx2 (fffff802`72dd62b0)
fffff802`7796e58d 8bd8            mov     ebx,eax

As described in MSDN, this function is a multi-core decompression function. We have allocated 2 cores in our testing VM and we haven’t tested with 1, in order to check whether RtlDecompressBufferEx would be called instead. The function takes 8 arguments, and to the fastcall only the 4 arguments are passed to registers left to right, the rest are stored in the stack right to left, therefore:

1.CompressionFormat is a bitmask that specifies the compression format => rcx becomes 0x2 (from movzx ecx,bx instruction), which is COMPRESSION_FORMAT_LZNT1.

2.UncompressedBuffer pointer to the uncompressed buffer. This is where the function will store the uncompressed data => rdx gets 0xffffc690eaf3104e from r14

3.UncompressedBufferSize size of the unc buffer in bytes. r8 gets value from ebp. Note that r8d references the lower dword part of the 64-bit r8 register. ebp takes the value 0x400 from the instruction mov ebp, dword ptr [rsp+90h] and this value is passed to r8:

dps rsp+0x90
ffff8087`b1dafe90  00000000`00000400

4.CompressedBuffer pointer to the buffer that contains the data to compress => r9 gets 0xffffc690eb11d05e from r15.

5.CompressedBufferSize size of the compressed buffer in bytes. This is stored in stack.

6.UncompressedChunkSize size of each chunk within the compr buffer (should be 512, 1024, 2048 or 4096). Also stored in the stack. This is 0x1000 (that is 4096 bytes), from the mov dword ptr [rsp+28h],1000h instruction.

7.FinalUncompressedSize pointer to a var that stores the size in bytes of the decompressed data. Stored in the stack.

8.WorkSpace pointer to the work space buffer. Also stored in stack.

Tracing right before the function call and checking the registers and stack:

1: kd> r r8, r9, rcx, rdx
r8=0000000000000400 r9=ffffc690eb11d05e rcx=0000000000000002 rdx=ffffc690eaf3104e
1: kd> dps rsp
ffff8087`b1dafe00  00000000`00000002
ffff8087`b1dafe08  00000000`00000402
ffff8087`b1dafe10  00000000`ffffffff
ffff8087`b1dafe18  00000000`00000000
ffff8087`b1dafe20  00000000`00000402 <== CompressedBufferSize
ffff8087`b1dafe28  fffff802`00001000 <== UncompressedChunkSize
ffff8087`b1dafe30  ffff8087`b1dafed0 <== FinalUncompressedSize
ffff8087`b1dafe38  ffffc68f`eab97000 <== WorkSpace (AllocatePoolWithTag)
ffff8087`b1dafe40  ffff8087`00001000
ffff8087`b1dafe48  00000000`00000018
ffff8087`b1dafe50  00000000`00000000
ffff8087`b1dafe58  00000000`00000400
ffff8087`b1dafe60  ffffc68f`eb31d820
ffff8087`b1dafe68  fffff802`77417f41 srv2!Srv2DecompressData+0xe1

Tracing inside the RtlDecompressBufferEx2 we see that after saving rbx, which is a non-volatile register, and setting the stack space, the CompressionFormat is ANDed with 0xFF. Then it is compared with 0x2 and if it is below, the function returns with code 0xC000000D. That ANDed value is again compaired with 0x4 and if it is above it, the function returns with code 0xC000025F.
Otherwise, r9 stores the CompressedBufferSize, eax stores the CompressionFormat, edx stores the uncompressed size (0x400), rcx will temporarily store the address of nt!RtlDecompressBufferProcs, rax will store an offset from rcx + 2*8 (array of function pointers) and it will point to function nt!RtlDecompressBufferLZNT1. rcx will store the WorkSpace (ffffc68f'eab97000) and will write it to another offset in the stack (rsp+0x30). The same way, in stack offset 0x28 the ffff8087'b1dafed0 value will be stored, which is the ptr to FinalUncompressedSize and in stack offset 0x20 the value 0x00001000 will be stored, which is the UncompressedChunkSize. Then, rbx will be saved in rcx and RtlDecompressBufferLZNT1 will be called via the mechanism described above.

The stack will look like this:

1: kd> dps rsp
ffff8087`b1dafdb0  00000000`00000200
ffff8087`b1dafdb8  00000000`00000400
ffff8087`b1dafdc0  00000000`00000000
ffff8087`b1dafdc8  fffff802`72dd62b2 nt!RtlDecompressBufferEx2+0x2
ffff8087`b1dafdd0  00000000`00001000 <== UncompressedChunkSize (rsp+0x20)
ffff8087`b1dafdd8  ffff8087`b1dafed0 <== FinalUncompressedSize
ffff8087`b1dafde0  ffffc68f`eab97000 <== WorkSpace
ffff8087`b1dafde8  00000000`00000018
ffff8087`b1dafdf0  00000000`00000002
ffff8087`b1dafdf8  fffff802`7796e58d srvnet!SmbCompressionDecompress+0xdd
ffff8087`b1dafe00  00000000`00000002
ffff8087`b1dafe08  00000000`00000402

Following, the CFG we end up at nt!RtlDecompressBufferLZNT1 with this stack (memory addresses have changed due to reboot):

0: kd> dps rsp
fffff10f`fae12da8  fffff804`7214b316 nt!RtlDecompressBufferEx2+0x66
fffff10f`fae12db0  00000000`00000200
fffff10f`fae12db8  00000000`00000400
fffff10f`fae12dc0  00000000`00000000
fffff10f`fae12dc8  fffff804`7214b2b2 nt!RtlDecompressBufferEx2+0x2
fffff10f`fae12dd0  00000000`00001000 <== UncompressedChunkSize (rsp+0x20)
fffff10f`fae12dd8  fffff10f`fae12ed0 <== FinalUncompressedSize
fffff10f`fae12de0  ffff8889`37a5a000 <== WorkSpace
fffff10f`fae12de8  00000000`00000018
fffff10f`fae12df0  00000000`00000002

In this function we will see that an rbp stack frame is used, and the function arguments are referenced using the rbp register - an offset. Following this function we see that the system will crash when attempting to dereference rsi and write the DWORD sized result to ebx, while inside the RtlDecompressBufferLZNT1.

By exploiting this vulnerability, we tried to showcase the code flow of our buffer inside the various functions of the srv2.sys, the srvnet.sys and the Windows kernel. In theory, this vulnerability can be exploited in an attempt to gain remote code execution, however many significant Windows exploit protections will need to be bypassed such as KASLR, CFG and SMAP.

There are different PoCs that can be used to showcase the vulnerability and manifest it in a slightly different manner, which will end up to different function calls. In any case, we hoped you enjoyed it as much as we did!

useful pointers:

bp srv2!Srv2DecompressData+0x79
bp srv2!Srv2DecompressData+0x97
bp srvnet!SmbCompressionDecompress+0x6d
bp nt!RtlGetCompressionWorkSpaceSize+0x3a
bp srvnet!SmbCompressionDecompress+0xd1 
bp nt!RtlDecompressBufferEx2
bp nt!RtlDecompressBufferEx2+0x61
bp nt!RtlDecompressBufferLZNT1
bp nt!RtlDecompressBufferLZNT1+0x2b
bp nt!RtlDecompressBufferLZNT1+0x49