Introduction
In this post, we will aim to touch on the CVE-2020-0796 vulnerability, which was initially publicly disclosed by Microsoft.
This post also attempts to illustrate basic windbg
usage for dynamic vulnerability analysis. IDA disassembler will also be used in some parts.
Feel free to contact us for any errors, comments or additions.
CVE-2020-0796
As discussed in the security update, a “remote code execution vulnerability exists in the way that the Microsoft Server Message Block 3.1.1 (SMBv3) protocol handles certain requests. An attacker who successfully exploited the vulnerability could gain the ability to execute code on the target server or client”.
The vulnerability can be triggered by sending a malicious packet at an SMBv3 server or SMBv3 enabled client. In addition, the vulnerability can be triggered without prior authentication; simple network access to an unpatched, exposed SMBv3 server is enough. The same goes for the client: if the client attempts to connect to a malicious SMBv3 server, the vulnerability can be triggered on the client side.
The vulnerability affects the default installations of the following Windows versions:
-
Windows 10 1903 for x86, x64 as well as ARM64-based systems
-
Windows 10 1909 for x86, x64 as well as ARM64-based systems
-
Windows server 1903 server core
-
Windows server 1909 server core
Based on the above, the vulnerability severity is rated as Critical.
Unintended disclosure
Directly, no further details had been provided by the security update, however, as a workaround, MS recommends to disable SMBv3 compression. Specifically, one of the “new” features that was introduced at the new SMB3.1.1 dialect is compression and Windows versions prior to 1903 aren’t vulnerable, since this is part of a feature update that was introduced in Windows 1903 and later versions.
Windows kernel debugging setup
To be able to investigate the vulnerability, we should setup a lab that enables us to debug the Windows Kernel. We are going to need two virtual machines if using Linux or macOS hosts, or just one, if we use Windows as a host.
In this guide we are going to assume that two VMs are being used, as we are using a Linux host to do the investigation. Also, we also use VMware Workstation for virtualization.
vms
VM1: Windows 10 that will run the windbg
debugger. We call this machine “debugger”. In our setup, we used Windows 10 Enterprise 1809 as the debugger.
VM2: Windows 10 that will have its kernel debugged. We will call this machine “debuggee”. In our setup, we used Windows 10 Enterprise 1909 OS Build 18363.592 as the debuggee
debugger
Install the following:
-
Visual Studio (2019), community will do
-
Latest Windows 10 SDK, included in the VS 2019
-
Windows Driver Kit (DDK), also included in VS 2019 as “Debugging tools for Windows”
Setup Microsoft Symbols: we believe that the most effective way to configure the MS symbols at your machine is to use an environment variable. Create a new system environment variable named _NT_SYMBOL_PATH
and set its value to srv*c:\symbols*http://msdl.microsoft.com/download/symbols
Edit the VM .vmx
file in order to remove any references to serial0
which, by default, is used for printing.
Inside the VM .vmx
file, add the following lines:
serial0.present = "TRUE"
serial0.fileType = "pipe"
serial0.yieldOnMsrRead = "TRUE"
serial0.fileName = "/tmp/kerneldbg"
serial0.pipe.endPoint = "server"
serial0.tryNoRxLoss = "TRUE"
if using macOS, change the
serial0.fileName
to be"/private/tmp/com1"
if using a Windows host and want to debug via the network, which gives much faster performance, add a new Network Card from your host to the debuggee, and set it to
host-only
.
debuggee generic
Run cmd.exe
and: sc config wuauserv start= disabled
and then sc stop wuauserv
in order to disable and stop the Windows Update service.
To make sure that the Windows update service doesn’t start on its own, open gpedit.msc
and go to Computer Configuration\Administrative Templates\Windows Components\Windows Update
-> Configure Automatic Updates
and set it to Disabled
debuggee over serial port
Edit the VM .vmx
file in order to remove any references to serial0
which, by default, is used for printing.
Inside the .vmx
file, add the following lines:
serial0.present = "TRUE"
serial0.fileType = "pipe"
serial0.yieldOnMsrRead = "TRUE"
serial0.fileName = "/tmp/kerneldbg"
serial0.pipe.endPoint = "client"
serial0.tryNoRxLoss = "TRUE"
if using macOS, change the
serial0.fileName
to be"/private/tmp/com1"
Open cmd.exe
as Administrator
and run: bcdedit /debug on
and bcdedit /dbgsettings serial debugport:1 baudrate:115200
.
debuggee over network
If you are using a Windows host and want to debug the machine via network:
Open Device Manager
=> Network Adapters
and select the second adapter you added => General
tab.
Write down the Location
. It is something like PCI Slot 256 (PCI bus 27, device 0, function 0)
.
You can get these parameters using powershell as follows
Get-NetAdapterHardwareInfo -InterfaceDescription *Intel* | select Name, InterfaceDescription, DeviceType, Busnumber, Devicenumber, Functionnumber | FL
Open cmd.exe
as Administrator
and run:
bcdedit /debug on
bcdedit /dbgsettings net hostip:192.168.50.1 port:50000 key:my.secure.key.here
bcdedit /set "{dbgsettings}" busparams 27.0.0
The busparams
has the format BusNumber
.Devicenumber
.FunctionNumber
As you notice, the hostip
is the IP address of the VMware host at the host-only interface (the debugger
machine). Here, you could potentially use the IP address of another Virtual Machine that you chose to be the debugger. We haven’t tested this setup though.
And then: shutdown -r -t 0
verify the setup works
To verify that the setup works, start the debugger first, run windbg
and then select File > Kernel Debug...
(or Ctrl
+ K
). Go to the tab COM
. Make sure baud rate
is set to 115200
and port
is set to com1
. Pipe
and reconnect
should NOT be checked.
if you are debugging over network using a windows host select
File > Kernel Debug...
(orCtrl
+K
) and at theNET
tab, set thePort
andkey
to have the value you entered at the configuration at the previous step.
Then, start the debuggee.
During the debuggee boot process, in windbg
running on the debugger, you should be able output similar to this:
Once the debuggee has booted, you should see output something similar to this:
windbg usage
Once the debuggee has booted, press g
for it to continue to run and Ctrl + Break
in order to break. Some laptops that don’t have the break/pause key try using Ctrl + Fn + B
or Alt + Del
to break.
In windbg
you can also setup and save your workspace the way you like it and everytime you debug something, the window arrangement and setup will be the one that you saved.
To do it, setup your windows the way you want them and then select File
-> Save Workspace
and then File
-> Save Workspace As
.
Once you have selected your setup and have the debuggee connected successfully, to verify that the symbols are downloaded and work as expected:
1: kd> !sym noisy
1: kd> .reload -f
This process might take a few minutes, depending on your Internet connection speed. Once done, you can verify that the symbols are downloaded and work as expected by running the lm
command to List Loaded Modules.
Specifically, the lm
command will list all currently loaded modules, as well as drivers. The output should be similar to this:
Run vertarget
to display the debuggee version.
All windbg
commands can be viewed by running the .hh
meta-command.
Vulnerability details
The vulnerability actually occurs in the driver code that implements the SMB service, srv2.sys
, located in C:\Windows\System32\drivers\
. More specifically, the vulnerability is an integer overflow that occurs when the client (or the server) sends a malicious SMBv3 packet.
Microsoft has published “Open Specifications Documentation” regarding the internals of its implementation of the SMB protocol. As already discussed, the SMB2 protocol implements a “dialect” called SMBv3.1.1. SMB2 version 3.1.1 isn’t called a protocol on its own by Microsoft. It is a dialect of SMB2.
We are interested in the compression that is supported for SMBv3.1.1 dialect of SMB 2.
There are currently three dialect families of the SMB 2 Protocol:
Dialect Family | Dialect Revisions | Revision Code |
---|---|---|
SMB 2.0.2 | SMB 2.0.2 dialect revision | 0x0202 |
SMB 2.1 | SMB 2.1 dialect revision | 0x0210 |
SMB 3.x | SMB 3.0 dialect revision | 0x0300 |
SMB 3.x | SMB 3.0.2 dialect revision | 0x0302 |
SMB 3.x | SMB 3.1.1 dialect revision | 0x0311 |
Specifically, the SMB 3.1.1 dialect introduces the following enhancements:
-
Supporting the negotiation of encryption and integrity algorithms.
-
Enhanced protection of negotiation and session establishment.
-
Reconnecting with a specified dialect.
-
Supporting the compression of messages between client and server.
The Microsoft web page provides a wealth of information regarding the implementation of the SMB 3.1.1 dialect. The sections that we will mostly use are 2.2.42 and 2.2.3.1.3.
PoC used
The PoC exploit that we will use to reproduce and analyze the vulnerability uses the latest impacket library version and is listed here.
windbg
We attach the kernel debugger and
.reload -f
to force reload symbols
If the above takes a lot of time, break
and then:
!sym noisy
to view the progress
and then run .reload -f
again and g
or F5
We proceed to run the poc.
Our debugger breaks and we press !analyze -v
When the bug check occured, this was the status of the stack (partial):
...
ffff9e85`fbdffb90 fffff800`296ebb37 : nt!KiPageFault+0x360
ffff9e85`fbdffd20 fffff800`29130326 : nt!RtlDecompressBufferLZNT1+0x57
ffff9e85`fbdffdb0 fffff800`2dc7e58d : nt!RtlDecompressBufferEx2+0x66
ffff9e85`fbdffe00 fffff800`2dd27f41 : srvnet!SmbCompressionDecompress+0xdd
ffff9e85`fbdffe70 fffff800`2dd2699e : srv2!Srv2DecompressData+0xe1
ffff9e85`fbdffed0 fffff800`2dd69a9f : srv2!Srv2DecompressMessageAsync+0x1e
ffff9e85`fbdfff00 fffff800`291c4dde : srv2!RfspThreadPoolNodeWorkerProcessWorkItems+0x13f
ffff9e85`fbdfff80 fffff800`291c4d9c : nt!KxSwitchKernelStackCallout+0x2e
ffff9e85`fc78a970 fffff800`2906a16e : nt!KiSwitchKernelStackContinue
...
From the above we can see that function srv2!Srv2DecompressMessageAsync
calls srv2!Srv2DecompressData
, which calls srvnet!SmbCompressionDecompress
, which calls nt!RtlDecompressBufferEx2
, which calls nt!RtlDecompressBufferLZNT1
and then a Page Fault occurs.
IDA Pro
We load the driver in IDA Pro. IDA Pro also reads the _NT_SYMBOL_PATH
environment variable. Our goal here is to try and identify any functions that implement compression, as the Microsoft security update indicates.
On the left, the “Functions window” is used to list every function that ID has recognized in the database. For each function, we can see the address of the function, if the function returns to the caller is indicated by the R
. These are the values that are detected by IDA, but they can be wrong. If we know more information about a function, we can edit it and change many of its details, such as whether it returns, whether it uses the RBP
/EBP
register to reference local variables and function arguments, and more.
At this window we press ctrl + f
to search for compress
and see that there are 5 actually functions whose name relates to compression:
Smb2GetHonorCompressionAlgOrder .text 00000001C0001660 00000007 R . . . . . .
Srv2DecompressMessageAsync .text 00000001C0016980 000000B8 00000028 00000010 R . . . . . .
Srv2DecompressData .text 00000001C0017E60 00000152 00000058 00000020 R . . . . . .
Smb2ValidateCompressionCapabilities PAGE 00000001C005600C 000000E2 00000038 00000018 R . . . . . .
Smb2SelectCompressionAlgorithm PAGE 00000001C0056350 00000064 00000028 00000010 R . . . . . .
We can quickly identify that Smb2GetHonorCompressionAlgOrder
has no information that might assist our search.
The Srv2DecompressMessageAsync
is a small function that fist calls Srv2DecompressData
, among other things.
We will check Srv2DecompressData
and if need be, we will get back to Srv2DecompressMessageAsync
After going through the function disassembly we deduce the following decompilation of part of the function Srv2DecompressData
:
int Srv2DecompressData(ptrSmbPacket) //rcx
{
// at the start this function saves RBX, RBP, RSI in the "shadow space"
// then saves RDI, R14 and R15 in the stack, and then allocates 64 bytes
// of stack space for local variables
// sets a "wall" of 0x00 bytes just above the saved return address and
// right below the shadow space
PULONG pToSmbPacket = ptrSmbPacket; // rdi
PULONG puPtr = *(ptrSmbPacket.ptrSmbPacketData); // mov rax, [rcx+0F0h]
// ptrSmbPacketData at offset 0xF0
PVOID ptrToBuffer;
if (*(puPtr + 0x24) < 0x10) // cmp dword ptr [rax+24h], 10h // this is the size of SMB2 COMPRESSION_TRANSFORM_HEADER
return 0xC000090B;
puPtr = *(puPtr + 0x18); // mov rax, [rax+18h]
REGXMM0 regXMM0 = *(puPtr); // movups xmm0, xmmword ptr [rax]. XMM0 now has our mal header
puPtr = *(ptrSmbPacket + 0x50); // mov rax, [rcx+50h]
DWORD stackLoc1 = REGXMM0; // save REGXMM0 to RSP + 0x30. RSP + 0x30 now has our mal hdr
ptrSmbPacket = *(puPtr + 0x1F0); // mov rcx, [rax+1F0h]
regXMM0 = regXMM0 >> 8; // psrldq xmm0, 8
DWORD stackLoc2 = *(ptrSmbPacket + 0x8C); // mov ebp, [rcx+8Ch]. Is this the abs offset from packet
// to the header of the data sent?
ptrSmbPacket = regXMM0; // ptrSmbPacket now points to
// _COMPRESSION_TRANSFORM_HEADER
// this is our malicious header
puPtr = (WORD)ptrSmbPacket; // movzx eax, cx
// so far we need to know what is at each offset, such as:
// puPtr + 0x18
// ptrSmbPacket + 0x50
// *(ptrSmbPacket + 0x50) + 0x1F0
if(stackLoc2 == puPtr) // now our header is in the stack
{
puPtr = *(stackLoc1); // rax points to our header again
puPtr = puPtr >> 32; // shr rax, 20h
ptrSmbPacket = ptrSmbPacket >> 32; // shr rax, 20h
ptrToBuffer = SrvNetAllocateBuffer(puPtr + ptrSmbPacket, 0, ...);
if (ptrToBuffer == NULL)
return 0xC000009A;
// to do loc_1C0017EF7
__imp_SmbCompressionDecompress();
}
return 0xC00000BB;
}
Things to note:
1. Windows x64 by default use the Fastcall calling convention where the first four function parameters are passed via registers in the following order: rcx
, rdx
, r8
and r9
(left-to-right) Function parameters 5 and above are passed via the stack (right to left).
2. In the fastcall calling convention, registers rcx
, rdx
, r8
to r11
are volatile. rbx
, rbp
, rdi
, rsi
, r12
to r15
are non-volatile. Non-volatile registers should be saved before being modified during the function execution, while volatile registers can be used freely.
3. Space is allocated on the call stack as a shadow store for callees to save those registers.
4. The code listing above is an approximation. It has errors. However, it attempts to show how the actual binary handles the data that we are interested in.
5. The integer overflow occurs at the addition performed to the first parameter passed to the SrvNetAllocateBuffer
function call.
The function does some preparation and calls the srvnet
module export function SrvNetAllocateBuffer
in order to allocate memory for the buffer to be decompressed. Then, depending on the results, it calls the SmbCompressionDecompress
function to decompress the buffer. The SmbCompressionDecompress
is also exposed by the srvnet
module at C:\Windows\System32\drivers\srvnet.sys
.
integer overflows
Before proceeding with the dynamic analysis, we need to briefly mention integer overflows.
The integer data type range at 32-bit and 64-bit systems is 32-bit. This means that an integer can range from -2^(32-1)
to 2^(32-1) - 1
, or from -2147483648
to 2147483647
. The MSB (Most Significant Bit) is used as a sign. Likewise, the unsigned integer data type range is also 32-bit. As a result, an unsigned int can range from 0
to 2^32 - 1
, or from 0
to 4294967295
.
Let’s assume that we have the largest unsigned integer number 0xffffffff
(4294967295). If we add 2
to this number, the result we get would require more space to be represented than 32 bits and therefore is truncated wraps around 0
. Therefore the result of 0xffffffff + 2
is 1
.
These vulnerabilities can have devastating effects, especially when they manifest in the context of memory allocations.
windbg
Having discussed the integer overflows, we will continue with the dynamic analysis. We attach the debugger, load the symbols and using the x
command we will list all the symbols of the srv2
driver:
0: kd> x srv2!*
To reduce the output, we will run the following:
0: kd> x srv2!Srv2*compress*
We then set a breakpoint to Srv2DecompressData
:
0: kd> bp srv2!Srv2DecompressData
and verify it with bl
We also unassemble the function using the uf
command:
0: kd> uf srv2DecompressData
The Srv2DecompressData
function takes one argument, a pointer to the SMB packet. Since this is the fastcall calling convention, the rcx
register holds this pointer.
It should be noted here, that the astute reader will notice that there are some differences between the disassembly listing of windbg
and IDA Pro. For example, this instruction in IDA Pro
.text:00000001C0017E9B movups xmmword ptr [rsp+58h+Size], xmm0
is shown like this in windbg
movups xmmword ptr [rsp+30h],xmm0
Or this instruction in IDA Pro
.text:00000001C0017EC8 mov rax, qword ptr [rsp+58h+Size]
is shown like this in the debugger:
mov rax,qword ptr [rsp+30h]
We can speculate as to why that is, but the accurate offsets are the ones that are shown dynamically by windbg
.
We run the exploit again, check that the bp is hit and then we see the registers:
r
Then unassemble at rip for 10 lines:
u rip L10
As already discussed, the first few instructions store the non-volatile registers and setup the stack:
0: kd> u rip L10
srv2!Srv2DecompressData:
fffff804`60d77e60 mov rax,rsp
fffff804`60d77e63 mov qword ptr [rax+10h],rbx
fffff804`60d77e67 mov qword ptr [rax+18h],rbp
fffff804`60d77e6b mov qword ptr [rax+20h],rsi
fffff804`60d77e6f push rdi
fffff804`60d77e70 push r14
fffff804`60d77e72 push r15
fffff804`60d77e74 sub rsp,40h
fffff804`60d77e78 and dword ptr [rax+8],0
We unassemble the whole function :
uf .
fffff804`60d77e7c mov rdi,rcx
fffff804`60d77e7f mov rax,qword ptr [rcx+0F0h]
fffff804`60d77e86 cmp dword ptr [rax+24h],10h
fffff804`60d77e8a jb srv2!Srv2DecompressData+0x134 (fffff804`60d77f94)
...
srv2!Srv2DecompressData+0x134:
fffff804`60d77f94 mov eax,0C000090Bh
fffff804`60d77f99 mov rbx,qword ptr [rsp+68h]
fffff804`60d77f9e mov rbp,qword ptr [rsp+70h]
fffff804`60d77fa3 mov rsi,qword ptr [rsp+78h]
fffff804`60d77fa8 add rsp,40h
fffff804`60d77fac pop r15
fffff804`60d77fae pop r14
fffff804`60d77fb0 pop rdi
fffff804`60d77fb1 ret
This part of the disassembly checks whether the size of the SMB2 COMPRESSION_TRANSFORM_HEADER
is indeed 0x10
bytes, as per the specification. If it isn’t the function restores the previously saved register values and exits with error code 0xC000090B
.
The next disassembly part starts preparing the normal branch and checks the sizes of other parts of the structure (COMPRESSION_TRANSFORM_HEADER.CompressionAlgorithm
):
srv2!Srv2DecompressData+0x30:
fffff804`60d77e90 mov rax,qword ptr [rax+18h]
fffff804`60d77e94 movups xmm0,xmmword ptr [rax]
fffff804`60d77e97 mov rax,qword ptr [rcx+50h]
fffff804`60d77e9b movups xmmword ptr [rsp+30h],xmm0
fffff804`60d77ea0 mov rcx,qword ptr [rax+1F0h]
fffff804`60d77ea7 psrldq xmm0,8
fffff804`60d77eac mov ebp,dword ptr [rcx+8Ch]
fffff804`60d77eb2 movq rcx,xmm0
fffff804`60d77eb7 movzx eax,cx
fffff804`60d77eba cmp ebp,eax
fffff804`60d77ebc je srv2!Srv2DecompressData+0x68 (fffff804`60d77ec8)
If the COMPRESSION_TRANSFORM_HEADER.CompressionAlgorithm
doesn’t match the actual compression algorithm, the function returns with error code 0xC00000BB
.
Following this part, the following branch is the one that calculates the sum of the OriginalCompressedSegmentSize
+ Offset/Length
and calls SrvNetAllocateBuffer
with this sum as an argument:
srv2!Srv2DecompressData+0x68:
fffff804`60d77ec8 mov rax,qword ptr [rsp+30h]
fffff804`60d77ecd xor edx,edx
fffff804`60d77ecf shr rax,20h
fffff804`60d77ed3 shr rcx,20h
fffff804`60d77ed7 add ecx,eax
fffff804`60d77ed9 mov r10,qword ptr [srv2!_imp_SrvNetAllocateBuffer (fffff804`60da1928)]
fffff804`60d77ee0 call srvnet!SrvNetAllocateBuffer (fffff804`60bf6730)
fffff804`60d77ee5 mov rbx,rax
fffff804`60d77ee8 test rax,rax
fffff804`60d77eeb jne srv2!Srv2DecompressData+0x97 (fffff804`60d77ef7) Branch
We can see that rax
and rcx
are shifted right with the shr
instruction for 0x20
(32) bytes. The shr
instruction implies that these values are unsigned integer values. The addition done at add ecx,eax
is an unsigned one. If the sum of ecx
and eax
is larger than 0xffffffff
then the value will wrap around zero to a much smaller one. The values stored in these registers are completely under the attacker’s control, as indicated by the open specification.
To showcase the above, trace until the xor edx,edx
and inspect the registers:
1: kd> r
rax=00000400424d53fc rbx=ffffae8f61978010 rcx=ffffffffffff0001
rdx=ffffae8f61978020 rsi=ffffffffffffffff rdi=ffffae8f61978010
rip=fffff80460d77ecd rsp=ffffe00ca72dbe70 rbp=0000000000000001
r8=0000000000000000 r9=fffff8045be00000 r10=ffffae8f616617d0
r11=ffffe00ca72dbcd8 r12=0000000000000000 r13=ffffae8f61661880
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl zr na po nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00040246
srv2!Srv2DecompressData+0x6d:
fffff804`60d77ecd xor edx,edx
rax
has 00000400424d53fc
and rcx
has ffffffffffff0001
Tracing 3 instructions and checking the registers again indicates:
1: kd> t 3
srv2!Srv2DecompressData+0x6f:
fffff804`60d77ecf shr rax,20h
srv2!Srv2DecompressData+0x73:
fffff804`60d77ed3 shr rcx,20h
srv2!Srv2DecompressData+0x77:
fffff804`60d77ed7 add ecx,eax
1: kd> r
rax=0000000000000400 rbx=ffffae8f61978010 rcx=00000000ffffffff
rdx=0000000000000000 rsi=ffffffffffffffff rdi=ffffae8f61978010
rip=fffff80460d77ed7 rsp=ffffe00ca72dbe70 rbp=0000000000000001
r8=0000000000000000 r9=fffff8045be00000 r10=ffffae8f616617d0
r11=ffffe00ca72dbcd8 r12=0000000000000000 r13=ffffae8f61661880
r14=0000000000000000 r15=0000000000000000
iopl=0 ov up ei pl nz na po cy
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00040a07
srv2!Srv2DecompressData+0x77:
fffff804`60d77ed7 add ecx,eax
We can see that rax
shifted and has 0x400
, as well as rcx
now has 0xffffffff
.
The addition that is performed in the next instruction will overflow rcx
to be 1 byte less than 0x400
:
1: kd> t
srv2!Srv2DecompressData+0x79:
fffff804`60d77ed9 mov r10,qword ptr [srv2!_imp_SrvNetAllocateBuffer (fffff804`60da1928)]
1: kd> r
rax=0000000000000400 rbx=ffffae8f61978010 rcx=00000000000003ff
rdx=0000000000000000 rsi=ffffffffffffffff rdi=ffffae8f61978010
rip=fffff80460d77ed9 rsp=ffffe00ca72dbe70 rbp=0000000000000001
r8=0000000000000000 r9=fffff8045be00000 r10=ffffae8f616617d0
r11=ffffe00ca72dbcd8 r12=0000000000000000 r13=ffffae8f61661880
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl nz na po cy
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00040207
As expected rcx
has become 0x3ff
. This is the value that will be passed to SrvNetAllocateBuffer
.
Continuing, we can see that if we set a breakpoint after the function returns, no crash occurs. It’s just the system has allocated much less memory than required. As we can imagine, the crash will most likely occur when the system attempts to write data to the buffer that has been allocated.
bp srv2!Srv2DecompressData
u . L25
Break right after the call srvnet!SrvNetAllocateBuffer
instruction, at mov rbx, rax
(where the return value of the SrvNetAllocateBuffer
function is copied in rbx
)
bp fffff807'2de97ee0
(offset srv2!Srv2DecompressData+0x85
)
We see that as we expected, the function returned.
The next jump will be taken because the function has returned normally and rax
is not 0
.
The next disassembly chunk prepares the registers for the SmbCompressionDecompress
function that will handle the decompression of the buffer received from the SMB packet:
srv2!Srv2DecompressData+0x97:
fffff804`3fda7ef7 488b97f0000000 mov rdx,qword ptr [rdi+0F0h]
fffff804`3fda7efe 8bcd mov ecx,ebp // 0x0001 is the CompressionAlgorithm (rcx)
fffff804`3fda7f00 4c8b4818 mov r9,qword ptr [rax+18h] // r9 has 0x1000
fffff804`3fda7f04 8b74243c mov esi,dword ptr [rsp+3Ch] // esi will get 0xffffffff (the offset/length we sent in the header)
fffff804`3fda7f08 448b742434 mov r14d,dword ptr [rsp+34h] // r14d will get 0x400, which is the UncompressedSize
fffff804`3fda7f0d 4c03ce add r9,rsi // offset (esi) is added at 0x1000, which is the OriginalCompressedSegmentSize
fffff804`3fda7f10 448b4224 mov r8d,dword ptr [rdx+24h]
fffff804`3fda7f14 488b4218 mov rax,qword ptr [rdx+18h]
fffff804`3fda7f18 442bc6 sub r8d,esi
fffff804`3fda7f1b 488d5610 lea rdx,[rsi+10h]
fffff804`3fda7f1f 4183e810 sub r8d,10h
fffff804`3fda7f23 4803d0 add rdx,rax
fffff804`3fda7f26 488d442460 lea rax,[rsp+60h]
fffff804`3fda7f2b 4889442428 mov qword ptr [rsp+28h],rax
fffff804`3fda7f30 4489742420 mov dword ptr [rsp+20h],r14d
fffff804`3fda7f35 4c8b1574990200 mov r10,qword ptr [srv2!_imp_SmbCompressionDecompress (fffff804`3fdd18b0)]
fffff804`3fda7f3c e86f65e7ff call srvnet!SmbCompressionDecompress (fffff804`3fc1e4b0)
The offset 0xF0
from rdi
points at 0x18
bytes before the SMB header that we sent.
To better understand this: rdi
points at an address. We add 0xF0
bytes at this address and we dereference this address and we get a new address (doulbe pointer). At this new address, we add 0x18
bytes and we have a pointer to the SMB header we sent with the exploit we run:
0: kd> dq rdi L2
ffffbb8b`17e3e010 00000005`00000001 000006c0`000000dc
0: kd> dq rdi+0xF0 L2
ffffbb8b`17e3e100 ffffbb8b`1bf7d150 ffffbb8b`1bf7d150
0: kd> dq ffffbb8b`1bf7d150 + 0x18 L2
ffffbb8b`1bf7d168 ffffbb8b`1bf7c050 00000410`00001100
0: kd> dq ffffbb8b`1bf7c050 L4
ffffbb8b`1bf7c050 00000400`424d53fc ffffffff`ffff0001
ffffbb8b`1bf7c060 41414141`41414141 41414141`41414141
Or, we can calculate this directly with:
0: kd> dq poi(poi(rdi+0xF0)+0x18) L4
ffffbb8b`1bf7c050 00000400`424d53fc ffffffff`ffff0001
ffffbb8b`1bf7c060 41414141`41414141 41414141`41414141
Therefore rdx
has what is pointed by rdi + 0xF0
. 7 instructions below, we see that what is contained at rdx + 0x18
is stored in rax
. Therefore, rax
at that point will contain a pointer to our SMB header. rdx
is modified to contain the memory location calculated by rsi + 0x10
(it’s not dereferenced). This is a pointer to the compressed buffer that contains our data. Register rcx
gets the value of 0x1
which is the CompressionAlgorithm
. Register r9
(fourth function parameter) gets a pointer that points to the sum of an offset 0x18
from the allocated buffer plus the address of the SMB header we sent. r8
has a value that is inside our SMB packet that is calculated by 0x240
bytes inside the SMB packet plus 0x36
minus our offset/length
minus the size of the SMB COMPRESSION_TRANSFORM_HEADER (the compression header we sent).
Finally, 2 more parameters are passed via the stack, as per the fastcall calling convention: r14d
as the UncompressedSize at rsp + 0x20
and rax
gets value that points to 0
and pushed at rsp+0x28
.
In the same manner, we set a bp at where the call srvnet!SmbCompressionDecompress
instruction is:
bp fffff804'3fda7f3c
We unassemble srvnet!SmbCompressionDecompress
and set a breakpoint right after this call instruction (in order know where the actual crash occurs):
uf srvnet!SmbCompressionDecompress
bp fffff802'70787f41
(@ instruction test eax,eax
)
We run gu
to go up until the srvnet!SmbCompressionDecompress
returns.
However, our last breakpoint is never reached, as the crash occurs somewhere inside the srvnet!SmbCompressionDecompress
function.
As the open specification says, this is the structure of the SMB2 COMPRESSION_TRANSFORM_HEADER
:
typedef struct SMB2COMPRESSION_TRANSFORM_HEADER
{
UCHAR ProtocolId[4]; // MUST be 0x424D53FC (\xfc\x53\x4d\x42 == \xfc'SMB')
UINT OriginalCompressedSegmentSize; // 4 bytes
WORD CompressionAlgorithm; // 2 bytes
WORD Flags; // 2 bytes
DWORD OffsetOrLength; // 4 bytetes
} SMB2COMPRESSION_TRANSFORM_HEADER;
Going through the srvnet!SmbCompressionDecompress
:
We now know that the crash occurs somewhere in this function srvnet!SmbCompressionDecompress
, or a function called within this one, therefore we can simply break to it:
bp srvnet!SmbCompressionDecompress
uf srvnet!SmbCompressionDecompress
Going through the function srvnet!SmbCompressionDecompress
we set a bp at test ecx,ecx
instead of tracing (ecx
is 1 and its the CompressionAlgorithm
of our SMB header), right before the jne srvnet!SmbCompressionDecompress+0x39
instruction. The jump should be taken.
bp fffff805'7096e4db
Our breakpoint is hit and we just keep tracing (t
) until we reach around srvnet!SmbCompressionDecompress+0x59
where a call to nt!RtlGetCompressionWorkSpaceSize
is made.
nt!RtlGetCompressionWorkSpaceSize
is a very small function, exported by the kernel (ntoskrnl.exe
) and is documented in MSDN.
Its parameters: USHORT CompressionFormatAndEngine
: ecx
got its value (2
) from bx
. PULONG CompressBufferWorkSpaceSize
: rdx
is a pointer to a buffer and gets its value from the stack lea rdx,[rsp+70h]
. Will receive the number of bytes required to compress a buffer. PULONG CompressFragmentWorkSpaceSize
: r8
is a pointer to a buffer and also gets its value from the stack lea r8,[rsp+40h]
. Will receive the number of bytes required to decompress a buffer.
Following that function, we see that a guard_dispatch_icall
is reached and called. This fuction is part of the kernel-mode CFG
/ CFI
(Control Flow Guard/Control Flow Integrity) implementation on Windows. As this is a big subject on its own and out of scope of this blogspot, we won’t mention anything else, besides that this basically is an exploit mitigation regarding the control of indirect calls (such as call eax
). It essentially validates that the target memory address of such a call is indeed a start of an actual function. This way, ROP based exploitation (since ROP gadgets are jumping in the middle, or the end, of functions) is mitigated, or is significantly hindered.
Breaking at nt!RtlGetCompressionWorkSpaceSize+0x35
which is the call nt!guard_dispatch_icall
instruction we check the value of the rax
register and we notice that the execution will eventually jump to it. Before following the call, we set a breakpoint at nt!RtlGetCompressionWorkSpaceSize+0x3a
, where the execution will return. Now we unassemble function pointed by the rax
register and we see:
1: kd> uf rax
nt!RtlCompressWorkSpaceSizeLZNT1:
fffff802`7323de00 6685c9 test cx,cx
fffff802`7323de03 0f85effa0000 jne nt!RtlCompressWorkSpaceSizeLZNT1+0xfaf8 (fffff802`7324d8f8) Branch
nt!RtlCompressWorkSpaceSizeLZNT1+0x9:
fffff802`7323de09 c70220000100 mov dword ptr [rdx],10020h
nt!RtlCompressWorkSpaceSizeLZNT1+0xf:
fffff802`7323de0f 41c70000100000 mov dword ptr [r8],1000h
fffff802`7323de16 33c0 xor eax,eax
fffff802`7323de18 c3 ret
nt!RtlCompressWorkSpaceSizeLZNT1+0xfaf8:
fffff802`7324d8f8 b800010000 mov eax,100h
fffff802`7324d8fd 663bc8 cmp cx,ax
fffff802`7324d900 750b jne nt!RtlCompressWorkSpaceSizeLZNT1+0xfb0d (fffff802`7324d90d) Branch
nt!RtlCompressWorkSpaceSizeLZNT1+0xfb02:
fffff802`7324d902 c70220000000 mov dword ptr [rdx],20h
fffff802`7324d908 e90205ffff jmp nt!RtlCompressWorkSpaceSizeLZNT1+0xf (fffff802`7323de0f) Branch
nt!RtlCompressWorkSpaceSizeLZNT1+0xfb0d:
fffff802`7324d90d b8bb0000c0 mov eax,0C00000BBh
fffff802`7324d912 c3 ret
As expected, this is indeed a function that matches the LZNT1
compression; that is the value (0x1
) that we set in our SMB Header. 2.2.3.1.3 SMB2_COMPRESSION_CAPABILITIES.
This is an undocumented function, which resides inside the ntoskrnl.exe
, as can be seen in the following screenshot.
After continuing, we are landing at the cleanup of the nt!RtlGetCompressionWorkSpaceSize
function. We will return to srvnet!SmbCompressionDecompress
, at offset 0x72
where test eax,eax
is run in order to check the result of this call and the jump will not be taken.
We check registers again
r
We see rax
being 0, therefore the js
won’t be executed.
Next, we see a mov edx,dword ptr [rsp+70h]
instruction.
We check where RSP points at by dps rsp
or limit that to show 5 lines with dps rsp L5
.
Next, we check the value that dps rsp+0x70 L1
points at. This should end up as the value inside edx
.
Next, mov ecx, 200h
is executed.
We see that a call nt!ExAllocatePoolWithTag
is made. As this is x64 fastcall calling convention (left-to-right order C, D, 8, 9) (args 5 and up are passed through the stack, right-to-left). MSDN mentions that this function takes 3 arguments:
1. PoolType
, which is the type of pool memory to allocate: This is the 0x200
stored in ecx
.
2. NumberOfBytes
, which is the number of bytes to allocate: This is the number of bytes, stored in edx
(rightmost 4 bytes of rcx
as per the dword ptr
suffix)
3. Tag
, which is the pool tag to use for the allocated memory. This is the 0x2532534C
value that is stored in r8
.
The function returns a VOID
pointer to the memory pool. rax
will have the ptr to this pool.
We expect that this function should return normally, as it just allocates a pool of kernel memory with size of 0x10020
bytes. We set a breakpoint at the instruction right after it and then type gu
.
After the call, we see that the result is stored in rdi
. So rdi
points to that kernel pool memory. Following that, there is an error check whether rax
was zero. If rax
is not zero -meaning that the function returned a valid result and no error- the code goes to srvnet!SmbCompressionDecompress+0xa0
, else most likely an error result is returned (srvnet!SmbCompressionDecompress+0xfe
), initially stored in rbx
and then copied in rax
. Then after some cleanup, the function returns. The error code will be 0x0C000009A
.
The assembly listing at srvnet!SmbCompressionDecompress+0xa0
prepares the function call for the RtlDecompressBufferEx2
function, which is a documented function in MSDN, exported by ntoskrnl.exe
, as can be seen in the following screenshot.
srvnet!SmbCompressionDecompress+0xa0:
fffff802`7796e550 488bb42498000000 mov rsi,qword ptr [rsp+98h]
fffff802`7796e558 4d8bcf mov r9,r15
fffff802`7796e55b 48897c2438 mov qword ptr [rsp+38h],rdi
fffff802`7796e560 498bd6 mov rdx,r14
fffff802`7796e563 4889742430 mov qword ptr [rsp+30h],rsi
fffff802`7796e568 0fb7cb movzx ecx,bx
fffff802`7796e56b c744242800100000 mov dword ptr [rsp+28h],1000h
fffff802`7796e573 896c2420 mov dword ptr [rsp+20h],ebp
fffff802`7796e577 8bac2490000000 mov ebp,dword ptr [rsp+90h]
fffff802`7796e57e 448bc5 mov r8d,ebp
fffff802`7796e581 4c8b1518400100 mov r10,qword ptr [srvnet!_imp_RtlDecompressBufferEx2 (fffff802`779825a0)]
fffff802`7796e588 e8237d46fb call nt!RtlDecompressBufferEx2 (fffff802`72dd62b0)
fffff802`7796e58d 8bd8 mov ebx,eax
As described in MSDN, this function is a multi-core decompression function. We have allocated 2 cores in our testing VM and we haven’t tested with 1, in order to check whether RtlDecompressBufferEx
would be called instead. The function takes 8 arguments, and to the fastcall only the 4 arguments are passed to registers left to right, the rest are stored in the stack right to left, therefore:
1.CompressionFormat
is a bitmask that specifies the compression format => rcx
becomes 0x2
(from movzx ecx,bx
instruction), which is COMPRESSION_FORMAT_LZNT1
.
2.UncompressedBuffer
pointer to the uncompressed buffer. This is where the function will store the uncompressed data => rdx
gets 0xffffc690eaf3104e
from r14
3.UncompressedBufferSize
size of the unc buffer in bytes. r8
gets value from ebp
. Note that r8d
references the lower dword
part of the 64-bit r8
register. ebp
takes the value 0x400
from the instruction mov ebp, dword ptr [rsp+90h]
and this value is passed to r8
:
dps rsp+0x90
ffff8087`b1dafe90 00000000`00000400
4.CompressedBuffer
pointer to the buffer that contains the data to compress => r9
gets 0xffffc690eb11d05e
from r15
.
5.CompressedBufferSize
size of the compressed buffer in bytes. This is stored in stack.
6.UncompressedChunkSize
size of each chunk within the compr buffer (should be 512, 1024, 2048 or 4096). Also stored in the stack. This is 0x1000
(that is 4096
bytes), from the mov dword ptr [rsp+28h],1000h
instruction.
7.FinalUncompressedSize
pointer to a var that stores the size in bytes of the decompressed data. Stored in the stack.
8.WorkSpace
pointer to the work space buffer. Also stored in stack.
Tracing right before the function call and checking the registers and stack:
1: kd> r r8, r9, rcx, rdx
r8=0000000000000400 r9=ffffc690eb11d05e rcx=0000000000000002 rdx=ffffc690eaf3104e
1: kd> dps rsp
ffff8087`b1dafe00 00000000`00000002
ffff8087`b1dafe08 00000000`00000402
ffff8087`b1dafe10 00000000`ffffffff
ffff8087`b1dafe18 00000000`00000000
ffff8087`b1dafe20 00000000`00000402 <== CompressedBufferSize
ffff8087`b1dafe28 fffff802`00001000 <== UncompressedChunkSize
ffff8087`b1dafe30 ffff8087`b1dafed0 <== FinalUncompressedSize
ffff8087`b1dafe38 ffffc68f`eab97000 <== WorkSpace (AllocatePoolWithTag)
ffff8087`b1dafe40 ffff8087`00001000
ffff8087`b1dafe48 00000000`00000018
ffff8087`b1dafe50 00000000`00000000
ffff8087`b1dafe58 00000000`00000400
ffff8087`b1dafe60 ffffc68f`eb31d820
ffff8087`b1dafe68 fffff802`77417f41 srv2!Srv2DecompressData+0xe1
Tracing inside the RtlDecompressBufferEx2
we see that after saving rbx
, which is a non-volatile register, and setting the stack space, the CompressionFormat
is AND
ed with 0xFF
. Then it is compared with 0x2
and if it is below, the function returns with code 0xC000000D
. That AND
ed value is again compaired with 0x4
and if it is above it, the function returns with code 0xC000025F
.
Otherwise, r9
stores the CompressedBufferSize
, eax
stores the CompressionFormat
, edx
stores the uncompressed size (0x400
), rcx
will temporarily store the address of nt!RtlDecompressBufferProcs
, rax
will store an offset from rcx + 2*8
(array of function pointers) and it will point to function nt!RtlDecompressBufferLZNT1
. rcx
will store the WorkSpace
(ffffc68f'eab97000
) and will write it to another offset in the stack (rsp+0x30
). The same way, in stack offset 0x28
the ffff8087'b1dafed0
value will be stored, which is the ptr to FinalUncompressedSize
and in stack offset 0x20
the value 0x00001000
will be stored, which is the UncompressedChunkSize
. Then, rbx
will be saved in rcx
and RtlDecompressBufferLZNT1
will be called via the mechanism described above.
The stack will look like this:
1: kd> dps rsp
ffff8087`b1dafdb0 00000000`00000200
ffff8087`b1dafdb8 00000000`00000400
ffff8087`b1dafdc0 00000000`00000000
ffff8087`b1dafdc8 fffff802`72dd62b2 nt!RtlDecompressBufferEx2+0x2
ffff8087`b1dafdd0 00000000`00001000 <== UncompressedChunkSize (rsp+0x20)
ffff8087`b1dafdd8 ffff8087`b1dafed0 <== FinalUncompressedSize
ffff8087`b1dafde0 ffffc68f`eab97000 <== WorkSpace
ffff8087`b1dafde8 00000000`00000018
ffff8087`b1dafdf0 00000000`00000002
ffff8087`b1dafdf8 fffff802`7796e58d srvnet!SmbCompressionDecompress+0xdd
ffff8087`b1dafe00 00000000`00000002
ffff8087`b1dafe08 00000000`00000402
Following, the CFG we end up at nt!RtlDecompressBufferLZNT1
with this stack (memory addresses have changed due to reboot):
0: kd> dps rsp
fffff10f`fae12da8 fffff804`7214b316 nt!RtlDecompressBufferEx2+0x66
fffff10f`fae12db0 00000000`00000200
fffff10f`fae12db8 00000000`00000400
fffff10f`fae12dc0 00000000`00000000
fffff10f`fae12dc8 fffff804`7214b2b2 nt!RtlDecompressBufferEx2+0x2
fffff10f`fae12dd0 00000000`00001000 <== UncompressedChunkSize (rsp+0x20)
fffff10f`fae12dd8 fffff10f`fae12ed0 <== FinalUncompressedSize
fffff10f`fae12de0 ffff8889`37a5a000 <== WorkSpace
fffff10f`fae12de8 00000000`00000018
fffff10f`fae12df0 00000000`00000002
In this function we will see that an rbp
stack frame is used, and the function arguments are referenced using the rbp
register -
an offset. Following this function we see that the system will crash when attempting to dereference rsi
and write the DWORD sized result to ebx
, while inside the RtlDecompressBufferLZNT1
.
By exploiting this vulnerability, we tried to showcase the code flow of our buffer inside the various functions of the srv2.sys
, the srvnet.sys
and the Windows kernel. In theory, this vulnerability can be exploited in an attempt to gain remote code execution, however many significant Windows exploit protections will need to be bypassed such as KASLR, CFG and SMAP.
There are different PoCs that can be used to showcase the vulnerability and manifest it in a slightly different manner, which will end up to different function calls. In any case, we hoped you enjoyed it as much as we did!
useful pointers:
bp srv2!Srv2DecompressData+0x79
bp srv2!Srv2DecompressData+0x97
bp srvnet!SmbCompressionDecompress+0x6d
bp nt!RtlGetCompressionWorkSpaceSize+0x3a
bp srvnet!SmbCompressionDecompress+0xd1
bp nt!RtlDecompressBufferEx2
bp nt!RtlDecompressBufferEx2+0x61
bp nt!RtlDecompressBufferLZNT1
bp nt!RtlDecompressBufferLZNT1+0x2b
bp nt!RtlDecompressBufferLZNT1+0x49
Resources
https://portal.msrc.microsoft.com/en-US/security-guidance/advisory/CVE-2020-0796
https://docs.microsoft.com/en-us/cpp/build/x64-software-conventions?view=vs-2019
https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019
https://www.amazon.com/Windows-Internals-Part-architecture-management/dp/0735684189
https://www.amazon.com/Art-Software-Security-Assessment-Vulnerabilities/dp/0321444426
https://gist.github.com/asolino/45095268f0893bcf08bca3ae68a755b2