In this article we presented the details of using sysenter instruction to call from user-mode to kernel-mode. In older versions of Windows operating systems, the “int 0x2e” interrupt was used instead, but on newer systems sysenter is used. When the “int 0x2e” interrupt is being used, it uses the 0x2e interrupt descriptor from the Interrupt Descriptor Table (IDT), while the system call number is passed in the eax register. On the other hand, the sysenter instruction can be used to transition from user to kernel-mode faster than by using the “int 0x2e” instruction. The instruction uses the Model Specific Registers (MSRs) specified below to do its thing. The MSR registers are control registers in the x86 machine used for debugging, program execution tracing, computer performance monitoring and toggling certain CPU features [1].
We can read and write from/to MSR registers by using rdmsr and wrmsr instructions, which must be executed as part of privileged instructions in kernel-mode. When sysenter instruction is called, the following MSR registers are used [2]:
- Target code segment: reads this from IA32_SYSENTER_CS.
- Target instruction: reads this from IA32_SYSENTER_EIP.
- Stack segment: computed by adding 8 to the value in IA32_SYSENTER_CS.
- Stack pointer: reads this from IA32_SYSENTER_ESP.
All the MSR registers for Intel IA-32 architecture can be found in [3], but relevant registers are presented on the picture below. We can see which bits are used for certain purposes, which enables us to store appropriate values in there when overwriting their values.
The columns on the picture above are represented as follows:
- 1: Hexadecimal representation of the register address.
- 2: Decimal representation of the register address.
- 3: Architectural MSR name and bit fields.
- 4: MSR/Bit Description.
- 5: Introduces as Architectural MSR.
When a sysenter instruction is used, the CS register is populated with a value of IA32_SYSENTER_CS, while the IA32_SYSENTER_ESP is loaded into the ESP register and IA32_SYSENTER_EIP into the EIP register. Additionally, the SS register is overwritten with the value of IA32_SYSENTER_CS+8. After that, the execution is taken to SS:EIP instruction, which executes the system call. Since we only have to access the special registers and write to ESP, EIP, SS and CS registers, the operation from switching from user to kernel-mode is very fast – especially in comparison with “int 0x2e” interrupts.
Setting up the Environment
Let’s now actually hook the IA32_SYSENTER_EIP, which value is stored in 0×176 MSR register. To do that, let’s first boot the Windows operating system in debugging mode. We can do that by executing the following instructions in Windows cmd.exe under Administrator privileges. Commands below will set Windows to start in debugging mode where we’ll be able to debug Windows over a serial port.
1
2
3
4
5
6
| # bcdedit /set debug on # bcdedit /set debugtype serial # bcdedit /set debugport 1 # bcdedit /set baudrate 115200 # bcdedit /set {bootmgr} displaybootmenu yes # bcdedit /timeout 10 |
In order to debug the Windows operating system, we must first start another Windows virtual machine with WinDbg installed and go to File – Kernel Debugging and accept the defaults as presented below. If we didn’t use exactly the same commands as outlined above, we need to change the settings in the Kernel Debugging dialog appropriately.
After pressing the OK button, the WinDbg will listen for incoming connections on a serial port. Because we’ve setup the Windows operating system in the other virtual machine to connect to the same serial port, we’ll be able to debug Windows from the started WinDbg debugger. More than that, we’ll be able to follow the execution of the whole operating system, not just the user-mode code. When debugging with Ida Pro, OllyDbg or ImmunityDebugger, we can’t see the kernel-mode instructions located at virtual addresses 0×80000000-0xFFFFFFFF being executed; we’re jumping right over them, because we’re running a user-mode debugger. In this case, we’ve specifically instructed the Windows operating system to connect to the serial port, where the WinDbg debugger is listening for an incoming connection. Therefore, we’re able to debug user-mode as well as kernel-mode instructions with ease. This also enables us to execute and debug privileged instructions like rdmsr and wrmsr, which would otherwise have been impossible to debug.
At this point, we’ve effectively started the Windows operating system in debug mode and we can start/stop it at will though WinDbg debugger. Let’s first pause the Windows execution by clicking on Debug – Break in WinDbg as shown below. That will effectively stop the debugged Windows operating system and give us a chance to execute WinDbg commands.
Once we break into the system, we will be able to input WinDbg commands at the “kd>” shell as seen on the picture below.
Getting and Setting the MSR Register Values
When hooking the sysenter instruction that uses 0×176 MSR, we first need to save the old MSR 0×176 IA32_SYSENTER_EIP value. We can read the content of the model-specific register by using the rdmsr instruction, which loads the 64-bit model specific register specified in the ECX register into registers EDX:EAX. The high-order 32-bits of the MSR are loaded into EDX, while the low-order 32-bits of the MSR are loaded into EAX. We must execute the rdmsr instruction in privileged mode and store an existing MSR address in the ECX register, otherwise a general protection exception will be triggered [4]. This can be done with the GetMSRAddress function presented below. The GetMSRAddress function takes the number of the MSR register whose value we would like to extract as input and returns its value.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
| MSR GetMSRAddress(UINT32 reg) { MSR msraddr; UINT32 lowvalue; UINT32 highvalue; /* get address of the IDT table */ __asm { push eax; push ecx; push edx; mov ecx, reg; rdmsr; mov lowvalue, eax; mov highvalue, edx; pop edx; pop ecx; pop eax; } msraddr.value_low = lowvalue; msraddr.value_high = highvalue; DbgPrint("Address of MSR entry %x is: %x.rn", reg, msraddr); /* store old MSR address in global variable, so we can use it later */ oldMSRAddressL = msraddr.value_low; oldMSRAddressH = msraddr.value_high; return msraddr; } |
In the GetMSRAddress function we’re first allocating space for two local variables lowvalue and highvalue, which are used for storing the lower and higher 32-bits of the MSR register value. The assembler code block stores registers eax, ecx and edx on the stack, so we can safely overwrite their values without unwanted side effects. After the push instructions, we’re storing the number of the MSR register whose value we would like to extract into the ecx register. After that we’re calling the rdmsr command, which extracts the value of the ecx’s MSR register and stores it into the edx:eax registers. Then we’re copying the values from edx and eax registers into local variables, so we can use them in C++ code after the assembly code block is done. Then we’re assigning the 64-bit value from MSR register into the MSR object and also storing it in the oldMSRAddressL and oldMSRAddressH global variables, so we can reuse it later. The function returns the MSR object with properly assigned values read from the MSR register.
After getting the value out from the MSR register, we need to overwrite the value with an address to our function, which can be done by using the wrmsr instruction. The wrmsr instruction writes the contents of registers EDX:EAX into the 64-bit MSR register specified in the ECX register. The high-order 32-bits are copied from EDX and the low-order 32-bits are copied from EAX. We must execute the wrmsr instruction in a privileged mode and store an existing MSR address in the ECX register, otherwise a general protection exception will be triggered [5]. This can be done with a function below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| void SetMSRAddress(UINT32 reg, PMSR msr) { UINT32 lowvalue; UINT32 highvalue; lowvalue = msr->value_low; highvalue = msr->value_high; /* get address of the IDT table */ __asm { push eax; push ecx; push edx; mov ecx, reg; mov eax, lowvalue; mov edx, highvalue; wrmsr; pop edx; pop ecx; pop eax; } DbgPrint("Address of MSR entry %x is hooked: %x.rn", reg, msr->value_low); } |
To view the values of IA32_SYSENTER_CS, IA32_SYSENTER_EIP and IA32_SYSENTER_ESP in a WinDbg debugger, we can use the rdmsr command to display them. We can see their values on the picture below, where it’s clearly seen that the IA32_SYSENTER_EIP is located at the address 0×82682300.
Let’s take a look at the first few instructions of that function, which is the KiFastCallEntry routine. On the picture below, we can see the instructions that will be executed first when a system function is called.
The function basically loads the number 0×30 into the FS register and value 0×23 into the DS and ES registers. Those are segment registers that will be used by the KiFastCallEntry routine. The segments can be printed by using the “dg 0 f0” command as seen below, where the 0×0008 selector specifies the code segment in kernel-mode with base address 0×00000000 and length 0xffffffff.
We must look at the segment selector 0×0023, which should be XORed with its Priority, which is 0×3 in this case; thus we must be looking at the segment selector 0×0020, which specifies data segment in user-mode with base address 0×00000000 and length 0xffffffff. Those instructions are there to ensure the right values get read and written when executing the interrupt.
Hooking the MSR
Let’s now present the actual code we used to implement the hookmsr program. First we have to present a new MSR data type that we introduced to the program. The MSR structure is presented below and has two UINT32 members, the value_low and value_high. Both members are used to represent the content of the MSR register, which is 64-bits large; the value_low presents the lower 32-bits and the value_high presents the higher 32-bits.
1
2
3
4
5
6
| #pragma pack(1) typedef struct _MSR { UINT32 value_low; UINT32 value_high; } MSR, *PMSR; #pragma pack() |
We defined the MSR, which is a structure _MSR and PMSR, which is a pointer to a structure _MSR. There’s also the #pragma directive, which accepts a parameter – in this case the number 1, which defines that members of the structure are aligned on 1-byte boundary. Therefore, when declaring a new instance of the structure, there are no extra bytes being used for padding to align the members on a 4-byte boundary, as is the default on 32-bit architecture.
Next, we’re defining two global variables, oldMSRAddressL and oldMSRAddressH, which are used to hold the old address of the overwritten MSR register. When overwriting the address in the MSR register, we have to store the old address, so we can jump to it in the hooked routine. This is necessary because we need to call the actual routine that would be called if we hadn’t overwritten it. If we don’t store the old address and jump to it at the end of the hooking routine, then it would be as if the system calls had never been called, which would definitely have undesirable side effects like the system not being interactive anymore.
1
2
3
| /* Global variable for storing old MSR address. */ UINT32 oldMSRAddressL = NULL; UINT32 oldMSRAddressH = NULL; |
The function that actually overwrites the pointer stored in the MSR register is HookMSR, which can be seen below. The function accepts two parameters: the parameter reg specifying the MSR register we would like to hook and hookaddrspecifying the address of the hooking routine, which will be called upon the sysenter interrupt invocation.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| void HookMSR(UINT32 reg, UINT32 hookaddr) { MSR msraddr; /* check if the ISR was already hooked */ msraddr = GetMSRAddress(reg); if(msraddr.value_low == hookaddr) { DbgPrint("The MSR register %x already hooked.rn", reg); } else { DbgPrint("Hooking MSR register %x: %x --> %x.rn", reg, msraddr.value_low, hookaddr); msraddr.value_low = hookaddr; SetMSRAddress(reg, &msraddr); } } |
The MSR object msraddr gets the value of the specified reg MSR register by calling the GetMSRAddress function. If the returned address stored in the selected MSR register already contains the same hookaddr address, then the MSR has already been hooked. Otherwise, the hooking takes place by calling the SetMSRAddress function.
The HookMSR gets called inside the DriverEntry routine, which gets invoked when the driver is loaded into the kernel. The source code of the whole DriverEntry routine can be seen below, but we won’t describe it in detail. If you want the details, you have to check this article I wrote previously. The important addition is the last line where the HookMSR function is actually called. Notice that we’re passing the 0×176 number as the first parameter, which identifies the IA32_SYSENTER_EIP MSR register. The second parameter specifies the pointer to the hooking function that will get called upon the sysenter invocation.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
| NTSTATUS DriverEntry(PDRIVER_OBJECT pDriverObject, PUNICODE_STRING pRegistryPath) { NTSTATUS NtStatus = STATUS_SUCCESS; unsigned int uiIndex = 0; PDEVICE_OBJECT pDeviceObject = NULL; UNICODE_STRING usDriverName, usDosDeviceName; DbgPrint("DriverEntry Called rn"); RtlInitUnicodeString(&usDriverName, L"\Device\MyDriver"); RtlInitUnicodeString(&usDosDeviceName, L"\DosDevices\MyDriver"); NtStatus = IoCreateDevice(pDriverObject, 0, &usDriverName, FILE_DEVICE_UNKNOWN, FILE_DEVICE_SECURE_OPEN, FALSE, &pDeviceObject); if(NtStatus == STATUS_SUCCESS) { /* MajorFunction: is a list of function pointers for entry points into the driver. */ for(uiIndex = 0; uiIndex < IRP_MJ_MAXIMUM_FUNCTION; uiIndex++) pDriverObject->MajorFunction[uiIndex] = MyDriver_UnSupportedFunction; /* DriverUnload is required to be able to dynamically unload the driver. */ pDriverObject->DriverUnload = MyDriver_Unload; pDeviceObject->Flags |= 0; pDeviceObject->Flags &= (~DO_DEVICE_INITIALIZING); /* Create a Symbolic Link to the device. MyDriver -> DeviceMyDriver */ IoCreateSymbolicLink(&usDosDeviceName, &usDriverName); /* hook IDT */ HookMSR(0x176, (UINT32)HookRoutine); } return NtStatus; } |
We should also present the MyDriver_Unload function, which is called when the driver is unloaded from the kernel. In the function, we have to call the IoDeleteSymbolicLink and IoDeleteDevice function, but in our case, the most important function call is again the HookMSR function call. You may be wondering why is that function called upon unloading the kernel driver? The answer is simple: we have to clean up after ourselves. The best way of doing that is calling an existing function and passing it the value of the old MSR address, which points to the original function. It’s imperative that we restore the MSR pointer to the old value when the driver is unloaded. If we didn’t, the system would crash, because it would try to call the HookRoutine function, which isn’t loaded anymore, so the pointer stored in 0×176 MSR points to undefined memory. By restoring the pointer to the old value, we’re allowing the system to continue executing the sysenter system calls without crashing the whole system.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| VOID MyDriver_Unload(PDRIVER_OBJECT DriverObject) { /* local variables */ UNICODE_STRING usDosDeviceName; /* restore the hook */ if(oldMSRAddressL != NULL || oldMSRAddressH != NULL) { HookMSR(0x176, (UINT32)oldMSRAddressL); } /* delete the driver */ DbgPrint("MyDriver_Unload Called rn"); RtlInitUnicodeString(&usDosDeviceName, L"\DosDevices\MyDriver"); IoDeleteSymbolicLink(&usDosDeviceName); IoDeleteDevice(DriverObject->DeviceObject); } |
At the end we also have to present our hooking HookRoutine function, which will hook the 0×176 MSR. The code of the function can be seen below and we can immediately notice that the function is naked. That means that compiler won’t add any instructions for creating/destroying the function stack frame, like: mov “push ebp”, “mov ebp, esp”, etc. In the function itself, we have a single __asm {} block, which contains assembler instruction. Those instructions are first storing the values of the following registers on the stack: eax, ecx, edx, ebx, esp, ebp, esi, edi (the pushad instruction), as well as the eflags register (the pushfd instruction). Then we’re executing five assembly lines from the beginning of the KiFastCallEntry routine as we have already identified; this is for setting up the right segment registers. Then we’re pushing the parameter of the DebugPrint routine on the stack (push eax instruction) and calling the DebugPrint routine, which prints a message in WinDbg by using the DbgPrint function. At the end, we’re restoring the eflags register (popfd instruction) as well as the following registers: edi, esi, ebp, esp, ebx, edx, ecx, eax (popad instruction).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| __declspec(naked) HookRoutine() { __asm { pushad; pushfd; mov ecx, 0x23 push 0x30 pop fs mov ds, cx mov es, cx push eax; call DebugPrint; popfd; popad; jmp oldMSRAddressL; } } |
We mentioned that the HookRoutine function calls the DebugPrint function, which code is presented below. We can see that the function actually just prints the message about being inside the hook routine.
1
2
3
4
| void DebugPrint(UINT32 d) { DbgPrint("[*] Inside Hook Routine - dispatch %d called", d); return; } |
Loading the Driver
In the previous section we presented the code used for hooking the 0×176 MSR register, but here we’ll actually see the driver in action. To do that, we must first compile the driver and transfer the mydriver.sys to the hooked Windows operating system. We also have to download two tools that are invaluable when testing Windows kernel drivers: the DebugView and OSR Driver Loader. The OSR Driver Loader can be used to load the driver into the kernel, at which point its DriverEntry function gets executed. The OSR Driver Loader can be seen on the picture below, where we’ve selected the right driver (mydriver.sys) to load in the ‘Driver Path’. Once we’ve selected the driver, we must click on the ‘Register Service’ button and afterwards on the ‘Start Service’ button. When starting the service, the mydriver.sys will be loaded into the kernel and DriverEntry function will be called. Therefore, the 0×176 MSR will also get hooked and whenever a sysenter instruction is executed our HookRoutine will be called.
To see that in action, we also need to start the DebugView program to view the messages printed by the DbgPrint function. Remember that both the OSR Driver Loader as well as the DebugView program need to be started with Administrator privileges.
After the driver has been loaded into the kernel, the following will be printed into the WinDbg, where it’s clearly seen that the DriverEntry function was called. The address of the 0×176 MSR entry is 0x8267e300 and has been replaced with 0x9caa3100, which is the address of our HookRoutine function.
When the 0×176 MSR is hooked, our HookRoutine will get called every time a sysenter instruction is executed. Because the HookRoutine executes the DebugPrint function, a new line will be printed in the DebugView every time the sysenter is executed. Since the sysenter instruction is the primary resource of calling from user to kernel mode on modern Windows operating systems, the following is the result of loading the driver. Notice the scroll pad on the right: there are 1023 entries generated by our HookRoutine function in a matter of seconds. Our HookRoutine function just made the system totally unresponsive; at this time we can’t actually do anything in the debugged Windows system, because the system is busy executing the DbgPrint functions in the DebugPrint function.
In order to overcome this, we should filter the messages somehow, so the DbgPrint gets executed only on specific system calls that we’re interested in. To display every 1000th occurrence of the system call, we can limit the DbgPrint function with the code below. We defined another global variable numActions, which is reduced by 1 every time a sysenter is called. When it reaches the number 0, which happens every 1000th occurrence, a debug message is printed and the numActions is restored to a number of 1000.
1
2
3
4
5
6
7
8
9
10
11
| UINT16 numActions = 1000; void DebugPrint(UINT32 d) { if(numActions == 0) { DbgPrint("[*] Inside Hook Routine - dispatch %d called.rn", d); numActions = 1000; } else { numActions--; } return; } |
Once we’ve reloaded the driver, it will execute the DbgPrint function every once in a while, approximately every second or so. The result can be seen on the picture below. At this time, the Windows operating system is perfectly usable and we can interact with it normally.
Let’s also inspect how the MSR register was overwritten by executing the rdmsr command.
1
2
3
| kd> rdmsr 176 msr[176] = 00000000`988cf110 |
The 0×176 MSR contains the address 0x988cf110, which points to our HookRoutine function. Below we can see the instructions comprising that function, which have been displayed by the u command. Notice that the assembly instructions are the same as we’ve coded them in the __asm{} block; because this is a naked function, the compiler didn’t add the stack frame handling instructions.
If we stop the service in the OSR Driver Loader, the original 0×176 MSR register value will be restored as can be seen below. We can start/stop the driver an arbitrary number of times, since the driver is written in a way that handles the hooking and unhooking of MSR gracefully.
1
2
3
| kd> rdmsr 176 msr[176] = 00000000`8264b300 |
Handling Multiple Processors
At this point, we also need to talk about a multiple processor system: it’s a well known fact that when multiple processors are present in a system, every processor has its own set of MSR registers which store pointers to the same routines. So when an interrupt is triggered by using sysenter on processor 1 or processor 2, the same action takes place. Therefore, when hooking an MSR routine, we need to do so on all MSR registers, so the same routine is called upon in a triggered interrupt/exception, no matter on which processor it’s executed. We can do that by using one of the methods below:
- Infinite Loop: we can launch threads in infinite loop, which will sooner or later launch the hooking thread on all processors; this is because the scheduler will assign certain threads to be executed on the processor that’s currently free.
- KeSet AffinityThread: this function allows us to set the affinity mask of the executing thread, which in turn allows us to define a certain thread to be called on a specific processor. In this article we’ll be using this method, which is far better than the previous one, but a little but more complex.
Hooking all processor MSRs is quite easy, because all we need to do is spawn a thread which calls the HookMSR routine and wait for that thread to be executed on specific processor. For that we’ll need a couple of functions that we’ll describe below.
The first function is the InitializeObjectAttributes, which initializes the OBJECT_ATTRIBUTES structure passed into the function as its first parameter. The syntax of the InitializeObjectAttributes function can be seen below.
The first function is the InitializeObjectAttributes, which initializes the OBJECT_ATTRIBUTES structure passed into the function as its first parameter. The syntax of the InitializeObjectAttributes function can be seen below.
The function accepts the following parameters [7]:
- InitializedAttributes: specifies the OBJECT_ATTRIBUTES structure, which will be initialized.
- ObjectName: a pointer to the unicode string that contains the name of the object for which a handle is to be opened.
- Attributes: specifies flags to be used, which can be the following:
- OBJ_INHERIT: handle can be inherited by child processes.
- OBJ_PERMANENT: objects will not be deleted when all open handles to it are closed.
- OBJ_EXCLUSIVE: only a single handle can be opened for this object.
- OBJ_CASE_INSENSITIVE: case-insensitive comparison of names is used when comparing the ObjectName to other object names.
- OBJ_OPENIF: a routine opens an object when the object exists.
- OBJ_KERNEL_HANDLE: the handle can only be accessed from kernel-mode.
- OBJ_FORCE_ACCESS_CHECK: access checks are checked when opening a handle.
- RootDirectory: a handle to the root object directory for the path name specified in the ObjectName.
- SecurityDescriptor: specifies a security descriptor, which will be used with the object. If NULL, then a default security descriptor is used.
The InitializeObjectAttributes initialized the OBJECT_ATTRIBUTES structure, which is used to hold the properties of an object. In our kernel driver, the following call is used:
1
2
3
4
5
6
| HANDLE thread; OBJECT_ATTRIBUTES attrs; PKTHREAD pkthread; LARGE_INTEGER timeout; InitializeObjectAttributes(&attrs, NULL, 0, NULL, NULL); |
Basically we’re passing the attrs OBJECT_ATTRIBUTES parameter into the InitializeObjectAttributes routine by specifying NULL ObjectName, NULL RootDirectory and NULL SecurityDescriptor. Because a number 0 is passed as the Attributes parameter, none of the specified attributes are set for this object.
The next function is PsCreateSystemThread, which creates a system thread that will execute in the kernel and return a handle to the thread. The syntax for the function can be seen below [8].
The PsCreateSystemThread takes the following parameters as input [8]:
- ThreadHandle: a pointer to variable where the handle will be stored. Once we no longer need the handle to the thread, we need to close it by using the ZwClose function.
- DesiredAccess: specifies the access the thread would like to have. Normally, we would have to look at [9] to see the available access masks, but since we’re creating a new thread, we can also use access masks specified at [10], which are the following. I specifically didn’t describe what each of the access masks means, because we’ll use THREAD_ALL_ACCESS, which will give us all possible access rights for a thread object.
- SYNCHRONIZE
- THREAD_ALL_ACCESS
- THREAD_DIRECT_IMPERSONATION
- THREAD_GET_CONTEXT
- THREAD_IMPERSONATE
- THREAD_QUERY_INFORMATION
- THREAD_QUERY_LIMITED_INFORMATION
- THREAD_SET_CONTEXT
- THREAD_SET_INFORMATION
- THREAD_SET_LIMITED_INFORMATION
- THREAD_SET_THREAD_TOKEN
- THREAD_SUSPEND_RESUME
- THREAD_TERMINATE
- SYNCHRONIZE
- ObjectAttributes: points to the previously created structure specifying object’s attributes.
- ProcessHandle: specifies the open handle of the process in whose address space the thread will be run. Since we’re programming a driver, we should specify NULL here.
- ClientId: a pointer to a structure, where the client identified of the new thread will be stored. Since we’re programming a driver, we should specify NULL here.
- StartRoutine: the entry point, which will be executed upon starting the thread.
- StartContext: an argument passed to the thread when the thread is started.
In our code we’re calling the PsCreateSystemThread as presented below.
1
| PsCreateSystemThread(&thread, THREAD_ALL_ACCESS, &attrs, NULL, NULL, (PKSTART_ROUTINE)AllCPUsInfiniteLoop, (PVOID)hookaddr); |
The next function we have to look at is the ObReferenceObjectByHandle, which validates the access on the object handle. If access is granted, it returns STATUS_SUCCESS, otherwise it might return one of the following error codes: STATUS_OBJECT_TYPE_MISMATCH, STATUS_ACCESS_DENIED or STATUS_INVALID_HANDLE [11].
The parameters passed to the ObReferenceObjectByHandle function are explained below [11]:
- Handle: specified an open handle for the object.
- DesiredAccess: specifies the request access to the object, which holds the value THREAD_ALL_ACCESS since we would like to have complete access to the object.
- ObjectType: a pointer to the object type, which can be NULL, in which case the operating system will not verify whether the supplied object type matches the object specified by the Handle argument.
- AccessMode: specifies the access mode to use for access check, which must be set to either the UserMode or KernelMode values.
- Object: a pointer to a variable that receives the object’s body. Depending on the ObjectType parameter, we can specify one of the following pointer types:
- HandleInformation: since we’re programming a kernel driver, we must set this to NULL.
In our code we’re using the following call to the ObReferenceObjectByHandle function by passing it an open handle forthread. The function call stores the pointer to the object’s body into the pkthread argument.
1
| ObReferenceObjectByHandle(thread, THREAD_ALL_ACCESS, NULL, KernelMode, &pkthread, NULL); |
The next function is KeWaitForSingleObject, which puts the current thread in wait state until the dispatcher object has been run or until the time given for the dispatcher to execute times out. The syntax of the function can be seen below [12].
The parameters passed to the KeWaitForSingleObject function are the following [12]:
- Object: a pointer to initialized dispatcher object, which can be an event, mutex, semphore, thread or a timer.
- WaitReason: specifies a reason for the wait. Since we’re programming a kernel driver, we should set this to Executive.
- WaitMode: specifies whether the caller waits in UserMode or KernelMode. Since we’re calling from the kernel driver, we should specify KernelMode.
- Alertable: specifies whether the wait is alertable (TRUE) or not (FALSE). Since the wait is not alertable in our case, we should use FALSE.
- Timeout: specifies a pointer to the timeout value, which specifies the time to wait in 100-nanosecods. If we specify 0, the routine returns without waiting, whether NULL waits indefinitely until the dispatcher object has signed that it’s done.
In our kernel driver, we’re using the following KeWaitForSingleObject function call by passing it the previously createspkthread pointer to object’s body.
1
| KeWaitForSingleObject(pkthread, Executive, KernelMode, FALSE, &timeout); |
The next function is KeQueryActiveProcessors, which returns a bitmask of the currently active processors, which syntax is presented below [13]. Notice that the function doesn’t accept any parameters and returns a KAFFINITY value, which represents the set of currently active processors. The KAFFINITY type is a typedef for ULONT_PTR, which is 32-bits on 32-bit Windows systems and 64-bit on 64-bit Windows systems: every bit in the value is set to1 if a processor is present in a system.
The KeGetCurrentThread routine identifies the current thread and returns a pointer to the thread object PKTHREAD. In our code we also defined the KeSetAffinityThread to be a function with stdcall calling convention. When using KeSetAffinityThread, we need to pass in two parameters: the thread of PKTHREAD type and affinity with KAFFINITY type.
1
2
3
4
| typedef NTSTATUS (__stdcall * KeSetAffinityThread)( PKTHREAD thread, KAFFINITY affinity ); |
In the AllCPUs function, we’re using the KeSetAffinityThread as follows. Basically we’re calling MmGetSystemRoutineAddress to return a pointer to the function specified by the str variable, which is set to “KeSetAffinityThread” in our case. When the name of the routine can be resolved, a pointer to that routine will be returned. Otherwise NULL is returned. Drivers usually use this to determine if routine is available on specific version of Windows and can be used for routines exported by the kernel or HAL and not for any driver-defined routine [14].
1
2
3
4
5
| KeSetAffinityThread KeSetAffinityThreadObj; UNICODE_STRING str; RtlInitUnicodeString(&str, L"KeSetAffinityThread"); KeSetAffinityThreadObj = (KeSetAffinityThread)MmGetSystemRoutineAddress(&str); |
We’ve just used an undocumented function KeSetAffinityThread, which is exported by the kernel or HAL and must be called with dynamic linking. Usually when we dynamically link against a function in a DLL, we must first call LoadLibrary to load the DLL into the virtual address space and then we must find the function by walking the export functions linked-list. Since we’re in the kernel-mode now, the MmGetSystemRoutineAddress function can simplify that for us. By calling the MmGetSystemRoutineAddress function, we get the address to the KeSetAffinityThread routine practically for free.
If we follow the AllCPUs routine in WinDbg, it translates to the following assembly instructions. The highlighted instruction is the call to MmGetSystemRoutineAddress function, which stores the value of “KeSetAffinityThread” function into the [ebp-1c] stack address and the value of KeSetAffinityThreadObj into the [ebp-14] stack address.
After that we’re calling the [ebp-14] address, which is the KeSetAffinityThreadObj object and holds a pointer to the KeSetAffinityThread kernel routine. On the picture below we can see where the KeSetAffinityThread is called.
If we set an “int 3″ instruction before that calls in our C++ code, our debugger will be stopped at that point when loading the driver into the kernel. This is shown on the picture below.
If we execute the u command on the current address where “int 3″ instruction is located, we’ll see that we’re at the same location as we previously located. Don’t be bothered with the virtual address being different; this is because we’ve reloaded the driver, which means it was loaded in a different location in the memory.
Let’s now step through the next four instructions, which set up the parameters before “KeSetAffinityThreadObj(thread, curCPU)” is called. On the picture below, we can see that the first parameter is edx, which corresponds to curCPU and the parameter stored in edx is the thread handle.
Just before executing the call instruction, let’s take a look at what’s located at the [ebp-14h] stack address by using the ddcommand.
1
2
3
| kd> dd ebp-0x14 l1 9cb69d3c 8263aa48 |
Notice that the address 0x8263aa48 is stored at that location. This means that the KeSetAffinityThread function is actually located at the 0x8263aa48 address. Let’s now dump that function with the u command.
Now we’ve been able to get our hands on the KeSetAffinityThread function, which sets the affinity of the current thread, so the thread is run on chosen processor. We’ll see the actual C++ code being used in the article to follow.
After that we’re also entering the for loop counting from 0 to 32 where we’re selectively choosing a processor by applying the AND operation on local variable processors. Note that processors variable holds 32 bits where a bit can be set at 1 (processor present) or 0 (processor not present). As many ones as there are in the processor variable is the same as the number of processors that are connected to the system. By shifting the number 1 for 1,2,3,…,31 bits, we’re effectively choosing each processor in a system.
The code that goes over all processors can be seen below. Inside the loop body, there are multiple calls to DbgPrint, which are for debugging purposes to let us know what’s going on. But there’s also the KeSetAffinityThreadObj function call that sets the current thread to be run on a specific processor.
1
2
3
4
5
6
7
8
9
10
| for(i = 0; i < 32; i++) { curCPU = processors & (1 << i); if(curCPU != 0) { DbgPrint("Logical processor 0x%x hookedn", curCPU); KeSetAffinityThreadObj(thread, curCPU); DbgPrint("Thread Argument: reg : %x.rn", thread_arg->reg); DbgPrint("Thread Argument: hook address : %x.rn", thread_arg->hookaddr); HookMSR(thread_arg->reg, thread_arg->hookaddr); } } |
Once the for loop has been executed, we’re reseting the current affinity by calling KeSetAffinityThreadObj again. This ensures that the rest of the program is executed on the same processor as has previously been selected by the scheduler. At last we terminate the currently running thread by calling the PsTerminateSystemThread command.
1
2
3
| KeSetAffinityThreadObj(thread, processors); PsTerminateSystemThread(STATUS_SUCCESS); |
If we compile and reload the driver again, it will print the following into the WinDbg output. Notice that the DriverEntry routine was called and the thread executed on logical processor 1. Then the MSR entry 0×176 was hooked and previous value 0x8265e300 overwritten with 0x96ef1110. Once the hooking process was over, the “Inside Hook Routine” messages were starting to be printed every 1000th call.
Conclusion
In this article we’ve seen how one can go about hooking MSR entries to gain control of the execution of system calls. In the beginning of the article we’ve taken a look at how we can hook the MSR entry on a 1-processor system where we didn’t have to take care of multiple processors. Later on, we’ve also written the code to hook the same MSR entry on all processors, ensuring that the same HookRoutine is called no matter which processor is selected to handle the system call.
Remember that detecting whether the MSR was hooked is trivial, because the pointers stored in MSR registers must point into the ntoskrnl.exe module. After the hooking process is done, the MSR pointer points to MyDriver, which means it’s not pointing to the ntoskrnl.exe module. This means that in order to detect whether the MSR pointers have been hooked, we have to execute a function that goes over the MSR pointers and checks whether the pointers actually point to the ntoskrnl.exe module. We can do that with the same KeSetAffinityThread routine as we used when hooking the pointers in the first place.
Remember that when entering a kernel-mode, we have access to all of the structures, which means that a driver that has been loaded into the kernel can do a lot of damage, especially on Windows 7 and older versions, where the drivers need not to be signed in order to be loaded into the kernel. On Windows 8 and later, the drivers are required to be signed by the known certificate authority in order to be loaded into the kernel, which means that attackers first need to steal a valid certificate that can be used to sign the driver. But that is just another layer of defense and can be quite easily circumvented: all we need to do is sign the driver with a trustworthy valid certificate. How to get our hands on such a certificate is another story, but in the end we can also order the certificate from a certificate authority ourselves.
References
[1] Model-specific register, https://en.wikipedia.org/wiki/Model-specific_register.
[2] Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A: System Programming Guide, Part 1,http://www.intel.com/Assets/ja_JP/PDF/manual/253668.pdf.
[3] Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3C: System Programming Guide, Part 3,http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3c-part-3-manual.pdf
[4] RDMSR, http://faydoc.tripod.com/cpu/rdmsr.htm.
[5] WRMSR, http://faydoc.tripod.com/cpu/wrmsr.htm.
[6] Syscall hooking via MSRs, http://www.blizzhackers.cc/viewtopic.php?t=392361.
[7] InitializeObjectAttributes macro, http://msdn.microsoft.com/en-us/library/windows/hardware/ff547804(v=vs.85).aspx.
[8] PsCreateSystemThread routine, http://msdn.microsoft.com/en-us/library/windows/hardware/ff559932(v=vs.85).aspx.
[10] Thread Security and Access Rights, http://msdn.microsoft.com/en-us/library/windows/desktop/ms686769(v=vs.85).aspx.
[11] ObReferenceObjectByHandle routine, http://msdn.microsoft.com/en-us/library/windows/hardware/ff558679(v=vs.85).aspx.
[12] KeWaitForSingleObject routine, http://msdn.microsoft.com/en-us/library/windows/hardware/ff553350(v=vs.85).aspx.
[13] KeQueryActiveProcessors routine, http://msdn.microsoft.com/en-us/library/windows/hardware/ff553001(v=vs.85).aspx.
[14] MmGetSystemRoutineAddress routine, http://msdn.microsoft.com/en-us/library/windows/hardware/ff554563(v=vs.85).aspx.
Thanks for this Great sharing i like this post..and your blog is amazing…
ReplyDeleteMayur Rele