Shadow Stack to fight buffer overflows Shadow stack (lets call it like that for now) is a new mitigation present on latest x64 windows 10 insider builds which prevents stack based buffer overflows. It makes use of shadow stack to save return address, and before ret RIP on stack is compared with RIP saved on shadow stack. It implemented using fs register as delta to shadow stack. fs should holds delta between stack (rsp) and shadow stack. Currently, that value is 0. Just as a reminder, on x64 (long mode), fs and gs are not referenced by their index in GDT, but their base is determined by MSRs. gs = IA32_GS_BASE and IA32_KERNEL_GS_BASE fs = IA32_FS_BASE Good thing about this mitigation is, even if you can leak stack cookie and use it during buffer overflow, you can't leak address of shadow stack nor modify it. Example: .text:00007FFA8DA7CFA0 mov rax, [rsp+0] <--- get RIP .text:00007FFA8DA7CFA4 mov fs:[rsp], rax <--- save to shadow stack .text:00007FFA8DA7CFA9 push rbx .text:00007FFA8DA7CFAA sub rsp, 20h .text:00007FFA8DA7CFAE mov ebx, edx .text:00007FFA8DA7CFB0 cmp edx, 1 .text:00007FFA8DA7CFB3 jz short loc_7FFA8DA7CFD0 .text:00007FFA8DA7CFB5 .text:00007FFA8DA7CFB5 loc_7FFA8DA7CFB5: .text:00007FFA8DA7CFB5 mov edx, ebx .text:00007FFA8DA7CFB7 add rsp, 20h .text:00007FFA8DA7CFBB pop rbx .text:00007FFA8DA7CFBC mov r11, fs:[rsp] <--- get saved RIP .text:00007FFA8DA7CFC1 cmp r11, [rsp+0] <--- compare with one on stack .text:00007FFA8DA7CFC5 jnz j_RtlFailFast This new mitigation is applied to image by ntoskrnl, and IMAGE_LOAD_CONFIG_DIRECTORY is extended to contain new data. In some cases those are relocations, and in another case I'm not sure yet. As soon as symbols for latest insider build are up maybe this "random" data will make more sense. Of course, this new mitigation can reduce number of gadgets in case of programs which don't have CFG enabled. Maybe not intended, but another benefit is that it will crash program if stack is pivoted due to access to fs:[] which will lead to ACCESS_VIOLATION, or read from fs:[] in best case would read from some random location which by some chance happens to be there and RtlFailFast will be called. Example (well not nice gadget but good example how it's preventing gadget execution): .text:00007FFA8DA80BBD pop rdi .text:00007FFA8DA80BBE pop rbx .text:00007FFA8DA80BBF mov r11, fs:[rsp] .text:00007FFA8DA80BC4 cmp r11, [rsp+0] .text:00007FFA8DA80BC8 jnz j_RtlFailFast .text:00007FFA8DA80BCE retn Prolog/epilog of image on disk/memory looks like this: +----------------------------------------+-----------------------------------+ | on disk | in memory | +----------------------------------------+-----------------------------------+ | xchg ax, ax | mov rax, [rsp+0] | | nop dword ptr [rax+00000000h] | mov fs:[rsp], rax | +----------------------------------------+-----------------------------------+ | jmp sub_18000D1E0 | mov r11, fs:[rsp] | | nop | cmp r11, [rsp+0] | | nop | jnz j_RtlFailFast | | nop | | | nop | | | nop | | | nop | | | nop | | | nop | | | nop | | | nop | | | jmp sub_18000D1E0 | | +----------------------------------------+-----------------------------------+ Similar thing I implemented long time ago using DynamoRIO for x32, but due to speed this code didn't make too much sense in real life scenarios. Quick description of some fields in extended LOAD_IMAGE_CONFIG_DIRECTORY: .rdata:00000000001C2190 _load_config_used dd 0E8h ; Size .rdata:00000000001C2194 dd 0 ; Time stamp .rdata:00000000001C2198 dw 2 dup(0) ; Version: 0.0 .rdata:00000000001C219C dd 0 ; GlobalFlagsClear .rdata:00000000001C21A0 dd 0 ; GlobalFlagsSet .rdata:00000000001C21A4 dd 0 ; CriticalSectionDefaultTimeout .rdata:00000000001C21A8 dq 0 ; DeCommitFreeBlockThreshold .rdata:00000000001C21B0 dq 0 ; DeCommitTotalFreeThreshold .rdata:00000000001C21B8 dq 0 ; LockPrefixTable .rdata:00000000001C21C0 dq 0 ; MaximumAllocationSize .rdata:00000000001C21C8 dq 0 ; VirtualMemoryThreshold .rdata:00000000001C21D0 dq 0 ; ProcessAffinityMask .rdata:00000000001C21D8 dd 0 ; ProcessHeapFlags .rdata:00000000001C21DC dw 0 ; CSDVersion .rdata:00000000001C21DE dw 0 ; Reserved1 .rdata:00000000001C21E0 dq 0 ; EditList .rdata:00000000001C21E8 dq offset __security_cookie ; SecurityCookie .rdata:00000000001C21F0 dq 0 ; SEHandlerTable .rdata:00000000001C21F8 dq 0 ; SEHandlerCount .rdata:00000000001C2200 dq offset __guard_check_icall_fptr ; GuardCFCheckFunctionPointer .rdata:00000000001C2208 dq offset off_1C5498 ; Reserved2 .rdata:00000000001C2210 dq offset __guard_fids_table ; GuardCFFunctionTable .rdata:00000000001C2218 dq 9A1h ; GuardCFFunctionCount .rdata:00000000001C2220 dd 10033500h ; GuardFlags .rdata:00000000001C2224 dd 0 .rdata:00000000001C2228 dd 0 .rdata:00000000001C222C dd 0 .rdata:00000000001C2230 dd 0 .rdata:00000000001C2234 dd 0 .rdata:00000000001C2238 dd 0 .rdata:00000000001C223C dd 0 .rdata:00000000001C2240 dd 0 .rdata:00000000001C2244 dd 0 .rdata:00000000001C2248 dd 0 .rdata:00000000001C224C dd 0 .rdata:00000000001C2250 dq 0 <---- ntoskrnl writes here delta between relocated image and imagebase in PE header .rdata:00000000001C2258 dq 0 .rdata:00000000001C2260 dq offset sub_14C6D0 <---- RtlFailFast .rdata:00000000001C2268 dq offset off_1C54A0 <---- offset of RtlFailFast .rdata:00000000001C2270 dd 0A78h <---- offset to data which describes where are patch bytes. It is offset from relocs start in image. This data comes right after relocs, and this offset, from what I've seen, matches size of relocs in PE header. .rdata:00000000001C2274 dd 6 .rdata:00000000001C2278 dd 0 .rdata:00000000001C227C dd 0 Thats it for now... deroko of ARTeam