Owning the Image Object File Format, the Compiler Toolchain, and the Operating System
As the Windows kernel continues to pursue in its quest for ever-stronger security features and exploit mitigations, the existence of fixed addresses in memory continues to undermine the advances in this area, as attackers can use data corruption vulnerabilities and combine these with stack and instruction pointer control in order to bypass SMEP, DEP, and countless other architectural defense-in-depth techniques. In some cases, entire mitigations (such as CFG) are undone due to their reliance on a single, well-known static address. In the latest builds of Windows 10 Redstone 1, aka “Anniversary Update”, the kernel takes a much stronger toward Kernel Address Space Layout Randomization (KASLR), employing an arsenal of tools that can only be available to an operating system developer that also happens to own the world’s most commercially successful compiler, and the world’s most pervasive executable object image format.The Page Table Entry Array
One of the most unique aspects of the Windows kernel is the reliance on a fixed kernel address to represent the virtual base address of an array of page table entries that describes the entire virtual address space, and the usage of a self-referencing entry which acts as a pivot describing the page directory for the space itself (and, on x64 systems, describing the page directory table itself, and the page map level 4 itself). This elegant solutions allows instant O(1) translation of any virtual address to its corresponding PTE, and with the correct shifts and base addresses, a conversion into the corresponding PDE (and PPE/PXE on x64 systems). For example, the function MmGetPhysicalAddress only needs to work as follows:

- Using virtual-mapped tables based on the EPROCESS structure (as Linux and OS X do) causes significant performance impact, as pointer chasing the different tables now causes cache misses and page translations. This becomes even worse when thinking about multi-processor systems, and the cache waste that this causes (where the TLB may end up getting filled with the various global (locked) pages corresponding to the page tables of various processes, instead of only the current process).
- Changing the address of the PTE array has a number of compatibility concerns, as PTE_BASE is actually documented in ntddk.h, part of the Windows Driver Kit. Additionally, once the new address is discovered, attackers can simply adjust their exploits to use the appropriate static address based on the version of the operating system.
- Randomizing the address of the PTE array means that Windows memory manager functions can no longer use a static constant or preprocessor definition for the address, but must instead access a global address which contains it. Forcing every processor to dereference a single global address every single time a virtual memory operation (allocation, protection, page walk, fault, etc…) is performed is a significantly negative performance hit, especially on multi-socket, NUMA systems.
- Dealing with the global variable problem above by creating cache-aligned copies of the address in a per-processor structure causes a waste of precious kernel storage space (for example, assuming a 64-byte cache line and 640 processors, 40KB of physical memory are used to replicate the variable most efficiently). However, on NUMA systems, one would also want the page containing this data to be local to the node, so we might imagine an overhead of 4KB per socket. In practice, this wouldn’t be quite as bad, as Windows already has a per-NUMA-node-allocated, per-processor, cache-aligned list of critical kernel variables: the Kernel Processor Region Control Block (KPRCB).
Dynamic Relocation Generation
When the Portable Executable (PE) file format was created, its designers realized an important issue: if compiled code made absolute references to data or functions, these hardcoded pointer values might become invalid if the operating system loaded the executable binary at a different base address than its preferred address. Originally a corner case, the advent of user-mode ASLR made this a common occurrence and new reality. In order to deal with such rebasing operations, the PE format includes the definition of a special data directory entry called the Relocation Table Directory (IMAGE_DIRECTORY_ENTRY_BASERELOC). In turn, this directory includes a number of tables, each of which is an array of entries. Each entry ultimately describes the offset of a piece of code that is accessing an absolute virtual address, and the required adjustment that is needed to fixup the address. On a modern x64 binary, the only possible fixup is an absolute delta (increment or decrement), but more exotic architectures such as MIPS and ARM had different adjustments based on how absolute addresses were encoded on such processors). These relocations work great to adjust hardcoded virtual addresses that correspond to code or data within the image itself – but if there is a hard-coded access to 0xC0000000, an address which the compiler has no understanding of, and which is not part of the image, it can’t possibly emit relocations for it – this is a meaningless data dereference. But what if it could? In such an implementation, all accesses to a particular magic hardcoded address could be described as such to the compiler, which could then work with the linker to generate a similar relocation table – but instead of describing addresses within the image, it would describe addresses outside of the image, which, if known and understood by the PE parser, could be adjusted to the new location of the hard-coded external data address. By doing so, compiled code would continue to access what appears to be a single literal value, and no global variable would ever be needed, cancelling out any disadvantages associated with the randomization of this address. Indeed, the new build of the Microsoft C Compiler, which is expected to ship with Visual Studio 15 (now in preview), address a special annotation that can be associated with constant values that correspond to external virtual addresses. Upon usage of such a constant, the compiler will ensure that accesses are done in a way that does not “break up” the address, but rather causes its absolute value to be expressed in code (i.e.: “mov rax, 0xC0000000”). Then, the linker collects the RVAs of such locations and builds structures of type IMAGE_DYNAMIC_RELOCATION_ENTRY, as shown below:

