A quick memory allocator primerWriting memory allocators is a challenging task. An allocator must be performant, limit fragmentation to a bare minimum, scale up to multi-thread applications and handle efficiently both small and large allocations and frequent/unfrequent alloc/free cycles. Looking in depth at allocator designs is beyond the scope of this blog entry, so we'll focus here on only the features that are relevant from an exploitation (and defense) perspective.
Modern operating systems deal with memory in page-sized chunks (ranging from 8K up to 16G on M7). Imagine an application that needs to store a 10 characters string: handing out a 8K page is certainly doable, but is hardly an efficient way to satisfy the request. Memory allocators solve the problem by sitting between the OS physical page allocator and the consumer (being it the kernel itself or an application) and efficiently manage arbitrary sized allocations, dividing pages into smaller chunks when small buffers are needed.
Allocators are composed of three main entities:
- live buffers: a chunk of memory that has been assigned to the application and is guaranteed to hold at least the amount of bytes requested.
- free buffers: chunks of memory that the allocator can use to satisfy an application request. Depending on the allocation design, these are either fixed-size buffers or buffers that can be sliced in smaller portions.
- metadata: all the necessary information that the allocator must maintain in order to efficiently work. Again, depending on the allocator design, this information might be stored within the buffer itself (e.g. Oracle Solaris libc malloc stores most of the data along with the buffer) or separately (e.g. Oracle Solaris umem/kmem SLAB allocator keeps the majority of the metadata into dedicated structures placed outside the objects)
The period that goes from when a buffer is handed out to an application, up until is freed is generally referred to as the buffer lifetime. During this period, the application has full control of the buffer contents. After this period, the allocator regains control and the application is expected to not interfere.
Of course, bugs happen. Bugs can affect both the application working set of buffers or the allocator free set and metadata. If we exclude allocator intrinsic design errors (which, for long existing allocators, due to the amount of exercise they get, are basically zero), bugs always generate from the application mishandling of a buffer reference, so they always happen during the lifetime of a buffer and originate from a live buffer. It's no surprise that live buffer behavior is what both attackers and defenders start from.
Exploitation vectors and techniquesAs we said, bugs originate from the application mishandling of allocated buffers:
- mishandling of buffer size: the classic case of buffer overflow. The application writes past the buffer boundaries into adjacent memory. Because buffers are intermixed with other live buffers, free buffers and, potentially, metadata, each one of those entities becomes a potential target and attackers will go for the most reliable one.
- mishandling of buffer references: a buffer is relinquished back to the allocator, but the attacker still holds a reference to it. Traditionally, these attacks are known as use after free (UAF), although, since this is an industry that loves taxonomies, it's not uncommon to see them further qualified as use after realloc (the buffer is reallocated, but the attacker is capable of unexpectedly modifying it through the stale reference) and double free (the same reference is passed twice to the free path). Sometimes an attacker is also capable of constructing a fake object and passing it to a free call, for example if the application erroneously calls free of a buffer allocated onto the stack. The degree of exploitability of these vulnerabilities (if we exclude the use after realloc case, which is application-specific) varies depending on what the allocator does during the free path and how many consistency/hardening checks are present.
- entrypoint checks: add consistency check at the defined free and alloc entrypoints. As an example, an allocator could mark into the buffer associated metadata (or poison the buffer itself) that the buffer has been freed. It would then be able to check this information whenever the free path is entered and a double free could be easily detected. Many of the early days techniques to exploit heap overflows (~2000, w00w00 , PHRACK57 MaXX's and anonymous' articles) relied on modifying metadata that would then be leveraged during the free path. Over time, some allocators have added checks to detect some of those techniques.
- design mitigations: attackers crave for control of the heap layout: in what sequence are buffer allocated, where are they placed, how can a buffer containing sensitive data be conveniently allocated in a specific location. Allocators can introduce statistical mitigations to hamper some of the techniques to achieve this level of control. As an example, free object selection can be randomized (although it ends up being pretty ineffective against a variety of heap spraying techniques and/or if the attacker has quite some control on the allocation pattern), free patterns can be modified (Microsoft IE Memory Protector) or sensitive objects can be allocated from a different heap space (dedicated SLAB caches, Windows IE Isolated Heap, Chrome PartitionAlloc, etc). The bottom line goal of these (and other) design approaches is to either reduce the amount of predictability of the allocator or increase the amount of precise control that the attacker needs to have in order to create the successful heap layout conditions to exploit the bug.
The practical result is that slab overflows are today probably the most reliable type of vulnerability at kernel level and use after free are a close second in kernel land, while extensively targeted in user land, with only the browsers being significantly more hardened than other components. Extensive work is going on towards automating and abstracting the development of exploits for such bugs (as recently presented by argp at Zeronights), which makes the design of efficient defenses even more compelling.
ADI to the rescueEnter the Oracle SPARC M7 processor and ADI, Application Data Integrity, that were both unveiled at HotChips and Oracle OpenWorld 2014. At its core, ADI provides memory tagging. Whenever ADI is enabled on a page entry, dedicated non-privileged load/store instructions provide the ability to assign a 4-bit version to each 64-byte cacheline that is part of the page. This version is maintained by the hardware throughout the entire non-persistent memory hierarchy (basically, all the way down to DRAM and back).
The same version can then be mirrored onto the (previously) unused 4 topmost bits of each virtual address. Once this is done, each time a pointer is used to access a memory range, if ADI is enabled (both at the page and per-thread level), the tag stored in the pointer is checked by the hardware against the tag stored in the cache line. If the two match, all is peachy. If they don't, an exception is raised.
Since the check is done in hardware, the main burden is at buffer creation, rather than at each access, which means that ADI can be introduced in a memory allocator and its benefit extended to any application consuming it without the need of extra instrumentation or special instructions into the application itself. This is a significant difference from other hardware-based memory corruption detection options, like Intel MPX, and minimizes the performance impact of ADI while maximizing coverage. More importantly, this means we finally have a reliable way to detect live object mishandling: the hardware does it for us.
[ADI versioning at work. Picture taken from Oracle SPARC M7 presentation]
4 bits allow for a handful of possible values. There are two intuitive ways in which an ADI-aware allocator can invariantly detect a linear overflow from a buffer/object to the adjacent one:
- introduce a redzone with a special tag
- tag differently any two adjacent buffers
Tagging differently two adjacent objects, instead, has the advantage of reducing memory wastage. In fact, the only induced wastage is the one from forcing the alignment to a 64-byte boundary. It also requires, though, to be able to uniquely pick a correct tag value at allocation time. Object-based allocators are a particularly good fit for this design because they already take some of the penalty for wasted memory (and larger caches are usually already 64-bit aligned) and their design (fixed size caches divided in a constant number of fixed size objects) allows to uniquely identify objects based on their address. This provides the ability to alternate between two different values (or range of values, e.g. odd/even tags, smaller/bigger than a median) based on the ibject position. For other allocators, the ability to properly tag the buffer depends on whether there is enough metadata to learn about the previous and next object tag. If there is, then this can still be implemented, if there isn't, one might decide to employ a statistical defense by randomizing the tag (note that the same point applies also to object-based allocators when we look at large caches, where effectively only a single object is present per cache).
A third interesting property of tagging is that it can be used to uniquely identify classes of objects, for example free objects. As we discussed previously, metadata and free objects are never the affector, but only the affectee of an attack, so one tag each suffices. The good side effect of devoting a tag each is that the allocator now has a fairly performant way to identify them and issues like double-frees can be easily detected. In the same way, it's also automatically guaranteed that a live object will never be able to overflow into metdata or free objects, even if a statistical defense (e.g. tag randomization) is employed.
Use-after-realloc and arbitrary writesADI does one thing and does it great: provides support to implement an invariant to detect linear overflows. Surely, this doesn't come without some constraints (64-byte granularity, 64-byte alignment, page-level granularity to enable it, 4-bit versioning range) and might be a more or less good fit (performance and design-wise) for an existing allocator, but this doesn't detract from its security potential. Heartbleed is just one example of a linear out-of-bound access and SLAB/heap overflow fixes have been in the commit logs of all major operating systems for years now. Invariantly detecting them is a significant win.
Use-after-realloc and arbitrary writes, instead, can't be invariantly stopped by ADI, although ADI can help in mitigating them. As we discussed, use-after-realloc rely on the ability, by the attacker, to hold a reference to a free-and-then-realloced object and then use this reference to modify some potentially sensitive content. ADI can introduce some statistical noise in this exploitation path, by looping/randomizing through different values for the same buffer/object. Note that this doesn't affect the invariant portion of, for example, alternate tagging in object-based allocators; it simply takes further advantage of the versioning space. Of course, if the attacker is in the position of performing a bruteforce attack, this mitigation would not hold much ground, but in certain scenarios, bruteforcing might be a limiting factor (kernel level exploitation) or leave some detectable noise.
Arbitrary writes, instead, depend on the ability of the attacker to forge an address and are not strictly related to allocator ranges only. Since the focus here is the allocator, the most interesting variant is when the attacker has the ability to write to an arbitrary offset from the current buffer. If metadata and free objects are specially tagged, they are unreachable, but other live objects with the same tag might be reached. Just as in the use-afte-realloc case, adding some randomization to the sequence of tags can help, with the very same limitations. In both cases, infoleaks would precisely guide the attacker, but this is basically a given for pretty much any statistical defense.