Barely grasping the small picture: Hardware-based buffer overflow defenses compared: SSM/ADI vs MPX

One of the most common questions when discussing SPARC M7 SSM/ADI (Silicon Secured Memory/Application Data Integrity, from here on only referred to as ADI) is "how does it compare to XYZ?", where XYZ is some other architecture security feature. In this blog entry we'll see how ADI stacks against another buffer overflow protection security feature, Intel Memory Protection eXtensions (MPX).

Before starting, a somewhat obvious, but necessary, observation: these features do not provide security "by themselves", instead, they provide building blocks to develop protections on top. In other words, they are meant to, and need to, be (extensively) leveraged by software implementations to be in any way effective.

ADI

I've already covered ADI extensively in another blog entry, so I won't dwell too much into the details here and just provide a quick recap. ADI implements a form of memory tagging and checking: individual cache lines can be assigned a color (a numeric value ranging from 0 to 16) and at load/store time such color is checked against the color saved onto the 4 topmost unused bits of the target virtual address. If the color matches, the operation goes through; if it doesn't, an exception is raised and the instruction is ignored. 0 and 16 are universal matches: a cache line colored with either of those will not fault regardless of the color used in the virtual address. ADI only works on the data path, it has not effect on the instruction fetch and execute process.

Memory tagging is not a revolutionary concept: it has actually floated a lot across academia for a while. What really makes ADI stand out, though, is that for the first time tagging is both granular enough (64 bytes -- a cache line) and seamlessly integrated in the architecture to be used at large for general purpose scenarios (e.g. a memory allocator). Oh, and it's fairly performant, too. In this respect, SPARC is the first "mainstream" architecture to achieve that.

From a security perspective, ADI does one thing and does it great: detect linear overflows. A linear overflow happens when a program reads past the boundaries of a buffer. If we can guarantee that two adjacent buffers never share the same color, we can guarantee that a read starting from one will never successfully hit the other. If this doesn't seem like much, look at it as a way to put a definitive end to heap overflow exploitation.

ADI does reasonably well also for non malicious stray pointers: by randomizing the assigned colors at runtime, there is a fair chance that a stray pointer will not have a matching color. For malicious (read: attacker forged) stray pointers, the music changes. In fact ADI isn't particularly suited to protect against arbitrary read/write patterns. While it does add a deterrence (the attacker still needs to leak or guess the target color), the range is super small, just fourteen values, and easy to brute force/guess.

When used for system level defenses, e.g. inside memory allocators, ADI works well with lots of software, but still changes two main assumptions: (1) math can be exercised directly onto pointers and (2) the ability to access one byte into a page implies the ability to access any other byte in the page in the same way. (1) can be solved by normalizing pointers before any math operation, while (2) can be solved by either loading the version and adjusting the pointer accordingly or, only for reading, by using a non faulting load. Non faulting loads are particularly interesting because they provide an efficient way to implement trusted paths (portions of the program that have a legitimate reason to bypass ADI checking when accessing a given memory location). As an example of this, see this recent fix for Python to make it work well with an ADI enhanced memory allocator.

MPX

Another approach to detect buffer overflows is to add instrumentation where the buffer is accessed, to verify whether the read/write operation stays within the boundaries. This approach has two drawbacks that make it less suitable for production scenarios: (1) it adds some (non marginal) performance penalty, since extra instructions need to be executed and (2) it cannot be switched on/off when deploying a system or needing relief when hitting a false positive (if binary recompilation is used, then a problem arises if the package manager or some other entity relies on the hash of the binary). Intel proposed solution to the above two problems is the Memory Protection eXtensions technology.

MPX introduces a couple of new registers and a handful of instructions that operate on them. The new registers BND[0..3] are 128 bits long, with 64-bit used to store the upper bound and 64-bit used to store the lower bound of a buffer. Three new instructions allow to check a pointer against said bounds: BNDCL (Bound Check Lower Bound), BNDCU (Bound Check Upper Bound) and BNDCN (Bound Check Upper Bound not in 1s Complement). For example, bndcl (%rax), %bnd0 compares the contents of RAX against the lower bound set in BND0. If the check fails, a new #BR exception is raised. BNDC* instructions are very fast, to reduce the performance penalty.

Of course, 4 bound registers aren't enough for every buffer used in a program, so MPX supports a number of ways to swap back and forth the necessary upper/lower bound values: BNDMK (Bound Make) stores a pair of addresses into one of the BNDx registers, BNDMOV (Bound Move) loads a pair from a location in memory and BNDLDX/BNDSTX manage the Bound Table, which stores information about a pointer and its bounds. Bound Tables are arranged in a two-level directory in memory and the root address is stored in BNDCFGU (user land, CPL=3) or BNDCFGS (kernel land, CPL=0). BND*X and BNDMOV instructions simplify bounds management, but do logically introduce a larger performance hit.

MPX relies heavily on the compiler/instrumentation to be effective: while the programmer can add manual checks, it's the compiler that needs to identify the places where a check is necessary and introduce the proper instruction sequences there. The smarter this logic is, the better the performance is going to be. A quick analysis of MPX performance (and more) is available on the AddressSanitizer wiki.

MPX is fully retro compatible, as its instructions use prefixes that are treated as NOPs on older architectures. This allows to build one single binary and distribute it around. The same also happens when MPX is disabled, which allows admins to toggle on/off the protection on a binary basis. MPX interoperates well with existing code, allowing to mix instrumented and non instrumented components into the same process (with some caveat). The idea there is to allow MPX to be introduced gradually in large applications, starting with the sensitive modules.

MPX vs ADI

Both MPX and ADI aim at detecting buffer overflows, although they look at the buffer from two different angles. Intel MPX aims at providing instructions so that an about to be performed operation can check whether what it wants to do over/underflows the buffer boundaries, while ADI aims at letting the operation go, but detect, through color violations, whether it does something wrong as it happens. From this perspective, ADI scales better for long lived buffers that might be accessed in various places within an application, as you only have to to tag the buffer once at creation time and the hardware will do the rest. The chance of missing one access point, due to a particularly odd construct, is a non issue with ADI. This is a particularly nice property when looking at building an invariant on top of it.

Similarly, the above also means that ADI is simpler to retrofit into legacy applications. Where MPX would require a recompilation/translation, with ADI one can design system level defenses (e.g. at the allocator level, which is the only entity responsible of the tagging) that can be enabled on legacy applications. Of course, if an application uses its own memory allocator or if one is looking at protecting other data regions, code changes (or compiler support, e.g. to track .data segment or automatic variables) is necessary, just like with MPX.

ADI imposes more constraints on the programmer. Minimum granularity is 64 bytes and pointers need to be aligned to - or contained within - the cache line boundaries. MPX has much better precision, as it can detect up to 1-byte overflows in just about any scenario (e.g. buffer overflow across structure members). More generally, any type of erroneous pointer access can be detected, provided that the proper bounds are set. This makes MPX simpler to apply to an arbitrary part of an application, regardless of what kind of data it is operating on.

Both technologies are meant to mix well within an application (ADI use is for the most part implicit, as demonstrated by the ability to retrofit into legacy applications) and do not require a special binary to run on older architectures. When disabled, MPX still imposes some performance penalty, both directly (very minimal, the NOPs take space and need to be executed) and indirectly (side effect of disabling certain optimizations that collide with the instrumentation engine). With ADI, at least for the cases where no compiler support is needed, there is no impact when it is disabled.

Both technologies are meant to be fast, but interesting side effects show up with large applications. MPX memory consumption (for Bound Tables or BNDMOV backup storage) grows significantly if the application manipulates a large number of pointers and similarly performance drops down due to the large amount of swapping necessary to set BND[0..3] registers. Also, while it's true that MPX can be gradually introduced into an application (by limiting the places where it is used), it is still pretty much black and white, as all the introduced instructions are there in the .text segment. In other words, it's hard to create different defenses based on MPX and only selectively enable some of them on a target application (unless, of course, this is done at build time).

On the contrary, ADI makes the above much simpler, since only the producer of the memory region is involved in the tagging. This also makes it generally better performant for a system level defense, although certain applications (especially those that make a substantial use of small buffers) might experience slow downs, due to the larger memory footprint and the smaller amount of objects fitting into a single cache line. A tradeoff is possible there, by packing more objects under the same color, losing the ability to detect a small overflow between them.

Summing up (and feel free to look for bias here and call me out on it), ADI is a superior solution for system level defenses for production scenarios, while MPX is a much stronger debugging tool, showing all its Pointer Checker heritage. Don't get me wrong: both can be used in both scenarios -- I can certainly envision adding MPX protections to binaries (and a security extension to control it) and Oracle Studio Discover does a great job of leveraging ADI to find bugs, but the trade offs in MPX (mandatory recompilation, significant compiler support, higher performance impact, higher precision) tilt the balance towards the debugging scenario, while ADI trade offs (more significant offload to hardware, possibility to design system level defenses that can be applied to legacy binaries, smaller performance impact, smaller precision geared towards more common cases) make it a better tool for production environments.

Barely grasping the small picture

Saturday, December 31, 2016

Hardware-based buffer overflow defenses compared: SSM/ADI vs MPX

ADI

MPX

MPX vs ADI

1 comment: