[tor-dev] TBB Memory Allocator choice fingerprint implications

Wed Aug 21 13:40:31 UTC 2019

On Mon, Aug 19, 2019 at 04:09:36PM +0000, Tom Ritter wrote:
> Okay I'm going to try and clear up a lot of misconceptions and stuff
> here.  I don't own Firefox's memory allocator but I have worked in it,
> recently, and am one of the people who are working on hardening it.

This makes it clear why you're spreading misinformation. You're going
out of your way to make false and misleading claims about mozjemalloc
and hardened_malloc, particularly your bogus comparisons between them.

Bolting on a few weak implementations of hardening features to an
allocator inherently very friendly to memory corruption exploitation
does not make it anything close to being hardened allocator, sorry.

> Firefox's memory allocator is not jemalloc. It's probably better
> referred to as mozjemalloc. We forked jemalloc and have been improving
> it (at least from our perspective.) Any analysis of or comparison to
> jemalloc is - at this point - outdated and should be redone from
> scratch against mozjemalloc on mozilla-central.

It's not particularly different and the comparison isn't outdated. I've
made some substantial contributions to jemalloc upstream and wrote the
integration into Rust. I'm deeply familiar with the design choices and
implementation of jemalloc. Those design choices lead to having a memory
allocator that's extremely friendly to exploitation. A lot of this is
covered in the hardened_malloc documentation, without going out of the
way to directly compare it to specific allocators. If the documentation
is not currently clear enough, I could provide a comparison to jemalloc
as an example of comparing a very hardened allocator implementation to
one that's the direct opposite. It makes exploitation significantly
easier overall than the traditional dlmalloc style design.

> LD_PRELOAD='/path/to/libhardened_malloc.so' /path/to/program will do
> nothing or approximately nothing. mozjemalloc uses mmap and low level
> allocation tools to create chunks of memory to be used by its internal
> memory allocator. To successfully replace Firefox memory allocator you
> should either use LD_PRELOAD _with_ a --disable-jemalloc build OR
> Firefox's replace_malloc functionality:
> https://searchfox.org/mozilla-central/source/memory/build/replace_malloc.h

LD_PRELOAD is not how hardened_malloc is supposed to be used outside of
testing it anyway. It's meant to be integrated in libc, in which case
--disable-jemalloc would be enough, although it can also be integrated
into a specific program. That doesn't sidestep the importance of doing
other hardening in libc and the rest of the system though.

> Fingerprinting: It is most likely possible to be creative enough to
> fingerprint what memory allocator is used. If we were to choose from
> different allocators at runtime, I don't think that fingerprinting is
> the worst thing open to us - it seems likely that any attacker who
> does such a attack could also fingerprinting your CPU speed, RAM, and
> your ASLR base addresses which depending on OS might not change until
> reboot.

They can obtain a lot more than just information about the hardware. A
lot of the hardware and OS information that fingerprinting mitigations
try to hide are leaked via performance measurements. It can also leak a
lot of data from within the browser.

> The only reason I can think of to choose between allocators at runtime
> is to introduce randomness into the allocation strategy. An attacker
> relying on a blind overwrite may not be able to position their
> overwrite reliably AND it has the cause the process to crash otherwise
> they can just try again.

The hardened_malloc design provides far more than randomization, and the
fine-grained randomization within size class regions is one of the least
important and lowest priority features.

https://github.com/GrapheneOS/hardened_malloc/blob/master/README.md

> Allocators can introduce randomness themselves, you don't need to
> choose between allocators to do that.

This is not something that can simply be bolted onto an existing
allocator design with a good approach. It needs to be more heavily
integrated into the design, and the same applies to an even greater
extent to more important security features than weak fine-grained
randomization.

> In virtually all browser exploits we have seen recently the attacker
> creates exploitation primitives that allow partial memory read/write
> and then full memory read/write. Randomness introduced is bypassed and
> ineffective. I've seen a general trend away from randomness for this
> purpose. The exception is when the attacker is heavily constrained -
> like exploiting over IPC or in a network protocol. Not when the
> attacker has a full Javascript execution environment available to
> them.
>
> When exploiting a memory corruption vulnerability, you can target the
> application's memory (meaning, target a DOM object or an ArrayBuffer)
> or you can target the memory allocator's metadata. While allocator
> metadata corruption was popular in the past, I haven't seen it used
> recently.

The importance of out-of-line metadata is far beyond simply preventing
exploitation through the allocator metadata. It's crucial for a hardened
allocator to have a reliable source of information about allocations
without trusting data read from freed allocations or next to the memory
allocations.

> Okay all that out of the way, let's talk about allocators.
> 
> I skimmed https://github.com/GrapheneOS/hardened_malloc and it looks
> like it has:
>  - out of line metadata

It has fully out-of-line metadata in a reserved region with the entirety
of the mutable allocator state. This is not simply for the sake of
protecting the metadata.

>  - double free protection

The hardened_malloc approach provides deterministic, immediate detection
of every invalid free for both slab and large allocations. It detects
much more than detect double free and doesn't rely on a probabilistic
approach or inline / semi-inline metadata to provide it.

The quarantine implementations for slab allocations, slabs and large
allocation also massively help with this. For slab allocations it has to
go out of the way to preserve the deterministic, immediate double free
detection since a naive quarantine implementation would interfere.

>  - guard regions of some type

It has much more than 'guard regions of some type'. It's designed around
being able to implement features like this is a very meaningful and
impactful way with low performance cost.

It reserves a slab allocation region, partitions that into arenas and
partitions those into dedicated regions for each size class with large
high entropy random guard regions around them.

Inside these size class regions, it skips slab locations to leave behind
guard slabs, providing a sparse heap. These are currently inserted at a
deterministic interval, but the plan is to randomize that to mitigate
heap sprays within the size class regions.

Freed slabs beyond the small amount of cached slabs are also replaced
with fresh guard regions, which is helpful. There's a quarantine for
these too, to keep them memory protected once they're become just like
the never allocated slab locations.

Large allocations (> 128k) have randomly sized guards around them using
a random range based on a ratio to the allocation size. When a large
allocation is freed, it's quarantined by replacing it with a guard
region held onto as long as possible based on the large allocation
quarantine size.

>  - zero-filling

It also provides write-after-free detection via checking the zero
filling at allocation time, and the slab allocation quarantine is
designed to work well with this.

>  - MPK support

This isn't one of the major features.

>  - randomization

This isn't simply a basic yes or no point about an allocator.

The slot randomization within slabs is the least impactful / important
form of this in the implementation, aside from the planned randomization
for choosing slabs within size class regions.

The quarantine randomization and high entropy bases for partitions are
much more important. The quarantine randomization and the FIFO portion
of it are also going to mix very well with the planned approach to using
memory tagging for deterministic use-after-free detection, while still
having deterministic sequential overflow detection and randomized tags
as a probabilistic general purpose memory corruption mitigation.

>  - support for arenas

This is a completely dishonest and ridiculous misrepresentation of the
design and security properties laid out in that README. People should
read it for themselves and they'll see that your attempt at spinning
misinformation about it is a complete joke.

> mozjemalloc:
>  - arenas (we call them partitions)

Unlike hardened_malloc, they're mixed together and address space is
reused between them rather than having strong isolation. The approach in
hardened_malloc also partitions each size class. Calling the mozjemalloc
arenas partitions as if it's an implementation of a security feature is
a joke.

>  - randomization (support for, not enabled by default due to limited
> utility, but improvements coming)
>  - double free protection
>  - zero-filling
> In Progress:
>  - we're actively working on guard regions

As covered above, you're being misleading with each of these points, by
portraying these things as something black and white that the allocator
either has or doesn't have when that couldn't be further from reality.

In particular, talking about randomization and guard regions as if this
is a matter of having them or not having them is ridiculous. There is
more to invalid free detection than double free detection and how well
it works has a lot of variation. It can be deterministic detection like
the hardened_malloc implementation, or probabilistic detection that an
attacker could much more easily bypass. The reuse of freed allocations
also matters a lot, since once it's handed out again, a free based on
a past generation allocation won't be considered invalid, despite it
being wrong and dangerous. This is why the design of memory allocation
reuse and quarantines matters so much. The documentation on thread
caching in the hardened_malloc README elaborates on why that's not
compatible with a hardened allocator due to interfering with doing
anything like this properly.

> Future Work:
>  - out of line metadata

There's a huge variation in what this means. The hardened_malloc
metadata is fully out-of-line in a dedicated region, with that address
space never mixed / reused with anything else. The same applies to all
the size class regions within arenas.

>  - MPK

For what exactly?

> harden_malloc definitely has more bells and whistles than mozjemalloc.

The hardened_malloc implementation is far simpler and more minimal. It
has a core design focused heavily on having quantifiable, best in class
security. The security is built into the design from the ground up and
cannot simply be bolted onto to it or another allocator. I had to write
it in the first place because I was unable to turn OpenBSD malloc into
something comparable by simply continuing to bolt on additional security
features to it which was the original approach. It should also be noted
that OpenBSD malloc is a hardened allocator and shares many of the same
security properties from the core design. The main distinction in the
core approach is that hardened_malloc is heavily designed around taking
advantage of the large address space on modern machines.

> But the benefit gained by slapping in an LD_PRELOAD and calling it a
> day is small to zero. Probably negative because you'll not utilize
> partitions by default. You'd need a particurally constrained
> vulnerability to actually prevent exploitation - it's more likely
> you'll just cost the attacker another 2-8 hours of work.

The claim that mozjemalloc has partitioning and hardened_malloc does not
couldn't be further from the truth. The opposite is true. What you claim
to be partitioning is not a proper implementation, and hardened_malloc
does have a proper implementation not only for arenas, but for dedicated
size class regions within those. I find it ridiculous how you attempt to
attack the project with these lies to promote your own work, which is
hardly comparable at all. It's not the same thing. Bolting on a few
security features to an allocator design that's exploitation friendly
from the ground up doesn't make it a hardened allocator. That's even
more true when the implementations of those features are unnecessarily
weak.

> Out of line metadata is on-the-surface-attractive but... that tends to
> only help when you have a off-by-one/four write and you corrupt
> metadata state because it's the only thing you *can* do. With out of
> line metadata, you can just corrupt a real object and effect a
> different type of corruption. I'm pretty skeptical of the benefit at
> this point, although I could be convinced. We don't see metadata
> corruption attacks anymore - but I'm not sure if it's because we find
> better exploit primitives or better vulnerabilities.

Out-of-line metadata is not simply about preventing attacks on the
metadata itself. It provides much more than that. I'm not sure why you
think your ignorant opinions on these topics matter, when you so clearly
don't know what you're talking about at all.

> In particular, if you wanted to pursue hardened_malloc you would need
> to use replace_malloc and wire up the partitions correctly.
> Randomization will almost certainly not help (and will hurt
> performance)*.

The hardened_malloc design is focused on reliable, deterministic memory
corruption mitigations. Randomization is used where possible, and in a
way that has a low impact on performance. The high entropy base
randomization for each size class region within arenas has no
significant impact on performance. The randomization for large
allocation (> 128k) guard regions and that quarantine has no substantial
impact on performance. The impact from slot randomization and the slab
allocation quarantine is measurable but not high, and the slab
quarantine has no substantial impact.

> MPK sounds nice but you have to use it correctly (which
> requires application code changes), you have to ensure there are no
> MPK gadgets, and oh wait no one can use it because it's only available
> in Linux on server CPUs. =(

Using MPK is not one of the major features of hardened_malloc and the
usage doesn't rely on not having MPK gadgets. This is explicitly
documented in the README. It's the only optional security feature that's
not enabled by default. It should be pointed out that most of security
offered by hardened_malloc is not a feature that can be turned on or off
because it's the design itself that's hardened, not the fact that it has
some optional security features bolted onto it.

> * One place randomization will help is on the other side of an IPC
> boundary. e.g. in the parent process. I'm trying to get that enabled
> for mozjemalloc in H2 2019.
> 
> In conclusion, while it's possible hardened_malloc could provide some
> small security increase over mozjemalloc, the gap is much smaller than
> it was when I advocated for allocator improvements 5 years ago, the
> effort is definitely non-trivial, and the gap is closing.

No, you're just making false attacks and misleading comparisons / spin
to promote your own work, which is trash. You're being incredibly
dishonest and unethical. You didn't even bother to inform yourself about
hardened_malloc by actually reading through the documentation. Instead,
you just jump to conclusions and present yourself as an expert on topics
you are clearly incredibly ignorant about. You really don't know what
you're talking about, and your post on this mailing list is offensive.
Your post as a whole is nonsense, and your conclusion is bogus. I
recommend actually trying to read the documentation and learning a bit
about memory allocators, memory corruption and exploit mitigations
before lecturing people and attacking other projects.

Expect to hear more about this in the future, because I'll be contacting
Mozilla about the fact that you're spreading dishonest attacks and
misinformation about my work, which is a continuation of past harm
inflicted on me by Mozilla employees. You're digging up a past conflict
that was supposed to be put aside, so nice work with that. It's nice to
have another example to point to about damages inflicted by Mozilla. I
expect this to be made right just like the past attacks by Mozilla. I
want a retraction and an apology, but either way this is being included
in documentation on damages.

> If people had the cycles to invest in something like this, I would
> actually advocate for helping us test and benchmark Fuzzyfox, and see
> if we can get the browser into a usable state with Fuzzyfox so we
> could enable it in Tor Browser.

Maybe if you didn't spend your time trying to inflict harm on other
projects and individuals, people would be more interested in helping. I
highly recommend that no one contributes any time to Mozilla, which has
a history of taking advantage of contributors, lying to them and abusing
them. Mozilla is built on abusing volunteer labour, and your post here
is just a continuation of that. It's a thoroughly unethical company with
disgusting, unethical people like yourself at the forefront of it. I
have no idea why anyone would want to help scumbags like you.