You Can't Always Hash Pointers in C

May 30, 2016

Occasionally I’ve needed to key a hash table with C pointers. I don’tcare about the contents of the object itself — especially if it mightchange — just its pointer identity. For example, suppose I’m usingnull-terminated strings as keys and I know these strings will alwaysbe interned in a common table. These strings can be compared directlyby their pointer values (str_a == str_b) rather than, more slowly,by their contents (strcmp(str_a, str_b) == 0). The intern tableensures that these expressions both have the same result.

As a key in a hash table, or other efficient map/dictionary datastructure, I’ll need to turn pointers into numerical values. However,C pointers aren’t integers. Following certain rules it’s permittedto cast pointers to integers and back, but doing so will reduce theprogram’s portability. The most important consideration is that theinteger form isn’t guaranteed to have any meaningful or stablevalue. In other words, even in a conforming implementation, the samepointer might cast to two different integer values. This would breakany algorithm that isn’t keenly aware of the implementation details.

To show why this is, I’m going to be citing the relevant parts of theC99 standard (ISO/IEC 9899:1999). The draft for C99 is freelyavailable (and what I use myself since I’m a cheapass). My purpose isnot to discourage you from casting pointers to integers and usingthe result. The vast majority of the time this works fine and as youwould expect. I just think it’s an interesting part of the language,and C/C++ programmers should be aware of potential the trade-offs.

Integer to pointer casts

What does the standard have to say about casting pointers to integers?§6.3.2.3¶5:

An integer may be converted to any pointer type. Except aspreviously specified, the result is implementation-defined, mightnot be correctly aligned, might not point to an entity of thereferenced type, and might be a trap representation.

It also includes a footnote:

The mapping functions for converting a pointer to an integer or aninteger to a pointer are intended to be consistent with theaddressing structure of the execution environment.

Casting an integer to a pointer depends entirely on theimplementation. This is intended for things like memory mappedhardware. The programmer may need to access memory as a specificphysical address, which would be encoded in the source as an integerconstant and cast to a pointer of the appropriate type.

intread_sensor_voltage(void){    return *(int *)0x1ffc;}

It may also be used by a loader and dynamic linker to compute thevirtual address of various functions and variables, then cast to apointer before use.

Both cases are already dependent on implementation defined behavior,so there’s nothing lost in relying on these casts.

An integer constant expression of 0 is a special case. It casts to aNULL pointer in all implementations (§6.3.2.3¶3). However, a NULLpointer doesn’t necessarily point to address zero, nor is itnecessarily a zero bit pattern (i.e. beware memset and calloc onmemory with pointers). It’s just guaranteed never to compare equallywith a valid object, and it is undefined behavior to dereference.

Pointer to integer casts

What about the other way around? §6.3.2.3¶6:

Any pointer type may be converted to an integer type. Except aspreviously specified, the result is implementation-defined. If theresult cannot be represented in the integer type, the behavior isundefined. The result need not be in the range of values of anyinteger type.

Like before, it’s implementation defined. However, the negatives are alittle stronger: the cast itself may be undefined behavior. Ispeculate this is tied to integer overflow. The last part makespointer to integer casts optional for an implementation. This is oneway that the hash table above would be less portable.

When the cast is always possible, an implementation can provide aninteger type wide enough to hold any pointer value. §7.18.1.4¶1:

The following type designates a signed integer type with theproperty that any valid pointer to void can be converted to thistype, then converted back to pointer to void, and the result willcompare equal to the original pointer:

intptr_t

The following type designates an unsigned integer type with theproperty that any valid pointer to void can be converted to thistype, then converted back to pointer to void, and the result willcompare equal to the original pointer:

uintptr_t

These types are optional.

The take-away is that the integer has no meaningful value. The onlyguarantee is that the integer can be cast back into a void pointerthat will compare equally. It would be perfectly legal for animplementation to pass these assertions (and still sometimes fail).

voidexample(void *ptr_a, void *ptr_b){    if (ptr_a == ptr_b) {        uintptr_t int_a = (uintptr_t)ptr_a;        uintptr_t int_b = (uintptr_t)ptr_b;        assert(int_a != int_b);        assert((void *)int_a == (void *)int_b);    }}

Since the bits don’t have any particular meaning, arithmeticoperations involving them will also have no meaning. When a pointermight map to two different integers, the hash values might not matchup, breaking hash tables that rely on them. Even with uintptr_tprovided, casting pointers to integers isn’t useful without alsorelying on implementation defined properties of the result.

Reasons for this pointer insanity

What purpose could such strange pointer-to-integer casts serve?

A security-conscious implementation may choose to annotate pointerswith additional information by setting unused bits. It might be forbaggy bounds checks or, someday, in an undefined behaviorsanitizer. Before dereferencing annotated pointers, themetadata bits would be checked for validity, and cleared/set beforeuse as an address. Or it may map the same object at multiple virtualaddresses) to avoid setting/clearing the metadata bits,providing interoperability with code unaware of the annotations. Whenpointers are compared, these bits would be ignored.

When these annotated pointers are cast to integers, the metadata bitswill be present, but a program using the integer wouldn’t know theirmeaning without tying itself closely to that implementation.Completely unused bits may even be filled with random garbage whencast. It’s allowed.

You may have been thinking before about using a union or char * tobypass the cast and access the raw pointer bytes, but you’d run intothe same problems on the same implementations.

Conforming programs

The standard makes a distinction between strictly conformingprograms (§4¶5) and conforming programs (§4¶7). A strictlyconforming program must not produce output depending on implementationdefined behavior nor exceed minimum implementation limits. Very fewprograms fit in this category, including any program using uintptr_tsince it’s optional. Here are more examples of code that isn’tstrictly conforming:

    printf("%zu", sizeof(int)); // §6.5.3.4    printf("%d", -1 >> 1);      // §6.5¶4    printf("%d", MAX_INT);      // §5.2.4.2.1

On the other hand, a conforming program is allowed to depend onimplementation defined behavior. Relying on meaningful, stable valuesfor pointers cast to uintptr_t/intptr_t is conforming even if yourprogram may exhibit bugs on some implementations.

Load Comments

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。