Why Can't You Do Bitwise Operations on Pointer in C, and Is There a Way Around This

Why can't you do bitwise operations on pointer in C, and is there a way around this?

The reason you can't do bitwise pointer operations is because the standard says you can't. I suppose the reason why the standard says so is because bitwise pointer operations would almost universally result in undefined or (at best) implementation-defined behavior. So there would be nothing you could do that is both useful and portable, unlike simpler operations like addition.

But you can get around it with casting:

#include <stdint.h>

void *ptr1;
// Find page start
void *ptr2 = (void *) ((uintptr_t) ptr1 & ~(uintptr_t) 0xfff)

As for C++, just use reinterpret_cast instead of the C-style casts.

Why bitwise-or doesn't result in a constant expression, but addition does

When you say

sometype *p = f(x);

where p is a global variable (or one with static duration) and where f(x) is not an actual function call but rather, some sequence of compile-time operations involving the address of another symbol x which won't be known until link time, the compiler obviously can't compute the initial value immediately. It actually emits an assembly language directive which causes the assembler to construct a relocation record which causes the linker to evaluate f(x) once the final location of the symbol x is known.

So f(x) (whatever sequence of operations it actually is) has to be, in effect, a function that the linker knows how to evaluate (and that there's a relocation record for, and if necessary an assembly language directive for). And while conventional linkers are good at performing addition and subtraction (because they do it all the time), they don't necessarily know how to perform other kinds of arithmetic.

So in consequence of all this, there are some additional rules on what kinds of arithmetic you can do while constructing pointer constants.

I'm in a hurry this morning and don't have time to dig through the Standard, but I'm pretty sure there's a sentence in there somewhere stating that among other restrictions on constant expressions, when you're initializing a pointer, you're limited to an address plus or minus an integer constant expression (since that's all the C Standard is willing to assume the linker is going to know how to do).

Your question has the additional complication that you're not actually initializing a pointer variable, but rather, an integer. In that case you get, in effect, the worst of both worlds: you're either not allowed to do it at all, or if the compiler lets you, the initializer on the right (since it involves an address/pointer), is limited to the kinds of arithmetic you can do while constructing pointer constants, as described above. You don't get to do the arbitrary arithmetic you'd be able to get away with (perhaps with confounding casts) in an integer expression at run time.

Why do we use bitwise operators & and ~ instead of mathematical operators % when implementing aligned malloc?

I agree with you that the classic code has problems, but not exactly those mentioned:

  • alignment must indeed be a power of 2, which is a constraint for the POSIX standard function aligned_alloc. In fact alignment must be a power of 2 greater than or equal to sizeof(size_t) and the size argument should be a multiple of alignment under this standard.

  • alignment is defined with type size_t, but this bears no connection to the data type of pointer p1. As a matter of fact, size_t and void * might have a different size as was the case in 16-bit MSDOS/Windows middle and large model architectures.

    Hence the code p2 = (void **)(((size_t)(p1) + offset) & ~(alignment - 1)); is not strictly conforming. To fix this problem, one would use uintptr_t defined in <stdint.h> which is specified as having the same size as void *:

      p2 = (void **)(void *)(((uintptr_t)(p1) + offset) & ~(alignment - 1));
  • there is another problem in the posted code: if alignment is smaller than sizeof(void *), p2 might be misaligned for writing void * value p1. Extra code is needed to make sure alignment is at least as large as sizeof(void *). In real systems, this is not a problem because malloc() must return pointers that are properly aligned for all basic types, including void *.

The reason bitwise & and ~ operators are preferred is one of efficiency: for x an unsigned integer and alignment a power of 2 x & (alignment - 1) is equivalent to x % alignment, but it is much faster for most CPUs to compute with a bitwise mask than with a division and the compiler cannot make the assumption that alignment is a power of 2 so it would compile you code using the much slower integer division instruction.

Furthermore, your computation is incorrect: if p1 is misaligned, offset (computed as (size_t)(p1) % alignment) can be as large as alignment - 1, so p2 can be as close to p1 as 1 byte, so p2[-1] = p1; would write before the beginning of the allocated space.

Here is a modified version:

#include <stdint.h>
#include <stdlib.h>

void *aligned_malloc(size_t size, size_t alignment) {
// alignment must be a power of 2
//assert(alignment != 0 && (alignment & (alignment - 1)) == 0);
void *p1; // allocated block
void **p2; // aligned block
size_t slack; // amount of extra memory to allocate to ensure proper alignment
// and space to save the original pointer returned by malloc.
//compute max(alignment - 1, sizeof(void*) - 1) without testing:
size_t alignment_mask = (alignment - 1) | (sizeof(void *) - 1);
slack = alignment_mask + sizeof(void *);
if ((p1 = malloc(size + slack)) == NULL)
return NULL;
p2 = (void **)(void *)(((uintptr_t)p1 + slack) & ~alignment_mask);
p2[-1] = p1;
return p2;
}

Why does the returned value of bit operations changes every time?

From your code, I am assuming you want to shift 0x24 to the higher(actually lower) byte. Try p=p+1 and see if you get the desired results.

Instead of reading the pointer as short, if you do (int)*p << 8 , you get 2400 every time.

Or you can do something crazy like initialising some variable after initializing pointer so that when we shift the pointer, it won't get the garbage but part of the variable

#include<stdio.h>

int main(void){
short sample = 0x2456;

char *p = (char*) &sample;
int zero = 0;
p = p+1;
printf("%x\n",*((short*)p) << 8 );
return 0;
}

You can even print 0x1224(half of both) like this

   #include<stdio.h>

int main(void){
short sample = 0x2456;

char *p = (char*) &sample;
int zero = 0x12;
p = p+1;
printf("%x\n",*((short*)p); //will print 1224
return 0;
}

sidenote: I assumed both variables to be in a single stack frame and assumed memory is little endian. Results may subject to change with compilers and target systems.

Bit Shifts on a C Pointer?

If your compiler supports it, C99's <stdint.h> header provides the intptr_t and uintptr_t types that should be large enough to hold a pointer on your system, but are integers, so you can do bit manipulation. It can't really get much more portable than that, if that's what you're looking for.

Why can't we use * on non-pointers?

C is a strongly typed language, which means that the operations which are allowed on an object (and the interpretation of those operations) is a function of the object's type. That's literally what it means for an object to have a type: the type determines the operations you can do with the object.

Unary * (the pointer indirection operator) is defined for pointer types, and it's not defined for integer types.

If you want to treat an integer's value as if it were a pointer, you can use an explicit cast, as in the *((int *)y) = 3; example you mentioned in your question.

There are two reasons the unary * operator is not defined for integers:

  1. Taking an integer and pretending it's a pointer is generally a bad idea, not something to be encouraged. If you really want to do it, the extra cost imposed on you -- namely that you have to use that pointer cast -- is appropriate.

  2. The bare expression *y doesn't contain enough information to know how big the pointed-to object might be. If you write *y = 3 and it were legal, how would the compiler know to assign an int, a short, or a char?

Point 2 is key. It's important to remember that C does not have one "pointer" type. Every pointer type incorporates a specification of the type of object which the pointer will point to. That's no accident, it's fundamental, and there's no way around it.

So you can't implicitly treat an integer as if it were a pointer, and even if you do it explicitly -- that is, with a cast, as in *((int *)y) = 3, you may still be on shaky ground, especially if integers and pointers don't have the same size on your machine.

These days, this is all generally such a bad idea that the compilers are slowly dropping their old "the programmer must know what he's doing" attitude, and getting somewhat hissy with warnings. For example, int y = p will generally get you a warning about a pointer-to-int assignment, and even with the explicit cast, *((int *)y) = 3 might get you a warning about "cast to pointer from integer of different size".

Why do we use (bytes) instead of (bits) in pointers arithmetic and array's addresses?

It doesn't have to be this way. But it's the way all "byte addressable" machines work, and those are by far the most popular type today.

The basic idea — and it is a basic idea, there's nothing secret or fancy or obscure about it — is just that you represent the computer's memory as a giant array of bytes. Each byte has an address, from 0 up to however many bytes (or kilobytes, or gigabytes) of memory you have in your computer. Each byte is individually addressable, so the addresses increase by 1 from one byte to the next.

Although, many objects you might want to store in memory are bigger than one byte. For example, a 32-bit integer is going to be four bytes.

This picture may help, taken straight from your program and the result you showed:



















































addresscontents
0x7fff2c70e6701
0x7fff2c70e6710
0x7fff2c70e6720
0x7fff2c70e6730
0x7fff2c70e6743
0x7fff2c70e6750
0x7fff2c70e6760
0x7fff2c70e6770
0x7fff2c70e6785
......

c - cannot take address of bit-field

Bitfields members are (typically) smaller than the granularity allowed by pointers, which is the granularity of chars (by definition of char, which by the way is mandated to be 8 bit long at least). So, a regular pointer doesn't cut it.

Also, it wouldn't be clear what would be the type of a pointer to a bitfield member, since to store/retrieve such a member the compiler must know exactly where it is located in the bitfield (and no "regular" pointer type can carry such information).

Finally, it's hardly a requested feature (bitfields aren't seen often in first place); bitfields are used to store information compactly or to build a packed representation of flags (e.g. to write to hardware ports), it's rare that you need a pointer to a single field of them - and if it's needed, you can always resort to a regular struct and convert to bitfield at the last moment.

For all these reasons, the standard says that bitfields members aren't addressable, period. It could be possible to overcome these obstacles (e.g. by defining special pointer types that store all the information needed to access a bitfield member), but it would be yet another overcomplicated dark corner of the language that nobody uses.

How to set and clear different bits with a single line of code (C)

It's not possible in a single instruction. This is because there are 3 possible operations you need to do on the different bits:

  • Set them (bit 3)
  • Clear them (bit 4)
  • Leave them alone (all the other bits)

How can you select from one of three possibilities with a bitmask made up of binary digits?

Of course, you can do it with one line e.g:

data = (data | (1 << 3)) & ~(1 << 4)


Related Topics



Leave a reply



Submit