Is There a List of Errors Will Show Up as 'segfaults' When They Are Not Really Related to Memory Access

Is there a list of errors will show up as `segfaults` when they are not really related to memory access?

Read through the instruction set reference and see where #GP is listed for a non-memory issue. Incomplete list: CLI, CLTS, HLT, IN, INT (with an invalid vector), INVD, INVLPG, IRET (under circumstances), LDMXCSR(setting reserved bits), LGDT, LIDT, LLDT, LMSW, LTR, MONITOR (with ECX != 0), MOV (to CRx or DRx), MWAIT (with invalid ECX), OUT, RDMSR, RDPMC, SWAPGS, SYSEXIT, SYSRET, WBINVD, WRMSR, XGETBV (invalid ECX), XRSTOR, XSETBV

How do programs know if a memory access is allowed?

The short answer is that the addresses used by your program (and used by the process running your program) are not the "real" memory addresses. Rather, there is a layer of abstraction between you and the physical memory addresses that is provided by virtual memory and paging.

Also, your program is split up into "segments" with different purposes, which generally live on different pages. For example, local variables allocated for a single function call live on the stack, whereas memory obtained through 'malloc' resides on the heap.

Segmentation fault on Memory Access

When you create your memory map, you give it a size equal to your pagesize- 4kb.

fpga_ptr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, page_addr); //mmap the device into memory

This translates to 0x1000 bytes, however you attempt to access data 0x100000 bytes after the beginning of your map, which raises the segmentation fault. To fix this issue, map a larger region of memory. For example:

fpga_ptr = mmap(NULL, 0x1000 * page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, page_addr);

This will allow you to access the next 0x1000000 bytes after the beginning of your memory map.

Can't identify memory access error in code, keeps giving segmentation faults

There are several issues with your program. Let's begin with the global variable top. This is causing problems because on the one hand you have a stack struct responsible for maintaining a stack, and that has its own top. But then you have this global which you're not even using anywhere. It's almost like you added it to get around compiler errors that you didn't understand ;)

So let's ditch that, and fix your stack functions. I'm rearranging the parameters of the push function so that the stack is the first argument. This is a bit more conventional.

typedef struct stack {
char s[MAX];
int top;
} stack;

int isFull(stack *stk)
{
return stk->top == FULL;
}

int isEmpty(stack *stk)
{
return stk->top == EMPTY;
}

void reset(stack *stk)
{
stk->top = EMPTY;
}

void push(stack *stk, char c)
{
if (isFull(stk))
return;
stk->s[++stk->top] = c;
}

char pop(stack *stk)
{
if (isEmpty(stk))
return '\0';
return stk->s[stk->top--];
}

For the pop function, I arbitrarily return a NUL character if the stack is empty, because something must be returned. But really, you should never call this function if the stack is empty.

Let's look at your display functions now. The first thing I notice is that these are really convoluted. There is no need for that complexity. Look here:

void print(stack *stk)
{
for(int i = 0; i <= stk->top; i++)
{
printf("%c\n", stk->s[i]);
}
printf("\n");
}

void reverse(stack *stk)
{
for(int i = stk->top; i >= 0; i--)
{
printf("%c", (*stk).s[i]);
}
printf("\n");
}

char peek(const stack *stk)
{
if (isEmpty(stk))
{
printf("Stack empty!\n");
return '\0';
}
return stk->s[stk->top];
}

And so all that remains is a little tidy-up of your main function, and adjust the parameter order for push.

int main()
{
const char *str = "i am otto am i";
printf("original is: %s\n", str);

stack stack_of_char;
reset(&stack_of_char);
for (int i = 0; str[i]; i++)
{
push(&stack_of_char, str[i]);
}

print(&stack_of_char);
reverse(&stack_of_char);
}

Note also that you shouldn't really be walking over your stack with those functions. The typical way you would use a stack to reverse something is to push values onto it and then pop them off. So, you can print the string in reverse like this:

    // Pop characters from stack to print in reverse
while (!isEmpty(&stack_of_char))
{
char c = pop(&stack_of_char);
putc(c, stdout);
}
putc('\n', stdout);

Why does a segmentation fault not occur?

Since the pointer arr is not initialized, it probably has the value of whatever value that memory address had when it was previously used.

In your case, the code that used that memory address previously probably used that memory address for storing a pointer, i.e. for storing another memory address that points to a valid object. Even if the lifetime of that object has expired in the mean time, the operating system will probably not be able to detect this, because the memory page was probably not returned to the operating system. Therefore, as far as the operating system is concerned, that memory page is still readable (and possibly also writable) by the program. This probably explains why dereferencing the uninitialized value of arr does not produce a segmentation fault.

The expression arr[1000] will attempt to dereference an address that is 4000 bytes apart from the uninitialized value of arr (assuming sizeof(float)==4). A typical size of a memory page is 4096 bytes. Therefore, assuming that the uninitialized value of arr is a memory address that points near the start of a memory page of 4096 bytes, then adding 4000 to that address will not change the memory address sufficiently to make the address point to a different memory page. However, if the uninitialized value of arr is a memory address that points somewhere in the middle of a memory page, then adding 4000 to that address will make it point to a different memory page (assuming a memory page size of 4096 bytes). This probably explains why your operating system treats both addresses differently, so that one memory access causes a segmentation fault and the other memory access does not fail.

However, this is all speculation (which is made clear by my frequent use of the word "probably"). There could be another reason why your code does not cause a segmentation fault. In any case, when your program invokes undefined behavior (which it does by dereferencing an uninitialized pointer), you cannot rely on any specific behavior. On some platforms, it may cause a segmentation fault, while on other platforms, the program may work perfectly. Even changing the compiler settings (such as the optimization level) may be enough to change the behavior of the program.



I am wondering if this due to the "smart" design of the compiler that alleviates the crash.

The "smart" thing to do in such a case would be to report some kind of error (i.e. to crash), and not to attempt to alleviate the crash. This is because crashing makes the bug easier to find.

The reason why your program is not crashing immediately is that neither your compiler nor your operating system are detecting the error.

If you want such errors to be detected more reliably, then you may want to consider using a feature offered by some compilers that tries to detect such bugs. For example, both gcc and clang support AddressSanitizer. On those two compilers, all you have to do is compile with the -fsanitize=address command-line option. However, this will cause the compiler to add additional checks, which will significantly decrease performance (by a factor of about two) and increase memory usage. Therefore, this should only be done for debugging purposes.

Why is a segmentation fault not recoverable?

When exactly does segmentation fault happen (=when is SIGSEGV sent)?

When you attempt to access memory you don’t have access to, such as accessing an array out of bounds or dereferencing an invalid pointer. The signal SIGSEGV is standardized but different OS might implement it differently. "Segmentation fault" is mainly a term used in *nix systems, Windows calls it "access violation".

Why is the process in undefined behavior state after that point?

Because one or several of the variables in the program didn’t behave as expected. Let’s say you have some array that is supposed to store a number of values, but you didn’t allocate enough room for all them. So only those you allocated room for get written correctly, and the rest written out of bounds of the array can hold any values. How exactly is the OS to know how critical those out of bounds values are for your application to function? It knows nothing of their purpose.

Furthermore, writing outside allowed memory can often corrupt other unrelated variables, which is obviously dangerous and can cause any random behavior. Such bugs are often hard to track down. Stack overflows for example are such segmentation faults prone to overwrite adjacent variables, unless the error was caught by protection mechanisms.

If we look at the behavior of "bare metal" microcontroller systems without any OS and no virtual memory features, just raw physical memory - they will just silently do exactly as told - for example, overwriting unrelated variables and keep on going. Which in turn could cause disastrous behavior in case the application is mission-critical.

Why is it not recoverable?

Because the OS doesn’t know what your program is supposed to be doing.

Though in the "bare metal" scenario above, the system might be smart enough to place itself in a safe mode and keep going. Critical applications such as automotive and med-tech aren’t allowed to just stop or reset, as that in itself might be dangerous. They will rather try to "limp home" with limited functionality.

Why does this solution avoid that unrecoverable state? Does it even?

That solution is just ignoring the error and keeps on going. It doesn’t fix the problem that caused it. It’s a very dirty patch and setjmp/longjmp in general are very dangerous functions that should be avoided for any purpose.

We have to realize that a segmentation fault is a symptom of a bug, not the cause.

Definitive List of Common Reasons for Segmentation Faults

WARNING!


The following are potential reasons for a segmentation fault. It is virtually impossible to list all reasons. The purpose of this list is to help diagnose an existing segfault.

The relationship between segmentation faults and undefined behavior cannot be stressed enough! All of the below situations that can create a segmentation fault are technically undefined behavior. That means that they can do anything, not just segfault -- as someone once said on USENET, "it is legal for the compiler to make demons fly out of your nose.". Don't count on a segfault happening whenever you have undefined behavior. You should learn which undefined behaviors exist in C and/or C++, and avoid writing code that has them!

More information on Undefined Behavior:

  • What is the simplest standard conform way to produce a Segfault in C?
  • Undefined, unspecified and implementation-defined behavior
  • How undefined is undefined behavior?

What Is a Segfault?

In short, a segmentation fault is caused when the code attempts to access memory that it doesn't have permission to access. Every program is given a piece of memory (RAM) to work with, and for security reasons, it is only allowed to access memory in that chunk.

For a more thorough technical explanation about what a segmentation fault is, see What is a segmentation fault?.

Here are the most common reasons for a segmentation fault error. Again, these should be used in diagnosing an existing segfault. To learn how to avoid them, learn your language's undefined behaviors.

This list is also no replacement for doing your own debugging work. (See that section at the bottom of the answer.) These are things you can look for, but your debugging tools are the only reliable way to zero in on the problem.


Accessing a NULL or uninitialized pointer

If you have a pointer that is NULL (ptr=0) or that is completely uninitialized (it isn't set to anything at all yet), attempting to access or modify using that pointer has undefined behavior.

int* ptr = 0;
*ptr += 5;

Since a failed allocation (such as with malloc or new) will return a null pointer, you should always check that your pointer is not NULL before working with it.

Note also that even reading values (without dereferencing) of uninitialized pointers (and variables in general) is undefined behavior.

Sometimes this access of an undefined pointer can be quite subtle, such as in trying to interpret such a pointer as a string in a C print statement.

char* ptr;
sprintf(id, "%s", ptr);

See also:

  • How to detect if variable uninitialized/catch segfault in C
  • Concatenation of string and int results in seg fault C

Accessing a dangling pointer

If you use malloc or new to allocate memory, and then later free or delete that memory through pointer, that pointer is now considered a dangling pointer. Dereferencing it (as well as simply reading its value - granted you didn't assign some new value to it such as NULL) is undefined behavior, and can result in segmentation fault.

Something* ptr = new Something(123, 456);
delete ptr;
std::cout << ptr->foo << std::endl;

See also:

  • What is a dangling pointer?
  • Why my dangling pointer doesn't cause a segmentation fault?

Stack overflow

[No, not the site you're on now, what is was named for.] Oversimplified, the "stack" is like that spike you stick your order paper on in some diners. This problem can occur when you put too many orders on that spike, so to speak. In the computer, any variable that is not dynamically allocated and any command that has yet to be processed by the CPU, goes on the stack.

One cause of this might be deep or infinite recursion, such as when a function calls itself with no way to stop. Because that stack has overflowed, the order papers start "falling off" and taking up other space not meant for them. Thus, we can get a segmentation fault. Another cause might be the attempt to initialize a very large array: it's only a single order, but one that is already large enough by itself.

int stupidFunction(int n)
{
return stupidFunction(n);
}

Another cause of a stack overflow would be having too many (non-dynamically allocated) variables at once.

int stupidArray[600851475143];

One case of a stack overflow in the wild came from a simple omission of a return statement in a conditional intended to prevent infinite recursion in a function. The moral of that story, always ensure your error checks work!

See also:

  • Segmentation Fault While Creating Large Arrays in C
  • Seg Fault when initializing array

Wild pointers

Creating a pointer to some random location in memory is like playing Russian roulette with your code - you could easily miss and create a pointer to a location you don't have access rights to.

int n = 123;
int* ptr = (&n + 0xDEADBEEF); //This is just stupid, people.

As a general rule, don't create pointers to literal memory locations. Even if they work one time, the next time they might not. You can't predict where your program's memory will be at any given execution.

See also:

  • What is the meaning of "wild pointer" in C?

Attempting to read past the end of an array

An array is a contiguous region of memory, where each successive element is located at the next address in memory. However, most arrays don't have an innate sense of how large they are, or what the last element is. Thus, it is easy to blow past the end of the array and never know it, especially if you're using pointer arithmetic.

If you read past the end of the array, you may wind up going into memory that is uninitialized or belongs to something else. This is technically undefined behavior. A segfault is just one of those many potential undefined behaviors. [Frankly, if you get a segfault here, you're lucky. Others are harder to diagnose.]

// like most UB, this code is a total crapshoot.
int arr[3] {5, 151, 478};
int i = 0;
while(arr[i] != 16)
{
std::cout << arr[i] << std::endl;
i++;
}

Or the frequently seen one using for with <= instead of < (reads 1 byte too much):

char arr[10];
for (int i = 0; i<=10; i++)
{
std::cout << arr[i] << std::endl;
}

Or even an unlucky typo which compiles fine (seen here) and allocates only 1 element initialized with dim instead of dim elements.

int* my_array = new int(dim);

Additionally it should be noted that you are not even allowed to create (not to mention dereferencing) a pointer which points outside the array (you can create such pointer only if it points to an element within the array, or one past the end). Otherwise, you are triggering undefined behaviour.

See also:

  • I have segfaults!

Forgetting a NUL terminator on a C string.

C strings are, themselves, arrays with some additional behaviors. They must be null terminated, meaning they have an \0 at the end, to be reliably used as strings. This is done automatically in some cases, and not in others.

If this is forgotten, some functions that handle C strings never know when to stop, and you can get the same problems as with reading past the end of an array.

char str[3] = {'f', 'o', 'o'};
int i = 0;
while(str[i] != '\0')
{
std::cout << str[i] << std::endl;
i++;
}

With C-strings, it really is hit-and-miss whether \0 will make any difference. You should assume it will to avoid undefined behavior: so better write char str[4] = {'f', 'o', 'o', '\0'};


Attempting to modify a string literal

If you assign a string literal to a char*, it cannot be modified. For example...

char* foo = "Hello, world!"
foo[7] = 'W';

...triggers undefined behavior, and a segmentation fault is one possible outcome.

See also:

  • Why is this string reversal C code causing a segmentation fault?

Mismatching Allocation and Deallocation methods

You must use malloc and free together, new and delete together, and new[] and delete[] together. If you mix 'em up, you can get segfaults and other weird behavior.

See also:

  • Behaviour of malloc with delete in C++
  • Segmentation fault (core dumped) when I delete pointer

Errors in the toolchain.

A bug in the machine code backend of a compiler is quite capable of turning valid code into an executable that segfaults. A bug in the linker can definitely do this too.

Particularly scary in that this is not UB invoked by your own code.

That said, you should always assume the problem is you until proven otherwise.


Other Causes

The possible causes of Segmentation Faults are about as numerous as the number of undefined behaviors, and there are far too many for even the standard documentation to list.

A few less common causes to check:

  • UD2 generated on some platforms due to other UB
  • c++ STL map::operator[] done on an entry being deleted

DEBUGGING

Firstly, read through the code carefully. Most errors are caused simply by typos or mistakes. Make sure to check all the potential causes of the segmentation fault. If this fails, you may need to use dedicated debugging tools to find out the underlying issues.

Debugging tools are instrumental in diagnosing the causes of a segfault. Compile your program with the debugging flag (-g), and then run it with your debugger to find where the segfault is likely occurring.

Recent compilers support building with -fsanitize=address, which typically results in program that run about 2x slower but can detect address errors more accurately. However, other errors (such as reading from uninitialized memory or leaking non-memory resources such as file descriptors) are not supported by this method, and it is impossible to use many debugging tools and ASan at the same time.

Some Memory Debuggers

  • GDB | Mac, Linux
  • valgrind (memcheck)| Linux
  • Dr. Memory | Windows

Additionally it is recommended to use static analysis tools to detect undefined behaviour - but again, they are a tool merely to help you find undefined behaviour, and they don't guarantee to find all occurrences of undefined behaviour.

If you are really unlucky however, using a debugger (or, more rarely, just recompiling with debug information) may influence the program's code and memory sufficiently that the segfault no longer occurs, a phenomenon known as a heisenbug.

In such cases, what you may want to do is to obtain a core dump, and get a backtrace using your debugger.

  • How to generate a core dump in Linux on a segmentation fault?
  • How do I analyse a program's core dump file with GDB when it has command-line parameters?


Related Topics



Leave a reply



Submit