Segmentation Fault: int 80h
whenever I use
int 21h
orint 80h
The int
instruction is a special variant of a call
instruction which is calling some function in the operating system.
This means of course that the int
instruction behaves differently in different operating systems:
int 21h
Interrupt 21h was used in MS-DOS and 16-bit Windows (Windows 3.x). Therefore this instruction could be used in MS-DOS and 16-bit Windows programs only.
The interrupt is not supported in 32-bit (or 64-bit) Windows programs. Linux does also not support this interrupt.
int 80h
This interrupt is supported in 32-bit Linux programs. 64-bit Linux versions can run 32-bit Linux programs (but you'll have to ensure that the program you are creating really is a 32-bit program and not a 64-bit program).
other interrupts (such as int 10h
)
... are neither supported by Linux nor by recent Windows versions. (They were supported in 16-bit Windows.)
int 80h
... it returns a segmentation fault.
Under Linux you may run the strace
command to see what is happening with the int 80h
system call.
I did this with your program and got the following output:
$ strace ./x.x
execve("./x.x", ["./x.x"], [/* 54 vars */]) = 0
strace: [ Process PID=3789 runs in 32 bit mode. ]
write(1, "", 0) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=NULL} ---
You can see that int 80h
does not generate a fault but it is executed correctly.
However the edx
register has the value 0. Therefore int 80h
will output the first 0 bytes (= nothing) of your "Hello World".
You'll have to add the instruction mov edx, 13
before the int 80h
instruction.
The segmentation fault happens later!
As a beginner of assembly language you should first realize what assembler is: Each assembler instruction represents some bytes in RAM memory.
The instruction mov eax, 4
for example represents the bytes 184, 4, 0, 0, 0
or the instruction int 80h
represents the bytes 205, 128
.
Your assembler program ends after the instruction int 80h
. However the RAM memory does of course not end after the bytes 205, 128
. The RAM memory will contain random data after the bytes 205, 128
.
Maybe the bytes in RAM found after that bytes are 160, 0, 0, 0, 0
which equals mov al, [0]
. This would cause a segmentation fault.
You'll have to add some instructions after the int 80h
instructions that will stop your program. Otherwise the CPU will interpret the bytes in RAM following the int 80h
instruction as instructions and execute them...
Segmentation fault when calling x86 Assembly function from C program
Your function has no epilogue. You need to restore
%ebp
and pop the stack back to where it was, and thenret
. If that's really missing from your code, then that explains your segfault: the CPU will go on executing whatever garbage happens to be after the end of your code in memory.You clobber (i.e. overwrite) the
%ebx
register which is supposed to be callee-saved. (You mention following the x86 calling conventions, but you seem to have missed that detail.) That would be the cause of your next segfault, after you fixed the first one. If you use%ebx
, you need to save and restore it, e.g. withpush %ebx
after your prologue andpop %ebx
before your epilogue. But in this case it is better to rewrite your code so as not to use it at all; see below.movzbl
loads an 8-bit value from memory and zero-extends it into a 32-bit register. Here the parameters areint
so they are already 32 bits, so plainmovl
is correct. As it stands your function would give incorrect results for any arguments which are negative or larger than 255.You're using an unnecessary number of registers. You could move the first operand for the addition directly into
%eax
rather than putting it into%ebx
and adding it to zero. And on x86 it is not necessary to get both operands into registers before adding; arithmetic instructions have amem, reg
form where one operand can be loaded directly from memory. With this approach we don't need any registers other than%eax
itself, and in particular we don't have to worry about%ebx
anymore.
I would write:
.text
# Here, we define a function addition
.global addition
addition:
# Prologue:
push %ebp
movl %esp, %ebp
# load first argument
movl 8(%ebp), %eax
# add second argument
addl 12(%ebp), %eax
# epilogue
movl %ebp, %esp # redundant since we haven't touched esp, but will be needed in more complex functions
pop %ebp
ret
In fact, you don't need a stack frame for this function at all, though I understand if you want to include it for educational value. But if you omit it, the function can be reduced to
.text
.global addition
addition:
movl 4(%esp), %eax
addl 8(%esp), %eax
ret
I am getting segmentation fault - assembly
Two bugs:
; moving length
mov ecx, eax
; moving syscall num and out desc
mov eax, 4
mov ebx, [stdout]
; syscall
int 0x80
Referring to Linux system call conventions, the write
system call needs the buffer pointer in ecx
and the length in edx
. You have the length in ecx
and the buffer pointer is nowhere at all. Make it:
mov edx, eax
mov ecx, dword [ebp+8]
mov eax, 4
mov ebx, [stdout]
int 0x80
Next, look at:
cmp [ebx+eax], byte 0x00
inc eax
jne loop1
The inc
instruction sets the zero flag according to its output. So your jne
doesn't branch on the result of the cmp
, but rather on whether eax
was incremented to zero (i.e. wrapped around). So your loop will iterate far too many times.
The jne
needs to be immediately after the cmp
, with no other flag-modifying instructions in between. There are several ways you could rewrite. One would be:
mov eax, -1
loop1:
inc eax
cmp byte [ebx+eax], 0x00
jne loop1
Note this eliminates the need for the extra dec eax
at the end.
After fixing these, the program works for me.
What causes this program to segmentation fault?
For a start, items should be popped in the reverse order to which they were pushed, if you want them back in their original registers. What you have here (with irrelevant lines removed):
push ebp
push ebx
pop ebp
pop ebx
will not restore ebp
to its previous value, and this is very likely to cause problems moving up through the stack frames.
Additionally, you may be better off following the normal practice of restoring esp
from ebp
rather than blindly adding eight. That would be something like this at the end:
; add esp, 8 ; not the normal way.
mov esp, ebp
pop ebp
Doing it this way removes the need for you to manually calculate how many bytes you need to take off the stack, a calculation that you actually got wrong since you didn't take into account everything you pushed.
And, finally, before you do that, you have to make sure the esp
is in the right place so that the pop ebp
will work. That means popping everything you pushed (other than ebp
itself). Since you pushed (after ebp
) ebx
, 0x00000000
, 0c64636261
, ebx
, and format_str
, you should make sure all of those are off the stack before attempting to pop ebp
.
Taking all that into account gives you something like:
main:
push ebp
mov ebp, esp
push ebx ; (+1)
push 0x00 ; (+2)
push 0x64636261 ; (+3)
mov ebx, ebp
sub ebx, esp
push ebx ; (+4)
push format_str ; (+5)
call printf
add esp, 16 ; (-5, -4, -3, -2)
pop ebx ; (-1)
mov esp, ebp
pop ebp
ret
Each of those (+N)
comments represent a 32-bit value that has been pushed on the stack, and the (-N)
comments indicate which instructions reverse those pushes. The add esp, 16
reverses four of them (at four bytes apiece), and is done that way since we don't care what happens to those items. That leaves the final pop
to recover the original ebx
(which we do care about).
That final reload of esp
is, I think, unnecessary in this case since it's been restored to the correct value by previous steps. Whether you leave it in for prolog/epilog consistency is up to you.
Segmentation fault in my Assembly implementation
to explain Comments More, start with x86 calling convention and your code.
x86 Calling Convention
In x86, arguments are located in stack. So basically your function call is x86 way. for example, If you build your code for x86,
[SECTION .data]
msg: db "Hello C",0
[SECTION .bss]
[SECTION .text]
extern puts
global main
main:
push ebp
mov ebp, esp
and esp, 0xfffffff0
sub esp, 0x10
mov DWORD PTR [esp], msg
call puts
mov esp, ebp
pop ebp
ret
It may works fine.
x86-64 Calling Convention
Main difference is two things.
- using 8 bytes to represent address, of course
- use 6 registeres (rdi, rsi, rdx, rcx, r8, r9) for represent first 6 arguments (rest is located in stack)
so first, you should change push dword msg
to mov rdi, msg
, and don't clean stack after call (because you didn't push anything to stack)
after change:
[SECTION .data]
msg: db "Hello C",0
[SECTION .bss]
[SECTION .text]
extern puts
global main
main:
push rbp
mov rbp, rsp
and rsp, 0xfffffffffffffff0
mov rdi, msg
call puts
mov rsp, rbp
pop rbp
ret
EDIT: from System V ABI, for call instruction stack be should 16-byte aligned. so push rbp
has effect to alignment, but it is not correct purpose to use. to change that, make stack save logic for both x86 and x86-64.
Related Topics
How to Compile a 32-Bit Binary on a 64-Bit Linux Machine With Gcc/Cmake
How to Upgrade Glibc from Version 2.12 to 2.14 on Centos
What Does Localhost Means Inside a Docker Container
The Difference Between Fork(), Vfork(), Exec() and Clone()
How to Kill a Process by Name Instead of Pid, on Linux
How to Set the Gopath Environment Variable on Ubuntu? What File Must I Edit
[ :Unexpected Operator in Shell Programming
How to Setup Public-Key Authentication
Setting Environment Variables in Linux Using Bash
Which Linux Ipc Technique to Use
"/Usr/Bin/Ld: Cannot Find -Lz"
How to Use Sed to Extract Substring
How to Extract Duration Time from Ffmpeg Output