How to interpret %gs:0x14?
First, let's deal with the terms. Seems you're using "protected mode" in general, as opposed to real mode. But, at least in Intel manuals, this term is applicable only for 32 bits mode. For 64 bits mode they use a poorly marketing term "IA-32e mode", which is horrible compared to "long mode" by AMD, but both are still hiding the fact that 64-bit mode is also protected one.
This difference is important because dealing with %gs is different for 32- and 64-bit protected mode. For 32 bits it's yet another segment register. A thread switching code fills it with a segment base for the current thread in the same virtual space, so, unlike {CS,DS,ES,SS} it's base isn't zero in a flat mode. For 64 bits, it's just a offset kept in a processor MSR and also changed by scheduler to the current thread TLS address. (Details can differ between Linux/*BSD/Windows/etc. which of %fs and %gs is used for what role.) But, as a common result, when see an access like %gs:0x14 you should realize that
- GS base address is got (using, as explained above, a generic method for 32 bits and special MSR-based handling for 64 bits nvironment)
- 0x14 is added to this address
and that's all you need to know unless you develop kernel or another deeply system thing as e.g. Wine.
What is %gs in Assembly
GS is a segment register, its use in linux can be read up on here (its basically used for per thread data).
mov %gs:0x14,%eax
xor %gs:0x14,%eax
this code is used to validate that the stack hasn't exploded or been corrupted, using a canary value stored at GS+0x14, see this.
gcc -fstack-protector=strong
is on by default in many modern distros; you can use gcc -fno-stack-protector
to not add those checks. (On x86, thread-local storage is cheap so GCC keeps the randomized canary value there, making it somewhat harder to leak.)
How to use a logical address with an FS or GS base in gdb?
how can i read the memory at "%gs:0x14" in gdb
You can't: there is no way for GDB to know how the segment to which %gs
refers to has been set up.
or translate this logical address to a linear address that i could use in x command
Again, you can't do this in general. However, you appear to be on 32-bit x86 Linux, and there you can do that -- the %gs
is set up to point to the thread descriptor via set_thread_area
system call.
You can do catch syscall set_thread_area
in GDB, and examine the parameters (each thread will have one such call). The code to actually do that is here. Once you know how %gs
has been set up, just add 0x14 to the base_addr
, and you are done.
Understand the assembly code generated by a simple C program
The reason for the "strange" addresses such as main+0
, main+1
, main+3
, main+6
and so on, is because each instruction takes up a variable number of bytes. For example:
main+0: push %ebp
is a one-byte instruction so the next instruction is at main+1
. On the other hand,
main+3: and $0xfffffff0,%esp
is a three-byte instruction so the next instruction after that is at main+6
.
And, since you ask in the comments why movl
seems to take a variable number of bytes, the explanation for that is as follows.
Instruction length depends not only on the opcode (such as movl
) but also the addressing modes for the operands as well (the things the opcode are operating on). I haven't checked specifically for your code but I suspect the
movl $0x1,(%esp)
instruction is probably shorter because there's no offset involved - it just uses esp
as the address. Whereas something like:
movl $0x2,0x4(%esp)
requires everything that movl $0x1,(%esp)
does, plus an extra byte for the offset 0x4
.
In fact, here's a debug session showing what I mean:
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
c:\pax> debug
-a
0B52:0100 mov word ptr [di],7
0B52:0104 mov word ptr [di+2],8
0B52:0109 mov word ptr [di+0],7
0B52:010E
-u100,10d
0B52:0100 C7050700 MOV WORD PTR [DI],0007
0B52:0104 C745020800 MOV WORD PTR [DI+02],0008
0B52:0109 C745000700 MOV WORD PTR [DI+00],0007
-q
c:\pax> _
You can see that the second instruction with an offset is actually different to the first one without it. It's one byte longer (5 bytes instead of 4, to hold the offset) and actually has a different encoding c745
instead of c705
.
You can also see that you can encode the first and third instruction in two different ways but they basically do the same thing.
The and $0xfffffff0,%esp
instruction is a way to force esp
to be on a specific boundary. This is used to ensure proper alignment of variables. Many memory accesses on modern processors will be more efficient if they follow the alignment rules (such as a 4-byte value having to be aligned to a 4-byte boundary). Some modern processors will even raise a fault if you don't follow these rules.
After this instruction, you're guaranteed that esp
is both less than or equal to its previous value and aligned to a 16 byte boundary.
The gs:
prefix simply means to use the gs
segment register to access memory rather than the default.
The instruction mov %eax,-0xc(%ebp)
means to take the contents of the ebp
register, subtract 12 (0xc
) and then put the value of eax
into that memory location.
Re the explanation of the code. Your function
function is basically one big no-op. The assembly generated is limited to stack frame setup and teardown, along with some stack frame corruption checking which uses the afore-mentioned %gs:14
memory location.
It loads the value from that location (probably something like 0xdeadbeef
) into the stack frame, does its job, then checks the stack to ensure it hasn't been corrupted.
Its job, in this case, is nothing. So all you see is the function administration stuff.
Stack set-up occurs between function+0
and function+12
. Everything after that is setting up the return code in eax
and tearing down the stack frame, including the corruption check.
Similarly, main
consist of stack frame set-up, pushing the parameters for function
, calling function
, tearing down the stack frame and exiting.
Comments have been inserted into the code below:
0x08048428 <main+0>: push %ebp ; save previous value.
0x08048429 <main+1>: mov %esp,%ebp ; create new stack frame.
0x0804842b <main+3>: and $0xfffffff0,%esp ; align to boundary.
0x0804842e <main+6>: sub $0x10,%esp ; make space on stack.
0x08048431 <main+9>: movl $0x3,0x8(%esp) ; push values for function.
0x08048439 <main+17>: movl $0x2,0x4(%esp)
0x08048441 <main+25>: movl $0x1,(%esp)
0x08048448 <main+32>: call 0x8048404 <function> ; and call it.
0x0804844d <main+37>: leave ; tear down frame.
0x0804844e <main+38>: ret ; and exit.
0x08048404 <func+0>: push %ebp ; save previous value.
0x08048405 <func+1>: mov %esp,%ebp ; create new stack frame.
0x08048407 <func+3>: sub $0x28,%esp ; make space on stack.
0x0804840a <func+6>: mov %gs:0x14,%eax ; get sentinel value.
0x08048410 <func+12>: mov %eax,-0xc(%ebp) ; put on stack.
0x08048413 <func+15>: xor %eax,%eax ; set return code 0.
0x08048415 <func+17>: mov -0xc(%ebp),%eax ; get sentinel from stack.
0x08048418 <func+20>: xor %gs:0x14,%eax ; compare with actual.
0x0804841f <func+27>: je <func+34> ; jump if okay.
0x08048421 <func+29>: call <_stk_chk_fl> ; otherwise corrupted stack.
0x08048426 <func+34>: leave ; tear down frame.
0x08048427 <func+35>: ret ; and exit.
I think the reason for the %gs:0x14
may be evident from above but, just in case, I'll elaborate here.
It uses this value (a sentinel) to put in the current stack frame so that, should something in the function do something silly like write 1024 bytes to a 20-byte array created on the stack or, in your case:
char buffer1[5];
strcpy (buffer1, "Hello there, my name is Pax.");
then the sentinel will be overwritten and the check at the end of the function will detect that, calling the failure function to let you know, and then probably aborting so as to avoid any other problems.
If it placed 0xdeadbeef
onto the stack and this was changed to something else, then an xor
with 0xdeadbeef
would produce a non-zero value which is detected in the code with the je
instruction.
The relevant bit is paraphrased here:
mov %gs:0x14,%eax ; get sentinel value.
mov %eax,-0xc(%ebp) ; put on stack.
;; Weave your function
;; magic here.
mov -0xc(%ebp),%eax ; get sentinel back from stack.
xor %gs:0x14,%eax ; compare with original value.
je stack_ok ; zero/equal means no corruption.
call stack_bad ; otherwise corrupted stack.
stack_ok: leave ; tear down frame.
Interpreting eFlags in DDD
The eflags register is made up of single bits, each being a flag.
When displaying the flags, they can be combined in a larger numeric entity (like 0x293 in your example), or each can have a symbol on its own (like in "[CF AF SF IF]" with the carry flag CF, adjust flag AF, sign flag SF and interrupt flag IF.
The Intel 64 and IA 32 Architecture Software Developer's Manual Vol. 1 describes the flags in detail in chapter 3.4.3.
The most important (for application developers) are:
bit | sym | name
------------------
0 | CF | carry
1 | -- | (always 1)
2 | PF | parity
3 | -- | (always 0)
4 | AF | adjust
5 | -- | (always 0)
6 | ZF | zero
7 | SF | sign
8 | TF | trap
9 | IF | interrupt
10 | DF | direction
11 | OF | overflow
Combining those in your example (CF AF SF IF) gives the binary value 1010010011, where the rightmost digit is the carry flag, and the leftmost the interrupt flag. Converted to hexadecimal it gives exactly 0x293.
Position of GCC stack canaries
The canary is always below the frame pointer, with every version of gcc I've tried. You can see that confirmed in the gdb disassembly immediately below the IDA disassembly in the blog post you linked, which has mov rax, QWORD PTR [rbp-0x8]
.
I think this is just an artifact of IDA's disassembler. Instead of displaying the numerical offset for rbp
-relative addresses, it assigns a name to each stack slot, and displays the name instead; basically assuming that every rbp
-relative access is to a local variable or argument. And it looks like it always displays that name with a +
regardless of whether the offset is positive or negative. Note that buf
and fd
also get a +
sign even though they are local variables which are clearly below the frame pointer.
In this example, it has named the canary var_8
as if it were a local variable. So I suppose to translate this properly, you have to think of var_8
as having the value -8
.
Related Topics
A Modification to %Esp Cause Sigsegv
Boost with Qt Creator and Linux
Perl Fails to Set Locale Even Though It Is Installed
Why Do My Results Different Following Along the Tiny Asm Example
Rename Files and Directories (Add Prefix)
How to Find Files Recursively by File Type and Copy Them to a Directory
How to Prepend a Directory the Library Path When Loading a Core File in Gdb on Linux
How to Find All Files with a Filename That Ends with Tilde
How to Automate Telnet Session Using Expect
What Is the Recommended Way to Perform Source-Level Debugging of System Library Calls
Is Stack Memory Contiguous Physically in Linux
How to Ignore Line Breaks in Input Using Nasm Assembly
Prevent * to Be Expanded in the Bash Script
Signals and Interrupts a Comparison