Does Gdb Temporarily Give Pages Write Permission

Does gdb temporarily give pages write permission?

does GDB temporarily set write permissions

No.

On Linux/*86, ptrace() (which is what GDB uses to read and write the inferior (being debugged) process memory) allows reads and writes to pages that are not readable/writable by the inferior, leading exactly to the confusion you've described.

This could be considered a bug in the kernel.

It should be noted that the kernel has to allow ptrace to write to normally non-writable .text section for the debugger to be able to plant breakpoints (which is done by overwriting original instruction with the breakpoint/trap instruction -- int3 via PTRACE_POKETEXT request).

The kernel doesn't have to do the same for POKE_DATA, but man ptrace says:

PTRACE_POKETEXT, PTRACE_POKEDATA
   Copies the word data to location addr in the child's memory.
   As above, the two requests are currently equivalent.

I believe it's that equivalentness that causes the current behavior.

gdb command to get the virtual memory page's permissions for a given address

You can find this info in "/proc/'pid_of_your_app'/maps" file. Please check Understanding Linux /proc/id/maps for more info.

If you often need to lookup addresses in maps file, you can write a small script to do this...

Anti-debugging: gdb does not write 0xcc byte for breakpoints. Any idea why?

Second part is easily explained (as Flortify correctly stated):
GDB shows original memory contents, not the breakpoint "bytes". In default mode it actually even removes breakpoints when debugger suspends and re-inserts them before continuing. Users typically want to see their code, not strange modified instructions used for breakpoints.

With your C code you missed breakpoint for few bytes. GDB sets breakpoint after function prologue, because function prologue is not typically what gdb users want to see. So, if you put break to foo, actual breakpoint will be typically located few bytes after that (depends on prologue code itself that is function dependent as it may or might not have to save stack pointer, frame pointer and so on). But it is easy to check. I used this code:

#include <stdio.h>
int main()
{
    int i,j;
    unsigned char *p = (unsigned char*)main;

    for (j=0; j<4; j++) {
        printf("%p: ",p);
        for (i=0; i<16; i++)
            printf("%.2x ", *p++);
        printf("\n");
    }
    return 0;
}

If we run this program by itself it prints:

0x40057d: 55 48 89 e5 48 83 ec 10 48 c7 45 f8 7d 05 40 00
0x40058d: c7 45 f4 00 00 00 00 eb 5a 48 8b 45 f8 48 89 c6
0x40059d: bf 84 06 40 00 b8 00 00 00 00 e8 b4 fe ff ff c7
0x4005ad: 45 f0 00 00 00 00 eb 27 48 8b 45 f8 48 8d 50 01

Now we run it in gdb (output re-formatted for SO).

(gdb) break main
Breakpoint 1 at 0x400585: file ../bp.c, line 6.
(gdb) info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x0000000000400585 in main at ../bp.c:6
(gdb) disas/r main,+32
Dump of assembler code from 0x40057d to 0x40059d:
  0x000000000040057d (main+0):  55                        push %rbp
  0x000000000040057e (main+1):  48 89 e5                  mov %rsp,%rbp
  0x0000000000400581 (main+4):  48 83 ec 10               sub $0x10,%rsp
  0x0000000000400585 (main+8):  48 c7 45 f8 7d 05 40 00   movq $0x40057d,-0x8(%rbp)
  0x000000000040058d (main+16): c7 45 f4 00 00 00 00      movl $0x0,-0xc(%rbp)
  0x0000000000400594 (main+23): eb 5a                     jmp 0x4005f0 
  0x0000000000400596 (main+25): 48 8b 45 f8               mov -0x8(%rbp),%rax
  0x000000000040059a (main+29): 48 89 c6                  mov %rax,%rsi
End of assembler dump.

With this we verified, that program is printing correct bytes. But this also shows that breakpoint has been inserted at 0x400585 (that is after function prologue), not at first instruction of function.
If we now run program under gdb (with run) and then "continue" after breakpoint is hit, we get this output:

(gdb) cont
Continuing.
0x40057d: 55 48 89 e5 48 83 ec 10 cc c7 45 f8 7d 05 40 00 
0x40058d: c7 45 f4 00 00 00 00 eb 5a 48 8b 45 f8 48 89 c6 
0x40059d: bf 84 06 40 00 b8 00 00 00 00 e8 b4 fe ff ff c7 
0x4005ad: 45 f0 00 00 00 00 eb 27 48 8b 45 f8 48 8d 50 01

This now shows 0xcc being printed for address 9 bytes into main.

How to print every executed line in GDB automatically until a given breakpoint is reached?

Well, this wasn't easy - but I think I somewhat got it :) I went through a bunch of failed attempts (posted here); relevant code is below.

Basically, the problem in a "next/step until breakpoint" is how to determine whether you're "on" a breakpoint or not, if the debugger is stopped (at a step). Note also I use GDB 7.2-1ubuntu11 (current for Ubuntu 11.04). So, it went like this:

I first found about Convenience Variables, and thought - given there are program counters and such available, there must be some GDB convenience variable that gives the "breakpoint" status, and can be used directly in a GDB script. After looking through GDB reference Index for a while, however, I simply cannot find any such variables (my attempts are in nub.gdb)
In lack of such a "breakpoint status" internal variable - the only thing left to do, is to capture the ('stdout') command line output of GDB (in response to commands) as a string, and parse it (looking for "Breakpoint")
Then, I found out about Python API to GDB, and the gdb.execute("CMDSTR", toString=True) command - which is seemingly exactly what is needed to capture the output: "By default, any output produced by command is sent to gdb's standard output. If the to_string parameter is True, then output will be collected by gdb.execute and returned as a string[1]"!
- So, first I tried to make a script (pygdb-nub.py,gdbwrap) that would utilize gdb.execute in the recommended manner; failed here - because of this:
  - Bug 627506 – python: gdb.execute([...], to_string=True) partly prints to stdout/stderr
  - Bug 10808 – Allow GDB/Python API to capture and store GDB output
- Then, I thought I'd use a python script to subprocess.Popen the GDB program, while replacing its stdin and stdout; and then proceed controlling GDB from there (pygdb-sub.py) - that failed too... (apparently, because I didn't redirect stdin/out right)
- Then, I thought I'd use python scripts to be called from GDB (via source) which would internally fork into a pty whenever gdb.execute should be called, so as to capture its output (pygdb-fork.gdb,pygdb-fork.py)... This almost worked - as there are strings returned; however GDB notices something ain't right: "[tcsetpgrp failed in terminal_inferior: Operation not permitted]", and the subsequent return strings don't seem to change.

And finally, the approach that worked is: temporarily redirecting the GDB output from a gdb.execute to a logfile in RAM (Linux: /dev/shm); and then reading it back, parsing it and printing it from python - python also handles a simple while loop that steps until a breakpoint is reached.

The irony is - most of these bugs, that caused this solution via redirecting the logfile, are actually recently fixed in SVN; meaning those will propagate to the distros in the near future, and one will be able to use gdb.execute("CMDSTR", toString=True) directly :/ Yet, as I cannot risk building GDB from source right now (and possibly bumping into possible new incompatibilites), this is good enough for me also :)

Here are the relevant files (partially also in pygdb-fork.gdb,pygdb-fork.py):

pygdb-logg.gdb is:

# gdb script: pygdb-logg.gdb
# easier interface for pygdb-logg.py stuff
# from within gdb: (gdb) source -v pygdb-logg.gdb
# from cdmline: gdb -x pygdb-logg.gdb -se test.exe

# first, "include" the python file:
source -v pygdb-logg.py

# define shorthand for nextUntilBreakpoint():
define nub
  python nextUntilBreakpoint()
end

# set up breakpoints for test.exe:
b main
b doFunction

# go to main breakpoint
run

pygdb-logg.py is:

# gdb will 'recognize' this as python
#  upon 'source pygdb-logg.py'
# however, from gdb functions still have
#  to be called like:
#  (gdb) python print logExecCapture("bt")

import sys
import gdb
import os

def logExecCapture(instr):
  # /dev/shm - save file in RAM
  ltxname="/dev/shm/c.log"

  gdb.execute("set logging file "+ltxname) # lpfname
  gdb.execute("set logging redirect on")
  gdb.execute("set logging overwrite on")
  gdb.execute("set logging on")
  gdb.execute(instr)
  gdb.execute("set logging off")

  replyContents = open(ltxname, 'r').read() # read entire file
  return replyContents

# next until breakpoint
def nextUntilBreakpoint():
  isInBreakpoint = -1;
  # as long as we don't find "Breakpoint" in report:
  while isInBreakpoint == -1:
    REP=logExecCapture("n")
    isInBreakpoint = REP.find("Breakpoint")
    print "LOOP:: ", isInBreakpoint, "\n", REP

Basically, pygdb-logg.gdb loads the pygdb-logg.py python script, sets up the alias nub for nextUntilBreakpoint, and initializes the session - everything else is handled by the python script. And here is a sample session - in respect to the test source in OP:

$ gdb -x pygdb-logg.gdb -se test.exe
...
Reading symbols from /path/to/test.exe...done.
Breakpoint 1 at 0x80483ec: file test.c, line 14.
Breakpoint 2 at 0x80483c7: file test.c, line 7.

Breakpoint 1, main () at test.c:14
14    count = 1;
(gdb) nub
LOOP::  -1
15    count += 2;

LOOP::  -1
16    count = 0;

LOOP::  -1
19      doFunction();

LOOP::  1

Breakpoint 2, doFunction () at test.c:7
7     count += 2;

(gdb) nub
LOOP::  -1
9     count--;

LOOP::  -1
10  }

LOOP::  -1
main () at test.c:20
20      printf("%d\n", count);

1
LOOP::  -1
21    }

LOOP::  -1
19      doFunction();

LOOP::  1

Breakpoint 2, doFunction () at test.c:7
7     count += 2;

(gdb)

... just as I wanted it :P Just don't know how reliable it is (and whether it will be possible to use in avr-gdb, which is what I need this for :) EDIT: version of avr-gdb in Ubuntu 11.04 is currently 6.4, which doesn't recognize the python command :()

Well, hope this helps someone,

Cheers!

Here some references:

GDB: error detected on stdin
GDB has problems with getting commands piped to STDIN
Re: [Gdb] How do i use GDB other input?
gdb doesn't accept input on stdin
Using gdb in an IDE - comp.os.linux.development.apps | Google Groups
rmathew: Terminal Sickness
[TUTORIAL] Calling an external program in C (Linux) - GIDForums
shell - how to use multiple arguments with a shebang (i.e. #!)? - Stack Overflow
Redirecting/storing output of shell into GDB variable? - Stack Overflow
Corey Goldberg: Python - Redirect or Turn Off STDOUT and STDERR
The Cliffs of Inanity › 9. Scripting gdb
gdb python scripting: where has parse_and_eval gone? - Stack Overflow
shell - Invoke gdb to automatically pass arguments to the program being debugged - Stack Overflow
Storing Files/Directories In Memory With tmpfs | HowtoForge - Linux Howtos and Tutorials
simple way to touch a file if it does not exist | Python | Python
os.fork() different in cgi-script? - Python
java - Writing tests that use GDB - how to capture output? - Stack Overflow
Debugging with GDB: How to create GDB Commands in Python - Wiki
GDB reference card

How does ptrace POKETEXT works when modifying program text?

Looking at the kernel sources, x86 uses the generic (as opposed to arch-specific) ptrace request functions.

The actual changes are done by mm/memory.c:__access_remote_vm(), which uses mm/gup.c:get_user_pages_remote() to obtain the kernel mapping for the target page, followed by kmap(page), copy_to_user_pages(), set_page_dirty_lock(), kunmap(page), and put_page(page).

The simple description of what is actually done, is that the target process memory containing the code is accessed (modified) thorough the kernel mapping — the virtual memory "window" or "barrier" between the target process and the kernel — and not through the mappings visible to user-space processes.

Based on the above, we can answer the stated questions:

Does PTRACE_POKETEXT bypass read only page permissions of the traced process?

Yes. The kernel does not use the page protection mechanisms visible to userspace processes for this; it uses its own internal mappings.

Or does it need to change permission temporarily to be writable?

No, it does not.

Note that except for the changed data in the userspace memory (and possibly whether the pages are backed by an executable file or not), and for any kernel or hardware bugs there might be, when and how the kernel uses its own mappings is invisible and undetectable to userspace processes.

Assembly x86 Programming Debugging (GDB): How to print out data through advancing indexing

I think 8($esp) tries to call 8 as a function, with %esp as the arg. (But absolute address 8 is not in a valid executable page). Remember that GDB uses GDB's C-like syntax, not AT&T assembly addressing-mode syntax.

x  $esp + 8

The x command examines memory at that address. Use help x to see options for formatting and how many elements to display. p would print the address, unless you cast it to a pointer and dereference it like p *(int*)(8 + $esp)

Does Gdb Temporarily Give Pages Write Permission