Can Gdb Change the Assembly Code of a Running Program

Can GDB change the assembly code of a running program?

You can write binary to memory directly but GDB doesn't have an assembler build in by default you can however do something like set *(unsigned char*)0x80FFDDEE = 0x90 to change the mnemonic at that address to a NOP for example. You could however use NASM to write a shellcode and use perl or python to inject it into the program :)

You might also like this little .gdbinit file to make debugging allot easier: https://gist.github.com/985474

GDB : changing the assembly code of a running program

0x74168048ea8 is longer than a word. You should try setting bytes one by one, e.g.

  set *(char*)0x08048e3a = 0x74
  set *(char*)0x08048e3b = 0x16

etc

Live editing code with gdb

Is what I am trying to do possible?

Yes, you can change .text of a binary.

Note that this change will only affect current execution; upon run your change will "evaporate" (if you wanted to permanently patch the binary, that's possible as well, but the procedure is different).

If so, am I doing something wrong?

Likely. You didn't tell us what you are trying to change the instruction to.

If so, what am I doing wrong and how can I fix it?

Using (gdb) disas/r will show you actual raw instruction bytes, and will likely make it easier to see what you did wrong. When I use it, I see this:

   0x080483ed <+9>: c7 44 24 1c d0 84 04 08 movl   $0x80484d0,0x1c(%esp)

That is, the address (which you apparently wanted to overwrite) for the instruction above [1] does not begin at &instruction+1, it begins at &instruction+4. Also, you shouldn't reverse the bytes when you ask GDB to write a word (I am guessing you wanted the new address to be 0x0804785b and not 0x5b870408):

(gdb) set *(0x080483ed+4)=0x01020304
(gdb) disas
Dump of assembler code for function main:
   0x080483e4 <+0>: push   %ebp
   0x080483e5 <+1>: mov    %esp,%ebp
   0x080483e7 <+3>: and    $0xfffffff0,%esp
   0x080483ea <+6>: sub    $0x20,%esp
=> 0x080483ed <+9>: movl   $0x1020304,0x1c(%esp)
   0x080483f5 <+17>:    mov    0x1c(%esp),%eax
   0x080483f9 <+21>:    mov    %eax,(%esp)
   0x080483fc <+24>:    call   0x8048318 <puts@plt>
   0x08048401 <+29>:    mov    $0x0,%eax
   0x08048406 <+34>:    leave  
   0x08048407 <+35>:    ret

[1] It is very likely that your instruction:

0x080487e0 <+17>: movl   $0x8048640,0x20(%esp)

has the same encoding as my instruction:

0x080483ed  <+9>: movl   $0x80484d0,0x1c(%esp)

as they are the "same", and have the same 8-byte length, but as FrankH pointed out, there might exist a different encoding of the same instruction. In any case, disas/r will show you all you need to know.

can I edit lines of code using gdb and is it also possible to save to actual source file and header file while in same debug session? linux

Most C implementations are compiled. The source code is analyzed and translated to processor instructions. This translation would be difficult to do on a piecewise basis. That is, given some small change in the source code, it would be practically impossible to update the executable file to represent those changes. As part of the translation, the compiler transforms and intertwines statements, assigns processor registers to be used for computing parts of expressions, designates places in memory to hold data, and more. When source code is changed slightly, this may result in a new compilation happening to use a different register in one place or needing more or less memory in a particular function, which results in data moving back or forth. Merging these changes into the running program would require figuring out all the differences, moving things in memory, rearranging what is in what processor register, and so on. For practical purposes, these changes are impossible.

GDB does not support this.

(Apple’s developer tools may have some feature like this. I saw it demonstrated for the Swift programming language but have not used it.)

Use gdb to Modify Binary

but the corresponding file is not changed.

It's hard to say what address you are actually modifying, and so whether your change should actually modify the binary or not.

In the past, I've found that after modifying the binary, I need to immediately quit. If I do anything other than quit (e.g. run), then GDB would discard my change, but if I quit, then the change would "take".

Example:

$ cat t.c
int main()
{
  return 42;
}

$ gcc t.c && ./a.out; echo $?
42

$ gdb --write -q  ./a.out
(gdb) disas/r main
Dump of assembler code for function main:
   0x00000000004004b4 <+0>:     55      push   %rbp
   0x00000000004004b5 <+1>:     48 89 e5        mov    %rsp,%rbp
   0x00000000004004b8 <+4>:     b8 2a 00 00 00  mov    $0x2a,%eax
   0x00000000004004bd <+9>:     5d      pop    %rbp
   0x00000000004004be <+10>:    c3      retq   
End of assembler dump.
(gdb) set {unsigned char}0x00000000004004b9 = 22
(gdb) disas/r main
Dump of assembler code for function main:
   0x00000000004004b4 <+0>:     55      push   %rbp
   0x00000000004004b5 <+1>:     48 89 e5        mov    %rsp,%rbp
   0x00000000004004b8 <+4>:     b8 16 00 00 00  mov    $0x16,%eax  <<< ---changed
   0x00000000004004bd <+9>:     5d      pop    %rbp
   0x00000000004004be <+10>:    c3      retq   
End of assembler dump.
(gdb) q

$ ./a.out; echo $?
22    <<<--- Just as desired

Show current assembly instruction in GDB

You can switch to assembly layout in GDB:

(gdb) layout asm

See here for more information. The current assembly instruction will be shown in assembler window.

   ┌───────────────────────────────────────────────────────────────────────────┐
   │0x7ffff740d756 <__libc_start_main+214>  mov    0x39670b(%rip),%rax        #│
   │0x7ffff740d75d <__libc_start_main+221>  mov    0x8(%rsp),%rsi              │
   │0x7ffff740d762 <__libc_start_main+226>  mov    0x14(%rsp),%edi             │
   │0x7ffff740d766 <__libc_start_main+230>  mov    (%rax),%rdx                 │
   │0x7ffff740d769 <__libc_start_main+233>  callq  *0x18(%rsp)                 │
  >│0x7ffff740d76d <__libc_start_main+237>  mov    %eax,%edi                   │
   │0x7ffff740d76f <__libc_start_main+239>  callq  0x7ffff7427970 <exit>       │
   │0x7ffff740d774 <__libc_start_main+244>  xor    %edx,%edx                   │
   │0x7ffff740d776 <__libc_start_main+246>  jmpq   0x7ffff740d6b9 <__libc_start│
   │0x7ffff740d77b <__libc_start_main+251>  mov    0x39ca2e(%rip),%rax        #│
   │0x7ffff740d782 <__libc_start_main+258>  ror    $0x11,%rax                  │
   │0x7ffff740d786 <__libc_start_main+262>  xor    %fs:0x30,%rax               │
   │0x7ffff740d78f <__libc_start_main+271>  callq  *%rax                       │
   └───────────────────────────────────────────────────────────────────────────┘
multi-thre process 3718 In: __libc_start_main     Line: ??   PC: 0x7ffff740d76d
#3  0x00007ffff7466eb5 in _IO_do_write () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007ffff74671ff in _IO_file_overflow ()
   from /lib/x86_64-linux-gnu/libc.so.6
#5  0x0000000000408756 in ?? ()
#6  0x0000000000403980 in ?? ()
#7  0x00007ffff740d76d in __libc_start_main ()
   from /lib/x86_64-linux-gnu/libc.so.6
(gdb)

How to call assembly in gdb?

Prior to GCC 5 ⁽¹⁾, I don't know of a way to run arbitrary machine code unless you actually enter the machine code into memory and then run it.

If you want to run code that's already in memory, you can just set the instruction pointer to the start, a breakpoint at the end, then go. Then, after the breakpoint, change the instruction pointer back to its original value.

But I can't actually see the use case for this. That doesn't mean there isn't one, just that anything you can do by running code, you can also achieve by directly modifying the registers, flags, memory and so forth.

For example, the command:

info registers

will dump the current values of the registers while:

set $eax = 42

will change the eax register to 42.

You can also change memory in this way:

set *((char*)0xb7ffeca0) = 4

This writes a single byte to memory location 0xb7ffeca0 and you can also use that same method to store wider data types.

⁽¹⁾ GCC 5 allows you to compile and execute arbitrary code with the compile code command, as documented here.

Setting breakpoints in GDB on a program build with YASM -g dwarf2 changes program behaviour and segfaults or SIGILL

YASM is making bad DWARF2 debug info. It's old and unmaintained. Use NASM instead.

NASM 2.15.05 nasm -felf64 -g didn't work for me either: GDB 12.1 says there is no line 30 when I tried b 30. But still generally use NASM. I didn't try NASM's DWARF debug-info format; I've had problems with it in the past IIRC, like messing up objdump disassembly so it's probably not great.

Don't rely on debug info from NASM or YASM. Use layout asm and set breakpoints on numeric addresses, or at the current position that you single-step to. layout reg / layout n is a good way to a registers + disassembly view. You can copy/paste addresses from there or disas to do stuff like b *0x40101b. Start the program with starti so GDB stops before executing the first user-space instruction; from there you can si single-step by instruction. See the bottom of https://stackoverflow.com/tags/x86/info for asm debugging tips.

(Update: the NASM bug with debug info is described in GDB does not load source lines from NASM Will hopefully get fixed in a future version of NASM.)

Assembly language maps 1:1 with machine code, so it's actually helpful to look at canonical disassembly of it when debugging, may help you spot something where you wrote the wrong thing by accident.

When I build with YASM 1.3.0 and try single-stepping this with layout reg (so register + source view), the debug info doesn't seem to match well, since I get two steps on the same source line sometimes (other than the repe where that's expected; I mean with RIP incrementing).

I built with yasm -felf64 -gdwarf2 using YASM 1.3.0, ld 2.38, GDB 12.1, on Linux 5.18 (Arch Linux) on bare metal (Skylake CPU). Using b 30 to set a breakpoint there doesn't ever hit the breakpoint; it runs without crashing for me.

Debug info maps source lines to memory addresses, so setting a breakpoint in GDB modifies a byte of machine code other than the first of an instruction (to 0xcc INT3 software breakpoint).

This would lead to occasional illegal instructions, or more commonly to valid but different instructions (e.g. changing a byte of an absolute address), perhaps of shorter length leading to later bytes getting decoded as opcodes if a ModRM byte got modified. (Linux delivers SIGSEGV when user-space tries to run a privileged instruction, so various problems would all raise the same signal, even if the CPU exception was #GP rather than #PF). Also, overwriting a ModRM byte with 0xCC would change what the register operands are, so later instructions could use a bad register value.

0xCC as a ModRM byte is a register (not memory) operand with AH and CL or ESP and ECX. For example with the first 4 opcodes (add of different order and size), from putting db 0, 0xcc and so on into a .asm to make this example:

  401000:       00 cc                   add    ah,cl
  401002:       01 cc                   add    esp,ecx
  401004:       02 cc                   add    cl,ah
  401006:       03 cc                   add    ecx,esp

Imagine what would happen if any of your mov or sub instructions had their operands replaced with esp,ecx for example! (And if it happens to inc, it could actually change the instruction. Some of your mov-immediate instructions may have modrm bytes, too, since unlike NASM, YASM doesn't optimize mov rcx,2 to mov ecx,2; it uses mov r/m64, sign_extended_imm32.)

Or of course messing up the jmp rel8 would jump to the wrong place. (But CC is a negative 8-bit integer so it would jump backwards).

Using GDB to try to examine the situation (e.g. to disassemble the machine code that faulted) may not work, because GDB puts back the original machine code bytes for commands like x /i or disas to try to disassemble.

You might still see RIP pointing at a privileged instruction or a bad register value if an earlier 0xCC byte got decoding out of sync, but you wouldn't be able to see how execution of earlier instructions could have led to this point, because you wouldn't be seeing the earlier instructions that the CPU actually executed.

I can reproduce this, confirming stray 0xCC bytes

I was able to reproduce it by setting breakpoints on lines 29 and 31 as well as 30. When it segfaulted, RIP was 0x40103f, just past the end of the int 0x80.

Watching the disassembly view and single-stepping by instruction with si (stepi), execution went right through the repe cmpsb in one step, and through the int 0x80, faulting on a 00 00 add [rax],al after it.

mov rdi,0x402004 loaded 0x4020cc, the wrong address but still inside the same page. So the strings differed on the first byte, explaining why repe cmpsb ran only one instruction.

mov eax,0x1 loaded RAX with 0xcc. In the 32-bit int 0x80 ABI (which you normally don't want to use in 64-bit code, BTW) that's __NR_setregid32 (check asm/unistd_32.h). So int 0x80 returns with RAX=-1, -EPERM (asm-generic/errno-base.h).

In both these cases, a 0xcc byte was the 2nd byte of the instruction, the first byte of the immediate. x86 is little-endian, so that messed up the low byte of the value loaded.

Can Gdb Change the Assembly Code of a Running Program