How to Disassemble a System Call

how to disassemble a system call?

Well, you could do something like this. Say I wanted to get an assembly dump of "dup":

Write this:

#include <stdio.h>
#include <sys/file.h>
int main() {
        return dup(0)
}

Compile it:

gcc  -o systest -g3 -O0 systest.c

Dump it:

objdump -d systest

Looking in "main" I see:

  400478:       55                      push   %rbp
  400479:       48 89 e5                mov    %rsp,%rbp
  40047c:       bf 00 00 00 00          mov    $0x0,%edi
  400481:       b8 00 00 00 00          mov    $0x0,%eax
  400486:       e8 1d ff ff ff          callq  4003a8 <dup@plt>
  40048b:       c9                      leaveq
  40048c:       c3                      retq
  40048d:       90                      nop
  40048e:       90                      nop
  40048f:       90                      nop

So looking at "dup@plt" I see:

00000000004003a8 <dup@plt>:
  4003a8:       ff 25 7a 04 20 00       jmpq   *2098298(%rip)        # 600828 <_GLOBAL_OFFSET_TABLE_+0x20>
  4003ae:       68 01 00 00 00          pushq  $0x1
  4003b3:       e9 d0 ff ff ff          jmpq   400388 <_init+0x18>

So it's making a call into a "global offset table", which I would assume has all the syscall vectors. Like the other post said, see the kernel source (or standard library sources?) for details on that.

How to disassemble a system call

I am not sure you question is very meaningful.

Please read more about system calls, kernels, operating systems, linux, and the linux kernel

Essentially, a system call is (from the application point of view) an atomic operation implemented by one machine instruction (int 0x80, syscall, etc.) with a few book-keeping instructions before (e.g. loading the system call arguments to registers) and after (e.g. setting errno). When it happens, control goes into the kernel, with a (sort-of) different address space and a different protection ring; here is the list of linux syscalls

The real code doing the system call is inside the kernel. You can get the Linux kernel code on kernel.org

See also the Linux Assembly Howto and asm.sourceforge.net

To understand what system calls a given application or process is doing, use strace

How to get disassembly after syscall in Windows

Not sure, if there is actually need for kernel debugging session itself. If you need only disassembly and are not experienced with kernel debugging, you may go simplier way.

Mark Russinovich's Process Explorer is a simple-to-use tool, could help to get a clue and investigate on what is happening. For example, it could display the thread call stack, and exposes kernel part of stack as well, with human-readible names:

ntoskrnl.exe!KeSynchronizeExecution+0x3f26
ntoskrnl.exe!KeWaitForMultipleObjects+0x109c
ntoskrnl.exe!KeWaitForMultipleObjects+0xb3f
ntoskrnl.exe!KeWaitForMutexObject+0x377
ntoskrnl.exe!KeUnstackDetachProcess+0x2230
ntoskrnl.exe!ObDereferenceObjectDeferDelete+0x28a
ntoskrnl.exe!KeWaitForMultipleObjects+0x1283
ntoskrnl.exe!KeWaitForMultipleObjects+0xb3f
ntoskrnl.exe!KeDelayExecutionThread+0x106
ntoskrnl.exe!CcUnpinData+0xfe
ntoskrnl.exe!setjmpex+0x3aa3
ntdll.dll!NtDelayExecution+0x14
test.exe!main+0x1f
test.exe!__scrt_common_main_seh+0x11d
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

Next step could be just disassembling ntoskrnl.exe with some disassembler, like x64dbg, or IDA Pro and you would get requested disassembly, proceeding to investigate it from KeDelayExecutionThread() entry.

As for literature i can advice you Mark Russinovich's book, Windows Internals.

Good luck!

How to disassemble a memory range with GDB?

Do you only want to disassemble your actual main? If so try this:

(gdb) info line main 
(gdb) disas STARTADDRESS ENDADDRESS

Like so:

USER@MACHINE /cygdrive/c/prog/dsa
$ gcc-3.exe -g main.c

USER@MACHINE /cygdrive/c/prog/dsa
$ gdb a.exe
GNU gdb 6.8.0.20080328-cvs (cygwin-special)
...
(gdb) info line main
Line 3 of "main.c" starts at address 0x401050 <main> and ends at 0x401075 <main+
(gdb) disas 0x401050 0x401075
Dump of assembler code from 0x401050 to 0x401075:
0x00401050 <main+0>:    push   %ebp
0x00401051 <main+1>:    mov    %esp,%ebp
0x00401053 <main+3>:    sub    $0x18,%esp
0x00401056 <main+6>:    and    $0xfffffff0,%esp
0x00401059 <main+9>:    mov    $0x0,%eax
0x0040105e <main+14>:   add    $0xf,%eax
0x00401061 <main+17>:   add    $0xf,%eax
0x00401064 <main+20>:   shr    $0x4,%eax
0x00401067 <main+23>:   shl    $0x4,%eax
0x0040106a <main+26>:   mov    %eax,-0xc(%ebp)
0x0040106d <main+29>:   mov    -0xc(%ebp),%eax
0x00401070 <main+32>:   call   0x4010c4 <_alloca>
End of assembler dump.

I don't see your system interrupt call however. (its been a while since I last tried to make a system call in assembly. INT 21h though, last I recall

How to disassemble one single function using objdump?

I would suggest using gdb as the simplest approach. You can even do it as a one-liner, like:

gdb -batch -ex 'file /bin/ls' -ex 'disassemble main'

How to disassemble, modify and then reassemble a Linux executable?

I don't think there is any reliable way to do this. Machine code formats are very complicated, more complicated than assembly files. It isn't really possible to take a compiled binary (say, in ELF format) and produce a source assembly program which will compile to the same (or similar-enough) binary. To gain an understanding of the differences, compare the output of GCC compiling direct to assembler (gcc -S) versus the output of objdump on the executable (objdump -D).

There are two major complications I can think of. Firstly, the machine code itself is not a 1-to-1 correspondence with assembly code, because of things like pointer offsets.

For example, consider the C code to Hello world:

int main()
{
    printf("Hello, world!\n");
    return 0;
}

This compiles to the x86 assembly code:

.LC0:
    .string "hello"
    .text
<snip>
    movl    $.LC0, %eax
    movl    %eax, (%esp)
    call    printf

Where .LCO is a named constant, and printf is a symbol in a shared library symbol table. Compare to the output of objdump:

80483cd:       b8 b0 84 04 08          mov    $0x80484b0,%eax
80483d2:       89 04 24                mov    %eax,(%esp)
80483d5:       e8 1a ff ff ff          call   80482f4 <printf@plt>

Firstly, the constant .LC0 is now just some random offset in memory somewhere -- it would be difficult to create an assembly source file which contains this constant in the correct place, since the assembler and linker are free to choose locations for these constants.

Secondly, I'm not entirely sure about this (and it depends on things like position independent code), but I believe the reference to printf is not actually encoded at the pointer address in that code there at all, but the ELF headers contain a lookup table which dynamically replaces its address at runtime. Therefore, the disassembled code doesn't quite correspond to the source assembly code.

In summary, source assembly has symbols while compiled machine code has addresses which are difficult to reverse.

The second major complication is that an assembly source file can't contain all of the information that was present in the original ELF file headers, like which libraries to dynamically link against, and other metadata placed there by the original compiler. It would be difficult to reconstruct this.

Like I said, it's possible that a special tool can manipulate all of this information, but it is unlikely that one can simply produce assembly code which can be reassembled back to the executable.

If you are interested in modifying just a small section of the executable, I recommend a much more subtle approach than recompiling the whole application. Use objdump to get the assembly code for the function(s) you are interested in. Convert it to "source assembly syntax" by hand (and here, I wish there was a tool that actually produced disassembly in the same syntax as the input), and modify it as you wish. When you are done, recompile just those function(s) and use objdump to figure out the machine code for your modified program. Then, use a hex editor to manually paste the new machine code over the top of the corresponding part of the original program, taking care that your new code is precisely the same number of bytes as the old code (or all the offsets would be wrong). If the new code is shorter, you can pad it out using NOP instructions. If it is longer, you may be in trouble, and might have to create new functions and call them instead.

Uncertain about some instructions in disassembly of data section

In order to not get inapt disassembly of section .data, don't use objdump option

-D, --disassemble-all Display assembler contents of all sections

but rather

-d, --disassemble Display assembler contents of executable sections

and in addition

-s, --full-contents Display the full contents of all sections requested

to get a dump of the data:


Contents of section .data:
 80490a8 48656c6c 6f20776f 726c6421 0a00      Hello world!..