How to Open a File in Assembler and Modify It

How to open a file in assembler and modify it?

This is x86 Linux (x86 is not the only assembly language, and Linux is not the only Unix!)...

section .data

textoutput db 'Hello world!', 10
lentext equ $ - textoutput
filetoopen db 'hi.txt'

The filename string requires a 0-byte terminator: filetoopen db 'hi.txt', 0

section .text
global _start

_start:

mov eax, 5            ;open
mov ebx, filetoopen
mov ecx, 2            ;read and write mode

2 is the O_RDWR flag for the open syscall. If you want the file to be created if it doesn't already exist, you will need the O_CREAT flag as well; and if you specify O_CREAT, you need a third argument which is the permissions mode for the file. If you poke around in the C headers, you'll find that O_CREAT is defined as 0100 - beware of the leading zero: this is an octal constant! You can write octal constants in nasm using the o suffix.

So you need something like mov ecx, 0102o to get the right flags and mov edx, 0666o to set the permssions.

int 80h

The return code from a syscall is passed in eax. Here, this will be the file descriptor (if the open succeeded) or a small negative number, which is a negative errno code (e.g. -1 for EPERM). Note that the convention for returning error codes from a raw syscall is not quite the same as the C syscall wrappers (which generally return -1 and set errno in the case of an error)...

mov eax, 4
mov ebx, filetoopen   ;I'm not sure what do i have to put here, what is the "file descriptor"?

...so here you need to mov ebx, eax first (to save the open result before eax is overwritten) then mov eax, 4. (You might want to think about checking that the result was positive first, and handling the failure to open in some way if it isn't.)

mov ecx, textoutput
mov edx, lentext

Missing int 80h here.

mov eax, 1
mov ebx, 0
int 80h              ; finish without errors

Assembly Modify file content

You can use service AH=42h for the same. After you have read a byte from the file the file pointer will be updated. Now to replace the previous read byte in the file to something else you need to first move the file pointer one byte backwards (so that it points to the byte that you want to replace) and this can be done with the following code :

Code to move the file pointer one byte backwards from its current position:

    mov al, 1        ; relative to current file position
    mov ah, 42h      ; service for seeking file pointer
    mov bx, handle
    mov cx, -1       ; upper half of lseek 32-bit offset (cx:dx)
    mov dx, -1       ; moves file pointer one byte backwards (This is important)
    int 21h

After the execution of above code you can now overwrite the byte with the new byte and this can be done with the following code:

Code to write from the current position of file pointer:

    mov ah, 40h          ; service for writing to a file
    mov bx, handle    
    mov cx, 1            ; number of bytes to write
    mov dx, offset char  ; buffer that holds the new character to be written
    int 21h

For more about file operation goto here.

How can I read and modify a file's bytes using NASM assembly, with C++ for opening/closing the file?

Load all the data of the file into memory, and pass a pointer to that memory instead. When done, simple write (the now modified) data back to the file.

Assembler can't open file

In addition to the things already mentioned in Michael Petch's and Peter Cordes' comments:

You should not use a fixed address (9Eh) but a label for the file name.

If you modify the code, the address is no longer correct. Using a label will fix this.

And 9Eh cannot be the correct address because a .com file starts at address 100h, so all addresses inside the .com file must be at least 100h.

It is not sure what is located at address 9Eh (it is an address inside the address space reserved for the command line; however, this address is not used if the command line arguments are less than ~20 bytes long). However, obviously the data stored at 9Eh is not a file name!

So it is clear that you'll get a "file not found" error because the dx register contains 9Eh but there is no valid file name at address 9Eh.

How I open .s assembly file in Linux?

The .s files are basically assembler source files, so you can pretty much open them in whatever tool you used to create the .c files in the first place.

In other words, mere mortals will opt for Notepad++ or Emacs, but the true intelligentsia will use Vim :-)

How to disassemble, modify and then reassemble a Linux executable?

I don't think there is any reliable way to do this. Machine code formats are very complicated, more complicated than assembly files. It isn't really possible to take a compiled binary (say, in ELF format) and produce a source assembly program which will compile to the same (or similar-enough) binary. To gain an understanding of the differences, compare the output of GCC compiling direct to assembler (gcc -S) versus the output of objdump on the executable (objdump -D).

There are two major complications I can think of. Firstly, the machine code itself is not a 1-to-1 correspondence with assembly code, because of things like pointer offsets.

For example, consider the C code to Hello world:

int main()
{
    printf("Hello, world!\n");
    return 0;
}

This compiles to the x86 assembly code:

.LC0:
    .string "hello"
    .text
<snip>
    movl    $.LC0, %eax
    movl    %eax, (%esp)
    call    printf

Where .LCO is a named constant, and printf is a symbol in a shared library symbol table. Compare to the output of objdump:

80483cd:       b8 b0 84 04 08          mov    $0x80484b0,%eax
80483d2:       89 04 24                mov    %eax,(%esp)
80483d5:       e8 1a ff ff ff          call   80482f4 <printf@plt>

Firstly, the constant .LC0 is now just some random offset in memory somewhere -- it would be difficult to create an assembly source file which contains this constant in the correct place, since the assembler and linker are free to choose locations for these constants.

Secondly, I'm not entirely sure about this (and it depends on things like position independent code), but I believe the reference to printf is not actually encoded at the pointer address in that code there at all, but the ELF headers contain a lookup table which dynamically replaces its address at runtime. Therefore, the disassembled code doesn't quite correspond to the source assembly code.

In summary, source assembly has symbols while compiled machine code has addresses which are difficult to reverse.

The second major complication is that an assembly source file can't contain all of the information that was present in the original ELF file headers, like which libraries to dynamically link against, and other metadata placed there by the original compiler. It would be difficult to reconstruct this.

Like I said, it's possible that a special tool can manipulate all of this information, but it is unlikely that one can simply produce assembly code which can be reassembled back to the executable.

If you are interested in modifying just a small section of the executable, I recommend a much more subtle approach than recompiling the whole application. Use objdump to get the assembly code for the function(s) you are interested in. Convert it to "source assembly syntax" by hand (and here, I wish there was a tool that actually produced disassembly in the same syntax as the input), and modify it as you wish. When you are done, recompile just those function(s) and use objdump to figure out the machine code for your modified program. Then, use a hex editor to manually paste the new machine code over the top of the corresponding part of the original program, taking care that your new code is precisely the same number of bytes as the old code (or all the offsets would be wrong). If the new code is shorter, you can pad it out using NOP instructions. If it is longer, you may be in trouble, and might have to create new functions and call them instead.

How to Open a File in Assembler and Modify It