Gnu Assembler .Data Section Value Corrupted After Syscall

Gnu assembler .data section value corrupted after syscall

You're looking at 4 bytes starting at result, which includes input as the 2nd or 3rd byte. (That's why the value goes up by a multiple of 256 or 65536, leaving the low byte = 1 if you print (char)result). This would be more obvious if you use p /x to print as hex.

GDB's default behaviour for print result when there was no debug info was to assume int. Now, because of user errors like this, with gdb 8.1 on Arch Linux, print result says 'result' has unknown type; cast it to its declared type

GAS + ld unexpectedly (to me anyway) merge the BSS and data segments into one page, so your variables are adjacent even though you put them in different sections that you'd expect to be treated differently. (BSS being backed by anonymous zeroed pages, data being backed by a private read-write mapping of the file).

After building with gcc -nostdlib -no-pie test.S, I get:

(gdb) p &result
$1 = (<data variable, no debug info> *) 0x600126
(gdb) p &input
$2 = (<data variable, no debug info> *) 0x600128 <input>

Unlike using .section .bss and reserving space manually, .lcomm is free to pad if it wants. Presumably for alignment, maybe here so the BSS starts on an 8-byte boundary. When I built with clang, I got input in the byte after result (at different addresses).

I investigated by adding a large array with .lcomm arr, 888332. Once I realized it wasn't storing literal zeros for the BSS in the executable, I used readelf -a a.out to check further:

(related: What's the difference of section and segment in ELF file format)

...
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x0000000000000126 0x0000000000000126  R E    0x200000
  LOAD           0x0000000000000126 0x0000000000600126 0x0000000000600126
                 0x0000000000000001 0x00000000000d8e1a  RW     0x200000
  NOTE           0x00000000000000e8 0x00000000004000e8 0x00000000004000e8
                 0x0000000000000024 0x0000000000000024  R      0x4

 Section to Segment mapping:
  Segment Sections...
   00     .note.gnu.build-id .text 
   01     .data .bss 
   02     .note.gnu.build-id 

...

So yes, the .data and .bss sections ended up in the same ELF segment.

I think what's going on here is that the ELF metadata says to map MemSize of 0xd8e1a bytes (of zeroed pages) starting at virt addr 0x600126. and LOAD 1 byte from offset 0x126 in the file to virtual address 0x600126.

So instead of just an mmap, the ELF program loader has to copy data from the file into an otherwise-zeroed page that's backing the BSS and .data sections.

It's probably a larger .data section that would be required for the linker to decide to use separate segments.

How to reference the value of data section in macro?

You can't read memory contents at assemble time, with or without macros. You can't assemble bytes into some section and then have another directive copy those bytes somewhere else.

What you can do is define a new symbol to have the same address as another symbol, which is obviously more efficient for read-only data (.section .rodata). You'd only want to actually duplicate the ASCII bytes as initializers for mutable .data, or in some rare case if you needed unique addresses for two things even though the data happened to be the same. (Perhaps as LUT entries for a struct { int i; char s[4];} table[16]; or something.)

.section .rodata
str1:
str2:            # you can have 2 labels that happen to have the same address
    .asciz "Hello"

# at any point later
 .set str3, str1     # an alias, just like if you'd put str3: before/after str2:

Hopefully this example also makes it clearer that labels only happen to be followed by data, they aren't like C variables where it's the one and only way to reference that location.

If you want the same string data twice without having it appear again literally in your file, the string itself needs to be a macro. And you need to reference that macro name, not some label that happens to be followed by some pseudo-instructions that emit some data there.

.macro foo
 .asciz "the text"
.endm

.data
str1: foo
str2: foo

If you want, you could have the macro take a symbol name as a parameter and include those labels, and even define a .equ with another name derived from that if you want.

Or with the C preprocessor (name the file foo.S and build with gcc -c instead of plain as). I'm not sure if GAS's own macro language allows macros like this that expand to a single token, not a whole line. I find it cumbersome compared to NASM.

#define foo  "the text"

.data
str1: .asciz foo
str2: .asciz foo

Other assemblers work the same way, for example NASM: Assemble-time read the value of a data variable

GNU assembler: Accessing a corrupted shared library error

I tried it as below, and it works:

as -32 -o hello.o hello.s
ld -melf_i386 -L/lib -lc -o hello hello.o

BTW, on my machine, it complains about missing /usr/lib/libc.so.1 , after I made a symbol link /usr/lib/libc.so.1 to /lib/ld-linux.so.2, it works.
create 32bit ELF on 64bit Linux, we need glibc.i686 or glibc.i386 installed.

Why are there empty address spaces between data sections in memory (x86 / nasm)?

Data alignment:

It is typical to think of memory as a flat array of bytes:

Data:       | 0x50 | 0x69 | 0x70 | 0x43 | 0x68 | 0x69 | 0x70 | 0x73 | 
Address:    |  0   |   1  |  2   |  3   |  4   |   5  |   6  |   7  |   
ASCII:      |  P   |   i  |  p   |  C   |  h   |   i  |   p  |   s  |

However, the CPU itself does not read, nor write, data to memory a byte at a time. Efficiency is the name of the game, therefore, a computer's CPU will read data from memory, a fixed number of bytes at a time. The size in which the processor accesses memory is know as its memory access granularity (MAG).

Memory access granularity varies across architectures. As a general rule, MAG is equal to the native word size of the processor in question, IE. IA-32 will have a 4-byte granularity.

If the CPU were to only read one byte at a time from memory, it would need to access memory 8 times in order to read the entirety of the above array. Compare this to if the CPU were to access memory 4-bytes at a time, a 4-byte granularity. In this case, the CPU would need to access memory only twice; 1 = bytes 0-3, 2 = bytes 4-7.

Where does memory alignment come into play:

Well, let’s assume a 4-byte MAG. As we have seen, in order to read the string “PipChips” from memory, the CPU would need to access memory twice. Now, let’s assume that the data was aligned in memory slightly differently. Let’s assume, the following:

Data:       | 0x6B | 0x50 | 0x69 | 0x70 | 0x43 | 0x68 | 0x69 | 0x70 | 0x73 |  
Address:    |   0  |   1  |   2  |   3  |   4  |   5  |   6  |  7   |   8  |    
ASCII:      |   k  |   P  |   i  |   p  |   C  |   h  |   i  |  p   |   s  |

In this example, to access the same data, the CPU would need to access memory a total of 3 times; 1 = bytes 0-3, 2 = bytes 4-7 and a third time to access “s”, at memory address 8. Furthermore, the processor would have to perform additional work, in order to shift-out the unwanted bytes, that were unnecessarily read from memory, due to the data being stored at an unaligned address.

This is where memory alignment comes in to play. The CPU has a MAG, the main purpose of which is to increase machine efficiency. Therefore, aligning data in memory to match the machines memory access boundaries creates more efficient code.

This is a(n) (overly) simplistic explanation of memory alignment, however it answers the question:

1) What is causing this empty address space, as it is not included in my source code?

The ‘empty address space’ is generated by the alignment requirements of the SECTION data. NASM defaults are assumed, if you do not specify values for the section properties. Please see the manual.

2) What are the specific reasons for this empty address space?

The overriding reason for aligning memory data is for software efficiency and robustness. As discussed, the processor will access memory at the granularity of its word size.

3) How is the size of the space calculated?

The assembler will pad out the section, so that the data immediately following on from it, is automatically aligned to an instance of the specified memory access boundary. In the original question, section .data would have ended at address 0x… 60012a, in the absence of the necessary padding, with section .bss starting at address 60012b. Here, the data would not have been properly aligned with the memory access boundary defined by the CPU's access granularity. Consequently, NASM, in its wisdom, adds a padding of one nul character, in order to round the memory address up to the next address that is divisible by 4, and hence, properly align the data.

The subtleties of memory access are many; for a more in-depth explanation, please see the wiki, and numerous on-line articles, e.g. here; and for the masochistic among you, there are always the manuals!

Generally, data alignment is handled automatically by the complier/assembler, although programmer control is an option and in some cases desirable.

…………………………………………………………………………………………………………................................

Solving the original problem:

We are still left with the question of how to concatenate our two strings for output. We know now that the implementation of concatenating two strings across sections is not ideal, to say the least. Generally, we will not know where these sections are placed, in relation to each other, during runtime.

It is preferable therefore, to concatenate these strings in a region in memory, before making the syscall; as opposed to relying on the system call to provide the concatenation, based on assumptions of where the strings ought to be in memory.

We have several options:

Make two sys_write calls in succession, in order to print both strings, and give the illusion in the output that they are one: Although straight forward, this makes little sense, as system calls are expensive.
Directly read the user input into place: This seems the logical and most efficient thing to do, at least at first glance. As we can write the string without moving any data around, and with only one syscall. However, we face the problem of inadvertently overwriting data, as we have not reserved the space in memory. Also, it seems ‘wrong’ to read user input to the initialized .data section; initialized data is data that has a value before the program begins!
Moving ‘EncodedName’ in memory, so that it is contiguous with ‘OutputMsg’: This seems clean and simple. However, in reality it is not really any different to option 2, and suffers the same drawbacks.
The solution: Create a memory buffer and concatenate the strings into this memory buffer, prior to the sys_write system call.
SECTION .bss
```
 EncodedName: resb ENCODELEN
 ENCODELEN: equ 1024

 CompleteOutput: resb COMPLETELEN
 COMPLETELEN: equ 2048  
```

User input will be read to ‘EncodedName’. We then concatenate ‘OutputMsg’ and ‘EncodedName’ at ‘CompleteOutput’, ready for writing to stdout:

    ; Read user input from stdin:
    mov rax,0                               ; sys_read
    mov rdi,0                               ; stdin
    mov rsi,EncodedName                     ; Memory offset in which to read input data
    mov rdx,ENCODELEN                       ; Length of memory buffer
    syscall                                 ; Kernel call
    
    mov r8,rax                              ; Save the number of bytes read by stdin
    
    ; Move string 'OutputMsg' to memory address 'CompleteOutput':
    mov rdi,CompleteOutput                  ; Destination memory address 
    mov rsi,OutputMsg                       ; Offset of 'string' to move to destination
    mov rcx,OUTPUTLEN                       ; Length of string being moved
    rep movsb                               ; Move string, iteration, per byte
    
    ; Concatenate 'OutputMsg' with 'EncodedName' in memory:
    mov rdi,CompleteOutput                  ; Destination memory address
    add rdi,OUTPUTLEN                       ; Add length of string already moved, so we append strings, as opposed to overwrite
    mov rsi,EncodedName                     ; Offset memory address of string being moved
    mov rcx,r8                              ; String length, during sys_read, the number of bytes read was saved in r8
    rep movsb                               ; Move string into place
    
    ; Write string to stdout:
    mov rdx,OUTPUTLEN                       ; Length of 'OutputMsg' 
    add rdx,r8                              ; add length of 'EncodedName' 
    
    mov rax,1                               ; sys_write
    mov rdi,1                               ; stdout
    mov rsi,CompleteOutput                  ; Memory offset of string
    syscall                                 ; Make system call

_{*Credit due to the comments in the original question, for pointing me in the right direction.}

Why use .data instead of reserving space in .bss and initializing at runtime, for variables in assembly/C?

.bss is where you put zero-initialized static data, like C int x; (at global scope). That's the same as int x = 0; for static / global (static storage class)¹.

.data is where you put non-zero-initialized static data, like int x = 2; If you put that in BSS, you'd need a runtime static "constructor" to initalize the BSS location. Like what a C++ compiler would do for static const int prog_starttime = __rdtsc();. (Even though it's const, the initializer isn't a compile-time constant so it can't go in .rodata)

.bss with a runtime initializer would make sense for big arrays that are mostly zero or filled with the same value (memset / rep stosd), but in practice writing char buf[1024000] = {1}; will put 1MB of almost all zeros into .data, with current compilers.

Otherwise it is not more efficient. A mov dword [myvar], imm32 instruction is at least 8 bytes long, costing about twice as many bytes in your executable as if it were statically initialized in .data. Also, the initializer has to be executed.

By contrast, section .rodata (or .rdata on Windows) is where compilers put string literals, FP constants, and static const int x = 123; (Actually, x would normally get inlined as an immediate everywhere it's used in the compilation unit, letting the compiler optimize away any static storage. But if you took its address and passed &x to a function, the compiler would need it to exist in memory somewhere, and that would be in .rodata)

Footnote 1: Inside a function, int x; would be on the stack if the compiler didn't optimize it away or into registers, when compiling for a normal register machine with a stack like x86.

I could ask this question in the context of C programming as well

In C, an optimizing compiler will treat int x; x=5; pretty much identically to int x=5; inside a function. No static storage is involved. Looking at actual compiler output is often instructive: see How to remove "noise" from GCC/clang assembly output?.

Outside a function, at global scope, you can't write things like x=5;. You could do that at the top of main, and then you would trick the compiler into making worse code.

Inside a function with static int x = 5;, the initialization happens once. (At compile time). If you did static int x; x=5; the static storage would be re-initialized every time the function was entered, and you might as well have not used static unless you have other reasons for needing static storage class. (e.g. returning a pointer to x that's still valid after the function returns.)

Memory corruption in assembly?

You have an off-by-one error in your code.

The problem is on this line:

    jna Scan        ; Loop back if ecx is <= number of chars in buffer

which means that you will go round the loop 17 times, rather than 16. This is hinted at by ruslik's comment (the original TextStr string is 16 dots followed by a space, so why does the space gets replaced?).

The reason it breaks the line numbering is that mov byte [HexStr+edx+2],al in the marked section overflows HexStr on the 17th iteration, and writes into the Digits table. It breaks the hex dump as well (look at the first broken line: the a of demonstrating has been dumped as 60, not 61).

Try:

    jb  Scan        ; Loop back if ecx is < number of chars in buffer

instead.

Assembly x86 (32-bit), call to NR_creat (8) Corrupts Filename Storage

First, you are messsing up a pointer and a buffer:

fnbuf   resb buflen

allocates "buflen" number of bytes (32) which you might want to use as a buffer,
but

mov     [fnbuf], ebx    ; save the filename in fnbuf (FIXME)

stores an adress (a pointer) contained in ebx into the first four bytes of fnbuf - it does not copy the filename itself or anything else to the buffer, just the pointer to the filename. Dumping your .bss memory area gives this output afterwards (note that fd_out is the first address of your .bss area):

(gdb) x/32 0x80491ec
0x80491ec <fd_out>: 0x00    0x00    0x00    0xac    0xd2    0xff    0xff    0x00
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                             Pointer retrieved from EBX

But, the real issue is where you store the file descriptor into fd_out:

mov     [fd_out], eax   ; save file descriptor in fd_out

This writes four (!!) bytes from eax to the memory starting at fd_out. Dumping the same memory
afterwards results in

                                          Destroyed!
(gdb) x/32 0x80491ec                        ****
0x80491ec <fd_out>: 0x03    0x00    0x00    0x00    0xd2    0xff    0xff    0x00
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                     Four bytes written by mov

As you see, this mov destroys the first byte of your pointer - it is set to 0x00 which results
in the modified value you observed.

Linux default behavior of executable .data section changed between 5.4 and 5.9?

This is only a guess: I think the culprit is the READ_IMPLIES_EXEC personality that was being set automatically in the absence of a PT_GNU_STACK segment.

In the 5.4 kernel source we can find this piece of code:

SET_PERSONALITY2(loc->elf_ex, &arch_state);
if (elf_read_implies_exec(loc->elf_ex, executable_stack))
    current->personality |= READ_IMPLIES_EXEC;

That's the only thing that can transform an RW section into an RWX one. Any other use of PROC_EXEC didn't seem to be changed or relevant to this question, to me.

The executable_stack is set here:

for (i = 0; i < loc->elf_ex.e_phnum; i++, elf_ppnt++)
    switch (elf_ppnt->p_type) {
    case PT_GNU_STACK:
        if (elf_ppnt->p_flags & PF_X)
            executable_stack = EXSTACK_ENABLE_X;
        else
            executable_stack = EXSTACK_DISABLE_X;
        break;

But if the PT_GNU_STACK segment is not present, that variable retains its default value:

int executable_stack = EXSTACK_DEFAULT;

Now this workflow is identical in both 5.4 and the latest kernel source, what changed is the definition of elf_read_implies_exec:

Linux 5.4:

/*
 * An executable for which elf_read_implies_exec() returns TRUE will
 * have the READ_IMPLIES_EXEC personality flag set automatically.
 */
#define elf_read_implies_exec(ex, executable_stack) \
    (executable_stack != EXSTACK_DISABLE_X)

Latest Linux:

/*
 * An executable for which elf_read_implies_exec() returns TRUE will
 * have the READ_IMPLIES_EXEC personality flag set automatically.
 *
 * The decision process for determining the results are:
 *
 *                 CPU: | lacks NX*  | has NX, ia32     | has NX, x86_64 |
 * ELF:                 |            |                  |                |
 * ---------------------|------------|------------------|----------------|
 * missing PT_GNU_STACK | exec-all   | exec-all         | exec-none      |
 * PT_GNU_STACK == RWX  | exec-stack | exec-stack       | exec-stack     |
 * PT_GNU_STACK == RW   | exec-none  | exec-none        | exec-none      |
 *
 *  exec-all  : all PROT_READ user mappings are executable, except when
 *              backed by files on a noexec-filesystem.
 *  exec-none : only PROT_EXEC user mappings are executable.
 *  exec-stack: only the stack and PROT_EXEC user mappings are executable.
 *
 *  *this column has no architectural effect: NX markings are ignored by
 *   hardware, but may have behavioral effects when "wants X" collides with
 *   "cannot be X" constraints in memory permission flags, as in
 *   https://lkml.kernel.org/r/20190418055759.GA3155@mellanox.com
 *
 */
#define elf_read_implies_exec(ex, executable_stack) \
    (mmap_is_ia32() && executable_stack == EXSTACK_DEFAULT)

Note how in the 5.4 version the elf_read_implies_exec returned a true value if the stack was not explicitly marked as not executable (via the PT_GNU_STACK segment).

In the latest source, the check is now more defensive: the elf_read_implies_exec is true only on 32-bit executable, in the case where no PT_GNU_STACK segment was found in the ELF binary.

I assembled your program, linked it, and found no PT_GNU_STACK segment, so this may be the reason.

If this is indeed the issue and if I followed the code correctly, if you set the stack as not executable in the binary, its data section should not be mapped executable anymore (not even on Linux 5.4).

Gnu Assembler .Data Section Value Corrupted After Syscall