How to Avoid Stdin Input That Does Not Fit in Buffer Be Sent to the Shell in Linux 64-Bit Intel (X86-64) Assembly

How to avoid stdin input that does not fit in buffer be sent to the shell in Linux 64-bit Intel (x86-64) assembly

It is not a buffer overflow as others have stated. I wrote a tutorial on reading from the terminal in Linux which also shows how to deal with this issue. It uses 32-bit Int 0x80, but you can easily change it to fit your needs.

http://www.dreamincode.net/forums/topic/286248-nasm-linux-terminal-inputoutput-wint-80h/

Read STDIN using syscall READ in Linux: unconsumed input is sent to bash

Yep, it's normal behavior. Anything you don't consume is available for the next process. You know how when you're doing something slow you can type ahead, and when the slow thing finishes the shell will run what you typed? Same thing here.

There's no one-size-fits-all solution. It's really about user expectation. How much input did they expect your program to consume? That's how much you should read.

  • Does your program act like a single line prompt like read? Then you should read a full line of input up through the next \n character. The easiest way to do that without over-reading is to read 1 character at a time. If you do bulk reads you might consume part of the next line by mistake.

  • Does your program act like a filter like cat or sed or grep? Then you should read until you reach EOF.

  • Does your program not read from stdin at all like echo or gcc? Then you should leave stdin alone and consume nothing, leaving the input for the next program.

Consuming exactly 4 bytes is unusual but could be reasonable behavior for, say, an interactive program that prompts for a 4-digit PIN and doesn't require the user to press Enter.

sys_read will spill characters when buffer overflows

After reading the input, you need to flush the buffer to avoid that the excess get passed to the next input read. It's not a buffer overflow, though.

I have asked the same question, but for x86-64 Linux, so it's not exactly duplicate:
How to avoid stdin input that does not fit in buffer be sent to the shell in Linux 64-bit Intel (x86-64) assembly

Anyway, following GunnerInc's excellent tutorial (for x86 Linux) should solve your problem:
http://www.dreamincode.net/forums/topic/286248-nasm-linux-terminal-inputoutput-wint-80h/

Clear input buffer Assembly x86 (NASM)

The simple way to clear stdin is to check if the 2nd char in choice is the '\n' (0xa). If it isn't, then characters remain in stdin unread. You already know how to read from stdin, so in that case, just read stdin until the '\n' is read1, e.g.

    mov rax, 0      ;Getting input and saving it on var choice
mov rdi, 0
mov rsi, choice
mov rdx, 2
syscall

cmp byte [choice], 'y' ; check 1st byte of choice, no need for r8b
jne end

cmp byte [choice + 1], 0xa ; is 2nd char '\n' (if yes done, jump start)
je _start

empty: ; chars remain in stdin unread
mov rax, 0 ; read 1-char from stdin into choice
mov rdi, 0
mov rsi, choice
mov rdx, 1
syscall

cmp byte [choice], 0xa ; check if char '\n'?
jne empty ; if not, repeat

jmp _start

Beyond that, you should determine your prompt lengths when you declare them, e.g.

prompt db "Type your text here. ", 0h
plen equ $-prompt
retry db "Try again (y/n)? ", 0h
rlen equ $-retry

That way you do not have to hardcode lengths in case you change your prompts, e.g.

_start:

mov rax, 1 ;Just asking the user to enter input
mov rdi, 1
mov rsi, prompt
mov rdx, plen
syscall

mov rax, 0 ;Getting input and saving it on var text
mov rdi, 0
mov rsi, text
mov rdx, 50
syscall

mov r8, rax

mov rax, 1 ;Printing the user input
mov rdi, 1
mov rsi, text
mov rdx, r8
syscall

mov rax, 1 ;Asking if user wants to try again
mov rdi, 1
mov rsi, retry
mov rdx, rlen
syscall

If you put it altogether, you can do:

prompt db "Type your text here. ", 0h
plen equ $-prompt
retry db "Try again (y/n)? ", 0h
rlen equ $-retry

section .bss
text resb 50
choice resb 2

section .text
global _start

_start:

mov rax, 1 ;Just asking the user to enter input
mov rdi, 1
mov rsi, prompt
mov rdx, plen
syscall

mov rax, 0 ;Getting input and saving it on var text
mov rdi, 0
mov rsi, text
mov rdx, 50
syscall

mov r8, rax

mov rax, 1 ;Printing the user input
mov rdi, 1
mov rsi, text
mov rdx, r8
syscall

mov rax, 1 ;Asking if user wants to try again
mov rdi, 1
mov rsi, retry
mov rdx, rlen
syscall

mov rax, 0 ;Getting input and saving it on var choice
mov rdi, 0
mov rsi, choice
mov rdx, 2
syscall

cmp byte [choice], 'y' ; check 1st byte of choice, no need for r8b
jne end

cmp byte [choice + 1], 0xa ; is 2nd char '\n' (if yes done, jump start)
je _start

empty: ; chars remain in stdin unread
mov rax, 0 ; read 1-char from stdin into choice
mov rdi, 0
mov rsi, choice
mov rdx, 1
syscall

cmp byte [choice], 0xa ; check if char '\n'?
jne empty ; if not, repeat

jmp _start

end:
mov rax, 60
mov rdi, 0
syscall

Example Use/Output

$ ./bin/emptystdin
Type your text here. abc
abc
Try again (y/n)? y
Type your text here. def
def
Try again (y/n)? yes please!
Type your text here. geh
geh
Try again (y/n)? yyyyyyyyyyyyyHow to avoid stdin input that does not fit in buffer be sent to the shell in Linux 64-bit Intel (x86-64) assembly Read STDIN using syscall READ in Linux: unconsumed inpeesssssssss!!!!
Type your text here. ijk
ijk
Try again (y/n)? n

Now even a cat stepping on the keyboard at your (y/n)? prompt won't cause problems. There are probably more elegant ways to handle this that would be more efficient that repetitive reads, with syscall, but this will handle the issue.


Additional Considerations And Error-Checks

As mentioned above, the simplistic reading and checking of a character-at-a-time isn't a very efficient approach, though it is conceptually the easiest extension without making other changes. @PeterCordes makes a number of good points in the comments below related to approaches that are more efficient and more importantly about error conditions that can arise that should be protected against as well.

For starters when you are looking for information on the individual system call use, Anatomy of a system call, part 1 provides a bit of background on approaching their use supplemented by the Linux manual page, for read man 2 read for details on the parameter types and return type and values.

The original solution above does not address what happens if the user generates a manual end-of-file by pressing Ctrl+d or if a read error actually occurs. It simply addressed the user-input and emptying stdin question asked. With any user-input, before you use the value, you must validate that the input succeeded by checking the return. (not just for the yes/no input, but all inputs). For purposes here, you can consider zero input (manual end-of-file) or a negative return (read error) as a failed input.

To check whether you have at least one valid character of input, you can simply check the return (read returns the number of characters read, sys_read placing that value in rax after the syscall). A zero or negative value indicating no input was received. A check could be:

    cmp rax, 0              ; check for 0 bytes read or error
jle error

You can write a short diagnostic to the user and then handle the error as wanted, this example simply exits after outputting a diagnostic, e.g.

readerr db 0xa, "eof or read error", 0xa, 0x0
rderrsz equ $-readerr
...
; your call to read here
cmp rax, 0 ; check for 0 bytes read or error
jle error
...
error:
mov rax, 1 ; output the readerr string and jmp to end
mov rdi, 1
mov rsi, readerr
mov rdx, rderrsz
syscall

jmp end

Now moving on to a more efficient manner for emptying stdin. The biggest hindrance indicate in the original answer was the repeated system calls to sys_read to read one character at a time reusing your 2-byte choice buffer. The obvious solution is to make choice bigger, or just use stack space to read more characters each time. (you can look at the comments for a couple of approaches) Here, for example we will increase choice to 128-bytes which in the case of the anticipate "y\n" input will only make use of two of those bytes, but in the case of an excessively long input will read 128-bytes at a time until the '\n' is found. For setup you have:

choicesz equ 128
...
section .bss
text resb 50
choice resb 128

Now after you ask for (y/n)? your read would be:

    mov rax, 0              ; Getting input and saving it on var choice
mov rdi, 0
mov rsi, choice
mov rdx, choicesz
syscall

cmp rax, 0 ; check for 0 bytes read (eof) or error
jle error

cmp byte [choice], 'y' ; check 1st byte of choice, no need for r8b
jne end ; not a 'y', doesn't matter what's in stdin, end

Now there are two conditions to check. First, compare the number of characters read with your buffer size choicesz and if the number of characters read is less than choicesz, no characters remain unread in stdin. Second, if the return equals the buffer size, you may or may not have characters remaining in stdin. You need to check the last character in the buffer to see if it is the '\n' to indicate whether you have read all the input. If the last character is other than the '\n' characters remain unread (unless the user just happened to generate a manual end-of-file at the 128th character) You can check as:

    empty:
cmp eax, choicesz ; compare chars read and buffer size
jb _start ; buffer not full - nothing remains in stdin

cmp byte [choice + choicesz - 1], 0xa ; if full - check if last byte \n, done
je _start

mov rax, 0 ; fill choice again from stdin and repeat checks
mov rdi, 0
mov rsi, choice
mov rdx, choicesz
syscall

cmp rax, 0 ; check for 0 bytes read or error
jle error

jmp empty

(note: as noted above, there is a further case to cover, not covered here, such as where the user enters valid input, but then generates a manual end-of-file instead of just pressing Enter after the 128th character (or a multiple of 128). There you can't just look for a '\n' it doesn't exist, and if there are no more chacters and call sys_read again, it will block wating on input. Conceivably you will need to use a non-blocking read and putback of a single character to break that ambiguity -- that is left to you)

A comlete example with the improvements would be:

prompt db "Type your text here. ", 0x0
plen equ $-prompt
retry db "Try again (y/n)? ", 0x0
rlen equ $-retry
textsz equ 50
choicesz equ 128
readerr db 0xa, "eof or read error", 0xa, 0x0
rderrsz equ $-readerr

section .bss
text resb 50
choice resb 128

section .text
global _start

_start:

mov rax, 1 ; Just asking the user to enter input
mov rdi, 1
mov rsi, prompt
mov rdx, plen
syscall

mov rax, 0 ; Getting input and saving it on var text
mov rdi, 0
mov rsi, text
mov rdx, textsz
syscall

cmp rax, 0 ; check for 0 bytes read or error
jle error

mov r8, rax

mov rax, 1 ; Printing the user input
mov rdi, 1
mov rsi, text
mov rdx, r8
syscall

mov rax, 1 ; Asking if user wants to try again
mov rdi, 1
mov rsi, retry
mov rdx, rlen
syscall

mov rax, 0 ; Getting input and saving it on var choice
mov rdi, 0
mov rsi, choice
mov rdx, choicesz
syscall

cmp rax, 0 ; check for 0 bytes read (eof) or error
jle error

cmp byte [choice], 'y' ; check 1st byte of choice, no need for r8b
jne end ; not a 'y', doesn't matter what's in stdin, end

empty:
cmp eax, choicesz ; compare chars read and buffer size
jb _start ; buffer not full - nothing remains in stdin

cmp byte [choice + choicesz - 1], 0xa ; if full - check if last byte \n, done
je _start

mov rax, 0 ; fill choice again from stdin and repeat checks
mov rdi, 0
mov rsi, choice
mov rdx, choicesz
syscall

cmp rax, 0 ; check for 0 bytes read or error
jle error

jmp empty

end:
mov rax, 60
mov rdi, 0
syscall

error:
mov rax, 1 ; output the readerr string and jmp to end
mov rdi, 1
mov rsi, readerr
mov rdx, rderrsz
syscall

jmp end

There are surely more efficient ways to optimize this, but for purposes of discussion of "How do I empty stdin?", this second approach with the buffer size used alieviates the repetitive calls to sys_read to read one character at-a-time is a good step forward. "How do it completely optimize the check?" is a whole separate question.

Let me know if you have further questions.

Footnotes:

1. In this circumstance where the user is typing input, the user generates a '\n' by pressing Enter, allowing you to check for the '\n' as the final character in emptying stdin. The user can also generate a manual end-of-file by pressing Ctrl+d so the '\n' isn't guaranteed. There are many still other ways stdin can be filled, such as redirecting a file as input where there should be a ending '\n' to be POSIX compliant, there too that isn't a guarantee.

Compact shellcode to print a 0-terminated string pointed-to by a register, given puts or printf at known absolute addresses?

Since I already spilled the beans and "spoiled" the answer to the online challenge in comments, I might as well write it up. 2 key tricks:

  • Create 0x7ffff7e3c5a0 (&puts) in a register with lea reg, [reg + disp32], using the known value of RDI which is within the +-2^31 range of a disp32. (Or use RBP as a starting point, but not RSP: that would need a SIB byte in the addressing mode).

    This is a generalization of the code-golf trick of lea edi, [rax+1] trick to create small constants from other small constants (especially 0) in 3 bytes, with code that runs less slowly than push imm8 / pop reg.

    The disp32 is large enough to not have any zero bytes; you have a couple registers to choose from in case one had been too close.

  • Copy a 64-bit register in 2 bytes with push reg / pop reg, instead of 3-byte mov rdi, rdx (REX + opcode + modrm). No savings if either push needs a REX prefix (for R8..R15), and actually costs bytes if both are "non-legacy" registers.

See other answers on Tips for golfing in x86/x64 machine code on codegolf.SE for more.

bits 64
lea rsi, [rdi - 0x166f30]
;; add rbp, imm32 ; alternative, but that would mess up a call-preserved register so we might crash on return.
push rdx
pop rdi ; copy RDX to first arg, x86-64 SysV calling convention
jmp rsi ; tailcall puts

This is exactly 11 bytes, and I don't see a way for it to be smaller. add r64, imm32 is also 7 bytes, same as LEA. (Or 6 bytes if the register is RAX, but even the xchg rax, rdi short form would cost 2 bytes to get it there, and the RAX value is still the fgets return value, which is the small mmap buffer address.)

The puts function pointer doesn't fit in 32 bits, so we need a REX prefix on any instruction that puts it into a register. Otherwise we could just mov reg, imm32 (5 bytes) with the absolute address, not deriving it from another register.

$ nasm -fbin -o exploit.bin -l /dev/stdout exploit.asm
1 bits 64
2 00000000 488DB7D090E9FF lea rsi, [rdi - 0x166f30]
3 ;; add rbp, imm32 ; we can avoid messing up any call-preserved registers
4 00000007 52 push rdx
5 00000008 5F pop rdi ; copy to first arg
6 00000009 FFE6 jmp rsi ; tailcall
$ ll exploit.bin
-rw-r--r-- 1 peter peter 11 Apr 24 04:09 exploit.bin
$ ./a.out < exploit.bin # would work if the addresses in my build matched yours

My build of your incomplete .c uses different addresses on my machine, but it does reach this code (at address 0x10000, mmap_min_addr which mmap picks after the amusing choice of 0x1337 as a hint address, which isn't even page aligned but doesn't result in EIVAL on current Linux.)

Since we only tailcall puts with correct stack alignment and don't modify any call-preserved registers, this should successfully return to main.


Note that 0 bytes (ASCII NUL, not NULL) would actually work in shellcode for this test program, if not for the requirement that forbids it.

The input is read using fgets (apparently to simulate a gets() overflow).
fgets actually can read a 0 aka '\0'; the only critical character is 0xa aka '\n' newline. See Is it possible to read null characters correctly using fgets or gets_s?

Often buffer overflows exploit a strcpy or something else that stops on a 0 byte, but fgets only stops on EOF or newline. (Or the buffer size, a feature gets is missing, hence its deprecation and removal from even the ISO C standard library! It's literally impossible to use safely unless you control the input data). So yes, it's totally normal to forbid zero bytes.


BTW, your int 0x80 attempt is not viable: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? - you can't use the 32-bit ABI to pass 64-bit pointers to write, and the string you want to output is not in the low 32 bits of virtual address space.

Of course, with the 64-bit syscall ABI, you're fine if you can hardcode the length.

    push rdx
pop rsi
shr eax, 16 ; fun 3-byte way to turn 0x10000` into `1`, __NR_write 64-bit, instead of just push 1 / pop
mov edi, eax ; STDOUT_FD = __NR_write
lea edx, [rax + 13 - 1] ; 3 bytes. RDX = 13 = string length
; or mov dl, 0xff ; 2 bytes leaving garbage in rest of RDX
syscall

But this is 12 bytes, as well as hard-coding the length of the string (which was supposed to be part of the secret?).

mov dl, 0xff could make sure the length was at least 255, and actually much more in this case, if you don't mind getting reams of garbage after the string you want, until write hits an unmapped page and returns early. That would save a byte, making this 11.

(Fun fact, Linux write does not return an error when it's successfully written some bytes; instead it returns how many it did write. If you try again with buf + write_len, you would get a -EFAULT return value for passing a bad pointer to write.)

Basic input with x64 assembly code

In your first code section you have to set the SYS_CALL to 0 for SYS_READ (as mentioned rudimentically in the other answer).

So check a Linux x64 SYS_CALL list for the appropriate parameters and try

_start:
mov rax, 0 ; set SYS_READ as SYS_CALL value
sub rsp, 8 ; allocate 8-byte space on the stack as read buffer
mov rdi, 0 ; set rdi to 0 to indicate a STDIN file descriptor
lea rsi, [rsp] ; set const char *buf to the 8-byte space on stack
mov rdx, 1 ; set size_t count to 1 for one char
syscall

linux syscall uname for x86

I wrote code for linux x86. Look it here (maybe will be useful)

https://github.com/OlegInfoSecurity/uname_x86

This error occurred when i output (print) info. I changed code for output info and program is work.

How to read input from STDIN in x86_64 assembly?

First of all : there are no variables in assembly. There are just labels for some kind of data. The data is, by design, untyped - at least in real assemblers, not HLA (e.g. MASM).

Reading from the standard input is achieved by using the system call read. I assume you've already read the post you mentioned and you know how to call system calls in x64 Linux. Assuming that you're using NASM (or something that resembles its syntax), and that you want to store the input from stdin at the address buffer, where you have reserved BUFSIZE bytes of memory, executing the system call would look like this :

xor eax, eax      ; rax <- 0 (syscall number for 'read')
xor edi, edi ; edi <- 0 (stdin file descriptor)
mov rsi, buffer ; rsi <- address of the buffer. lea rsi, [rel buffer]
mov edx, BUFSIZE ; rdx <- size of the buffer
syscall ; execute read(0, buffer, BUFSIZE)

Upon returning, rax will contain the result of the syscall. If you want to know more about how it works, please consult man 2 read. Note that the syscall for read on mac is 0x2000003 instead of 0, so that first line would instead be mov rax, 0x2000003.

Parsing an integer in assembly language is not that simple, though. Since read only gives you plain binary data that appears on the standard input, you need to convert the integer value yourself. Keep in mind that what you type on the keyboard is sent to the application as ASCII codes (or any other encoding you might be using - I'm assuming ASCII here). Therefore, you need to convert the data from an ASCII-encoded decimal to binary.

A function in C for converting such a structure to a normal unsigned int could look something like this:

unsigned int parse_ascii_decimal(char *str,unsigned int strlen)
{
unsigned int ret = 0, mul = 1;
int i = strlen-1;
while(i >= 0)
{
ret += (str[i] & 0xf) * mul;
mul *= 10;
--i;
}
return ret;
}

Converting this to assembly (and extending to support signed numbers) is left as an exercise for the reader. :) (Or see NASM Assembly convert input to integer? - a simpler algorithm only has 1 multiply per iteration, with total = total*10 + digit. And you can check for the first non-digit character as you iterate instead of doing strlen separately, if the length isn't already known.)


Last but not least - the write syscall requires you to always pass a pointer to a buffer with the data that's supposed to be written to a given file descriptor. Therefore, if you want to output a newline, there is no other way but to create a buffer containing the newline sequence.

I want my Assembly Code to takes user input and outputs it along with other text but the output isn't correct

The length of Bob!!!!!!!!!!!!!!!!!! is the length of Welcome to the club, .
This is no coincidence.
Following the write(2) system call rax contains the number of successfully written Bytes.
(This might be less than the desired number of Bytes as the manual page describes.)

Like David C. Rankin commented you will need to mind the return value of read(2).
On success, read(2) returns the number of Bytes read in rax.
However, you are overwriting this value for and with the intervening write(2) system call.
Store and recall somewhere the number of successfully read Bytes (e. g. push/pop) and you’re good.

PS:
You could save one write(2) system call by rearranging the buffer to follow after greet_1.
Then you could write(2) rax + greet1_len Bytes at once.
But one problem at a time.



Related Topics



Leave a reply



Submit