How to avoid stdin input that does not fit in buffer be sent to the shell in Linux 64-bit Intel (x86-64) assembly
It is not a buffer overflow as others have stated. I wrote a tutorial on reading from the terminal in Linux which also shows how to deal with this issue. It uses 32-bit Int 0x80, but you can easily change it to fit your needs.
http://www.dreamincode.net/forums/topic/286248-nasm-linux-terminal-inputoutput-wint-80h/
Read STDIN using syscall READ in Linux: unconsumed input is sent to bash
Yep, it's normal behavior. Anything you don't consume is available for the next process. You know how when you're doing something slow you can type ahead, and when the slow thing finishes the shell will run what you typed? Same thing here.
There's no one-size-fits-all solution. It's really about user expectation. How much input did they expect your program to consume? That's how much you should read.
Does your program act like a single line prompt like
read
? Then you should read a full line of input up through the next\n
character. The easiest way to do that without over-reading is to read 1 character at a time. If you do bulk reads you might consume part of the next line by mistake.Does your program act like a filter like
cat
orsed
orgrep
? Then you should read until you reach EOF.Does your program not read from stdin at all like
echo
orgcc
? Then you should leave stdin alone and consume nothing, leaving the input for the next program.
Consuming exactly 4 bytes is unusual but could be reasonable behavior for, say, an interactive program that prompts for a 4-digit PIN and doesn't require the user to press Enter.
sys_read will spill characters when buffer overflows
After reading the input, you need to flush the buffer to avoid that the excess get passed to the next input read. It's not a buffer overflow, though.
I have asked the same question, but for x86-64 Linux, so it's not exactly duplicate:
How to avoid stdin input that does not fit in buffer be sent to the shell in Linux 64-bit Intel (x86-64) assembly
Anyway, following GunnerInc's excellent tutorial (for x86 Linux) should solve your problem:
http://www.dreamincode.net/forums/topic/286248-nasm-linux-terminal-inputoutput-wint-80h/
Clear input buffer Assembly x86 (NASM)
The simple way to clear stdin
is to check if the 2nd char in choice
is the '\n'
(0xa
). If it isn't, then characters remain in stdin
unread. You already know how to read from stdin
, so in that case, just read stdin
until the '\n'
is read1, e.g.
mov rax, 0 ;Getting input and saving it on var choice
mov rdi, 0
mov rsi, choice
mov rdx, 2
syscall
cmp byte [choice], 'y' ; check 1st byte of choice, no need for r8b
jne end
cmp byte [choice + 1], 0xa ; is 2nd char '\n' (if yes done, jump start)
je _start
empty: ; chars remain in stdin unread
mov rax, 0 ; read 1-char from stdin into choice
mov rdi, 0
mov rsi, choice
mov rdx, 1
syscall
cmp byte [choice], 0xa ; check if char '\n'?
jne empty ; if not, repeat
jmp _start
Beyond that, you should determine your prompt lengths when you declare them, e.g.
prompt db "Type your text here. ", 0h
plen equ $-prompt
retry db "Try again (y/n)? ", 0h
rlen equ $-retry
That way you do not have to hardcode lengths in case you change your prompts, e.g.
_start:
mov rax, 1 ;Just asking the user to enter input
mov rdi, 1
mov rsi, prompt
mov rdx, plen
syscall
mov rax, 0 ;Getting input and saving it on var text
mov rdi, 0
mov rsi, text
mov rdx, 50
syscall
mov r8, rax
mov rax, 1 ;Printing the user input
mov rdi, 1
mov rsi, text
mov rdx, r8
syscall
mov rax, 1 ;Asking if user wants to try again
mov rdi, 1
mov rsi, retry
mov rdx, rlen
syscall
If you put it altogether, you can do:
prompt db "Type your text here. ", 0h
plen equ $-prompt
retry db "Try again (y/n)? ", 0h
rlen equ $-retry
section .bss
text resb 50
choice resb 2
section .text
global _start
_start:
mov rax, 1 ;Just asking the user to enter input
mov rdi, 1
mov rsi, prompt
mov rdx, plen
syscall
mov rax, 0 ;Getting input and saving it on var text
mov rdi, 0
mov rsi, text
mov rdx, 50
syscall
mov r8, rax
mov rax, 1 ;Printing the user input
mov rdi, 1
mov rsi, text
mov rdx, r8
syscall
mov rax, 1 ;Asking if user wants to try again
mov rdi, 1
mov rsi, retry
mov rdx, rlen
syscall
mov rax, 0 ;Getting input and saving it on var choice
mov rdi, 0
mov rsi, choice
mov rdx, 2
syscall
cmp byte [choice], 'y' ; check 1st byte of choice, no need for r8b
jne end
cmp byte [choice + 1], 0xa ; is 2nd char '\n' (if yes done, jump start)
je _start
empty: ; chars remain in stdin unread
mov rax, 0 ; read 1-char from stdin into choice
mov rdi, 0
mov rsi, choice
mov rdx, 1
syscall
cmp byte [choice], 0xa ; check if char '\n'?
jne empty ; if not, repeat
jmp _start
end:
mov rax, 60
mov rdi, 0
syscall
Example Use/Output
$ ./bin/emptystdin
Type your text here. abc
abc
Try again (y/n)? y
Type your text here. def
def
Try again (y/n)? yes please!
Type your text here. geh
geh
Try again (y/n)? yyyyyyyyyyyyyHow to avoid stdin input that does not fit in buffer be sent to the shell in Linux 64-bit Intel (x86-64) assembly Read STDIN using syscall READ in Linux: unconsumed inpeesssssssss!!!!
Type your text here. ijk
ijk
Try again (y/n)? n
Now even a cat stepping on the keyboard at your (y/n)?
prompt won't cause problems. There are probably more elegant ways to handle this that would be more efficient that repetitive reads, with syscall
, but this will handle the issue.
Additional Considerations And Error-Checks
As mentioned above, the simplistic reading and checking of a character-at-a-time isn't a very efficient approach, though it is conceptually the easiest extension without making other changes. @PeterCordes makes a number of good points in the comments below related to approaches that are more efficient and more importantly about error conditions that can arise that should be protected against as well.
For starters when you are looking for information on the individual system call use, Anatomy of a system call, part 1 provides a bit of background on approaching their use supplemented by the Linux manual page, for read man 2 read for details on the parameter types and return type and values.
The original solution above does not address what happens if the user generates a manual end-of-file by pressing Ctrl+d or if a read error actually occurs. It simply addressed the user-input and emptying stdin
question asked. With any user-input, before you use the value, you must validate that the input succeeded by checking the return. (not just for the yes/no input, but all inputs). For purposes here, you can consider zero input (manual end-of-file) or a negative return (read error) as a failed input.
To check whether you have at least one valid character of input, you can simply check the return (read
returns the number of characters read, sys_read
placing that value in rax
after the syscall). A zero or negative value indicating no input was received. A check could be:
cmp rax, 0 ; check for 0 bytes read or error
jle error
You can write a short diagnostic to the user and then handle the error as wanted, this example simply exits after outputting a diagnostic, e.g.
readerr db 0xa, "eof or read error", 0xa, 0x0
rderrsz equ $-readerr
...
; your call to read here
cmp rax, 0 ; check for 0 bytes read or error
jle error
...
error:
mov rax, 1 ; output the readerr string and jmp to end
mov rdi, 1
mov rsi, readerr
mov rdx, rderrsz
syscall
jmp end
Now moving on to a more efficient manner for emptying stdin
. The biggest hindrance indicate in the original answer was the repeated system calls to sys_read
to read one character at a time reusing your 2-byte choice
buffer. The obvious solution is to make choice
bigger, or just use stack space to read more characters each time. (you can look at the comments for a couple of approaches) Here, for example we will increase choice
to 128-bytes which in the case of the anticipate "y\n"
input will only make use of two of those bytes, but in the case of an excessively long input will read 128-bytes at a time until the '\n'
is found. For setup you have:
choicesz equ 128
...
section .bss
text resb 50
choice resb 128
Now after you ask for (y/n)?
your read would be:
mov rax, 0 ; Getting input and saving it on var choice
mov rdi, 0
mov rsi, choice
mov rdx, choicesz
syscall
cmp rax, 0 ; check for 0 bytes read (eof) or error
jle error
cmp byte [choice], 'y' ; check 1st byte of choice, no need for r8b
jne end ; not a 'y', doesn't matter what's in stdin, end
Now there are two conditions to check. First, compare the number of characters read with your buffer size choicesz
and if the number of characters read is less than choicesz
, no characters remain unread in stdin
. Second, if the return equals the buffer size, you may or may not have characters remaining in stdin
. You need to check the last character in the buffer to see if it is the '\n'
to indicate whether you have read all the input. If the last character is other than the '\n'
characters remain unread (unless the user just happened to generate a manual end-of-file at the 128th character) You can check as:
empty:
cmp eax, choicesz ; compare chars read and buffer size
jb _start ; buffer not full - nothing remains in stdin
cmp byte [choice + choicesz - 1], 0xa ; if full - check if last byte \n, done
je _start
mov rax, 0 ; fill choice again from stdin and repeat checks
mov rdi, 0
mov rsi, choice
mov rdx, choicesz
syscall
cmp rax, 0 ; check for 0 bytes read or error
jle error
jmp empty
(note: as noted above, there is a further case to cover, not covered here, such as where the user enters valid input, but then generates a manual end-of-file instead of just pressing Enter after the 128th character (or a multiple of 128). There you can't just look for a '\n'
it doesn't exist, and if there are no more chacters and call sys_read
again, it will block wating on input. Conceivably you will need to use a non-blocking read and putback of a single character to break that ambiguity -- that is left to you)
A comlete example with the improvements would be:
prompt db "Type your text here. ", 0x0
plen equ $-prompt
retry db "Try again (y/n)? ", 0x0
rlen equ $-retry
textsz equ 50
choicesz equ 128
readerr db 0xa, "eof or read error", 0xa, 0x0
rderrsz equ $-readerr
section .bss
text resb 50
choice resb 128
section .text
global _start
_start:
mov rax, 1 ; Just asking the user to enter input
mov rdi, 1
mov rsi, prompt
mov rdx, plen
syscall
mov rax, 0 ; Getting input and saving it on var text
mov rdi, 0
mov rsi, text
mov rdx, textsz
syscall
cmp rax, 0 ; check for 0 bytes read or error
jle error
mov r8, rax
mov rax, 1 ; Printing the user input
mov rdi, 1
mov rsi, text
mov rdx, r8
syscall
mov rax, 1 ; Asking if user wants to try again
mov rdi, 1
mov rsi, retry
mov rdx, rlen
syscall
mov rax, 0 ; Getting input and saving it on var choice
mov rdi, 0
mov rsi, choice
mov rdx, choicesz
syscall
cmp rax, 0 ; check for 0 bytes read (eof) or error
jle error
cmp byte [choice], 'y' ; check 1st byte of choice, no need for r8b
jne end ; not a 'y', doesn't matter what's in stdin, end
empty:
cmp eax, choicesz ; compare chars read and buffer size
jb _start ; buffer not full - nothing remains in stdin
cmp byte [choice + choicesz - 1], 0xa ; if full - check if last byte \n, done
je _start
mov rax, 0 ; fill choice again from stdin and repeat checks
mov rdi, 0
mov rsi, choice
mov rdx, choicesz
syscall
cmp rax, 0 ; check for 0 bytes read or error
jle error
jmp empty
end:
mov rax, 60
mov rdi, 0
syscall
error:
mov rax, 1 ; output the readerr string and jmp to end
mov rdi, 1
mov rsi, readerr
mov rdx, rderrsz
syscall
jmp end
There are surely more efficient ways to optimize this, but for purposes of discussion of "How do I empty stdin
?", this second approach with the buffer size used alieviates the repetitive calls to sys_read
to read one character at-a-time is a good step forward. "How do it completely optimize the check?" is a whole separate question.
Let me know if you have further questions.
Footnotes:
1. In this circumstance where the user is typing input, the user generates a '\n'
by pressing Enter, allowing you to check for the '\n'
as the final character in emptying stdin
. The user can also generate a manual end-of-file by pressing Ctrl+d so the '\n'
isn't guaranteed. There are many still other ways stdin
can be filled, such as redirecting a file as input where there should be a ending '\n'
to be POSIX compliant, there too that isn't a guarantee.
Compact shellcode to print a 0-terminated string pointed-to by a register, given puts or printf at known absolute addresses?
Since I already spilled the beans and "spoiled" the answer to the online challenge in comments, I might as well write it up. 2 key tricks:
Create
0x7ffff7e3c5a0
(&puts
) in a register withlea reg, [reg + disp32]
, using the known value of RDI which is within the +-2^31 range of a disp32. (Or use RBP as a starting point, but not RSP: that would need a SIB byte in the addressing mode).This is a generalization of the code-golf trick of
lea edi, [rax+1]
trick to create small constants from other small constants (especially 0) in 3 bytes, with code that runs less slowly thanpush imm8
/pop reg
.The disp32 is large enough to not have any zero bytes; you have a couple registers to choose from in case one had been too close.
Copy a 64-bit register in 2 bytes with
push reg
/pop reg
, instead of 3-bytemov rdi, rdx
(REX + opcode + modrm). No savings if either push needs a REX prefix (for R8..R15), and actually costs bytes if both are "non-legacy" registers.
See other answers on Tips for golfing in x86/x64 machine code on codegolf.SE for more.
bits 64
lea rsi, [rdi - 0x166f30]
;; add rbp, imm32 ; alternative, but that would mess up a call-preserved register so we might crash on return.
push rdx
pop rdi ; copy RDX to first arg, x86-64 SysV calling convention
jmp rsi ; tailcall puts
This is exactly 11 bytes, and I don't see a way for it to be smaller. add r64, imm32
is also 7 bytes, same as LEA. (Or 6 bytes if the register is RAX, but even the xchg rax, rdi
short form would cost 2 bytes to get it there, and the RAX value is still the fgets return value, which is the small mmap buffer address.)
The puts
function pointer doesn't fit in 32 bits, so we need a REX prefix on any instruction that puts it into a register. Otherwise we could just mov reg, imm32
(5 bytes) with the absolute address, not deriving it from another register.
$ nasm -fbin -o exploit.bin -l /dev/stdout exploit.asm
1 bits 64
2 00000000 488DB7D090E9FF lea rsi, [rdi - 0x166f30]
3 ;; add rbp, imm32 ; we can avoid messing up any call-preserved registers
4 00000007 52 push rdx
5 00000008 5F pop rdi ; copy to first arg
6 00000009 FFE6 jmp rsi ; tailcall
$ ll exploit.bin
-rw-r--r-- 1 peter peter 11 Apr 24 04:09 exploit.bin
$ ./a.out < exploit.bin # would work if the addresses in my build matched yours
My build of your incomplete .c
uses different addresses on my machine, but it does reach this code (at address 0x10000
, mmap_min_addr
which mmap picks after the amusing choice of 0x1337
as a hint address, which isn't even page aligned but doesn't result in EIVAL on current Linux.)
Since we only tailcall puts
with correct stack alignment and don't modify any call-preserved registers, this should successfully return to main
.
Note that 0
bytes (ASCII NUL, not NULL) would actually work in shellcode for this test program, if not for the requirement that forbids it.
The input is read using fgets
(apparently to simulate a gets()
overflow).fgets
actually can read a 0
aka '\0'
; the only critical character is 0xa
aka '\n'
newline. See Is it possible to read null characters correctly using fgets or gets_s?
Often buffer overflows exploit a strcpy
or something else that stops on a 0
byte, but fgets
only stops on EOF or newline. (Or the buffer size, a feature gets
is missing, hence its deprecation and removal from even the ISO C standard library! It's literally impossible to use safely unless you control the input data). So yes, it's totally normal to forbid zero bytes.
BTW, your int 0x80
attempt is not viable: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? - you can't use the 32-bit ABI to pass 64-bit pointers to write
, and the string you want to output is not in the low 32 bits of virtual address space.
Of course, with the 64-bit syscall
ABI, you're fine if you can hardcode the length.
push rdx
pop rsi
shr eax, 16 ; fun 3-byte way to turn 0x10000` into `1`, __NR_write 64-bit, instead of just push 1 / pop
mov edi, eax ; STDOUT_FD = __NR_write
lea edx, [rax + 13 - 1] ; 3 bytes. RDX = 13 = string length
; or mov dl, 0xff ; 2 bytes leaving garbage in rest of RDX
syscall
But this is 12 bytes, as well as hard-coding the length of the string (which was supposed to be part of the secret?).
mov dl, 0xff
could make sure the length was at least 255, and actually much more in this case, if you don't mind getting reams of garbage after the string you want, until write hits an unmapped page and returns early. That would save a byte, making this 11.
(Fun fact, Linux write
does not return an error when it's successfully written some bytes; instead it returns how many it did write. If you try again with buf + write_len
, you would get a -EFAULT
return value for passing a bad pointer to write.)
Basic input with x64 assembly code
In your first code section you have to set the SYS_CALL to 0
for SYS_READ
(as mentioned rudimentically in the other answer).
So check a Linux x64 SYS_CALL list for the appropriate parameters and try
_start:
mov rax, 0 ; set SYS_READ as SYS_CALL value
sub rsp, 8 ; allocate 8-byte space on the stack as read buffer
mov rdi, 0 ; set rdi to 0 to indicate a STDIN file descriptor
lea rsi, [rsp] ; set const char *buf to the 8-byte space on stack
mov rdx, 1 ; set size_t count to 1 for one char
syscall
linux syscall uname for x86
I wrote code for linux x86. Look it here (maybe will be useful)
https://github.com/OlegInfoSecurity/uname_x86
This error occurred when i output (print) info. I changed code for output info and program is work.
How to read input from STDIN in x86_64 assembly?
First of all : there are no variables in assembly. There are just labels for some kind of data. The data is, by design, untyped - at least in real assemblers, not HLA (e.g. MASM).
Reading from the standard input is achieved by using the system call read
. I assume you've already read the post you mentioned and you know how to call system calls in x64 Linux. Assuming that you're using NASM (or something that resembles its syntax), and that you want to store the input from stdin at the address buffer
, where you have reserved BUFSIZE
bytes of memory, executing the system call would look like this :
xor eax, eax ; rax <- 0 (syscall number for 'read')
xor edi, edi ; edi <- 0 (stdin file descriptor)
mov rsi, buffer ; rsi <- address of the buffer. lea rsi, [rel buffer]
mov edx, BUFSIZE ; rdx <- size of the buffer
syscall ; execute read(0, buffer, BUFSIZE)
Upon returning, rax
will contain the result of the syscall. If you want to know more about how it works, please consult man 2 read
. Note that the syscall for read
on mac is 0x2000003
instead of 0
, so that first line would instead be mov rax, 0x2000003
.
Parsing an integer in assembly language is not that simple, though. Since read
only gives you plain binary data that appears on the standard input, you need to convert the integer value yourself. Keep in mind that what you type on the keyboard is sent to the application as ASCII codes (or any other encoding you might be using - I'm assuming ASCII here). Therefore, you need to convert the data from an ASCII-encoded decimal to binary.
A function in C for converting such a structure to a normal unsigned int could look something like this:
unsigned int parse_ascii_decimal(char *str,unsigned int strlen)
{
unsigned int ret = 0, mul = 1;
int i = strlen-1;
while(i >= 0)
{
ret += (str[i] & 0xf) * mul;
mul *= 10;
--i;
}
return ret;
}
Converting this to assembly (and extending to support signed numbers) is left as an exercise for the reader. :) (Or see NASM Assembly convert input to integer? - a simpler algorithm only has 1 multiply per iteration, with total = total*10 + digit
. And you can check for the first non-digit character as you iterate instead of doing strlen separately, if the length isn't already known.)
Last but not least - the write
syscall requires you to always pass a pointer to a buffer with the data that's supposed to be written to a given file descriptor. Therefore, if you want to output a newline, there is no other way but to create a buffer containing the newline sequence.
I want my Assembly Code to takes user input and outputs it along with other text but the output isn't correct
The length of Bob!!!!!!!!!!!!!!!!!!
is the length of Welcome to the club,
.
This is no coincidence.
Following the write(2)
system call rax
contains the number of successfully written Bytes.
(This might be less than the desired number of Bytes as the manual page describes.)
Like David C. Rankin commented you will need to mind the return value of read(2)
.
On success, read(2)
returns the number of Bytes read in rax
.
However, you are overwriting this value for and with the intervening write(2)
system call.
Store and recall somewhere the number of successfully read Bytes (e. g. push
/pop
) and you’re good.
PS:
You could save one write(2)
system call by rearranging the buffer to follow after greet_1
.
Then you could write(2)
rax + greet1_len
Bytes at once.
But one problem at a time.
Related Topics
What's the Max File Mapping Size in 64Bits MAChine
Write to a File After Piping Output from Tail -F Through to Grep
Fatal: Bad Config File Line 1 in /Home/Trx/.Gitconfig
Configure Options for Building Mingw-64 on Linux-64 for Linux-64 (Ultimately Targetting Windows-64)
Bash Alias Create File with Current Timestamp in Filename
Why Does This Movq Instruction Work on Linux and Not Osx
Rsync --Exclude Not Excluding Specific Files
How to Install Visual Studio 2015 in Ubuntu
Extract Text Between Two Strings Repeatedly Using Sed or Awk
Dlopen Failed: Cannot Open Shared Object File: No Such File or Directory
How to Manage Log Verbosity Inside a Shell Script
How to Break an Arbitrary Tcp/Ip Connection on Linux
How to Schedule an R Script Cronjob in a Linux Server
How to Redirect the Telnet Console Logs to a File Linux
Perf-Report Show Value of CPU Register