Prompting for User Input in Assembly Ci20 Seg Fault

Prompting for user input in assembly ci20 seg fault

edit: for some reason I completely ignored this code was tested on ci20 machine.

So is this linux? Then you can't use MARS syscalls, you have to find linux syscalls instead. It is then probably segfaulting on the very first syscall instruction, as the arguments are invalid for Linux.

To display "prompt" you use syscall with arguments set as v0 = 4, a0 = prompt ... to display "message" you set arguments for syscall as v0 = 1, a0 = message.

If this is in MARS, then v0=1 is "print integer", so a0 should be integer, not address of "message" string. .. you probably want to call syscall twice, with v0=4 and v0=1 (argument a0 being "message" and users integer for particular call).

Anyway, none of this should segfault. The segfault happens probably at the end, where your code ends with addiu $sp, $sp, +4, not returning to the ra, or calling syscall "exit" function (from the saving of ra at the start of your code it looks like you want rather to return than exit, but it's up to you). So the execution continues over some random instructions (uninitialized memory content).

Anyway 2, you should figure out how to load this code in debugger and step over it instruction by instruction, then you will be capable to say where exactly it segfaults, and what was the content of registers before segfaulting instruction. If your code segfaults and you don't even know where, it shows lack of effort on your side.

(disclaimer: I never did MIPS assembly, so I'm mostly guessing how it works and may have overlooked something)

edit about syscall, maybe this hint will help too?

syscall isn't some magic instruction doing all that nifty stuff on the CPU. It just jumps to some handler routine.

That handler code is set up by the OS. Most of the MIPS assembly listings on SO are targetted at MARS or SPIM, which have completely different handler than Linux.

So you should study linux ABI for MIPS, and how syscall is used there. And then find linux system calls table, you will probably find ton of x86 docs, so you have to convert that into v0/a0/... ABI.

You can still follow MARS examples, but any OS interaction has to be adjusted, and don't expect to find alternative for everything. For example outputting the number is not available in linux. You have to convert the number value into ASCII string by yourself (for single digit numbers adding '0' is enough, for numbers above 9 you have to calculate digit for each power of 10 and convert it into ASCII character and store it into some buffer), and output then the string with sys_write/etc. (or link with some libc and call sprintf-like function from C library).

Program received signal SIGSEGV, Segmentation fault. in assembly when I call printf

I think you are missing the caller clean up that you need to do with cdecl calling convention. Try adding add $4, %esp after call printf.

segmentation fault(core dumped) error while using inline assembly

The key to understanding inline asm is to understand that each asm statement has two parts:

The text of the actual assembler stuff, in which the compiler will make textual substitutions, but does not understand.
This is the AssemblerTemplate in the documentation (everything up to the first : in the __asm__()).
A description of what the assembler stuff does, in terms that the compiler does understand.
This the : OutputOperands : InputOperands : Clobbers in the documentation.
This must tell the compiler how the assembler fits in with all the code which the compiler is generating around it. The code generation is busy allocating registers to hold values, deciding what order to do things in, moving things out of loops, eliminating unused fragments of code, discarding values it no longer needs, and so on.
The actual assembler is a black box which takes the inputs described here, produces the outputs described and as a side effect may 'clobber' some registers and/or memory. This must be a complete description of what the assembler does... otherwise the compiler-generated asm around your template will clash with it and rely on false assumptions.
Armed with this information the compiler can decide what registers the assembler can use, and you should let it do that.

So, your fragment:

 asm volatile(
            "movq %0 , %%rax\n"
            "rol %%rax\n"
            "rol %%rax\n"
            :"=r"(X)
            :"r"(X)
         );

has a few "issues":

you may have chosen %rax for the result on the basis that the asm() is like a function, and may be expected to return a result in %rax -- but that isn't so.
you went ahead and used %rax, which the compiler may (well) already have allocated to something else... so you are, effectively, 'clobbering' %rax but you have failed to tell the compiler about it !
you specified =r(X) (OutputOperand) which tells the compiler to expect an output in some register and that output will be the new value of the variable X. The %0 in the AssemblerTemplate will be replaced by the register selected for the output. Sadly, your assembly treats %0 as the input :-( And the output is, in fact, in %rax -- which, as above, the compiler is unaware of.
you also specified r(X) (InputOperand) which tells the compiler to arrange for the current value of the variable X to be placed in some register for the assembler to use. This would be %1 in the AssemblerTemplate. Sadly, your assembly does not use this input.
Even though the output and input operands both refer to X, the compiler may not make %0 the same register as %1. (This allows it to use the asm block as a non-destructive operation that leaves the original value of an input unmodified. If this isn't how your template works, don't write it that way.
generally you don't need the volatile when all the inputs and outputs are properly described by constraints. One of the fine things the compiler will do is discard an asm() if (all of) the output(s) are not used... volatile tells the compiler not to do that (and tells it a number of other things... see the manual).

Apart from that, things were wonderful. The following is safe, and avoids a mov instruction:

 asm("rol %0\n"
     "rol %0\n"   : "+r"(X));

where "+r"(X) says that there is one combined input and output register required, taking the old value of X and returning a new one.

Now, if you don't want to replace X, then assuming Y is to be the result, you could:

 asm("mov %1, %0\n"
     "rol %0\n"
     "rol %0\n"   : "=r"(Y) : "r"(X));

But it's better to leave it up to the compiler to decide whether it needs to mov or whether it can just let an input be destroyed.

There are a couple of rules about InputOperands which are worth mentioning:

The assembler must not overwrite any of the InputOperands -- the compiler is tracking which values it has in which registers, and is expecting InputOperands to be preserved.
The compiler expects all InputOperands to be read before any OutputOperand is written. This is important when the compiler knows that a given InputOperand is not used again after the asm(), and it can therefore allocate the InputOperand's register to an OutputOperand. There is a thing called earlyclobber (=&r(foo)) to deal with this little wrinkle.

In the above, if you don't in fact use X again the compiler could allocate %0 and %1 to the same register! But the (redundant) mov will still be assembled -- remembering that the compiler really doesn't understand the AssemblerTemplate. So, you are in general better off shuffling values around in the C, not the asm(). See https://gcc.gnu.org/wiki/DontUseInlineAsm and Best practices for circular shift (rotate) operations in C++

So here are four variations on a theme, and the code generated (gcc -O2):

// (1) uses both X and Y in the printf() -- does mov %1, %0 in asm()
void Never_Inline footle(void)               Dump of assembler code for function footle:
{                                              mov    $0x492782,%edi   # address of format string
  unsigned long  X, Y ;                        xor    %eax,%eax
                                               mov    $0x63,%esi       # X = 99
  X = 99 ;                                     rol    %rsi            # 1st asm
  __asm__("\t rol  %0\n"                       rol    %rsi
          "\t rol  %0\n" : "+r"(X)             mov    %rsi,%rdx       # 2nd asm, compiler using it as a copy-and-rotate
      ) ;                                      rol    %rdx
                                               rol    %rdx
  __asm__("\t mov  %1, %0\n"                   jmpq   0x4010a0 <printf@plt>  # tailcall printf
          "\t rol  %0\n"
          "\t rol  %0\n" : "=r"(Y) : "r"(X)
      ) ;

  printf("%lx %lx\n", X, Y) ;
}

// (2) uses both X and Y in the printf() -- does Y = X in 'C'
void Never_Inline footle(void)               Dump of assembler code for function footle:
{                                              mov    $0x492782,%edi
  unsigned long  X, Y ;                        xor    %eax,%eax
                                               mov    $0x63,%esi
  X = 99 ;                                     rol    %rsi       # 1st asm
  __asm__("\t rol  %0\n"                       rol    %rsi
          "\t rol  %0\n" : "+r"(X)             mov    %rsi,%rdx  # compiler-generated mov
      ) ;                                      rol    %rdx       # 2nd asm
                                               rol    %rdx
  Y = X ;                                      jmpq   0x4010a0 <printf@plt>
  __asm__("\t rol  %0\n"
          "\t rol  %0\n" : "+r"(Y)
      ) ;

  printf("%lx %lx\n", X, Y) ;
}

// (3) uses only Y in the printf() -- does mov %1, %0 in asm()
void Never_Inline footle(void)               Dump of assembler code for function footle:
{                                              mov    $0x492782,%edi
  unsigned long  X, Y ;                        xor    %eax,%eax
                                               mov    $0x63,%esi
  X = 99 ;                                     rol    %rsi
  __asm__("\t rol  %0\n"                       rol    %rsi
          "\t rol  %0\n" : "+r"(X)             mov    %rsi,%rsi   # redundant instruction because of mov in the asm template
      ) ;                                      rol    %rsi
                                               rol    %rsi
  __asm__("\t mov  %1, %0\n"                   jmpq   0x4010a0 <printf@plt>
          "\t rol  %0\n"
          "\t rol  %0\n" : "=r"(Y) : "r"(X)
      ) ;

  printf("%lx\n", Y) ;
}

// (4) uses only Y in the printf() -- does Y = X in 'C'
void Never_Inline footle(void)               Dump of assembler code for function footle:
{                                              mov    $0x492782,%edi
  unsigned long  X, Y ;                        xor    %eax,%eax
                                               mov    $0x63,%esi
  X = 99 ;                                     rol    %rsi
  __asm__("\t rol  %0\n"                       rol    %rsi
          "\t rol  %0\n" : "+r"(X)             rol    %rsi    # no wasted mov, compiler picked %0=%1=%rsi
      ) ;                                      rol    %rsi
                                               jmpq   0x4010a0 <printf@plt>
  Y = X ;
  __asm__("\t rol  %0\n"
          "\t rol  %0\n" : "+r"(Y)
      ) ;

  printf("%lx\n", Y) ;
}

which, hopefully, demonstrates the compiler busily allocating values to registers, tracking which values it needs to hold on to, minimizing register/register moves, and generally being clever.

So the trick is to work with the compiler, understanding that the :OutputOperands:InputOperands:Clobbers is where you are describing what the assembler is doing.

Why do I get a segmentation fault when writing to a char *s initialized with a string literal, but not char s[] ?

See the C FAQ, Question 1.32

Q: What is the difference between these initializations?

char a[] = "string literal";
char *p = "string literal";
My program crashes if I try to assign a new value to p[i].

A: A string literal (the formal term
for a double-quoted string in C
source) can be used in two slightly
different ways:

As the initializer for an array of char, as in the declaration of char a[] , it specifies the initial values
of the characters in that array (and,
if necessary, its size).

Anywhere else, it turns into an unnamed, static array of characters,
and this unnamed array may be stored
in read-only memory, and which
therefore cannot necessarily be
modified. In an expression context,
the array is converted at once to a
pointer, as usual (see section 6), so
the second declaration initializes p
to point to the unnamed array's first
element.

Some compilers have a switch
controlling whether string literals
are writable or not (for compiling old
code), and some may have options to
cause string literals to be formally
treated as arrays of const char (for
better error catching).

Help with Assembly. Segmentation fault when compiling samples on Mac OS X

You seem to be using 32-bit assembly code here. One big difference among 32-bit Mac OS X and 32-bit Windows or Linux is that Mac requires the stack to be 16-byte aligned whenever you CALL a function. In other words, at the point in your code where you have a CALL instruction, it is required that ESP = #######0h.

The following may be interesting reads:

http://blogs.embarcadero.com/eboling/2009/05/20/5607

www.agner.org/optimize/calling_conventions.pdf

Assembly segmentation fault during retq

I think one of your return paths doesn't pop rbp. Just leave out the

pushq   %rbp
movq    %rsp, %rbp

pop     %rbp

altogether. gcc's default is -fomit-frame-pointer.

Or fix your non-return-zero path to also pop rbp.

Actually, you're screwed because your function appears to be designed to put stuff on the stack an never take it off. If you want to invent your own ABI where space below the stack pointer can be used to return arrays, that's interesting, but you'll have to keep track of how big they are so you can adjust rsp back to pointing at the return address before ret.

I recommend against loading the return address into a register and replacing a later ret with jmp *%rdx or something. That would throw off the call/return address prediction logic in modern CPUs, and cause a stall the same as a branch mispredict. (See http://agner.org/optimize/). CPUs hate mismatched call/ret. I can't find a specific page to link about that right now.

See https://stackoverflow.com/tags/x86/info for other useful resources, including ABI documentation on how functions normally take args.

You could copy the return address down to below the array you just pushed, and then run ret, to return with %rsp modified. But unless you need to call a long function from multiple call sites, it's probably better to just inline it in one or two call-sites.

If it's too big to inline at too many call sites, your best bet, instead of using call, and copying the return address down to the new location, would be to emulate call and ret. Caller does

    put args in some registers
    lea   .ret_location(%rip), %rbx
    jmp   my_weird_helper_function
.ret_location:  # in NASM/YASM, labels starting with . are local labels, and don't show up in the object file.
         # GNU assembler might only treat symbols starting with .L that way.
    ...

my_weird_helper_function:
    use args, potentially modifying the stack
    jmp *%rbx   # return

You need a really good reason to use something like this. And you'll have to justify / explain it with a lot of comments, because it's not what readers will be expecting. First of all, what are you going to do with this array that you pushed onto the stack? Are you going to find its length by subtracting rsp and rbp or something?

Interestingly, even though push has to modify rsp as well as do a store, it has one per clock throughput on all recent CPUs. Intel CPUs have a stack engine to make stack ops not have to wait for rsp to be computed in the out-of-order engine when it's only been changed by push/pop/call/ret. (mixing push/pop with mov 4(%rsp), %rax or whatever results in extra uops being inserted to sync the OOO-engine's rsp with the stack-engine's offset.) Intel/AMD CPUs can only do one store per clock anyway, but Intel SnB and later can pop twice per clock.

So push/pop is actually not a terrible way to implement a stack data structure, esp. on Intel.

Also, your code is structured weirdly. main() is split across r8_digits_to_stack. That's fine, but you're not taking advantage of falling through from one block into the other block ever, so it just costs you an extra jmp in main for no benefit and a huge readability downside.

Let's pretend your loop is part of main, since I already talked about how it's super-weird to have a function return with %rsp modified.

Your loop could be simpler, too. Structure things with a jcc back to the top, when possible.

There's a small benefit to avoiding the upper 16 registers: 32bit insns with the classic registers doesn't need a REX prefix byte. So lets pretend we just have our starting value in %rax.

digits_to_stack:
# put each bit of %rax into its own 8 byte element on the stack for maximum space-inefficiency

    movq   %rax, %rdx  # save a copy

    xor    %ecx, %ecx  # setcc is only available for byte operands, so zero %rcx

    # need a test at the top after transforming while() into do{}while
    test   %rax, %rax  # fewer insn bytes to test for zero this way
    jz  .Lend

    # Another option can be to jmp to the test at the end of the loop, to begin the first iteration there.

.align 16
.Lpush_loop:
    shr   $1, %rax   # shift the low bit into CF, set ZF based on the result
    setc  %cl       # set %cl to 0 or 1, based on the carry flag
    # movzbl %cl, %ecx  # zero-extend
    pushq %rcx
      #.Lfirst_iter_entry
      # test %rax, %rax   # not needed, flags still set from shr
    jnz  .Lpush_loop
.Lend:

This version still kinda sucks, because on Intel P6 / SnB CPU families, using a wider register after writing a smaller part leads to a slowdown. (stall on pre-SnB, or extra uop on SnB and later). Others, including AMD and Silvermont, don't track partial-registers separately, so writing to %cl has a dependency on the previous value of %rcx. (writing to a 32bit reg zeros the upper 32, which avoids the partial-reg dependency problem.) A movzx to zero-extend from byte to long would do what Sandybridge does implicitly, and give a speedup on older CPUs.

This won't quite run in a single cycle per iteration on Intel, but might on AMD. mov/and $1 is not bad, but and affects the flags, making it harder to loop just based on shr setting the flags.

Note that your old version sarq %rax shifts in sign bits, not necessarily zeros, so with negative input your old version would be an inf loop (and segfault when you ran out of stack space (push would try to write to an unmapped page)).

FizzBuzz in assembly - segmentation fault

This is causing the segmentation fault:

...
    je  print_fizz      ;print fizz if they equal
...
    je  print_buzz      ;print buzz if they equal
    jmp print_ax        ;print contents of ax if not
...

print_ax:
    ret

print_fizz:
    ret

print_buzz:
    ret
...

Since you jump to the functions, the ret gets no return address and will return anywhere. Change it to a call/ret-pair:

...
;   je  print_fizz      ;print fizz if they equal
    jne .1              ;skip if not equal
    call print_fizz
    .1:
...

;   je  print_buzz      ;print buzz if they equal
    jne .2              ;skip if not equal
    call print_buzz
    .2:

;   jmp print_ax        ;print contents of ax if not
    call print_ax
...

This will cause an infinite loop:

mov ax,0
mov [counter],ax

main_loop:

    cmp ax,100          ;from 0 to 100
    je  exit
    ...
    mov ah,0            ;here will be a remainder
    div bl              ;divide
    ...
    mov ah,0            ;do I have to do it every time?
    div bl              ;divide
    ...
    inc ax              ;increment ax
    jmp main_loop       ;jump to label

AX changes its values and is unfit to hold the loop-counter. I suggest:

...
main_loop:

;   cmp ax,100          ;from 0 to 100
    cmp byte [counter], 100
...
;   inc ax              ;increment ax
    inc byte [counter]
    jmp main_loop       ;jump to label
...

Prompting for User Input in Assembly Ci20 Seg Fault