How Does Ltrace() Display Rand()

How does ltrace() display rand()

ltrace shows the content of the few registers passing arguments, according to x86-64 ABI conventions.

For other functions, ltrace knows their API (i.e. their signature) so show arguments more cleverly.

See ltrace(1) and the PROTOTYPE LIBRARY DISCOVERY section.

ltrace does not show sin() in the output

This is because your sin call is a constant value and gcc optimizes it out (even when compiling with -O0 and without -lm). This is the result of running disass main in gdb:

   0x0000000000400580 <+0>:     push   %rbp
   0x0000000000400581 <+1>:     mov    %rsp,%rbp
   0x0000000000400584 <+4>:     sub    $0x10,%rsp
   0x0000000000400588 <+8>:     mov    0xee(%rip),%eax        # 0x40067c
   0x000000000040058e <+14>:    mov    %eax,-0x4(%rbp)
   0x0000000000400591 <+17>:    mov    $0x400660,%edi
   0x0000000000400596 <+22>:    callq  0x400450 <puts@plt>
   0x000000000040059b <+27>:    mov    0xdf(%rip),%eax        # 0x400680
   0x00000000004005a1 <+33>:    mov    %eax,-0x4(%rbp)
   0x00000000004005a4 <+36>:    movss  -0x4(%rbp),%xmm0
   0x00000000004005a9 <+41>:    cvtps2pd %xmm0,%xmm0
   0x00000000004005ac <+44>:    mov    $0x40066e,%edi
   0x00000000004005b1 <+49>:    mov    $0x1,%eax
   0x00000000004005b6 <+54>:    callq  0x400460 <printf@plt>
   0x00000000004005bb <+59>:    mov    $0x0,%eax
   0x00000000004005c0 <+64>:    leaveq 
   0x00000000004005c1 <+65>:    retq

There is no call for sin here.

Changing your code to read:

#include<stdio.h>
#include<math.h>

int main()
{
    float x, y;
    scanf("%f", &x);
    y=sin(x);
    printf("sin(%f)=%f\n", x, y);
    return 0;
}

will make you need -lm when compiling:

$ gcc -Wall -Wextra -O0 -g 1.c -lm

and now you'll see this disassembled output:

   ...
   0x00000000004006c9 <+25>:    callq  0x4005b0 <__isoc99_scanf@plt>
   0x00000000004006ce <+30>:    movss  -0x8(%rbp),%xmm0
   0x00000000004006d3 <+35>:    unpcklps %xmm0,%xmm0
   0x00000000004006d6 <+38>:    cvtps2pd %xmm0,%xmm0
   0x00000000004006d9 <+41>:    callq  0x4005a0 <sin@plt>
   ...

and the call in ltrace:

__libc_start_main(0x4006b0, 1, 0x7fffd25ecff8, 0x400720 <unfinished ...>
__isoc99_scanf(0x4007b0, 0x7fffd25ecf08, 0x7fffd25ed008, 0x400720) = 1
sin(0x7fffd25ec920, 0x7fa1a6388a20, 1, 16)                         = 0x7fa1a643b780
printf("sin(%f)=%f\n", 3.000000, 0.141120sin(3.000000)             =0.141120
)                                                                  = 23
+++ exited (status 0) +++

No output when running ltrace

This may have to do with binaries being compiled with -z now. I created a quick test program (I'm using Ubuntu 16.04):

int main() {
  write(0, "hello\n", 6);
  return 0;
}

If I compile it with gcc -O2 test.c -o test then ltrace works:

$ ltrace ./test 
__libc_start_main(0x400430, 1, 0x7ffc12326528, 0x400550 <unfinished ...>
write(0, "hello\n", 6hello
)                                                              = 6
+++ exited (status 0) +++

However when I compile with gcc -O2 test.c -Wl,-z,relro -Wl,-z,now -o test2 then it doesn't:

$ ltrace ./test2 
hello
+++ exited (status 0) +++

You can check if a binary was compiled like so using scanelf from the pax-utils package on Ubuntu:

$ scanelf -a test*
 TYPE    PAX   PERM ENDIAN STK/REL/PTL TEXTREL RPATH BIND FILE 
ET_EXEC PeMRxS 0775 LE RW- R-- RW-    -      -   LAZY test 
ET_EXEC PeMRxS 0775 LE RW- R-- RW-    -      -   NOW test2

Note the LAZY (ltrace works) versus NOW (ltrace doesn't).

There is a little bit more discussion (but no resolution) here:

https://bugzilla.redhat.com/show_bug.cgi?id=1333481

gdb prints long values watching a variable set with rand()

Unoptimized gcc assembly can be strange:

        jmp     .L2
.L3:
        call    rand
        movl    %eax, %edx
        movslq  %edx, %rax
        imulq   $1717986919, %rax, %rax
        shrq    $32, %rax
        sarl    $2, %eax
        movl    %edx, %ecx
        sarl    $31, %ecx
        subl    %ecx, %eax
        movl    %eax, -4(%rbp)
        movl    -4(%rbp), %ecx
        movl    %ecx, %eax
        sall    $2, %eax
        addl    %ecx, %eax
        addl    %eax, %eax
        subl    %eax, %edx
        movl    %edx, -4(%rbp)
        addl    $1, -8(%rbp)
.L2:
        cmpl    $9, -8(%rbp)
        jle     .L3

And it seems you are warching -4(%rbp). So there is movl %eax, -4(%rbp) where a "big number" is put there, then a read in movl -4(%rbp), %ecx and then movl %edx, -4(%rbp) where the result of % 10 is put there. So you are seeing some number from middle of calculations. Ie. one loop corresponds to:

New value = 32015002
0x00005555555551f8 in main () at demo.c:12
12          var = rand() % 10;

Hardware access (read/write) watchpoint 2: var

Value = 32015002
0x00005555555551fb in main () at demo.c:12
12          var = rand() % 10;

Hardware access (read/write) watchpoint 2: var

Old value = 32015002
New value = 7
main () at demo.c:10
10      for (int i = 0; i < 5; i++)

Hardware access (read/write) watchpoint 2: var

What is the best approach to compute the trace of a (sparse) matrix product efficiently in python

Another option is (A.conj().multiply(B)).sum().

In [111]: Dimension = 2**12

In [112]: A = rand(Dimension, Dimension, density=0.001, format='csr')
     ...: B = rand(Dimension, Dimension, density=0.001, format='csr')

Compare to sum((A.conj().T @ B).diagonal()):

In [113]: sum((A.conj().T @ B).diagonal())
Out[113]: 4.152218112255467

In [114]: (A.conj().multiply(B)).sum()
Out[114]: 4.152218112255466

In [115]: %timeit sum((A.conj().T @ B).diagonal())
2.7 ms ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [116]: %timeit (A.conj().multiply(B)).sum()
1.12 ms ± 4.39 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Of course, for larger values of Dimension, the relative performance difference is much greater (O(Dimension**3) for the full matrix multiply vs O(Dimension**2) for the elementwise multiply):

In [119]: Dimension = 2**14

In [120]: A = rand(Dimension, Dimension, density=0.001, format='csr')
     ...: B = rand(Dimension, Dimension, density=0.001, format='csr')

In [121]: sum((A.conj().T @ B).diagonal())
Out[121]: 69.23254213582365

In [122]: (A.conj().multiply(B)).sum()
Out[122]: 69.23254213582364

In [123]: %timeit sum((A.conj().T @ B).diagonal())
124 ms ± 1.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [124]: %timeit (A.conj().multiply(B)).sum()
8.67 ms ± 63.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

How Does Ltrace() Display Rand()