When Is The System Call Set_Tid_Address Used

when is the system call set_tid_address used?

The clone() syscall can take a CLONE_CHILD_CLEARTID flag, that the value at child_tidptr (another clone() argument) gets cleared and an associated futex signal a wake-up when the child thread exits. This is used to implement pthread_join() (the parent thread waits on the futex).

set_tid_address() allows to pthread_join() on the initial thread. More information in the following LKML threads:

[patch] threading fix, tid-2.5.47-A3

[patch] user-vm-unlock-2.5.31-A2

As to why some programs call set_tid_address() and others don't, the answer is easy. Programs linked (directly or indirectly) to libpthread call set_tid_address. ls is linked to librt, which is linked to libpthread, so it runs the initialization for NPTL.

Why is sys_fork not used by glibc's implementation of fork?

I looked at the commit where Ulrich Drepper added that code to glibc, and there wasn't any explanation in the commit log (or elsewhere).

Have a look at Linux's implementation of fork, though:

return _do_fork(SIGCHLD, 0, 0, NULL, NULL, 0);

And here is clone:

return _do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr, tls);

Obviously, they are almost exactly the same. The only difference is that when calling clone, you can set various flags, can specify a stack size for the new process, etc. fork doesn't take any arguments.

Looking at Drepper's code, the clone flags are CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD. If fork was used, the only flag would be SIGCHLD.

Here is what the clone manpage says about those extra flags:

CLONE_CHILD_CLEARTID (since Linux 2.5.49)
Erase child thread ID at location ctid in child memory when the child
exits, and do a wakeup on the futex at that address. The address
involved may be changed by the set_tid_address(2) system call. This is
used by threading libraries.

CLONE_CHILD_SETTID (since Linux 2.5.49)
Store child thread ID at location ctid in child memory.

...And you can see that he does pass a pointer to where the kernel should first store the child's thread ID and then later do a futex wakeup. Is glibc doing a futex wait on that address somewhere? I don't know. If so, that would explain why Drepper chose to use clone.

(And if not, it would be just one more example of the extreme accumulation of cruft which is our beloved glibc! If you wanted to find some nice, clean, well-maintained code, just keep moving and go have a look at musl libc!)

sbrk system call in unix

sbrk is not a system call in linux. It's a library function implemented in libc which uses the brk system call. Your strace shows brk being used.

Which linux system call is used by ls command in linux to display the folder/file name?

Most of the system calls there are noise from loading shared libraries at startup. The interesting things happen here:

openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents(3, /* 2 entries */, 32768) = 48
getdents(3, /* 0 entries */, 32768) = 0
close(3)

The openat(2) system call is used to open the current directory (".") relative to the current working directory (the AT_FDCWD flag). The O_DIRECTORY flag indicates that it wants to open the directory and read the directory's contents.

The actual directory data is read using the getdents(2) system call. In this case, it called it twice, since until it returns 0, it's not sure if there's more data or not. Finally, the file descriptor is closed after it's done.

If you were to write your own program, however, you wouldn't call these directly -- instead you'd use opendir(3), readdir(3), and closedir(3) to read a directory. They're portable (POSIX-compliant), and they insulate you from the details of the underlying system calls. They're also easier to use, IMO.

Why do C++ and strace disagree on how long the open() system call is taking?

Comparing elapsed time with execution time is like comparing apples with orange juice. (One of them is missing the pulp :) ) To open a file, the system has to find and read the appropriate directory entry... and if the paths are deep, it might need to rrad a number of directory entries. If the entries are not cached, they will need to be read from disk, which will involve a disk seek. While the disk heads are moving, and while the sector is spinning around to where the disk heads are, the wall clock keeps ticking, but the CPU can be doing other stuff (if there is work to do.) So that counts as elapsed time -- the inexorable clock ticks on -- but not execution time.

type of syscall a process or program is making

Pipe the output to wc -l to get the number of lines in the statistics. Since the statistics are written to standard error, you'll need to do some redirection for this.

strace -c cat abc.txt 2>&1 >/dev/null | wc -l

You'll also need to subtract 4 from this, because of the header, total, and divider lines.



Related Topics



Leave a reply



Submit