On Linux, Is Access() Faster Than Stat()

Performance of fopen vs stat

stat is probably better, since it doesn't have to allocate resources for actually reading the file. You won't have to call fclose to release those resources, and you may also benefit from caching of recently checked files.

When it doubt, test it out. Time a big loop that checks for 1000 files using each method, with the appropriate mix of filenames that exist and don't exist.

If you have the source code for stat and fopen, you should be able to read through it and get an idea as to which will require more resources.

What is the fastest way to detect file size is not zero without knowing the file descriptor?

You should probably benchmark it for yourself.

I've measured

//Real-time System-time
272.58 ns(R) 170.11 ns(S) //lseek
366.44 ns(R) 366.28 ns(S) //fstat
812.77 ns(R) 711.69 ns(S) //stat("/etc/profile",&sb)

on my Linux laptop. It fluctuates a little between runs but lseek is usually a bunch of ns faster than fstat, but you also need a fd for it and opening is quite expensive at about 1.6µs, so stat is probably the best choice for your case.


As tom-karzes has noted, stat should dependent on the number of directory components in the path. I tried it on a PATH_MAX long "/foo/foo/.../foo" directory and there I'm getting about 80µs.

Why is the same C program sometimes much faster

Modern CPUs have dynamically changing frequency, you should always measure not only wall time (astronomic time), but also number of cpu cycles. perf stat (actually, perf stat -e task-clock,cycles,instructions is enough) shows you mean CPU core frequency while the program was running in line of cycles, if there was cpu-clock/task-clock event to measure wall time (cycles divided to time to get GHz):

 #### cycles                    #    1,653 GHz    

#### cycles # 2,579 GHz

This is Intel Turbo Boost (2), https://en.wikipedia.org/wiki/Intel_Turbo_Boost (AMD has https://en.wikipedia.org/wiki/AMD_Turbo_Core). Both are very fast, so when cpupower -c all frequency-info is running, the real frequency is low (1.3); but when there is high load from your program, CPU will scale its frequency to higher level in several microseconds.

Sometimes it is possible to turn off it in BIOS to get more uniform measurements: http://www.intel.com/content/www/us/en/support/processors/000005641.html

How is Intel® Turbo Boost Technology enabled or disabled? -
Intel® Turbo Boost Technology is typically enabled by default. You can only disable and enable the technology through a switch in the BIOS. There are no other user controllable settings.

Or you may try some magic MSR writing (don't write random values into random msr regs, it may break something, or hang PC): https://askubuntu.com/questions/619875/disabling-intel-turbo-boost-in-ubuntu answer by Maythux: "wrmsr -pC 0x1a0 0x4000850089"

Other lines from perf stat: 7361-7360 mln instructions, 1228-1227 mln branches 64 mln branch mispredictions indicate that program was the same and there were same code executed (no external random). You may also try perf stat -d (better is to select some working hardware events from stat -d and list them manually in perf stat -e cpu-clock,....) to check cache event difference between runs.

File stat() vs access() to check permissions on a directory

Either is equivalent for your needs. access() is a cleaner wrapper if you're not going to do anything with the stat structure that you populate.

Just be mindful that you are creating a race when doing this. The permissions can change between calling stat()/access() and when you actually try and use the directory. Hell, the directory could even be deleted and recreated in that time.

It's better to just try and open what you need and check for EPERM. Checking stat() or access() will not guarantee that a subsequent operation won't return EPERM.

access() Security Hole

That is a TOCTOU race (Time of Check to Time of Update). A malicious user could substitute a file he has access to for a symlink to something he doesn't have access to between the access() and the open() calls. Use faccessat() or fstat(). In general, open a file once, and use f*() functions on it (e.g: fchown(), ...).

Why is using fstat not recommended

I believe that what should be made clear here is that both fs.stat and fs.access are not recommended for the particular case of checking for the accessibility of a file before opening it. As well mentioned in the question, this can trigger race conditions. The functions exists() and existsSync() were deprecated (around version 4) for this reason (and a few others related to the API): they were often exploited for this purpose.

When seeking to open a file, the operation will already trigger an error if the file is inaccessible. Therefore, such checks should be handled here. Otherwise, there is more than one reasonable way to check if a file exists.

Also note that, as of version 6.8.0, existsSync() is undeprecated! See discussion and the 6.8.0 changelog. The same rules above apply: only use it if you do not intend to open the file afterwards.



Related Topics



Leave a reply



Submit