Performance of fopen vs stat
stat
is probably better, since it doesn't have to allocate resources for actually reading the file. You won't have to call fclose
to release those resources, and you may also benefit from caching of recently checked files.
When it doubt, test it out. Time a big loop that checks for 1000 files using each method, with the appropriate mix of filenames that exist and don't exist.
If you have the source code for stat
and fopen
, you should be able to read through it and get an idea as to which will require more resources.
What is the fastest way to detect file size is not zero without knowing the file descriptor?
You should probably benchmark it for yourself.
I've measured
//Real-time System-time
272.58 ns(R) 170.11 ns(S) //lseek
366.44 ns(R) 366.28 ns(S) //fstat
812.77 ns(R) 711.69 ns(S) //stat("/etc/profile",&sb)
on my Linux laptop. It fluctuates a little between runs but lseek
is usually a bunch of ns faster than fstat
, but you also need a fd for it and open
ing is quite expensive at about 1.6µs, so stat
is probably the best choice for your case.
As tom-karzes has noted, stat
should dependent on the number of directory components in the path. I tried it on a PATH_MAX long "/foo/foo/.../foo" directory and there I'm getting about 80µs
.
Why is the same C program sometimes much faster
Modern CPUs have dynamically changing frequency, you should always measure not only wall time (astronomic time), but also number of cpu cycles. perf stat
(actually, perf stat -e task-clock,cycles,instructions
is enough) shows you mean CPU core frequency while the program was running in line of cycles
, if there was cpu-clock/task-clock event to measure wall time (cycles divided to time to get GHz):
#### cycles # 1,653 GHz
#### cycles # 2,579 GHz
This is Intel Turbo Boost (2), https://en.wikipedia.org/wiki/Intel_Turbo_Boost (AMD has https://en.wikipedia.org/wiki/AMD_Turbo_Core). Both are very fast, so when cpupower -c all frequency-info
is running, the real frequency is low (1.3); but when there is high load from your program, CPU will scale its frequency to higher level in several microseconds.
Sometimes it is possible to turn off it in BIOS to get more uniform measurements: http://www.intel.com/content/www/us/en/support/processors/000005641.html
How is Intel® Turbo Boost Technology enabled or disabled? -
Intel® Turbo Boost Technology is typically enabled by default. You can only disable and enable the technology through a switch in the BIOS. There are no other user controllable settings.
Or you may try some magic MSR writing (don't write random values into random msr regs, it may break something, or hang PC): https://askubuntu.com/questions/619875/disabling-intel-turbo-boost-in-ubuntu answer by Maythux: "wrmsr -pC 0x1a0 0x4000850089
"
Other lines from perf stat
: 7361-7360 mln instructions, 1228-1227 mln branches 64 mln branch mispredictions indicate that program was the same and there were same code executed (no external random). You may also try perf stat -d
(better is to select some working hardware events from stat -d
and list them manually in perf stat -e cpu-clock,....
) to check cache event difference between runs.
File stat() vs access() to check permissions on a directory
Either is equivalent for your needs. access()
is a cleaner wrapper if you're not going to do anything with the stat structure that you populate.
Just be mindful that you are creating a race when doing this. The permissions can change between calling stat()/access()
and when you actually try and use the directory. Hell, the directory could even be deleted and recreated in that time.
It's better to just try and open what you need and check for EPERM
. Checking stat()
or access()
will not guarantee that a subsequent operation won't return EPERM.
access() Security Hole
That is a TOCTOU race (Time of Check to Time of Update). A malicious user could substitute a file he has access to for a symlink to something he doesn't have access to between the access()
and the open()
calls. Use faccessat()
or fstat()
. In general, open a file once, and use f*()
functions on it (e.g: fchown()
, ...).
Why is using fstat not recommended
I believe that what should be made clear here is that both fs.stat
and fs.access
are not recommended for the particular case of checking for the accessibility of a file before opening it. As well mentioned in the question, this can trigger race conditions. The functions exists()
and existsSync()
were deprecated (around version 4) for this reason (and a few others related to the API): they were often exploited for this purpose.
When seeking to open a file, the operation will already trigger an error if the file is inaccessible. Therefore, such checks should be handled here. Otherwise, there is more than one reasonable way to check if a file exists.
Also note that, as of version 6.8.0, existsSync()
is undeprecated! See discussion and the 6.8.0 changelog. The same rules above apply: only use it if you do not intend to open the file afterwards.
Related Topics
Can't Sample Hardware Cache Events with Linux Perf
How Does Linux Kernel Prevents The Bios System Calls
Udev Rules Are Not Working for Libusb on Ubuntu 12.04
Example of Using External Libraries or Packages in Common Lisp
Nasm X86_64 Assembly in 32-Bit Mode: Why Does This Instruction Produce Rip-Relative Addressing Code
How to Lock The Cursor to The Inside of a Window on Linux
Binding on a Port with Netpipes/Netcat
Linux Awk Comparing Two CSV Files and Creating a New File with a Flag
Replace Strings with Evaluated String Based on Matched Group (Elegant Way, Not Using for .. In)
Can't Use Gpg-Agent as an Ssh Agent
Are Debug Symbols Loaded into Memory on Linux
Best Linux Filesystem Filter Option
How to Delete 5 Lines Before and 6 Lines After Pattern Match Using Sed
Program Life in Terms of Paged Segmentation Memory