Symlinks - Performance Hit

Symlinks - performance hit?

I have created a file testfile.txt with 1000 lines of blablabla in it, and created a local symlink (testfile.link.txt) to it:

$ ls -n
total 12
lrwxrwxrwx 1 1000 1000    12 2012-09-26 14:09 testfile.link.txt -> testfile.txt
-rw-r--r-- 1 1000 1000 10000 2012-09-26 14:08 testfile.txt

(The -n switch is only there to hide my super-secret username.:))

And then executed 10 rounds of cating into /dev/null 1000 times for both files.
(Results are in seconds.)

Accessing the file directly:

$ for j in `seq 1 10`; do ( time -p ( for i in `seq 1 1000`; do cat testfile.txt >/dev/null; done ) ) 2>&1 | grep 'real'; done
real 2.32
real 2.33
real 2.33
real 2.33
real 2.33
real 2.32
real 2.32
real 2.33
real 2.32
real 2.33

Accessing through symlink:

$ for j in `seq 1 10`; do ( time -p ( for i in `seq 1 1000`; do cat testfile.link.txt >/dev/null; done ) ) 2>&1 | grep 'real'; done
real 2.30
real 2.31
real 2.36
real 2.32
real 2.32
real 2.31
real 2.31
real 2.31
real 2.32
real 2.32

Measured on (a rather old install of) Ubuntu:

$ uname -srvm
Linux 2.6.32-43-generic #97-Ubuntu SMP Wed Sep 5 16:43:09 UTC 2012 i686

Of course it's a dumbed-down example, but based on this I wouldn't expect too much of a performance degradation when using symlinks.

I personally think, that using symlinks is more practical:

As you said, your deployment process will be simpler.
You can also easily use versioning and roll-back if you include some kind of timestamp or version number in the directory names (e.g. my_web_files.v1, my_web_files.v2), and use the "official" name in the symlink (e.g. my_web_files) pointing to the "live" version. If you want to change the version, just re-link to another versioned directory.

Is there any significant performance hit, for serving file with PHP

The more processes involved, the more performance will suffer. So you can expect some performance hit, but how much you will need to measure and then decide if that is worth it for your auth checks. In my experience, the cost is marginal.

One thing, don't forget scaling performance: when you're tying up your PHP processes streaming files you're reducing the total number of processes available to serve other requests.

If you're worried about scale and performance, do everything you can to serve this content up-stream. For example:

Perform the auth check in PHP, then issue a redirect to a CDN with a sufficiently large keyspace (eg UUID) -- you might have to rotate files in this keyspace periodically if you're worried about people reusing these URL.
Require the auth have been performed already and have the load balancers check the auth tokens against an IdP.

When you implement it in PHP, make sure to use something like readfile with output buffering disabled. Otherwise, you're increasing the size of your web service process by the size of the content, which could cause out of memory exceptions.

Performance hit of using php versus html with 'img src'

Based on your comments above, I would say this sounds like a very inefficient way to do it, mostly because it stops normal caching. If somebody is likely to automate scraping of your full size images, then they will find a way around it (e.g. Selenium RC).

If you're only concern is about someone scraping the images, then use another solution. Here are some other solutions:

How do I prevent site scraping?
Protection from screen scraping

The honeypot is a very common implementation.

Sharing git repo symlinks between Windows and WSL

This issue occurs because Windows and Linux (or at least the emulated version) disagree about the size of a symlink. On Windows, a symlink's size is in blocks, so a 6-character symlink will be 4096 bytes in size. On Linux, a symlink's size is the number of bytes it contains (in this example, 6).

One of the things that Git writes into the index to keep track of whether a file has changed is the size. When you perform any sort of update of the index, such as with git reset --hard, Git writes all of this metadata into the index, including the size. When you run git status, git checks this metadata to determine if it matches, and if not, it marks the file as modified.

It is possible to control whether certain information is checked in the index, since some tools can produce bogus info (for example, JGit doesn't write device and inode numbers), but size is always checked, because it is seen as a good indicator of whether a file has changed.

Since this is a fundamental disagreement between how Windows and how WSL see the symlink, this really can't be fixed. You could try asking the Git for Windows project if they'd be willing to work around this issue in Git for Windows, but I suspect the answer is probably going to be no since changing it would likely have a performance impact for all Windows users.

Makefile replace symlink list of files

Let's take this in stages.

First, to create a symbolic link, and remove a preexisting link of that name, if there is one:

ln -fs filename linkname

Now to make a list of the existing files:

existing_files = $(wildcard dir1/dir2/*.txt)

So far, so good. Let's suppose this gives us dir1/dir2/red.txt dir1/dir2/green.txt.

# The next line shows where I would want to put the symbolic links
symlinks = $(wildcard new_dir1/new_dir2/*.txt)

That will give you a list of the things that already exist in that directory, which is probably not what you intend. We must construct the list of links we want from the list of files we have:

filenames := $(notdir $(existing_files))
symlinks := $(addprefix new_dir1/new_dir2/, $(filenames))

Now for a rule or rules to build the symlinks. We could write two explicit rules:

new_dir1/new_dir2/red.txt: dir1/dir2/red.txt
    ln -fs dir1/dir2/red.txt new_dir1/new_dir2

new_dir1/new_dir2/green.txt: dir1/dir2/green.txt
    ln -fs dir1/dir2/green.txt new_dir1/new_dir2

but that is horribly redundant, and besides we don't know the file names beforehand. First we can remove some of the redundancy by defining a variable and using the automatic variale $<:

DEST_DIR := new_dir1/new_dir2

$(DEST_DIR)/red.txt: dir1/dir2/red.txt
    ln -fs $< $(DEST_DIR)

$(DEST_DIR)/green.txt: dir1/dir2/green.txt
    ln -fs $< $(DEST_DIR)

Now we can see how to replace these rules with a pattern rule:

$(DEST_DIR)/%.txt: dir1/dir2/%.txt
    ln -fs $< $(DEST_DIR)

Now all we need is a master rule that requires the links:

.PHONY: make_some_links
make_some_links: $(symlinks)

Symlinks - Performance Hit