Is there a good way to detect a stale NFS mount
You could write a C program and check for ESTALE
.
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <iso646.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
int main(){
struct stat st;
int ret;
ret = stat("/mnt/some_stale", &st);
if(ret == -1 and errno == ESTALE){
printf("/mnt/some_stale is stale\n");
return EXIT_SUCCESS;
} else {
return EXIT_FAILURE;
}
}
loop to test all NFS mount point
In your loop it should be:
read -r -t1 < <(stat -t "$i" 2>&-)
Currently it's just reading the first array value and $i
isn't used.
Linux Shell Script: How to detect NFS Mount-point (or the Server) is dead?
"stat" command is a somewhat cleaner way:
statresult=`stat /my/mountpoint 2>&1 | grep -i "stale"`
if [ "${statresult}" != "" ]; then
#result not empty: mountpoint is stale; remove it
umount -f /my/mountpoint
fi
Additionally, you can use rpcinfo to detect whether the remote nfs share is available:
rpcinfo -t remote.system.net nfs > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo Remote NFS share available.
fi
Added 2013-07-15T14:31:18-05:00:
I looked into this further as I am also working on a script that needs to recognize stale mountpoints. Inspired by one of the replies to "Is there a good way to detect a stale NFS mount", I think the following may be the most reliable way to check for staleness of a specific mountpoint in bash:
read -t1 < <(stat -t "/my/mountpoint")
if [ $? -eq 1 ]; then
echo NFS mount stale. Removing...
umount -f -l /my/mountpoint
fi
"read -t1" construct reliably times out the subshell if stat command hangs for some reason.
Added 2013-07-17T12:03:23-05:00:
Although read -t1 < <(stat -t "/my/mountpoint")
works, there doesn't seem to be a way to mute its error output when the mountpoint is stale. Adding > /dev/null 2>&1
either within the subshell, or in the end of the command line breaks it. Using a simple test: if [ -d /path/to/mountpoint ] ; then ... fi
also works, and may preferable in scripts. After much testing it is what I ended up using.
Added 2013-07-19T13:51:27-05:00:
A reply to my question "How can I use read timeouts with stat?" provided additional detail about muting the output of stat (or rpcinfo) when the target is not available and the command hangs for a few minutes before it would time out on its own. While [ -d /some/mountpoint ]
can be used to detect a stale mountpoint, there is no similar alternative for rpcinfo, and hence use of read -t1
redirection is the best option. The output from the subshell can be muted with 2>&-. Here is an example from CodeMonkey's response:
mountpoint="/my/mountpoint"
read -t1 < <(stat -t "$mountpoint" 2>&-)
if [[ -n "$REPLY" ]]; then
echo "NFS mount stale. Removing..."
umount -f -l "$mountpoint"
fi
Perhaps now this question is fully answered :).
Check if NFS Directory Mounted without Large Hangs on Failure
Ok I managed to solve this using the timeout command, I checked back here to see that BroSlow updated his answer with a very similar solution. Thank you BroSlow for your help.
To solve the problem, the code I used is:
if [[ `timeout 5s ls /nfs/machine |wc -l` -gt 0 ]] ; then
echo "can see machine"
else
echo "cannot see machine"
fi
I then reduced this to a single line command so that it could be run through ssh and put inside of a loop (to loop through hosts and execute this command).
Stale file handle error, when process trying read the file, that other process already had deleted
This is totally expected. The NFS specification is clear about use of file handles after an object (be it file or directory) has been deleted. Section 4 clearly addresses this. For example:
The persistent filehandle will become stale or invalid when the file system object is removed. When the server is presented with a persistent filehandle that refers to a deleted object, it MUST return an error of NFS4ERR_STALE.
This is such a common problem, it even has its own entry in section A.10 of the NFS FAQ, which says one common cause of ESTALE errors is that:
The file handle refers to a deleted file. After a file is deleted on the server, clients don't find out until they try to access the file with a file handle they had cached from a previous LOOKUP. Using rsync or mv to replace a file while it is in use on another client is a common scenario that results in an ESTALE error.
The expected resolution is that your client app must close and reopen the file to see what has happened. Or, as the FAQ says:
... to recover from an ESTALE error, an application must close the file or directory where the error occurred, and reopen it so the NFS client can resolve the pathname again and retrieve the new file handle.
Nagios SNMP Process check hangs on stale nfs mount
Fairly simple - NFS is designed to tolerate server reboots. NFS calls to a mounted file system when it's mounted hard
will therefore block and wait for the server to respond. This is to ensure that no data is lost or processes are suspended - they simply 'stall' - which'll be the problem you're having.
There's a mount option to nfs that avoids this problem - simply specify soft
when mounting (either in fstab, or -o soft
when doing it manually).
Be warned though - you'll get errors when accessing the NFS mount. Most things will tolerate this scenario, but it's always possible that badly written scripts or programs will fall over.
What does 'stale file handle' in Linux mean?
When the directory is deleted, the inode for that directory (and the inodes for its contents) are recycled. The pointer your shell has to that directory's inode (and its contents's inodes) are now no longer valid. When the directory is restored from backup, the old inodes are not (necessarily) reused; the directory and its contents are stored on random inodes. The only thing that stays the same is that the parent directory reuses the same name for the restored directory (because you told it to).
Now if you attempt to access the contents of the directory that your original shell is still pointing to, it communicates that request to the file system as a request for the original inode, which has since been recycled (and may even be in use for something entirely different now). So you get a stale file handle
message because you asked for some nonexistent data.
When you perform a cd
operation, the shell reevaluates the inode location of whatever destination you give it. Now that your shell knows the new inode for the directory (and the new inodes for its contents), future requests for its contents will be valid.
Related Topics
Merge Files with Bash by Primary Key
Matlab Mex Socket Wrapper Library
How to Detect Usb Drive Insertion in Linux
Linux Script to Automate Ftp Operation
How to Use Stdin Twice from Pipe
String Comparison Not Working Properly
Ftdi D2Xx Conflict with Ftdi_Sio on Linux - How to Remove Ftdi_Sio Automatically
How to Disable CPU Cache (L1/L2) on a Linux System
Python Socket.Error: [Errno 13] Permission Denied
Using a Glob Expression Passed as a Bash Script Argument
Splitting Gzipped Logfiles Without Storing the Ungzipped Splits on Disk
Linux:How to Set Default Route from C
Best Posix Way to Determine If a Filesystem Is Mounted Read Only
Opening a .Tar.Gz File with a Single Command
Genymotion Throws Libssl_Conf.So: Cannot Open Shared Object File: No Such File or Directory