Seeing Too Many Lsof Can't Identify Protocol

Socket Descriptor Leak - lsof Can't Identify Protocol?

Probably you are creating the sockets in your function call which is called in a loop, but the created socket is never closed and this results in a leak.

Troubleshooting 'Too many files open' with lsof


Definition

  • java - The process with the open file.
  • 25426 - This should be the real PID. If not please let us know what it is by posting the header.
  • 420 w - The file descriptor number followed by the mode it was opened with. (Read / write)
  • 0,8 - Major minor device identification.
  • 273664482 - The inode of the file.
  • pipe - A FIFO pipe that is open in your application.

Interpretation

You are not closing all your streams. There are many open file descriptors in read or write mode that are writing to un-named pipes. The most common scenario for this to happen, is when folks use Runtime.getRuntime.exec() and then proceed to keep the streams associated with the process open. You can use the commons IO utils library to close them or you can close them yourself.

    try
{
p = Runtime.getRuntime().exec("something");
}
finally
{
if (p != null)
{
IOUtils.closeQuietly(p.getOutputStream());
IOUtils.closeQuietly(p.getInputStream());
IOUtils.closeQuietly(p.getErrorStream());
}
}

If that is not the problem, you'll need to dig into your code base and identify where the leaky streams are and plug them.

Find which thread is causing too many open files issue and why duplicate node ids in lsof output

The most likely thing is that you are opening resources and then not properly closing them. Make sure you use appropriate methods such as try-with-resources or try-finally blocks to tidy up.

To find the problem you should route all your IO through a class and then keep track of open and close, possibly even remembering the stack trace. You can then query that and see where you are leaking resources.

Too many open files error but lsof shows a legal number of open files

It turns out the problem was that my program was running as an upstart init script, and that the exec stanza does not invoke a shell. ulimit and the settings in limits.conf apply only to user processes in a shell.

I verified this by changing the exec stanza to

exec sudo -u username java $JAVA_OPTS -jar program.jar

which runs java in username's default shell. That allowed the program to use as many open files as it needs.

I have seen it mentioned that you can also call ulimit -n prior to invoking the command; for an upstart script I think you would use a script stanza instead.

I found a better diagnostic than lsof to be ls /proc/{pid}/fd | wc -l, to obtain a precise count of the open file descriptor. By monitoring that I could see that the failures occurred right at 4096 open fds. I don't know where that 4096 comes from; it's not in /etc anywhere; I guess it's compiled into the kernel.

HttpUrlConnection Leaking Sockets with Can't identify protocol error message: even after closing input stream and disconnecting socket

I don't know if this is completely on target but it sounds similar.

There was a problem in the JRE that wasn't fixed until JRE7. I don't know if the fix got backported to 6 eventually, it was not last time I checked. The bug showed up if you passed a hostname to a Socket and it threw an UnknownHostException the socket would leak a file descriptor until the garbage collector collected the dead socket object. The work around is that you resolve the hostname manually and give the socket the IP address instead or upgrade the JRE.

I could not locate the original bug report in Oracle's bug database that has the exact fix version.

Socket accept - Too many open files

There are multiple places where Linux can have limits on the number of file descriptors you are allowed to open.

You can check the following:

cat /proc/sys/fs/file-max

That will give you the system wide limits of file descriptors.

On the shell level, this will tell you your personal limit:

ulimit -n

This can be changed in /etc/security/limits.conf - it's the nofile param.

However, if you're closing your sockets correctly, you shouldn't receive this unless you're opening a lot of simulataneous connections. It sounds like something is preventing your sockets from being closed appropriately. I would verify that they are being handled properly.



Related Topics



Leave a reply



Submit