Linux TCP connect with Select() fails at testserver
with nonblocking socket connect() call may return 0 with the connection is still not ready
the connect() code section, may be written like this(my connect wraper code segment learnt from the python implementation):
if (FAIL_CHECK(connect(sock, (struct sockaddr *) &channel, sizeof(channel)) &&
errno != EINPROGRESS))
{
gko_log(WARNING, "connect error");
ret = HOST_DOWN_FAIL;
goto CONNECT_END;
}
/** Wait for write bit to be set **/
#if HAVE_POLL
{
struct pollfd pollfd;
pollfd.fd = sock;
pollfd.events = POLLOUT;
/* send_sec is in seconds, timeout in ms */
select_ret = poll(&pollfd, 1, (int)(send_sec * 1000 + 1));
}
#else
{
FD_ZERO(&wset);
FD_SET(sock, &wset);
select_ret = select(sock + 1, 0, &wset, 0, &send_timeout);
}
#endif /* HAVE_POLL */
if (select_ret < 0)
{
gko_log(FATAL, "select/poll error on connect");
ret = HOST_DOWN_FAIL;
goto CONNECT_END;
}
if (!select_ret)
{
gko_log(FATAL, "connect timeout on connect");
ret = HOST_DOWN_FAIL;
goto CONNECT_END;
}
python version code segment:
res = connect(s->sock_fd, addr, addrlen);
if (s->sock_timeout > 0.0) {
if (res < 0 && errno == EINPROGRESS && IS_SELECTABLE(s)) {
timeout = internal_select(s, 1);
if (timeout == 0) {
/* Bug #1019808: in case of an EINPROGRESS,
use getsockopt(SO_ERROR) to get the real
error. */
socklen_t res_size = sizeof res;
(void)getsockopt(s->sock_fd, SOL_SOCKET,
SO_ERROR, &res, &res_size);
if (res == EISCONN)
res = 0;
errno = res;
}
else if (timeout == -1) {
res = errno; /* had error */
}
else
res = EWOULDBLOCK; /* timed out */
}
}
if (res < 0)
res = errno;
Connect Timeout with Alarm()
signal
is a massively under-specified interface and should be avoided in new code. On some versions of Linux, I believe it provides "BSD semantics", which means (among other things) that providing SA_RESTART
by default.
Use sigaction
instead, do not specify SA_RESTART
, and you should be good to go.
...
Well, except for the general fragility and unavoidable race conditions, that is. connect
will return EINTR
for any signal, not just SIGALARM
. More troublesome, if the system happens to be under heavy load, it could take more than 5 seconds between the call to alarm
and the call to connect
, in which case you will miss the signal and block in connect
forever.
Your earlier attempt, using non-blocking sockets with connect
and select
, was a much better idea. I would suggest debugging that.
Synchronizing a Test Server During Tests
You can just attempt to connect to the server before starting the test suite, as part of the initialization process.
For example, I usually have a function like this in my tests:
// waitForServer attempts to establish a TCP connection to localhost:<port>
// in a given amount of time. It returns upon a successful connection;
// ptherwise exits with an error.
func waitForServer(port string) {
backoff := 50 * time.Millisecond
for i := 0; i < 10; i++ {
conn, err := net.DialTimeout("tcp", ":"+port, 1*time.Second)
if err != nil {
time.Sleep(backoff)
continue
}
err = conn.Close()
if err != nil {
log.Fatal(err)
}
return
}
log.Fatalf("Server on port %s not up after 10 attempts", port)
}
Then in my TestMain()
I do:
func TestMain(m *testing.M) {
go startServer()
waitForServer(serverPort)
// run the suite
os.Exit(m.Run())
}
socket.error: [Errno 10013] An attempt was made to access a socket in a way forbidden by its access permissions
On Windows Vista/7, with UAC, administrator accounts run programs in unprivileged mode by default.
Programs must prompt for administrator access before they run as administrator, with the ever-so-familiar UAC dialog. Since Python scripts aren't directly executable, there's no "Run as Administrator" context menu option.
It's possible to use ctypes.windll.shell32.IsUserAnAdmin()
to detect whether the script has admin access, and ShellExecuteEx
with the 'runas' verb on python.exe, with sys.argv[0] as a parameter to prompt the UAC dialog if needed.
Socket accept - Too many open files
There are multiple places where Linux can have limits on the number of file descriptors you are allowed to open.
You can check the following:
cat /proc/sys/fs/file-max
That will give you the system wide limits of file descriptors.
On the shell level, this will tell you your personal limit:
ulimit -n
This can be changed in /etc/security/limits.conf - it's the nofile param.
However, if you're closing your sockets correctly, you shouldn't receive this unless you're opening a lot of simulataneous connections. It sounds like something is preventing your sockets from being closed appropriately. I would verify that they are being handled properly.
Related Topics
How to Source a Simple Bash Script
Implementation of Function Execve (Unistd.H)
Gitlab-Ci Alpine Image: Host Key Verification Failed
Would Gcc 4.8 and 4.7 Peacefully Coexist on The Same Machine
Why a Static Library Can Depend on a Shared a Library
Redirecting Output of a C Program to Another C Program with a Bash Script Under Linux
How to Efficiently Get 10% of Random Lines Out of The Large File in Linux
Linux Allocates Memory at Specific Physical Address
Kaldi Toolkit Installation Error on Ubuntu 16.04
Is There Any Way for Ioctl() in Linux to Specify Submission Queue Id for a Nvme Io Request
Rename Multiple Files - Linux/Ubuntu
Linux - Bash Redirect a String to a File
How to Create a File of Size More Than 2Gb in Linux/Unix
How to Use If/Else Awk to Evaluate a File and Extract This Information