Lowest Latency Notification Method Between Process Under Linux

Lowest latency notification method between process under Linux

Generally... There is almost no difference between the OS methods.

Setup:

  1. Two processes with affinity to two different CPUs.
  2. One process sleeps (nanosleep) for N microseconds measures current time
    and then notifies other process.
  3. Other process wakes measures current time and compares it to the client's time.
  4. Average, standard deviation, median and percentile 95 is calculated over 1K samples after warm-up on 100 notifications.
  5. OS: Linux 2.6.35 x86_64
  6. CPU: Intel i5 M460

Results:

Semaphore (sem_wait/sem_post - kernel - futex):

sleep us     mean             median      %95
1 4.98 ±18.7 3.78 5.04
10 4.14 ±14.8 3.54 4.00
100 20.60 ±29.4 22.96 26.96
1000 49.42 ±37.6 30.62 78.75
10000 63.20 ±22.0 68.38 84.38

Signal (kill/sigwait)

sleep us     mean             median      %95
1 4.69 ±3.8 4.21 5.39
10 5.91 ±14.8 4.19 7.45
100 23.90 ±17.7 23.41 35.90
1000 47.38 ±28.0 35.27 81.16
10000 60.80 ±19.9 68.50 82.36

Pipe (pipe + write/read)

sleep us     mean             median      %95
1 3.75 ±5.9 3.46 4.45
10 4.42 ±3.5 3.84 5.18
100 23.32 ±25.6 24.17 38.05
1000 51.17 ±35.3 46.34 74.75
10000 64.69 ±31.0 67.95 86.80

Socket (socketpair +write/read)

sleep us     mean             median      %95
1 6.07 ±3.2 5.55 6.78
10 7.00 ±7.1 5.51 8.50
100 27.57 ±14.1 28.39 50.86
1000 56.75 ±25.7 50.82 88.74
10000 73.89 ±16.8 77.54 88.46

As a reference busy waiting:

sleep us     mean             median      %95
1 0.17 ±0.5 0.13 0.23
10 0.15 ±0.3 0.13 0.19
100 0.17 ±0.3 0.16 0.21
1000 0.22 ±0.1 0.18 0.35
10000 0.38 ±0.3 0.30 0.78

fastest (low latency) method for Inter Process Communication between Java and C/C++

Just tested latency from Java on my Corei5 2.8GHz, only single byte send/received,
2 Java processes just spawned, without assigning specific CPU cores with taskset:

TCP         - 25 microseconds
Named pipes - 15 microseconds

Now explicitly specifying core masks, like taskset 1 java Srv or taskset 2 java Cli:

TCP, same cores:                      30 microseconds
TCP, explicit different cores: 22 microseconds
Named pipes, same core: 4-5 microseconds !!!!
Named pipes, taskset different cores: 7-8 microseconds !!!!

so

TCP overhead is visible
scheduling overhead (or core caches?) is also the culprit

At the same time Thread.sleep(0) (which as strace shows causes a single sched_yield() Linux kernel call to be executed) takes 0.3 microsecond - so named pipes scheduled to single core still have much overhead

Some shared memory measurement:
September 14, 2009 – Solace Systems announced today that its Unified Messaging Platform API can achieve an average latency of less than 700 nanoseconds using a shared memory transport.
http://solacesystems.com/news/fastest-ipc-messaging/

P.S. - tried shared memory next day in the form of memory mapped files,
if busy waiting is acceptable, we can reduce latency to 0.3 microsecond
for passing a single byte with code like this:

MappedByteBuffer mem =
new RandomAccessFile("/tmp/mapped.txt", "rw").getChannel()
.map(FileChannel.MapMode.READ_WRITE, 0, 1);

while(true){
while(mem.get(0)!=5) Thread.sleep(0); // waiting for client request
mem.put(0, (byte)10); // sending the reply
}

Notes: Thread.sleep(0) is needed so 2 processes can see each other's changes
(I don't know of another way yet). If 2 processes forced to same core with taskset,
the latency becomes 1.5 microseconds - that's a context switch delay

P.P.S - and 0.3 microsecond is a good number! The following code takes exactly 0.1 microsecond, while doing a primitive string concatenation only:

int j=123456789;
String ret = "my-record-key-" + j + "-in-db";

P.P.P.S - hope this is not too much off-topic, but finally I tried replacing Thread.sleep(0) with incrementing a static volatile int variable (JVM happens to flush CPU caches when doing so) and obtained - record! - 72 nanoseconds latency java-to-java process communication!

When forced to same CPU Core, however, volatile-incrementing JVMs never yield control to each other, thus producing exactly 10 millisecond latency - Linux time quantum seems to be 5ms... So this should be used only if there is a spare core - otherwise sleep(0) is safer.

Under Linux, is there any language that offers lower latency than C? (apart from assembler)

The difference between C and asm is unlikely to be a major factor in response latency. After all, before reaching your code, the system will have to run a fair bit of C code in the Linux kernel first, in order to schedule in your process. You'd be better off doing things like turning on threaded interrupt handlers, setting real-time priorities, and disabling BIOS features that may cause system-management-mode traps.

Efficient (low latency) ways to communicate between processes?

If you can use NDK, you can use a shared mmaped file object, linux revolves around this

Fastest technique to pass messages between processes on Linux?

I would suggest looking at this also: How to use shared memory with Linux in C.

Basically, I'd drop network protocols such as TCP and UDP when doing IPC on a single machine. These have packeting overhead and are bound to even more resources (e.g. ports, loopback interface).



Related Topics



Leave a reply



Submit