Linux Ipc: Shared Memory Recovery

Linux IPC: shared memory recovery

It might be better idea to alter your producer design:

  • Instead of using IPC_CREAT it could first check if there is an existing segment that could be re-used.

  • You could also consider using mmap based shared memory instead which is more flexible in some ways.

  • You could use some other indicator such as a lock file to determine if the shared memory interface is still viable.

However, if for some reason these are not options (someone else controls the producer code for example) then read on.

There are several things you can do:

  1. use shmctl() to 'stat' your memory segment
 // return true if the shared memory region is still 'useful/useable'
bool checkShm(int shmId)
{
struct shmid_ds statBuf;
int res = shmctl(<shmid>, IPC_STAT, statBuf);
if (res == -1) return false;
...

  1. check if the region is marked for deletion (Linux specific)
 if ((statBuf.shm_perm.mode&SHM_DEST) != 0) return false;

  1. assuming you attached after the producer and it is the creator process - check that it dettached after you.
    caveat: It could have reattached again if your design allows this.
 if (statBuf.shm_cpid == shmBuf.shm_lpid) return false;

  1. check the PID of the creator process is a running process.
    caveat: the PID could be recycled by a new process
 if (getpgid(shmBuf.shm_cpid) == -1) return false;

note: you could use kill(shmBuf.shm_cpid,0) instead if the producer is not a different user.


  1. You might also want to check if the file has been modified.
    A key point is that ftok uses the inode number not the actual filename as the man page suggests. So you need to be careful using it:
struct stat fstatBuf;
int res = stat(fileName,&fstatBuf);
if (res == -1) return false; // if the file has disappeared it could be a bad sign!
if (fstatBuf.st_ino != savedInode) return false;

Having done all this you should now have a reasonably good way to check if the SHM you think is still useful is actually being used by the 'producer' you think it is.


  1. Clean up the stale shared memory segmant

You are now free to detach shmdt() from the segment, and try to clean it up shmctl(shmid,IPC_RMID,NULL). The consumer process might not have permissions to remove it if the creator did not grant them.


  1. Attach to the replacement shared memory segment

You are then in principle able to attach to any new shared memory segment created by a replacement producer process:

auto key = ftok(<somefile>,<someid>;
void* memArea = shmat(key,NULL,0);
// check errors and do stuff...

But there a cruel and interesting punishment awaits you. It will not work immediately. You have to wait a time and periodically retry. I guess this is until the operating system has had a chance to clean up the old memory segment.

I found that ftok() returns -1 for a while despite the file existing and having the same inode as the original file.

Relationship between shared memory and files

The path given to ftok is just a placeholder in the "everything is a file" tradition.

After some time considering I would say mmap is a simpler, safer and more effective API. I personally would avoid shmget etc. completely if possible.

It is particularly awkward to clean up after shmget() see:

  • Linux IPC: shared memory recovery

See also:

  • Linux shared memory: shmget() vs mmap()?
  • System V IPC vs POSIX IPC
  • How do you correctly cleanup and re-use SysV shared memory segments?

An argument used in favour of system-v in one of the linked questions is that Posix was less widely implemented. I don't know if that was true then but it seems even less likely to be true now. Given the dominance of GNU/Linux & BSD derivatives. Even the legendarily 'unique' AIX claims Posix compliance these days.

How do you correctly cleanup and re-use SysV shared memory segments?

I was misled into thinking it was proper form to call shmctl(segmentId, IPC_RMID) as soon as the process designated as the owner has attached to the shared memory.

In fact IPC_RMID should not be called until all processes have attached.

Part of the answer is here:

https://comp.unix.programmer.narkive.com/iLg3PhfZ/shmctl-ipc-rmid-oddity

It seems that IPC_RMID sets the segment to private so that no new processes can attach to it.

A way of guaranteeing a unique segment is deliberately using IPC_PRIVATE to start with:

id = shmget(IPC_PRIVATE, IPC_CREAT | mode);

This also avoids the need to use ftok() and risk colliding with another segment.
Unfortunately I cannot use that here as the interface is predicated on identifying the segment with ftok(). At least I understand the issue here.

Someone wiser may be able to chip in with better ways of cleaning up before re-use.

See also https://www.linuxquestions.org/questions/programming-9/shmctl-ipc_rmid-precludes-further-attachments-574636/

Also consider this question and answer: Linux IPC: shared memory recovery

IPC - How to redirect a command output to a shared memory segment in child

How to redirect stdout of the ls -l

We must shed more light on the processes (parent and children) involved into this code.
How many processes your program creates during its run?
The correct answer is - three.
Two processes are the parent and the explicitly forked child.
The third one is created by the system("ls -l") call.
This function implicitly forks another process that executes (by calling an exec family function) the "ls -l" sell command. What you need to redirect is the output of the child process created by the system() function. It is sad, but the system() does not establish IPC between the participators. If you need to manipulate with the output, do not use system().

I agree with @leeduhem, popen() could be the best approach.
It works exactly as the system(), i.e. forks a new process and executes "ls -l".
In addition, it also establishes a pipe IPC between the participators, so it is easy to catch the child output and to do with it whatever you want:

char buff[1024];
FILE *fd;

// instead of system("ls -l")
fd = popen("ls -l", "r");
// check for errors

while(fgets(buff, sizeof(buff), fd) != NULL)
{
// write to the shared memory
}

pclose(fd);

If you do not want to use the popen() function, you may write a similar one.
The general approach is

  1. open a pipe()
  2. fork() a new process
  3. redirect stdout using dup2
  4. call a suitable exec() function (probably execl()) executing "ls -l"
  5. read from the descriptor you are duplicating by dup2.

I have a c++ program running on linux, is it possible to have it periodically store state snapshots in shared memory for post-crash recovery?

Just an idea (not tried) on Uni*x like systems.

Do a fork(2) and send a SIGTRAP signal to this child process (or any signal which creates a core dump).

Fork makes a copy of the original process environment. This will dump the full memory state. Then it can be analysed by gdb (or alike). Of course it is not for recovery...

You can create a gdbinit file and You can dump the variables from a script calling gdb with the core file.

Why the shared memory is needed? Is it not good to dump the state to disk?


I think this can be used for recover as well. Perl -u command line argument does similar thing. It parses the script file and dumps a core file. This core file can be used by undump program to load the core directly to the memory and start perl without the parsing phase.

How does shared memory work behind the scene in Linux?

Every process has its own virtual memory space. To simplify things a bit, you can imagine that a process has all possible memory addresses 0x00000000..0xffffffff available for itself. One consequence of this is that a process can not use memory allocated to any other process – this is absolutely essential for both stability and security.

Behind the scenes, kernel manages allocations of all processes and maps them to physical memory, making sure they don't overlap. Of course, not all addresses are in fact mapped, only those that are being used. This is done by means of pages, and with the help of memory-mapping unit in the CPU hardware.

Creating shared memory (shmget) allocates a chunk of memory that does not belong to any particular process. It just sits there. From the kernel's point of view, it doesn't matter who uses it. So a process has to request access to it – that's the role of shmat. By doing that, kernel will map the shared memory into process' virtual memory space. This way, the process can read and write it. Because it's the same memory, all processes who have "attached" to it see the same contents. Any change a process makes is visible to other processes as well.

C - System V - remove shared memory segment

As a follow-up to the comments which shows how to remark shared memory segment for destruction:

shmid1 = shmget(key1,1024,0666|IPC_CREAT);
...
shmctl(shmid1, IPC_RMID, NULL)


Related Topics



Leave a reply



Submit