SIGSTOP/SIGCONT POSIX behavior

I'm playing around with signals: SIGSTOP and SIGCONT in particular. Here is a test program I wrote. The idea is to create a chain of N + 1 processes (including the main process). Each one has to wait for its child to stop, then stop itself. The main process has to wake up its child when the latter has stopped.

To do so, the f function recursively create the process chain. Each of the process uses sigsuspend on the SIGCHLD signal apart from the last child who stops itself directly. When its child has stopped, a process will receive the SIGCHLD signal, then it can stop on its turn. When the main process receives the SIGCHLD signal it means that all the processes are in the stop state, so it sends the SIGCONT signal to its child. Each process sends SIGCONT to its own child then exit, apart from the last child who just exit.

I tried to make it clear: removed return code tests and wrote some comments.

When executing the program everything seems to be okay but the SIGCONT chain. Some processes get awakened but not all of them. Looking at the running programs (with ps for example) everything seems fine: no blocked processes. I don't really get what could be wrong in this program. Any help or hint would be welcome.

Here is a sample trace. As you can see, the "fork chain" went well, where processes are suspending on SIGCHLD . Then the last child spawns and stops. Which creates a " SIGCHLD chain" over the parents because each process stops itself. When the main process gets is notified of a SIGCHLD it sends SIGCONT to its child, which gets awakened and in turn sends SIGCONT to its own child etc. You can notice that this chain is not complete:

$ ./bin/trycont 
n   pid     log
0   6257    "suspending on SIGCHLD"
1   6258    "suspending on SIGCHLD"
2   6259    "suspending on SIGCHLD"
3   6260    "suspending on SIGCHLD"
4   6261    "suspending on SIGCHLD"
5   6262    "last child - stopping"
4   6261    "got SIGCHLD"
4   6261    "stopping"
3   6260    "got SIGCHLD"
3   6260    "stopping"
2   6259    "got SIGCHLD"
2   6259    "stopping"
1   6258    "got SIGCHLD"
1   6258    "stopping"
0   6257    "got SIGCHLD"
0   6257    "sending SIGCONT to 6258"
1   6258    "awakened - sending SIGCONT to 6259"
2   6259    "awakened - sending SIGCONT to 6260"
# <- not the expected trace

Here is the program: src/trycont.c

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <signal.h>

/* number of created processes with fork
 */
#define N 5

#define printHeader() printf("ntpidtlogn");
#define printMsg(i, p, str, ...) printf("%dt%dt" #str "n", i, p, ##__VA_ARGS__)

void f(int n);
void handler(int sig);

sigset_t set;
struct sigaction action;

int main(int argc, char *argv[])
{
    /* mask SIGCHLD
     */
    sigemptyset(&set);
    sigaddset(&set, SIGCHLD);
    sigprocmask(SIG_SETMASK, &set, NULL);

    /* handler will be called when SIGCHLD is sent to the process
     * during the handler, SIGCHLD will be masked (sa_mask)
     */
    action.sa_mask = set;
    action.sa_handler = handler;
    action.sa_flags = 0;

    /* SIGCHLD will trigger action
     */
    sigaction(SIGCHLD, &action, NULL);

    /* start
     */
    printHeader();
    f(N);

    exit(EXIT_SUCCESS);
}

void f(int n)
{
    pid_t p, pc;
    int myIndex;

    myIndex = N - n;
    p = getpid();

    if (n == 0)
    {
        /* last child
         */
        printMsg(myIndex, p, "last child - stopping");
        kill(p, SIGSTOP);
        printMsg(myIndex, p, "END REACHED");
        exit(EXIT_SUCCESS);
    }

    pc = fork();

    if (pc == 0)
    {
        /* recursion
         */
        f(n - 1);

        /* never reached
         * because of exit
         */
    }

    /* father
     */

    /* suspending on SIGCHLD
     * need to unmask the signal
     * and suspend
     */
    printMsg(myIndex, p, "suspending on SIGCHLD");

    sigfillset(&set);
    sigdelset(&set, SIGCHLD);
    sigsuspend(&set);

    printMsg(myIndex, p, "got SIGCHLD");

    if (n < N)
    {
        /* child process
         * but not last
         */
        printMsg(myIndex, p, "stopping");
        kill(p, SIGSTOP);

        printMsg(myIndex, p, "awakened - sending SIGCONT to %d", pc);
        kill(pc, SIGCONT);
    }
    else
    {
        /* root process
         */
        printMsg(myIndex, p, "sending SIGCONT to %d", pc);
        kill(pc, SIGCONT);
    }

    exit(EXIT_SUCCESS);
}

void handler(int sig)
{
    switch (sig)
    {
    case SIGCHLD:
        /* when the process received SIGCHLD
         * we can ignore upcoming SIGCHLD
         */
        action.sa_handler = SIG_IGN;
        sigaction(SIGCHLD, &action, NULL);
        break;
    default:
        break;
    }
}

Here is a Makefile if you need:

CC=gcc
DEFINES=-D_POSIX_C_SOURCE
STD=-std=c11 -Wall -Werror
OPTS=-O2
CFLAGS=$(STD) $(DEFINES) $(OPTS) -g
LDFLAGS=

SRC=src
OBJ=obj
BIN=bin

DIRS=$(BIN) $(OBJ)

.PHONY: mkdirs clean distclean

all: mkdirs $(BIN)/trycont

$(BIN)/%: $(OBJ)/%.o
    $(CC) $(CFLAGS) $(LDFLAGS) -o $@ $<

$(OBJ)/%.o: $(SRC)/%.c
    $(CC) $(CFLAGS) -c -o $@ $<

mkdirs:
    - mkdir $(DIRS)

clean:
    rm -vf -- $(OBJ)/*.o

distclean: clean
    rm -vfr -- $(DIRS)

Some (all?) of your descendant processes are dying of a system-generated SIGHUP when the first process terminates.

This is expected POSIX behavior under certain circumstances.

When you start the root process from your shell, it is a process group leader, and its descendants are members of that group. When that leader terminates, the process group is orphaned. When the system detects a newly-orphaned process group in which any member is stopped, then every member of the process group is sent a SIGHUP followed by a SIGCONT.

So, some of your descendant processes are still stopped when the leader terminates, and thus everyone receives a SIGHUP followed by a SIGCONT, which for practical purposes mean they die of SIGHUP.

Exactly which descendants are still stopped (or even just merrily advancing toward exit() ) is a timing race. On my system, the leader terminates so quickly that none of the descendants are able to print anything.

链接地址: http://www.djcxy.com/p/50028.html

上一篇: C使用IPC消息队列的信号使用和处理

下一篇: SIGSTOP / SIGCONT POSIX行为