Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch signals using a self-pipe (follow-up to #98 and #132) #143

Merged
merged 1 commit into from
Dec 3, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 41 additions & 65 deletions ttyplot.c
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,7 @@
#define T_LLCR ACS_LLCORNER
#endif

static sigset_t block_sigset;
static sigset_t empty_sigset;
static volatile sig_atomic_t sigint_pending = 0;
static volatile sig_atomic_t sigwinch_pending = 0;
static int signal_read_fd, signal_write_fd;
static cchar_t plotchar, max_errchar, min_errchar;
static struct timeval now;
static double td;
Expand Down Expand Up @@ -281,40 +278,12 @@ static void paint_plot(void) {
move(0,0);
}

static void resize(int signum) {
(void)signum;
sigwinch_pending = 1;
}

static void finish(int signum) {
(void)signum;
sigint_pending = 1;
}

static int pselect_without_signal_starvation(
int nfds,
fd_set * readfds,
fd_set * writefds,
fd_set * exceptfds,
const struct timespec * timeout,
const sigset_t * sigmask) {
// With high pressure on file descriptors (e.g. with "ttyplot < /dev/zero")
// a call to `pselect` could stall signal delivery for 20+ seconds on Linux.
// To avoid that situation, we first do a call to `pselect` that is dedicated
// to signal delivery and that only.
// (Related: https://stackoverflow.com/q/62315082)
const struct timespec zero_timeout = { .tv_sec = 0, .tv_nsec = 0 };

// First call, signal delivery only
const int select_ret = pselect(0, NULL, NULL, NULL, &zero_timeout, sigmask);

const bool signal_received = ((select_ret == -1) && (errno == EINTR));
if (signal_received) {
return select_ret;
}

// Second call
return pselect(nfds, readfds, writefds, exceptfds, timeout, sigmask);
// Send signals through a pipe, in order to catch them without race conditions.
// pselect() could be an alternative, but it is unreliable on Linux.
// (Related: https://stackoverflow.com/q/62315082)
static void signal_handler(int signum) {
const unsigned char signal_number = (unsigned char) signum; // signum is either 2 (SIGINT) or 28 (SIGWINCH)
write(signal_write_fd, &signal_number, 1);
}

static void redraw_screen(const char * errstr) {
Expand Down Expand Up @@ -585,39 +554,42 @@ int main(int argc, char *argv[]) {
cbreak();
}

sigemptyset(&empty_sigset);

sigemptyset(&block_sigset);
sigaddset(&block_sigset, SIGINT);
sigaddset(&block_sigset, SIGWINCH);
sigprocmask(SIG_BLOCK, &block_sigset, NULL);
signal(SIGWINCH, signal_handler);
signal(SIGINT, signal_handler);

signal(SIGWINCH, resize);
signal(SIGINT, finish);
int signal_fds[2];
if (pipe(signal_fds) != 0) {
perror("pipe");
exit(1);
}
signal_read_fd = signal_fds[0];
signal_write_fd = signal_fds[1];

while(1) {
// Block until (a) we receive a signal or (b) stdin can be read without blocking
// or (c) timeout expires, in order to reduce use of CPU and power while idle
fd_set read_fds;
FD_ZERO(&read_fds);
int select_nfds = 0;
FD_SET(signal_read_fd, &read_fds);
int select_nfds = signal_read_fd + 1;
if (stdin_is_open) {
FD_SET(STDIN_FILENO, &read_fds);
select_nfds = STDIN_FILENO + 1;
if (STDIN_FILENO >= select_nfds)
select_nfds = STDIN_FILENO + 1;
}
if (tty != -1) {
FD_SET(tty, &read_fds);
if (tty >= select_nfds)
select_nfds = tty + 1;
}
// Refresh the clock on the next full second.
const int nanoseconds_per_second = 1e9;
const int nanoseconds_remaining = nanoseconds_per_second - now.tv_usec * 1000;
struct timespec timeout = {
.tv_sec = nanoseconds_remaining / nanoseconds_per_second,
.tv_nsec = nanoseconds_remaining % nanoseconds_per_second
const int microseconds_per_second = 1e6;
const int microseconds_remaining = microseconds_per_second - now.tv_usec;
struct timeval timeout = {
.tv_sec = microseconds_remaining / microseconds_per_second,
.tv_usec = microseconds_remaining % microseconds_per_second
};
const int select_ret = pselect_without_signal_starvation(select_nfds, &read_fds, NULL, NULL, &timeout, &empty_sigset);
const int select_ret = select(select_nfds, &read_fds, NULL, NULL, &timeout);
Copy link
Collaborator

@hartwork hartwork Dec 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgar-bonet I only notice now that with this falling back to plain select rather than pselect, this would not fix the race but just move the race elsewhere? Am I missing something?

Also, if we do blocking reads and writes to the pipe, isn't that a problem, given that writing only works when there already is a reader waiting? Do we need to put it in non-blocking mode and read as many signals as there are rather than just one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no race here. Signals are never blocked, so the write to the pipe will always happen as soon as a signal arrives. And the signal will be handled on the next loop iteration.

Blocking reads should not block, as per the select promise. We may put all the file descriptors in non-blocking mode if we want to be extra defensive. Writes could only block if the pipe gets full, which can only happen if you send several thousand signals per second. You may be able to do that programmatically, but not by resizing the terminal or typing Ctrl-C.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[..] Signals are never blocked [..]

@edgar-bonet while you say it like that, do we need to bring back signal blocking around interaction with ncurses a la pull request #130 then?

[T]he write to the pipe will always happen as soon as a signal arrives. And the signal will be handled on the next loop iteration.

Blocking reads should not block, as per the select promise.

Understood, yes.

We may put all the file descriptors in non-blocking mode if we want to be extra defensive. Writes could only block if the pipe gets full, which can only happen if you send several thousand signals per second. You may be able to do that programmatically, but not by resizing the terminal or typing Ctrl-C.

I mis-remembered then probably: I thought that writes to a pipe always block until there is a reader 🤔 man 7 pipe is with you.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to bring back signal blocking around interaction with ncurses a la pull request #130 then?

We don't. The signal handler does nothing but a write, which is async-signal-safe as per POSIX. The signal can really interrupt us any time without disrupting the program.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgar-bonet isn't the important question whether the ncurses functions will stop their job half way when interrupted by a signal or not? From a quick look the man pages of erase and refresh say nothing about signals. I think that's what we'd need to know to be sure of no disruption. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an ncurses function is interrupted by a signal, it will stop halfway... then resume from where it stopped, and not even be aware that it had been interrupted. It is not very different from what happens to any running program during a hardware interrupt.

Most of these functions are not async-signal-safe, but this does not mean it is unsafe to interrupt them: it means it is unsafe to call them from a signal handler.

That being said, it is conceivable that some ncurses function is buggy and mishandles a syscall failing with EINTR. In that case it would make sense to work around that bug by blocking signals around the call. However, I would avoid implementing workarounds for bugs we do not even know they exist.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgar-bonet okay, let's note though that when we learn about that later, it will have been this move back to select from pselect that caused it and that — compared to 1.5.2 — now we have select without blocking signals while we had select with blocking signals in 1.5.2. There are multiple ways to see this and so going forward with #143 means that you are taking responsibility for dropping these blocks compared to 1.5.2; I cannot take the responsibility, because I cannot guarantee, that it will not cause new problems. Is that aware and okay with you? I'll do a last round of review then…

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had select with blocking signals in 1.5.2.

Let me take a look at 1.5.2 again... I see. That version called ncurses from the SIGWINCH signal handler, which is inherently unsafe and can lead, e.g., to memory corruption. This is reported in issue #62.

That version also blocked signals around calls to ncurses. This is a reasonable mitigation strategy: it prevents ncurses from being called on top of an interrupted ncurses function, and should thus prevent ncurses from corrupting its own state. The mitigation is not perfect though, and not guaranteed to always work. For example, it cannot prevent ncurses from calling malloc on top of an interrupted malloc call.

Now that we have a signal handler that is completely async-signal-safe, that mitigation has lost its purpose.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgar-bonet I'll take that as "I'm sure enough that a block-do-unblock wrapper is not needed to take full responsibility for not re-adding some". Good.


gettimeofday(&now, NULL);

Expand All @@ -626,17 +598,21 @@ int main(int argc, char *argv[]) {
redraw_needed = true;

// Handle signals.
if (sigint_pending) {
break;
}
if (sigwinch_pending) {
sigwinch_pending = 0;
endwin();
initscr();
erase();
refresh();
gethw();
redraw_needed = true;
if (select_ret > 0 && FD_ISSET(signal_read_fd, &read_fds)) {
unsigned char signal_number;
const ssize_t count = read(signal_read_fd, &signal_number, 1);
if (count > 0) {
if (signal_number == SIGINT)
break;
if (signal_number == SIGWINCH) {
endwin();
initscr();
erase();
refresh();
gethw();
redraw_needed = true;
}
}
}

// Handle user's keystrokes.
Expand Down