Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch signals using a self-pipe (follow-up to #98 and #132) #143

Merged
merged 1 commit into from
Dec 3, 2023

Conversation

edgar-bonet
Copy link
Collaborator

As noticed by @hartwork, pselect is not a reliable way of catching signals on Linux: in case of high input pressure, signal delivery is starved. Pull request #132 partially fixed the issue by issuing two consecutive pselect calls, but this left a race window between these two calls.

This pull request fixes the issue without race conditions by using plain select() and a self-pipe, which makes signal delivery reliable.

This is one of two possible alternatives, the other one being signalfd (#142).

Copy link
Collaborator

@hartwork hartwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgar-bonet works well in practices, I will likely favor this over alternative #142 (..). Please see four tiny to small things below:

ttyplot.c Outdated Show resolved Hide resolved
ttyplot.c Outdated Show resolved Hide resolved
ttyplot.c Outdated Show resolved Hide resolved
ttyplot.c Outdated Show resolved Hide resolved
@hartwork hartwork added this to the 1.6.0 milestone Dec 1, 2023
@hartwork hartwork changed the title Catch signals using a self-pipe Catch signals using a self-pipe (follow-up to #98 and #132) Dec 1, 2023
struct timespec timeout = { .tv_sec = 0, .tv_nsec = 500e6 }; // 500 milliseconds for refreshing the clock
const int select_ret = pselect_without_signal_starvation(select_nfds, &read_fds, NULL, NULL, &timeout, &empty_sigset);
struct timeval timeout = { .tv_sec = 0, .tv_usec = 500e3 }; // 500 milliseconds for refreshing the clock
const int select_ret = select(select_nfds, &read_fds, NULL, NULL, &timeout);
Copy link
Collaborator

@hartwork hartwork Dec 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgar-bonet I only notice now that with this falling back to plain select rather than pselect, this would not fix the race but just move the race elsewhere? Am I missing something?

Also, if we do blocking reads and writes to the pipe, isn't that a problem, given that writing only works when there already is a reader waiting? Do we need to put it in non-blocking mode and read as many signals as there are rather than just one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no race here. Signals are never blocked, so the write to the pipe will always happen as soon as a signal arrives. And the signal will be handled on the next loop iteration.

Blocking reads should not block, as per the select promise. We may put all the file descriptors in non-blocking mode if we want to be extra defensive. Writes could only block if the pipe gets full, which can only happen if you send several thousand signals per second. You may be able to do that programmatically, but not by resizing the terminal or typing Ctrl-C.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[..] Signals are never blocked [..]

@edgar-bonet while you say it like that, do we need to bring back signal blocking around interaction with ncurses a la pull request #130 then?

[T]he write to the pipe will always happen as soon as a signal arrives. And the signal will be handled on the next loop iteration.

Blocking reads should not block, as per the select promise.

Understood, yes.

We may put all the file descriptors in non-blocking mode if we want to be extra defensive. Writes could only block if the pipe gets full, which can only happen if you send several thousand signals per second. You may be able to do that programmatically, but not by resizing the terminal or typing Ctrl-C.

I mis-remembered then probably: I thought that writes to a pipe always block until there is a reader 🤔 man 7 pipe is with you.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to bring back signal blocking around interaction with ncurses a la pull request #130 then?

We don't. The signal handler does nothing but a write, which is async-signal-safe as per POSIX. The signal can really interrupt us any time without disrupting the program.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgar-bonet isn't the important question whether the ncurses functions will stop their job half way when interrupted by a signal or not? From a quick look the man pages of erase and refresh say nothing about signals. I think that's what we'd need to know to be sure of no disruption. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an ncurses function is interrupted by a signal, it will stop halfway... then resume from where it stopped, and not even be aware that it had been interrupted. It is not very different from what happens to any running program during a hardware interrupt.

Most of these functions are not async-signal-safe, but this does not mean it is unsafe to interrupt them: it means it is unsafe to call them from a signal handler.

That being said, it is conceivable that some ncurses function is buggy and mishandles a syscall failing with EINTR. In that case it would make sense to work around that bug by blocking signals around the call. However, I would avoid implementing workarounds for bugs we do not even know they exist.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgar-bonet okay, let's note though that when we learn about that later, it will have been this move back to select from pselect that caused it and that — compared to 1.5.2 — now we have select without blocking signals while we had select with blocking signals in 1.5.2. There are multiple ways to see this and so going forward with #143 means that you are taking responsibility for dropping these blocks compared to 1.5.2; I cannot take the responsibility, because I cannot guarantee, that it will not cause new problems. Is that aware and okay with you? I'll do a last round of review then…

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had select with blocking signals in 1.5.2.

Let me take a look at 1.5.2 again... I see. That version called ncurses from the SIGWINCH signal handler, which is inherently unsafe and can lead, e.g., to memory corruption. This is reported in issue #62.

That version also blocked signals around calls to ncurses. This is a reasonable mitigation strategy: it prevents ncurses from being called on top of an interrupted ncurses function, and should thus prevent ncurses from corrupting its own state. The mitigation is not perfect though, and not guaranteed to always work. For example, it cannot prevent ncurses from calling malloc on top of an interrupted malloc call.

Now that we have a signal handler that is completely async-signal-safe, that mitigation has lost its purpose.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgar-bonet I'll take that as "I'm sure enough that a block-do-unblock wrapper is not needed to take full responsibility for not re-adding some". Good.

ttyplot.c Outdated Show resolved Hide resolved
ttyplot.c Outdated Show resolved Hide resolved
Copy link
Collaborator

@hartwork hartwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgar-bonet it seems we have conflicts now and on more place where const could be added (if you like), other than that, this seems good to go 👍

ttyplot.c Outdated
// pselect() could be an alternative, but it is unreliable on Linux.
// (Related: https://stackoverflow.com/q/62315082)
static void signal_handler(int signum) {
unsigned char signal_number = (unsigned char) signum; // signum is either 2 (SIGINT) or 28 (SIGWINCH)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(There's one more opportunity for const here, if you like.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for keeping a watch on this. 😉 Fixed.

On Linux, pselect() does not reliably catch signals when there is high
pressure on the watched file descriptors. This was mostly fixed in
commit e94cecd by calling pselect()
twice. This solution, however, leaves a race window between the two
calls.

Fix the pselect() issue without races by using plain select() and a
self-pipe. This makes signal delivery reliable.
@edgar-bonet edgar-bonet merged commit 1deb137 into tenox7:development Dec 3, 2023
7 checks passed
@edgar-bonet edgar-bonet deleted the self-pipe branch December 3, 2023 11:37
hartwork added a commit that referenced this pull request Dec 3, 2023
Fix signal handling (regression from #143)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants