-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catch signals using a self-pipe (follow-up to #98 and #132) #143
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgar-bonet works well in practices, I will likely favor this over alternative #142 (..). Please see four tiny to small things below:
struct timespec timeout = { .tv_sec = 0, .tv_nsec = 500e6 }; // 500 milliseconds for refreshing the clock | ||
const int select_ret = pselect_without_signal_starvation(select_nfds, &read_fds, NULL, NULL, &timeout, &empty_sigset); | ||
struct timeval timeout = { .tv_sec = 0, .tv_usec = 500e3 }; // 500 milliseconds for refreshing the clock | ||
const int select_ret = select(select_nfds, &read_fds, NULL, NULL, &timeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgar-bonet I only notice now that with this falling back to plain select
rather than pselect
, this would not fix the race but just move the race elsewhere? Am I missing something?
Also, if we do blocking reads and writes to the pipe, isn't that a problem, given that writing only works when there already is a reader waiting? Do we need to put it in non-blocking mode and read as many signals as there are rather than just one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no race here. Signals are never blocked, so the write
to the pipe will always happen as soon as a signal arrives. And the signal will be handled on the next loop iteration.
Blocking reads should not block, as per the select
promise. We may put all the file descriptors in non-blocking mode if we want to be extra defensive. Writes could only block if the pipe gets full, which can only happen if you send several thousand signals per second. You may be able to do that programmatically, but not by resizing the terminal or typing Ctrl-C.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[..] Signals are never blocked [..]
@edgar-bonet while you say it like that, do we need to bring back signal blocking around interaction with ncurses a la pull request #130 then?
[T]he
write
to the pipe will always happen as soon as a signal arrives. And the signal will be handled on the next loop iteration.Blocking reads should not block, as per the
select
promise.
Understood, yes.
We may put all the file descriptors in non-blocking mode if we want to be extra defensive. Writes could only block if the pipe gets full, which can only happen if you send several thousand signals per second. You may be able to do that programmatically, but not by resizing the terminal or typing Ctrl-C.
I mis-remembered then probably: I thought that writes to a pipe always block until there is a reader 🤔 man 7 pipe
is with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to bring back signal blocking around interaction with ncurses a la pull request #130 then?
We don't. The signal handler does nothing but a write
, which is async-signal-safe as per POSIX. The signal can really interrupt us any time without disrupting the program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgar-bonet isn't the important question whether the ncurses functions will stop their job half way when interrupted by a signal or not? From a quick look the man pages of erase
and refresh
say nothing about signals. I think that's what we'd need to know to be sure of no disruption. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an ncurses function is interrupted by a signal, it will stop halfway... then resume from where it stopped, and not even be aware that it had been interrupted. It is not very different from what happens to any running program during a hardware interrupt.
Most of these functions are not async-signal-safe, but this does not mean it is unsafe to interrupt them: it means it is unsafe to call them from a signal handler.
That being said, it is conceivable that some ncurses function is buggy and mishandles a syscall failing with EINTR
. In that case it would make sense to work around that bug by blocking signals around the call. However, I would avoid implementing workarounds for bugs we do not even know they exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgar-bonet okay, let's note though that when we learn about that later, it will have been this move back to select
from pselect
that caused it and that — compared to 1.5.2 — now we have select
without blocking signals while we had select
with blocking signals in 1.5.2. There are multiple ways to see this and so going forward with #143 means that you are taking responsibility for dropping these blocks compared to 1.5.2; I cannot take the responsibility, because I cannot guarantee, that it will not cause new problems. Is that aware and okay with you? I'll do a last round of review then…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we had
select
with blocking signals in 1.5.2.
Let me take a look at 1.5.2 again... I see. That version called ncurses from the SIGWINCH
signal handler, which is inherently unsafe and can lead, e.g., to memory corruption. This is reported in issue #62.
That version also blocked signals around calls to ncurses. This is a reasonable mitigation strategy: it prevents ncurses from being called on top of an interrupted ncurses function, and should thus prevent ncurses from corrupting its own state. The mitigation is not perfect though, and not guaranteed to always work. For example, it cannot prevent ncurses from calling malloc
on top of an interrupted malloc
call.
Now that we have a signal handler that is completely async-signal-safe, that mitigation has lost its purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgar-bonet I'll take that as "I'm sure enough that a block-do-unblock wrapper is not needed to take full responsibility for not re-adding some". Good.
a704c8a
to
d6ef6b0
Compare
d6ef6b0
to
552ea52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgar-bonet it seems we have conflicts now and on more place where const
could be added (if you like), other than that, this seems good to go 👍
ttyplot.c
Outdated
// pselect() could be an alternative, but it is unreliable on Linux. | ||
// (Related: https://stackoverflow.com/q/62315082) | ||
static void signal_handler(int signum) { | ||
unsigned char signal_number = (unsigned char) signum; // signum is either 2 (SIGINT) or 28 (SIGWINCH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(There's one more opportunity for const
here, if you like.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for keeping a watch on this. 😉 Fixed.
On Linux, pselect() does not reliably catch signals when there is high pressure on the watched file descriptors. This was mostly fixed in commit e94cecd by calling pselect() twice. This solution, however, leaves a race window between the two calls. Fix the pselect() issue without races by using plain select() and a self-pipe. This makes signal delivery reliable.
552ea52
to
384b215
Compare
Fix signal handling (regression from #143)
As noticed by @hartwork,
pselect
is not a reliable way of catching signals on Linux: in case of high input pressure, signal delivery is starved. Pull request #132 partially fixed the issue by issuing two consecutivepselect
calls, but this left a race window between these two calls.This pull request fixes the issue without race conditions by using plain select() and a self-pipe, which makes signal delivery reliable.
This is one of two possible alternatives, the other one being
signalfd
(#142).