I experienced strange failures of some commands in gvim
(e.g. errors on
closing window of :Gblame
) for quite a while, but couldn’t figure out why the
hell their occur. Overall it probably took more than 20 hours to identify the
issue and solve it not causing any other issues to arise. At the end of the
day it took single line to solve the issue…
A lot of time was spent to identify when it happens, the answer is: when gvim
is started by window manager (euclid-wm here). If there was some
intermediate process (e.g. terminal, script, anything), the issue didn’t occur.
Simple way to test for the issue:
:!echo text text Command terminated
It looks like, echo
is failing, but it works, so it must be shell, but it
works too… What really is broken is communication of child’s exit code to
its parent process.
This line in euclid-wm’s sources caused the issue:
//this is to avoid leaving zombies
signal(SIGCHLD, SIG_IGN);
And indeed, commenting this line out causes euclid-wm to leave a bunch of processes in zombie state, but it fixes the error of gvim! Need a way to both solve the issue and do not leave zombies hanging around.
Replacing signal(...)
with proper handler for SIGCHLD
didn’t help. Loop
calling waitpid()
did nothing useful. Commenting out setsid()
in spawn()
function changed nothing (thought that something is wrong with associated
controlling terminals). Forking one more time on spawn()
had no effect. Not
exec
ing on spawn, closing standard file descriptors, explicit wait
ing for
each child process, nothing helped.
Searches on the Web, in books on GNU/Linux programming…
Literally nothing answered what could it be and how one can fix it, I still
don’t understand why it had so strange effect, but now the reason is at least
clear enough: child processes inherit signal mask of their parent and shell
feels bad when SIGCHLD
is ignored.
The solution is to call
signal(SIGCHLD, SIG_DFL);
to restore default behaviour in each child process after fork
ing, i.e.:
if (fork() == 0) { /* ... */ signal(SIGCHLD, SIG_DFL); /* ... */ exec(/* ... */); }
The solution is trivial, but it surely isn’t the most obvious one.