A tty is hung up by __tty_hangup() setting file->f_op to
hung_up_tty_fops, which is skipped on ttys whose write operation isn't
tty_write(). This means that, for example, /dev/console whose write
op is redirected_tty_write() is never actually marked hung up.
Because n_tty_read() uses the hung up status to decide whether to
abort the waiting readers, the lack of hung-up marking can lead to the
following scenario.
1. A session contains two processes. The leader and its child. The
child ignores SIGHUP.
2. The leader exits and starts disassociating from the controlling
terminal (/dev/console).
3. __tty_hangup() skips setting f_op to hung_up_tty_fops.
4. SIGHUP is delivered and ignored.
5. tty_ldisc_hangup() is invoked. It wakes up the waits which should
clear the read lockers of tty->ldisc_sem.
6. The reader wakes up but because tty_hung_up_p() is false, it
doesn't abort and goes back to sleep while read-holding
tty->ldisc_sem.
7. The leader progresses to tty_ldisc_lock() in tty_ldisc_hangup()
and is now stuck in D sleep indefinitely waiting for
tty->ldisc_sem.
The following is Alan's explanation on why some ttys aren't hung up.
http://lkml.kernel.org/r/20171101170908.6ad08580@alans-desktop
1. It broke the serial consoles because they would hang up and close
down the hardware. With tty_port that *should* be fixable properly
for any cases remaining.
2. The console layer was (and still is) completely broken and doens't
refcount properly. So if you turn on console hangups it breaks (as
indeed does freeing consoles and half a dozen other things).
As neither can be fixed quickly, this patch works around the problem
by introducing a new flag, TTY_HUPPING, which is used solely to tell
n_tty_read() that hang-up is in progress for the console and the
readers should be aborted regardless of the hung-up status of the
device.
The following is a sample hung task warning caused by this issue.
INFO: task agetty:2662 blocked for more than 120 seconds.
Not tainted 4.11.3-dbg-tty-lockup-02478-gfd6c7ee-dirty #28
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
0 2662 1 0x00000086
Call Trace:
__schedule+0x267/0x890
schedule+0x36/0x80
schedule_timeout+0x23c/0x2e0
ldsem_down_write+0xce/0x1f6
tty_ldisc_lock+0x16/0x30
tty_ldisc_hangup+0xb3/0x1b0
__tty_hangup+0x300/0x410
disassociate_ctty+0x6c/0x290
do_exit+0x7ef/0xb00
do_group_exit+0x3f/0xa0
get_signal+0x1b3/0x5d0
do_signal+0x28/0x660
exit_to_usermode_loop+0x46/0x86
do_syscall_64+0x9c/0xb0
entry_SYSCALL64_slow_path+0x25/0x25
The following is the repro. Run "$PROG /dev/console". The parent
process hangs in D state.
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <sys/ioctl.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <signal.h>
#include <time.h>
#include <termios.h>
int main(int argc, char **argv)
{
struct sigaction sact = { .sa_handler = SIG_IGN };
struct timespec ts1s = { .tv_sec = 1 };
pid_t pid;
int fd;
if (argc < 2) {
fprintf(stderr, "test-hung-tty /dev/$TTY\n");
return 1;
}
/* fork a child to ensure that it isn't already the session leader */
pid = fork();
if (pid < 0) {
perror("fork");
return 1;
}
if (pid > 0) {
/* top parent, wait for everyone */
while (waitpid(-1, NULL, 0) >= 0)
;
if (errno != ECHILD)
perror("waitpid");
return 0;
}
/* new session, start a new session and set the controlling tty */
if (setsid() < 0) {
perror("setsid");
return 1;
}
fd = open(argv[1], O_RDWR);
if (fd < 0) {
perror("open");
return 1;
}
if (ioctl(fd, TIOCSCTTY, 1) < 0) {
perror("ioctl");
return 1;
}
/* fork a child, sleep a bit and exit */
pid = fork();
if (pid < 0) {
perror("fork");
return 1;
}
if (pid > 0) {
nanosleep(&ts1s, NULL);
printf("Session leader exiting\n");
exit(0);
}
/*
* The child ignores SIGHUP and keeps reading from the controlling
* tty. Because SIGHUP is ignored, the child doesn't get killed on
* parent exit and the bug in n_tty makes the read(2) block the
* parent's control terminal hangup attempt. The parent ends up in
* D sleep until the child is explicitly killed.
*/
sigaction(SIGHUP, &sact, NULL);
printf("Child reading tty\n");
while (1) {
char buf[1024];
if (read(fd, buf, sizeof(buf)) < 0) {
perror("read");
return 1;
}
}
return 0;
}
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Alan Cox <alan@llwyncelyn.cymru>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:
for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done
with de-mangling cleanups yet to come.
NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.
The next patch from Al will sort out the final differences, and we
should be all done.
Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull poll annotations from Al Viro:
"This introduces a __bitwise type for POLL### bitmap, and propagates
the annotations through the tree. Most of that stuff is as simple as
'make ->poll() instances return __poll_t and do the same to local
variables used to hold the future return value'.
Some of the obvious brainos found in process are fixed (e.g. POLLIN
misspelled as POLL_IN). At that point the amount of sparse warnings is
low and most of them are for genuine bugs - e.g. ->poll() instance
deciding to return -EINVAL instead of a bitmap. I hadn't touched those
in this series - it's large enough as it is.
Another problem it has caught was eventpoll() ABI mess; select.c and
eventpoll.c assumed that corresponding POLL### and EPOLL### were
equal. That's true for some, but not all of them - EPOLL### are
arch-independent, but POLL### are not.
The last commit in this series separates userland POLL### values from
the (now arch-independent) kernel-side ones, converting between them
in the few places where they are copied to/from userland. AFAICS, this
is the least disruptive fix preserving poll(2) ABI and making epoll()
work on all architectures.
As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
it will trigger only on what would've triggered EPOLLWRBAND on other
architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
at all on sparc. With this patch they should work consistently on all
architectures"
* 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
make kernel-side POLL... arch-independent
eventpoll: no need to mask the result of epi_item_poll() again
eventpoll: constify struct epoll_event pointers
debugging printk in sg_poll() uses %x to print POLL... bitmap
annotate poll(2) guts
9p: untangle ->poll() mess
->si_band gets POLL... bitmap stored into a user-visible long field
ring_buffer_poll_wait() return value used as return value of ->poll()
the rest of drivers/*: annotate ->poll() instances
media: annotate ->poll() instances
fs: annotate ->poll() instances
ipc, kernel, mm: annotate ->poll() instances
net: annotate ->poll() instances
apparmor: annotate ->poll() instances
tomoyo: annotate ->poll() instances
sound: annotate ->poll() instances
acpi: annotate ->poll() instances
crypto: annotate ->poll() instances
block: annotate ->poll() instances
x86: annotate ->poll() instances
...
We added support for EXTPROC back in 2010 in commit 26df6d1340 ("tty:
Add EXTPROC support for LINEMODE") and the intent was to allow it to
override some (all?) ICANON behavior. Quoting from that original commit
message:
There is a new bit in the termios local flag word, EXTPROC.
When this bit is set, several aspects of the terminal driver
are disabled. Input line editing, character echo, and mapping
of signals are all disabled. This allows the telnetd to turn
off these functions when in linemode, but still keep track of
what state the user wants the terminal to be in.
but the problem turns out that "several aspects of the terminal driver
are disabled" is a bit ambiguous, and you can really confuse the n_tty
layer by setting EXTPROC and then causing some of the ICANON invariants
to no longer be maintained.
This fixes at least one such case (TIOCINQ) becoming unhappy because of
the confusion over whether ICANON really means ICANON when EXTPROC is set.
This basically makes TIOCINQ match the case of read: if EXTPROC is set,
we ignore ICANON. Also, make sure to reset the ICANON state ie EXTPROC
changes, not just if ICANON changes.
Fixes: 26df6d1340 ("tty: Add EXTPROC support for LINEMODE")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Reported-by: syzkaller <syzkaller@googlegroups.com>
Cc: Jiri Slaby <jslaby@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that the SPDX tag is in all tty files, that identifies the license
in a specific and legally-defined manner. So the extra GPL text wording
can be removed as it is no longer needed at all.
This is done on a quest to remove the 700+ different ways that files in
the kernel describe the GPL license text. And there's unneeded stuff
like the address (sometimes incorrect) for the FSF which is never
needed.
No copyright headers or other non-license-description text was removed.
Cc: Jiri Slaby <jslaby@suse.com>
Cc: James Hogan <jhogan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It's good to have SPDX identifiers in all files to make it easier to
audit the kernel tree for correct licenses.
Update the drivers/tty files files with the correct SPDX license
identifier based on the license text in the file itself. The SPDX
identifier is a legally binding shorthand, which can be used instead of
the full boiler plate text.
This work is based on a script and data from Thomas Gleixner, Philippe
Ombredanne, and Kate Stewart.
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: David Sterba <dsterba@suse.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Stefan Wahren <stefan.wahren@i2se.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ray Jui <rjui@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: bcm-kernel-feedback-list@broadcom.com
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Joachim Eastwood <manabian@gmail.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Tobias Klauser <tklauser@distanz.ch>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Richard Genoud <richard.genoud@gmail.com>
Cc: Alexander Shiyan <shc_work@mail.ru>
Cc: Baruch Siach <baruch@tkos.co.il>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: "Uwe Kleine-König" <kernel@pengutronix.de>
Cc: Pat Gefre <pfg@sgi.com>
Cc: "Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Vladimir Zapolskiy <vz@mleia.com>
Cc: Sylvain Lemieux <slemieux.tyco@gmail.com>
Cc: Carlo Caione <carlo@caione.org>
Cc: Kevin Hilman <khilman@baylibre.com>
Cc: Liviu Dudau <liviu.dudau@arm.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Andy Gross <andy.gross@linaro.org>
Cc: David Brown <david.brown@linaro.org>
Cc: "Andreas Färber" <afaerber@suse.de>
Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: Laxman Dewangan <ldewangan@nvidia.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Jonathan Hunter <jonathanh@nvidia.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Patrice Chotard <patrice.chotard@st.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Peter Korsgaard <jacmet@sunsite.dk>
Cc: Timur Tabi <timur@tabi.org>
Cc: Tony Prisk <linux@prisktech.co.nz>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: "Sören Brinkmann" <soren.brinkmann@xilinx.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Jiri Slaby <jslaby@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
OpenSSH expects the (non-blocking) read() of pty master to return
EAGAIN only if it has received all of the slave-side output after
it has received SIGCHLD. This used to work on pre-3.12 kernels.
This fix effectively forces non-blocking read() and poll() to
block for parallel i/o to complete for all ttys. It also unwinds
these changes:
1) f8747d4a46
tty: Fix pty master read() after slave closes
2) 52bce7f8d4
pty, n_tty: Simplify input processing on final close
3) 1a48632ffe
pty: Fix input race when closing
Inspired by analysis and patch from Marc Aurele La France <tsi@tuyoix.net>
Reported-by: Volth <openssh@volth.com>
Reported-by: Marc Aurele La France <tsi@tuyoix.net>
BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=52
BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=2492
Signed-off-by: Brian Bloniarz <brian.bloniarz@gmail.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On final port close (and thus final tty close), only output flow
control requests in the input data should be processed. Ignore all
other input data, including parity errors, overruns and breaks.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to fcntl(2), "a SIGIO signal is sent whenever input
or output becomes possible on that file descriptor", i.e.
after the output buffer was full and now has space for new data.
But in fact SIGIO is sent after every write.
n_tty_write() should set TTY_DO_WRITE_WAKEUP only when
not all data could be written to the buffer.
[pjh: Also fixes missed SIGIO if amt written just happens to be
[ amount still to write
Signed-off-by: Johannes Stezenbach <js@sig21.net>
[pjh: minor patch edits and re-submit]
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since n_tty_check_unthrottle() is only called from n_tty_read()
which only originates from a userspace read(), the tty count cannot
be 0; the read() guarantees the file descriptor has not yet been
released.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If signal-driven i/o is disabled while write wakeup is pending (ie.,
n_tty_write() has set TTY_DO_WRITE_WAKEUP but then signal-driven i/o
is disabled), the TTY_DO_WRITE_WAKEUP bit will never be cleared and
will cause tty_wakeup() to always call n_tty_write_wakeup.
Unconditionally clear the write wakeup, and since kill_fasync()
already checks if the fasync ptr is null, call kill_fasync()
unconditionally as well.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Only the N_TTY line discipline implements the signal-driven i/o
notification enabled/disabled by fcntl(F_SETFL, O_ASYNC). The ldisc
fasync() notification is sent to the ldisc when the enable state has
changed (the tty core is notified via the fasync() VFS file operation).
The N_TTY line discipline used the enable state to change the wakeup
condition (minimum_to_wake = 1) for notifying the signal handler i/o is
available. However, just the presence of data is sufficient and necessary
to signal i/o is available, so changing minimum_to_wake is unnecessary
(and creates a race condition with read() and poll() which may be
concurrently updating minimum_to_wake).
Furthermore, since the kill_fasync() VFS helper performs no action if
the fasync list is empty, calling unconditionally is preferred; if
signal driven i/o just has been disabled, no signal will be sent by
kill_fasync() anyway so notification of the change via the ldisc
fasync() method is superfluous.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A read() in non-canonical mode when VMIN > 0 and VTIME == 0 does not
complete until at least VMIN chars have been read (or the user buffer is
full). In this infrequent read mode, n_tty_read() attempts to reduce
wakeups by computing the amount of data still necessary to complete the
read (minimum_to_wake) and only waking the read()/poll() when that much
unread data has been processed. This is the only read mode for which
new data does not necessarily generate a wakeup.
However, this optimization is broken and commonly leads to hung reads
even though the necessary amount of data has been received. Since the
optimization is of marginal value anyway, just remove the whole
thing. This also remedies a race between a concurrent poll() and
read() in this mode, where the poll() can reset the minimum_to_wake
of the read() (and vice versa).
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In canonical read mode, each line read and logged is pushed separately
with tty_audit_push(). For all single-threaded processes and multi-threaded
processes reading from only one tty, this patch has no effect; the last line
read will still be the entry pushed to the audit log because the tty
association cannot have changed between tty_audit_add_data() and
tty_audit_push().
For multi-threaded processes reading from different ttys concurrently,
the audit log will have mixed log entries anyway. Consider two ttys
audited concurrently:
CPU0 CPU1
---------- ------------
tty_audit_add_data(ttyA)
tty_audit_add_data(ttyB)
tty_audit_push()
tty_audit_add_data(ttyB)
tty_audit_push()
This patch will now cause the ttyB output to be split into separate
audit log entries.
However, this possibility is equally likely without this patch:
CPU0 CPU1
---------- ------------
tty_audit_add_data(ttyB)
tty_audit_add_data(ttyA)
tty_audit_push()
tty_audit_add_data(ttyB)
tty_audit_push()
Mixed canonical and non-canonical reads have similar races.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The tty termios bits cannot change while n_tty_read() is in the
i/o loop; the termios_rwsem ensures mutual exclusion with termios
changes in n_tty_set_termios(). Check L_ICANON() directly and
eliminate icanon parameter.
NB: tty_audit_add_data() => tty_audit_buf_get() => tty_audit_buf_alloc()
is a single path; ie., tty_audit_buf_get() and tty_audit_buf_alloc()
have no other callers.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tty audit never logs pty master reads, but packet mode only works for
pty masters, so tty_audit_add_data() was never logging packet mode
anyway.
Don't audit packet mode data. As those are the lone call sites, remove
tty_put_user().
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move is_ignored() to drivers/tty/tty_io.c and re-declare in file
scope.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reduce global tty symbols; move and rename tty_ldisc_begin() as
n_tty_init() and redefine the N_TTY ldisc ops as file scope.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The chars_in_buffer() line discipline method serves no functional
purpose, other than as a (dubious) debugging aid for mostly bit-rotting
drivers. Despite being documented as an optional method, every caller
is unconditionally executed (although conditionally compiled).
Furthermore, direct tty->ldisc access without an ldisc ref is unsafe.
Lastly, N_TTY's chars_in_buffer() has warned of removal since 3.12.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Although n_tty_check_unthrottle() has a valid ldisc reference (since
the tty core gets the ldisc ref in tty_read() before calling the line
discipline read() method), it does not have a valid ldisc reference to
the "other" pty of a pty pair. Since getting an ldisc reference for
tty->link essentially open-codes tty_wakeup(), just replace with the
equivalent tty_wakeup().
Cc: <stable@vger.kernel.org>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Instead of compare-and-set, just compute 'found'.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a temporary for the computed source address and substitute
where appropriate. No functional change.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Merge the multiple tty_copy_to_user() calls into a single copy
sequence within tty_copy_to_user().
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since not all ttys are devices (eg., SysV ptys), dev_*() printk macros
cannot be used. Define tty_*() printk macros that output in similar
format to dev_*() macros (ie., <driver> <tty>: .....).
Transform the most-trivial printk( LEVEL ...) usage to tty_*() usage.
NB: The function name has been eliminated from messages with unique
context, or prefixed to the format when given.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40d5e0905a ("n_tty: Fix EOF push handling") fixed EOF push
for reads. However, that approach still allows a condition mismatch
between poll() and read(), where poll() returns POLLIN but read()
blocks. This state can happen when a previous read() returned because
the user buffer was full and the next character was an EOF not at the
beginning of the line. While the next read() will properly identify
the condition and advance the read buffer tail without improperly
indicating an EOF file condition (ie., read() will not mistakenly
return 0), poll() will mistakenly indicate POLLIN.
Although a possible solution would be to peek at the input buffer
in n_tty_poll(), the better solution in this patch is to eat the
EOF during the previous read() (ie., fix the problem by eliminating
the condition).
The current canon line buffer copy limits the scan for next end-of-line
to the smaller of either,
a. the remaining user buffer size
b. completed lines in the input buffer
When the remaining user buffer size is exactly one less than the
end-of-line marked by EOF push, the EOF is not scanned nor skipped
but left for subsequent reads. In the example below, the scan
index 'eol' has stopped at the EOF because it is past the scan
limit of 5 (not because it has found the next set bit in read_flags)
user buffer [*nr = 5] _ _ _ _ _
read_flags 0 0 0 0 0 1
input buffer h e l l o [EOF]
^ ^
/ /
tail eol
result: found = 0, tail += 5, *nr += 5
Instead, allow the scan to peek ahead 1 byte (while still limiting the
scan to completed lines in the input buffer). For the example above,
result: found = 1, tail += 6, *nr += 5
Because the scan limit is now bumped +1 byte, when the scan is
completed, the tail advance and the user buffer copy limit is
re-clamped to *nr when EOF is _not_ found.
Fixes: 40d5e0905a ("n_tty: Fix EOF push handling")
Cc: <stable@vger.kernel.org> # 3.12+
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The data to audit/record is in the 'from' buffer (ie., the input
read buffer).
Fixes: 72586c6061 ("n_tty: Fix auditing support for cannonical mode")
Cc: stable <stable@vger.kernel.org> # 4.1+
Cc: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduce API functions to restart and cancel tty buffer work, rather
than manipulate buffer work directly.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The job_control() check in n_tty_read() has nearly identical purpose
and results as tty_check_change(). Both functions' purpose is to
determine if the current task's pgrp is the foreground pgrp for the tty,
and if not, to signal the current pgrp.
Introduce __tty_check_change() which takes the signal to send
and performs the shared operations for job control() and
tty_check_change().
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Waking the reader immediately upon receipt of TTY_BREAK or TTY_PARITY
chars has no effect on the outcome of read():
1. Only non-canonical/EXTPROC mode applies since canonical mode
will not return data until a line termination is received anyway
2. EXTPROC mode - the reader will always be woken by the input worker
3. Non-canonical modes
a. MIN == 0, TIME == 0
b. MIN == 0, TIME > 0
c. MIN > 0, TIME > 0
minimum_to_wake is always 1 in these modes so the reader will always
be woken by the input worker
d. MIN > 0, TIME == 0
although the reader will not be woken by the input worker unless the
minimum data is received, the reader would not otherwise have
returned the received data
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
My colleague ran into a program stall on a x86_64 server, where
n_tty_read() was waiting for data even if there was data in the buffer
in the pty. kernel stack for the stuck process looks like below.
#0 [ffff88303d107b58] __schedule at ffffffff815c4b20
#1 [ffff88303d107bd0] schedule at ffffffff815c513e
#2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818
#3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2
#4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23
#5 [ffff88303d107dd0] tty_read at ffffffff81368013
#6 [ffff88303d107e20] __vfs_read at ffffffff811a3704
#7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57
#8 [ffff88303d107f00] sys_read at ffffffff811a4306
#9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7
There seems to be two problems causing this issue.
First, in drivers/tty/n_tty.c, __receive_buf() stores the data and
updates ldata->commit_head using smp_store_release() and then checks
the wait queue using waitqueue_active(). However, since there is no
memory barrier, __receive_buf() could return without calling
wake_up_interactive_poll(), and at the same time, n_tty_read() could
start to wait in wait_woken() as in the following chart.
__receive_buf() n_tty_read()
------------------------------------------------------------------------
if (waitqueue_active(&tty->read_wait))
/* Memory operations issued after the
RELEASE may be completed before the
RELEASE operation has completed */
add_wait_queue(&tty->read_wait, &wait);
...
if (!input_available_p(tty, 0)) {
smp_store_release(&ldata->commit_head,
ldata->read_head);
...
timeout = wait_woken(&wait,
TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------
The second problem is that n_tty_read() also lacks a memory barrier
call and could also cause __receive_buf() to return without calling
wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken()
as in the chart below.
__receive_buf() n_tty_read()
------------------------------------------------------------------------
spin_lock_irqsave(&q->lock, flags);
/* from add_wait_queue() */
...
if (!input_available_p(tty, 0)) {
/* Memory operations issued after the
RELEASE may be completed before the
RELEASE operation has completed */
smp_store_release(&ldata->commit_head,
ldata->read_head);
if (waitqueue_active(&tty->read_wait))
__add_wait_queue(q, wait);
spin_unlock_irqrestore(&q->lock,flags);
/* from add_wait_queue() */
...
timeout = wait_woken(&wait,
TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------
There are also other places in drivers/tty/n_tty.c which have similar
calls to waitqueue_active(), so instead of adding many memory barrier
calls, this patch simply removes the call to waitqueue_active(),
leaving just wake_up*() behind.
This fixes both problems because, even though the memory access before
or after the spinlocks in both wake_up*() and add_wait_queue() can
sneak into the critical section, it cannot go past it and the critical
section assures that they will be serialized (please see "INTER-CPU
ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a
better explanation). Moreover, the resulting code is much simpler.
Latency measurement using a ping-pong test over a pty doesn't show any
visible performance drop.
Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Other serial driver work wants to build on patches now in 4.2-rc4 so
merge the branch so this can properly happen.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
task_pgrp requires an rcu or tasklist lock to be obtained if the returned pid
is to be dereferenced, which kill_pgrp does. Obtain an RCU lock for the
duration of use.
Signed-off-by: Patrick Donnelly <batrick@batbytes.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When handling signalling char, claim the termios write lock before
signalling waiting readers and writers to prevent further i/o
before flushing the echo and output buffers. This prevents a
userspace signal handler which may output from racing the terminal
flush.
Reference: Bugzilla #99351 ("Output truncated in ssh session after...")
Fixes: commit d2b6f44779 ("n_tty: Fix signal handling flushes")
Reported-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes up a merge issue with the amba-pl011.c driver, and we want
the fixes in this branch as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 32f13521ca
("n_tty: Line copy to user buffer in canonical mode")
changed cannonical mode copying to use copy_to_user
but missed adding the call to the audit framework.
Add in the appropriate functions to get audit support.
Fixes: 32f13521ca ("n_tty: Line copy to user buffer in canonical mode")
Reported-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There was a hardcoded value of 4096 which should have been N_TTY_BUF_SIZE.
This caused reads from tty to fail with EFAULT when they shouldn't have
done if N_TTY_BUF_SIZE was declared to be something other than 4096.
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A read() from a pty master may mistakenly indicate EOF (errno == -EIO)
after the pty slave has closed, even though input data remains to be read.
For example,
pty slave | input worker | pty master
| |
| | n_tty_read()
pty_write() | | input avail? no
add data | | sleep
schedule worker --->| | .
|---> flush_to_ldisc() | .
pty_close() | fill read buffer | .
wait for worker | wakeup reader --->| .
| read buffer full? |---> input avail ? yes
|<--- yes - exit worker | copy 4096 bytes to user
TTY_OTHER_CLOSED <---| |<--- kick worker
| |
**** New read() before worker starts ****
| | n_tty_read()
| | input avail? no
| | TTY_OTHER_CLOSED? yes
| | return -EIO
Several conditions are required to trigger this race:
1. the ldisc read buffer must become full so the input worker exits
2. the read() count parameter must be >= 4096 so the ldisc read buffer
is empty
3. the subsequent read() occurs before the kicked worker has processed
more input
However, the underlying cause of the race is that data is pipelined, while
tty state is not; ie., data already written by the pty slave end is not
yet visible to the pty master end, but state changes by the pty slave end
are visible to the pty master end immediately.
Pipeline the TTY_OTHER_CLOSED state through input worker to the reader.
1. Introduce TTY_OTHER_DONE which is set by the input worker when
TTY_OTHER_CLOSED is set and either the input buffers are flushed or
input processing has completed. Readers/polls are woken when
TTY_OTHER_DONE is set.
2. Reader/poll checks TTY_OTHER_DONE instead of TTY_OTHER_CLOSED.
3. A new input worker is started from pty_close() after setting
TTY_OTHER_CLOSED, which ensures the TTY_OTHER_DONE state will be
set if the last input worker is already finished (or just about to
exit).
Remove tty_flush_to_ldisc(); no in-tree callers.
Fixes: 52bce7f8d4 ("pty, n_tty: Simplify input processing on final close")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96311
BugLink: http://bugs.launchpad.net/bugs/1429756
Cc: <stable@vger.kernel.org> # 3.19+
Reported-by: Andy Whitcroft <apw@canonical.com>
Reported-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tty_name no longer uses the buf parameter, so remove it along with all
the 64 byte stack buffers that used to be passed in.
Mostly generated by the coccinelle script
@depends on patch@
identifier buf;
constant C;
expression tty;
@@
- char buf[C];
<+...
- tty_name(tty, buf)
+ tty_name(tty)
...+>
allmodconfig compiles, so I'm fairly confident the stack buffers
weren't used for other purposes as well.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
BRKINT and ISIG requires input and output flush when a signal char
is received. However, the order of operations is significant since
parallel i/o may be ongoing.
Merge the signal handling for BRKINT with ISIG handling.
Process the signal first. This ensures any ongoing i/o is aborted;
without this, a waiting writer may continue writing after the flush
occurs and after the signal char has been echoed.
Write lock the termios_rwsem, which excludes parallel writers from
pushing new i/o until after the output buffers are flushed; claiming
the write lock is necessary anyway to exclude parallel readers while
the read buffer is flushed.
Subclass the termios_rwsem for ptys since the slave pty performing
the flush may appear to reorder the termios_rwsem->tty buffer lock
lock order; adding annotation clarifies that
slave tty_buffer lock-> slave termios_rwsem -> master tty_buffer lock
is a valid lock order.
Flush the echo buffer. In this context, the echo buffer is 'output'.
Otherwise, the output will appear discontinuous because the output buffer
was cleared which contains older output than the echo buffer.
Open-code the read buffer flush since the input worker does not need
kicking (this is the input worker).
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In canon mode, the read buffer head will advance over the buffer tail
if the input > 4095 bytes without receiving a line termination char.
Discard additional input until a line termination is received.
Before evaluating for overflow, the 'room' value is normalized for
I_PARMRK and 1 byte is reserved for line termination (even in !icanon
mode, in case the mode is switched). The following table shows the
transform:
actual buffer | 'room' value before overflow calc
space avail | !I_PARMRK | I_PARMRK
--------------------------------------------------
0 | -1 | -1
1 | 0 | 0
2 | 1 | 0
3 | 2 | 0
4+ | 3 | 1
When !icanon or when icanon and the read buffer contains newlines,
normalized 'room' values of -1 and 0 are clamped to 0, and
'overflow' is 0, so read_head is not adjusted and the input i/o loop
exits (setting no_room if called from flush_to_ldisc()). No input
is discarded since the reader does have input available to read
which ensures forward progress.
When icanon and the read buffer does not contain newlines and the
normalized 'room' value is 0, then overflow and room are reset to 1,
so that the i/o loop will process the next input char normally
(except for parity errors which are ignored). Thus, erasures, signalling
chars, 7-bit mode, etc. will continue to be handled properly.
If the input char processed was not a line termination char, then
the canon_head index will not have advanced, so the normalized 'room'
value will now be -1 and 'overflow' will be set, which indicates the
read_head can safely be reset, effectively erasing the last char
processed.
If the input char processed was a line termination, then the
canon_head index will have advanced, so 'overflow' is cleared to 0,
the read_head is not reset, and 'room' is cleared to 0, which exits
the i/o loop (because the reader now have input available to read
which ensures forward progress).
Note that it is possible for a line termination to be received, and
for the reader to copy the line to the user buffer before the
input i/o loop is ready to process the next input char. This is
why the i/o loop recomputes the room/overflow state with every
input char while handling overflow.
Finally, if the input data was processed without receiving
a line termination (so that overflow is still set), the pty
driver must receive a write wakeup. A pty writer may be waiting
to write more data in n_tty_write() but without unthrottling
here that wakeup will not arrive, and forward progress will halt.
(Normally, the pty writer is woken when the reader reads data out
of the buffer and more space become available).
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If PARMRK is enabled, the available read buffer space computation is
overly-pessimistic, which results in severely throttled i/o, even
in the absence of parity errors. For example, if the 4k read buffer
contains 1k processed data, the input worker will compute available
space of 333 bytes, despite 3k being available. At 1365 chars of
processed data, 0 space available is computed.
*Divide remaining space* by 3, truncating down (if left == 2, left = 0).
Reported-by: Christian Riesch <christian.riesch@omicron.at>
Conflicts:
drivers/tty/n_tty.c
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add commit_head buffer index, which the producer-side publishes
after input processing in non-canon mode. This ensures the consumer-side
observes correctly-ordered writes in non-canonical mode (ie., the buffer
data is written before the buffer index is advanced). Fix consumer-side
uses of read_cnt() to use commit_head instead.
Add required memory barriers to the tail index to guarantee
the consumer-side has completed the loads before the producer-side
begins writing new data. Open-code the producer-side receive_room()
into the i/o loop.
Remove no-longer-referenced receive_room().
Based on work by Christian Riesch <christian.riesch@omicron.at>
Cc: Christian Riesch <christian.riesch@omicron.at>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The adjustments performed by receive_room() are to ensure a line
termination can always be written to the read buffer. However,
these adjustments are irrelevant to the throttle threshold (because
the threshold < buffer limit).
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The tty driver will be mistakenly throttled if a line termination
has not been received, and the line exceeds 3967 chars. Thus, it is
possible for the driver to stop sending when it has not yet sent
the newline. This does not apply to the pty driver.
Don't throttle until at least one line termination has been
received.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The input worker never reschedules itself; it only processes input until
either there is no more input or the read buffer is full. So the reader
is responsible for restarting the input worker only if the read buffer
was previously full (no_room == 1) _and_ space is now available to process
more input because the reader has consumed data from the read buffer.
However, computing the actual space available is not required to determine
if the reader has consumed data from the read buffer. This condition is
evaluated in 5 situations, each of which the space avail is already known:
1. n_tty_flush_buffer() - the read buffer is empty; kick the worker
2. n_tty_set_termios() - no data has been consumed; do not kick the worker
(although it may have kicked the reader so data _will be_ consumed)
3. n_tty_check_unthrottle - avail space > 3968; kick the worker
4. n_tty_read, before leaving - only kick the worker if the reader has
moved the tail. This prevents unnecessarily kicking the worker
when timeout-style reading is used.
5. n_tty_read, before sleeping - although it is possible for the read
buffer to be full and input_available_p() to be false, this can
only happen when the input worker is racing the reader, in which
case the reader will have been woken and won't sleep.
Rename n_tty_set_room() to n_tty_kick_worker() to reflect what the
function actually does.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit c4dc304677.
This fix is superseded by commit 52bce7f8d4,
'pty, n_tty: Simplify input processing on final close'.
The final close now waits for input processing to complete before
destroying the pty, so poll() does not need to special case this
condition.
Cc: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Here's the big tty/serial driver update for 3.19-rc1.
There are a number of TTY core changes/fixes in here from Peter Hurley
that have all been teted in linux-next for a long time now. There are
also the normal serial driver updates as well, full details in the
changelog below.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlSOD/MACgkQMUfUDdst+ymW+wCfbSzoYMRObIImMPWfoQtxkvvN
rpkAnAtyEP/zZIfkQIuKTSH6FJxocF8V
=WZt3
-----END PGP SIGNATURE-----
Merge tag 'tty-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here's the big tty/serial driver update for 3.19-rc1.
There are a number of TTY core changes/fixes in here from Peter Hurley
that have all been teted in linux-next for a long time now. There are
also the normal serial driver updates as well, full details in the
changelog below"
* tag 'tty-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (219 commits)
serial: pxa: hold port.lock when reporting modem line changes
tty-hvsi_lib: Deletion of an unnecessary check before the function call "tty_kref_put"
tty: Deletion of unnecessary checks before two function calls
n_tty: Fix read_buf race condition, increment read_head after pushing data
serial: of-serial: add PM suspend/resume support
Revert "serial: of-serial: add PM suspend/resume support"
Revert "serial: of-serial: fix up PM ops on no_console_suspend and port type"
serial: 8250: don't attempt a trylock if in sysrq
serial: core: Add big-endian iotype
serial: samsung: use port->fifosize instead of hardcoded values
serial: samsung: prefer to use fifosize from driver data
serial: samsung: fix style problems
serial: samsung: wait for transfer completion before clock disable
serial: icom: fix error return code
serial: tegra: clean up tty-flag assignments
serial: Fix io address assign flow with Fintek PCI-to-UART Product
serial: mxs-auart: fix tx_empty against shift register
serial: mxs-auart: fix gpio change detection on interrupt
serial: mxs-auart: Fix mxs_auart_set_ldisc()
serial: 8250_dw: Use 64-bit access for OCTEON.
...
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle are:
- 'Nested Sleep Debugging', activated when CONFIG_DEBUG_ATOMIC_SLEEP=y.
This instruments might_sleep() checks to catch places that nest
blocking primitives - such as mutex usage in a wait loop. Such
bugs can result in hard to debug races/hangs.
Another category of invalid nesting that this facility will detect
is the calling of blocking functions from within schedule() ->
sched_submit_work() -> blk_schedule_flush_plug().
There's some potential for false positives (if secondary blocking
primitives themselves are not ready yet for this facility), but the
kernel will warn once about such bugs per bootup, so the warning
isn't much of a nuisance.
This feature comes with a number of fixes, for problems uncovered
with it, so no messages are expected normally.
- Another round of sched/numa optimizations and refinements, for
CONFIG_NUMA_BALANCING=y.
- Another round of sched/dl fixes and refinements.
Plus various smaller fixes and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
sched: Add missing rcu protection to wake_up_all_idle_cpus
sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK
sched/numa: Init numa balancing fields of init_task
sched/deadline: Remove unnecessary definitions in cpudeadline.h
sched/cpupri: Remove unnecessary definitions in cpupri.h
sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task()
sched/fair: Fix stale overloaded status in the busiest group finding logic
sched: Move p->nr_cpus_allowed check to select_task_rq()
sched/completion: Document when to use wait_for_completion_io_*()
sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC
sched/fair: Kill task_struct::numa_entry and numa_group::task_list
sched: Refactor task_struct to use numa_faults instead of numa_* pointers
sched/deadline: Don't check CONFIG_SMP in switched_from_dl()
sched/deadline: Reschedule from switched_from_dl() after a successful pull
sched/deadline: Push task away if the deadline is equal to curr during wakeup
sched/deadline: Add deadline rq status print
sched/deadline: Fix artificial overrun introduced by yield_task_dl()
sched/rt: Clean up check_preempt_equal_prio()
sched/core: Use dl_bw_of() under rcu_read_lock_sched()
sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu
...
Commit 19e2ad6a09 ("n_tty: Remove overflow
tests from receive_buf() path") moved the increment of read_head into
the arguments list of read_buf_addr(). Function calls represent a
sequence point in C. Therefore read_head is incremented before the
character c is placed in the buffer. Since the circular read buffer is
a lock-less design since commit 6d76bd2618
("n_tty: Make N_TTY ldisc receive path lockless"), this creates a race
condition that leads to communication errors.
This patch modifies the code to increment read_head _after_ the data
is placed in the buffer and thus fixes the race for non-SMP machines.
To fix the problem for SMP machines, memory barriers must be added in
a separate patch.
Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit f95499c303 ("n_tty: Don't wait for buffer work in read() loop")
introduces a race window where a pty master can be signalled that the pty
slave was closed before all the data that the slave wrote is delivered.
Commit f8747d4a46 ("tty: Fix pty master read() after slave closes") fixed the
problem in case of n_tty_read, but the problem still exists for n_tty_poll.
This can be seen by running 'for ((i=0; i<100;i++));do ./test.py ;done'
where test.py is:
import os, select, pty
(pid, pty_fd) = pty.fork()
if pid == 0:
os.write(1, 'This string should be received by parent')
else:
poller = select.epoll()
poller.register( pty_fd, select.EPOLLIN )
ready = poller.poll( 1 * 1000 )
for fd, events in ready:
if not events & select.EPOLLIN:
print 'missed POLLIN event'
else:
print os.read(fd, 100)
poller.close()
The string from the slave is missed several times.
This patch takes the same approach as the fix for read and special cases
this condition for poll.
Tested on 3.16.
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When releasing one end of a pty pair, that end may just have written
to the other, which the input processing worker, flush_to_ldisc(), is
still working on but has not completed the copy to the other end's
read buffer. So input may not appear to be available to a waiting
reader but yet TTY_OTHER_CLOSED is now observed. The n_tty line
discipline has worked around this by waiting for input processing
to complete and then re-checking if input is available before
exiting with -EIO.
Since the tty/ldisc lock reordering, the wait for input processing
to complete can now occur during final close before setting
TTY_OTHER_CLOSED. In this way, a waiting reader is guaranteed to
see input available (if any) before observing TTY_OTHER_CLOSED.
Reviewed-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The stale comment refers to lock behavior which was eliminated in
commit 6d76bd2618,
n_tty: Make N_TTY ldisc receive path lockless.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Packet mode can only be set for a pty master, and a pty master is
always in raw mode since its termios cannot be changed.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Reviewed-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The pty master read() can miss the wake up for a packet mode
status change. For example,
CPU 0 | CPU 1
n_tty_read() | n_tty_packet_mode_flush()
... | .
if (packet & link->ctrl_status) { | .
/* no new ctrl_status ATM */ | .
| spin_lock
| ctrl_status |= TIOCPKT_FLUSHREAD
| spin_unlock
| wake_up(link->read_wait)
} |
set_current_state(TASK_INTERRUPTIBLE) |
... |
The pty master read() will now sleep (assuming there is no input) having
missed the read_wait wakeup.
Set the task state before the condition test.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Reviewed-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The slave's ctrl_lock serializes updates to the ctrl_status field
only, whereas the master's ctrl_lock serializes updates to the
packet mode enable (ie., the master does not have ctrl_status and
the slave does not have packet mode). Thus, claiming the slave's
ctrl_lock to access ->packet is useless.
Unlocked reads of ->packet are already smp-safe.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Reviewed-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Interrupts are enabled in the n_tty_read() loop, ioctl(TIOCPKT)
and pty driver flush_buffer() routine; no need to save and restore
local interrupt state.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Reviewed-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
n_tty_{read,write} are wait loops with sleeps in. Wait loops rely on
task_struct::state and sleeps do too, since that's the only means of
actually sleeping. Therefore the nested sleeps destroy the wait loop
state.
Fix this by using the new woken_wake_function and wait_woken() stuff,
which registers wakeups in wait and thereby allows shrinking the
task_state::state changes to the actual sleep part.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: oleg@redhat.com
Link: http://lkml.kernel.org/r/20140924082242.323011233@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When the N_TTY line discipline receives data and wakes readers to
process the input, polling writers are also mistakenly woken. This
is because, although readers and writers are differentiated by
different wait queues (tty->read_wait & tty->write_wait), both
wait queues are polled together. Thus, reader wakeups without poll
flags still cause poll(POLLOUT) to wakeup.
For received data, wakeup readers with POLLIN. Preserve the
unspecific wakeup in n_tty_packet_mode_flush(), as this action
should flag both POLLIN and POLLOUT.
Fixes epoll_wait() for edge-triggered EPOLLOUT.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If INPCK is not set, input parity detection should be disabled. This means
parity errors should not be received from the tty driver, and the data
received should be treated normally.
SUS v3, 11.2.2, General Terminal Interface - Input Modes, states:
"If INPCK is set, input parity checking shall be enabled. If INPCK is
not set, input parity checking shall be disabled, allowing output parity
generation without input parity errors. Note that whether input parity
checking is enabled or disabled is independent of whether parity detection
is enabled or disabled (see Control Modes). If parity detection is enabled
but input parity checking is disabled, the hardware to which the terminal
is connected shall recognize the parity bit, but the terminal special file
shall not check whether or not this bit is correctly set."
Ignore parity errors reported by the tty driver when INPCK is not set, and
handle the received data normally.
Fixes: Bugzilla #71681, 'Improvement of n_tty_receive_parity_error from n_tty.c'
Reported-by: Ivan <athlon_@mail.ru>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The tty atomic_write_lock does not provide an exclusion guarantee for
the tty driver if the termios settings are LECHO & !OPOST. And since
it is unexpected and not allowed to call TTY buffer helpers like
tty_insert_flip_string concurrently, this may lead to crashes when
concurrect writers call pty_write. In that case the following two
writers:
* the ECHOing from a workqueue and
* pty_write from the process
race and can overflow the corresponding TTY buffer like follows.
If we look into tty_insert_flip_string_fixed_flag, there is:
int space = __tty_buffer_request_room(port, goal, flags);
struct tty_buffer *tb = port->buf.tail;
...
memcpy(char_buf_ptr(tb, tb->used), chars, space);
...
tb->used += space;
so the race of the two can result in something like this:
A B
__tty_buffer_request_room
__tty_buffer_request_room
memcpy(buf(tb->used), ...)
tb->used += space;
memcpy(buf(tb->used), ...) ->BOOM
B's memcpy is past the tty_buffer due to the previous A's tb->used
increment.
Since the N_TTY line discipline input processing can output
concurrently with a tty write, obtain the N_TTY ldisc output_lock to
serialize echo output with normal tty writes. This ensures the tty
buffer helper tty_insert_flip_string is not called concurrently and
everything is fine.
Note that this is nicely reproducible by an ordinary user using
forkpty and some setup around that (raw termios + ECHO). And it is
present in kernels at least after commit
d945cb9cce (pty: Rework the pty layer to
use the normal buffering logic) in 2.6.31-rc3.
js: add more info to the commit log
js: switch to bool
js: lock unconditionally
js: lock only the tty->ops->write call
References: CVE-2014-0196
Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When echoes cannot be flushed to output (usually because the tty
has no more write room) and L_ECHO is subsequently turned off, then
when L_ECHO is turned back on, stale echoes are output.
Output completed echoes regardless of the L_ECHO setting:
1. before normal writes to that tty
2. if the tty was stopped by soft flow control and is being
restarted
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: <stable@vger.kernel.org> # 3.13.x
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit eafbe67f84,
n_tty: Refactor input_available_p() by call site
broke poll() when TIME_CHAR(tty) and MIN_CHAR(tty) are both 0.
When TIME_CHAR and MIN_CHAR are both 0, input is available if the
read_cnt is 1 (not 0).
Reported-by: Eric Dumazet <edumazet@google.com>
Tested-by: Eric Dumazet <edumazet@google.com>
Reported-by: Stephane Eranian <eranian@google.com>
Tested-by: David Ahern <dsahern@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With block processing of echoed output, observed output order is still
required. Push completed echoes and echo commands prior to output.
Introduce echo_mark echo buffer index, which tracks completed echo
commands; ie., those submitted via commit_echoes but which may not
have been committed. Ensure that completed echoes are output prior
to subsequent terminal writes in process_echoes().
Fixes newline/prompt output order in cooked mode shell.
Cc: <stable@vger.kernel.org> # 3.12.x : 39434ab n_tty: Fix missing newline echo
Reported-by: Karl Dahlke <eklhad@comcast.net>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Tested-by: Karl Dahlke <eklhad@comcast.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
readline() inadvertently triggers an error recovery path when
pastes larger than 4k overrun the line discipline buffer. The
error recovery path discards input when the line discipline buffer
is full and operating in canonical mode and no newline has been
received. Because readline() changes the termios to non-canonical
mode to read the line char-by-char, the line discipline buffer
can become full, and then when readline() restores termios back
to canonical mode for the caller, the now-full line discipline
buffer triggers the error recovery.
When changing termios from non-canon to canon mode and the read
buffer contains data, simulate an EOF push _without_ the
DISABLED_CHAR in the read buffer.
Importantly for the readline() problem, the termios can be
changed back to non-canonical mode without changes to the read
buffer occurring; ie., as if the previous termios change had not
happened (as long as no intervening read took place).
Preserve existing userspace behavior which allows '\0's already
received in non-canon mode to be read as '\0's in canon mode
(rather than trigger add'l EOF pushes or an actual EOF).
Patch based on original proposal and discussion here
https://bugzilla.kernel.org/show_bug.cgi?id=55991
by Stas Sergeev <stsp@users.sourceforge.net>
Reported-by: Margarita Manterola <margamanterola@gmail.com>
Cc: Maximiliano Curia <maxy@gnuservers.com.ar>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Acked-by: Stas Sergeev <stsp@users.sourceforge.net>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Distinguish if caller is n_tty_poll() or n_tty_read(), and
set the read/wakeup threshold accordingly.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Although n_tty_receive_char_closing() only has one call-site,
let the compiler inline instead.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit e60d27c4d8,
n_tty: Factor LNEXT processing from per-char i/o path,
mistakenly inlined the non-inline alias, n_tty_receive_char(),
for the inline function, n_tty_receive_char_inline().
As n_tty_receive_char() is intended for slow-path char
processing only, un-inline it.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
N_TTY's direct and flow-controlled flavors of the .receive_buf()
method are nearly identical; fold together.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When L_ECHONL is on, newlines are echoed regardless of the L_ECHO
state; if set, ensure accumulated echoes are flushed before finishing
the current input processing and before more output.
Cc: <stable@vger.kernel.org> # 3.12.x
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com
Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With multiple, concurrent readers (each waiting to acquire the
atomic_read_lock mutex), a departing reader may mistakenly reset
minimum_to_wake after a new reader has already set a new value.
Protect the minimum_to_wake reset with the atomic_read_lock critical
section.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Although the maximum allowable canonical line is specified to
be 255 bytes (MAX_CANON), the practical limit has actually been
the size of the line discipline read buffer (N_TTY_BUF_SIZE == 4096).
Commit 32f13521ca,
n_tty: Line copy to user buffer in canonical mode, limited the
line copy to 4095 bytes. With a completely full line discipline
read buffer and a userspace buffer > 4095, _no_ data was copied,
and the read() syscall returned 0, indicating EOF.
Fix the interval arithmetic to compute the correct number of bytes
to copy to userspace in the range [1..4096].
Cc: <stable@vger.kernel.org> # 3.12.x
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit cbfd0340ae,
'n_tty: Process echoes in blocks', introduced an error when
consuming the echo buffer tail to prevent buffer overrun, where
the incorrect operation code byte is checked to determine how
far to advance the tail to the next echo byte.
Check the correct byte for the echo operation code byte.
Cc: <stable@vger.kernel.org> # 3.12.x : c476f65 tty: incorrect test of echo_buf() result for ECHO_OP_START
Cc: <stable@vger.kernel.org> # 3.12.x
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A departing reader must restart a flush_to_ldisc() worker _before_
the next reader enters the read loop; this is to avoid the new reader
concluding no more i/o is available and prematurely exiting, when the
old reader simply hasn't re-started the worker yet.
Cc: stable <stable@vger.kernel.org> # 3.12
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
test echo_buf() result for ECHO_OP_START
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit f95499c303,
n_tty: Don't wait for buffer work in read() loop
creates a race window which can cause a pty master read()
to miss the last pty slave write(s) and return -EIO instead,
thus signalling the pty slave is closed. This can happen when
the pty slave is written and immediately closed but before the
tty buffer i/o loop receives the new input; the pty master
read() is scheduled, sees its read buffer is empty and the
pty slave has been closed, and exits.
Because tty_flush_to_ldisc() has significant performance impact
for parallel i/o, rather than revert the commit, special case this
condition (ie., when the read buffer is empty and the 'other' pty
has been closed) and, only then, wait for buffer work to complete
before re-testing if the read buffer is still empty.
As before, subsequent pty master reads return any available data
until no more data is available, and then returns -EIO to
indicate the pty slave has closed.
Reported-by: Mikael Pettersson <mikpelinux@gmail.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Tested-by: Mikael Pettersson <mikpelinux@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Remove braces from single-statement conditional in
n_tty_set_termios.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 40d5e0905a,
'n_tty: Fix EOF push handling' introduced a subtle state
change error wrt EOF push handling when the termios is
changed from non-canonical to canonical mode.
Reset line_start to the current read_tail index, not 0.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
LNEXT processing accounts for ~15% of total cpu time in end-to-end
tty i/o; factor the lnext test/clear from the per-char i/o path.
Instead, attempt to immediately handle the literal next char if not
at the end of this received buffer; otherwise, handle the first char
of the next received buffer as the literal next char, then continue
with normal i/o.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Always pre-figure the space available in the read_buf and limit
the inbound receive request to that amount.
For compatibility reasons with the non-flow-controlled interface,
n_tty_receive_buf() will continue filling read_buf until all data
has been received or receive_room() returns 0.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Convert to modal receive_buf processing; factor char receive
processing for unusual termios settings out of normal per-char
i/o path.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Factor 'special' per-char processing into standalone fn,
n_tty_receive_char_special(), which handles processing for chars
marked in the char_map.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Relocate the IXANY restart tty test to code paths where the
the received char is not START_CHAR, STOP_CHAR, INTR_CHAR,
QUIT_CHAR or SUSP_CHAR.
Fixes the condition when ISIG if off and one of INTR_CHAR,
QUIT_CHAR or SUSP_CHAR does not restart i/o.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>