Commit Graph

145 Commits

Author SHA1 Message Date
Chuck Lever
88770b8de3 svcrdma: Fix stale comment
Commit 7d81ee8722 ("svcrdma: Single-stage RDMA Read") changed the
behavior of svc_rdma_recvfrom() but neglected to update the
documenting comment.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-18 12:09:08 -04:00
Chuck Lever
baf6d18b11 svcrdma: Prevent page release when nothing was received
I noticed that svc_rqst_release_pages() was still unnecessarily
releasing a page when svc_rdma_recvfrom() returns zero.

Fixes: a53d5cb064 ("svcrdma: Avoid releasing a page in svc_xprt_release()")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17 13:18:04 -04:00
Chuck Lever
c5d68d25bd svcrdma: Clean up allocation of svc_rdma_recv_ctxt
The physical device's favored NUMA node ID is available when
allocating a recv_ctxt. Use that value instead of relying on the
assumption that the memory allocation happens to be running on a
node close to the device.

This clean up eliminates the hack of destroying recv_ctxts that
were not created by the receive CQ thread -- recv_ctxts are now
always allocated on a "good" node.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12 12:16:35 -04:00
NeilBrown
948f072ada SUNRPC: always free ctxt when freeing deferred request
Since the ->xprt_ctxt pointer was added to svc_deferred_req, it has not
been sufficient to use kfree() to free a deferred request.  We may need
to free the ctxt as well.

As freeing the ctxt is all that ->xpo_release_rqst() does, we repurpose
it to explicit do that even when the ctxt is not stored in an rqst.
So we now have ->xpo_release_ctxt() which is given an xprt and a ctxt,
which may have been taken either from an rqst or from a dreq.  The
caller is now responsible for clearing that pointer after the call to
->xpo_release_ctxt.

We also clear dr->xprt_ctxt when the ctxt is moved into a new rqst when
revisiting a deferred request.  This ensures there is only one pointer
to the ctxt, so the risk of double freeing in future is reduced.  The
new code in svc_xprt_release which releases both the ctxt and any
rq_deferred depends on this.

Fixes: 773f91b2cf ("SUNRPC: Fix NFSD's request deferral on RDMA transports")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-05-14 15:55:02 -04:00
Chuck Lever
319951eba0 SUNRPC: Remove ->xpo_secure_port()
There's no need for the cost of this extra virtual function call
during every RPC transaction: the RQ_SECURE bit can be set properly
in ->xpo_recvfrom() instead.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:55 -05:00
Chuck Lever
983084b267 SUNRPC: Remove svc_rqst::rq_xprt_hlen
Clean up: This field is now always set to zero.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-19 12:25:39 -04:00
Chuck Lever
773f91b2cf SUNRPC: Fix NFSD's request deferral on RDMA transports
Trond Myklebust reports an NFSD crash in svc_rdma_sendto(). Further
investigation shows that the crash occurred while NFSD was handling
a deferred request.

This patch addresses two inter-related issues that prevent request
deferral from working correctly for RPC/RDMA requests:

1. Prevent the crash by ensuring that the original
   svc_rqst::rq_xprt_ctxt value is available when the request is
   revisited. Otherwise svc_rdma_sendto() does not have a Receive
   context available with which to construct its reply.

2. Possibly since before commit 71641d99ce ("svcrdma: Properly
   compute .len and .buflen for received RPC Calls"),
   svc_rdma_recvfrom() did not include the transport header in the
   returned xdr_buf. There should have been no need for svc_defer()
   and friends to save and restore that header, as of that commit.
   This issue is addressed in a backport-friendly way by simply
   having svc_rdma_recvfrom() set rq_xprt_hlen to zero
   unconditionally, just as svc_tcp_recvfrom() does. This enables
   svc_deferred_recv() to correctly reconstruct an RPC message
   received via RPC/RDMA.

Reported-by: Trond Myklebust <trondmy@hammerspace.com>
Link: https://lore.kernel.org/linux-nfs/82662b7190f26fb304eb0ab1bb04279072439d4e.camel@hammerspace.com/
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org>
2022-04-06 14:01:27 -04:00
Chuck Lever
8dcc5721da svcrdma: Split the svcrdma_wc_receive() tracepoint
There are currently three separate purposes being served by a single
tracepoint here. They need to be split up.

svcrdma_wc_recv:
 - status is always zero, so there's no value in recording it.
 - vendor_err is meaningless unless status is not zero, so
   there's no value in recording it.
 - This tracepoint is needed only when developing modifications,
   so it should be left disabled most of the time.

svcrdma_wc_recv_flush:
 - As above, needed only rarely, and not an error.

svcrdma_wc_recv_err:
 - received is always zero, so there's no value in recording it.
 - This tracepoint can be left enabled because completion
   errors are run-time problems (except for FLUSHED_ERR).
 - Tracepoint name now ends in _err to reflect its purpose.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-04 15:40:15 -04:00
Chuck Lever
e3eded5e81 svcrdma: Clean up dto_q critical section in svc_rdma_recvfrom()
This, to me, seems less cluttered and less redundant. I was hoping
it could help reduce lock contention on the dto_q lock by reducing
the size of the critical section, but alas, the only improvement is
readability.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-31 15:58:48 -04:00
Chuck Lever
5533c4f4b9 svcrdma: Remove svc_rdma_recv_ctxt::rc_pages and ::rc_arg
These fields are no longer used.

The size of struct svc_rdma_recv_ctxt is now less than 300 bytes on
x86_64, down from 2440 bytes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-31 15:57:48 -04:00
Chuck Lever
9af723be86 svcrdma: Remove sc_read_complete_q
Now that svc_rdma_recvfrom() waits for Read completion,
sc_read_complete_q is no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-31 15:57:48 -04:00
Chuck Lever
7d81ee8722 svcrdma: Single-stage RDMA Read
Currently the generic RPC server layer calls svc_rdma_recvfrom()
twice to retrieve an RPC message that uses Read chunks. I'm not
exactly sure why this design was chosen originally.

Instead, let's wait for the Read chunk completion inline in the
first call to svc_rdma_recvfrom().

The goal is to eliminate some page allocator churn.
rdma_read_complete() replaces pages in the second svc_rqst by
calling put_page() repeatedly while the upper layer waits for the
request to be constructed, which adds unnecessary NFS WRITE round-
trip latency.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
2021-03-31 15:57:39 -04:00
Chuck Lever
82011c80b3 SUNRPC: Move svc_xprt_received() call sites
Currently, XPT_BUSY is not cleared until xpo_recvfrom returns.
That effectively blocks the receipt and handling of the next RPC
message until the current one has been taken off the transport.
This strict ordering is a requirement for socket transports.

For our kernel RPC/RDMA transport implementation, however, dequeuing
an ingress message is nothing more than a list_del(). The transport
can safely be marked un-busy as soon as that is done.

To keep the changes simpler, this patch just moves the
svc_xprt_received() call site from svc_handle_xprt() into the
transports, so that the actual optimization can be done in a
subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 13:22:13 -04:00
Chuck Lever
e844d307d4 svcrdma: Add a "deferred close" helper
Refactor a bit of commonly used logic so that every site that wants
a close deferred to an nfsd thread does all the right things
(set_bit(XPT_CLOSE) then enqueue).

Also, once XPT_CLOSE is set on a transport, it is never cleared. If
XPT_CLOSE is already set, then the close is already being handled
and the enqueue can be skipped.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 13:22:13 -04:00
Chuck Lever
c558d47596 svcrdma: Maintain a Receive water mark
Post more Receives when the number of pending Receives drops below
a water mark. The batch mechanism is disabled if the underlying
device cannot support a reasonably-sized Receive Queue.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 13:22:13 -04:00
Chuck Lever
7b748c30cc svcrdma: Use svc_rdma_refresh_recvs() in wc_receive
Replace svc_rdma_post_recv() with the new batch receive mechanism.
For the moment it is posting just a single Receive WR at a time,
so no change in behavior is expected.

Since svc_rdma_wc_receive() was the last call site for
svc_rdma_post_recv(), it is removed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 13:22:13 -04:00
Chuck Lever
77f0a2aa5c svcrdma: Add a batch Receive posting mechanism
Introduce a server-side mechanism similar to commit e340c2d6ef
("xprtrdma: Reduce the doorbell rate (Receive)") to post Receive
WRs in batch. Its first consumer is svc_rdma_post_recvs(), which
posts the initial set of Receive WRs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 13:22:13 -04:00
Chuck Lever
c6b7ed8f94 svcrdma: Remove stale comment for svc_rdma_wc_receive()
xprt pinning was removed in commit 365e9992b9 ("svcrdma: Remove
transport reference counting"), but this comment was not updated
to reflect that change.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 10:19:05 -04:00
Chuck Lever
072db263e1 svcrdma: RPCDBG_FACILITY is no longer used
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 10:19:05 -04:00
Chuck Lever
bade4be69a svcrdma: Revert "svcrdma: Reduce Receive doorbell rate"
I tested commit 43042b90ca ("svcrdma: Reduce Receive doorbell
rate") with mlx4 (IB) and software iWARP and didn't find any
issues. However, I recently got my hardware iWARP setup back on
line (FastLinQ) and it's crashing hard on this commit (confirmed
via bisect).

The failure mode is complex.
 - After a connection is established, the first Receive completes
   normally.
 - But the second and third Receives have garbage in their Receive
   buffers. The server responds with ERR_VERS as a result.
 - When the client tears down the connection to retry, a couple
   of posted Receives flush twice, and that corrupts the recv_ctxt
   free list.
 - __svc_rdma_free then faults or loops infinitely while destroying
   the xprt's recv_ctxts.

Since 43042b90ca ("svcrdma: Reduce Receive doorbell rate") does
not fix a bug but is a scalability enhancement, it's safe and
appropriate to revert it while working on a replacement.

Fixes: 43042b90ca ("svcrdma: Reduce Receive doorbell rate")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-11 15:26:07 -05:00
Chuck Lever
dd2d055b27 svcrdma: DMA-sync the receive buffer in svc_rdma_recvfrom()
The Receive completion handler doesn't look at the contents of the
Receive buffer. The DMA sync isn't terribly expensive but it's one
less thing that needs to be done by the Receive completion handler,
which is single-threaded (per svc_xprt). This helps scalability.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-01-25 09:36:28 -05:00
Chuck Lever
43042b90ca svcrdma: Reduce Receive doorbell rate
This is similar to commit e340c2d6ef ("xprtrdma: Reduce the
doorbell rate (Receive)") which added Receive batching to the
client.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:28 -05:00
Chuck Lever
df971cd853 svcrdma: Convert rdma_stat_recv to a per-CPU counter
Receives are frequent events. Avoid the overhead of a memory bus
lock cycle for counting a value that is hardly every used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:28 -05:00
Chuck Lever
d96962e6d0 svcrdma: Use the new parsed chunk list when pulling Read chunks
As a pre-requisite for handling multiple Read chunks in each Read
list, convert svc_rdma_recv_read_chunk() to use the new parsed Read
chunk list.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 13:00:23 -05:00
Chuck Lever
7954c8503b svcrdma: Remove chunk list pointers
Clean up: These pointers are no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 13:00:23 -05:00
Chuck Lever
9d0b09d5ef svcrdma: Support multiple write chunks when pulling up
When counting the number of SGEs needed to construct a Send request,
do not count result payloads. And, when copying the Reply message
into the pull-up buffer, result payloads are not to be copied to the
Send buffer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 13:00:22 -05:00
Chuck Lever
7a1cbfa180 svcrdma: Use parsed chunk lists to construct RDMA Writes
Refactor: Instead of re-parsing the ingress RPC Call transport
header when constructing RDMA Writes, use the new parsed chunk lists
for the Write list and Reply chunk, which are version-agnostic and
already XDR-decoded.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 13:00:22 -05:00
Chuck Lever
58b2e0fefa svcrdma: Use parsed chunk lists to detect reverse direction replies
Refactor: Don't duplicate header decoding smarts here. Instead, use
the new parsed chunk lists.

Note that the XID sanity test is also removed. The XID is already
looked up by the cb handler, and is rejected if it's not recognized.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 13:00:22 -05:00
Chuck Lever
eb3de6a49d svcrdma: Use parsed chunk lists to derive the inv_rkey
Refactor: Don't duplicate header decoding smarts here. Instead, use
the new parsed chunk lists.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 13:00:22 -05:00
Chuck Lever
78147ca8b4 svcrdma: Add a "parsed chunk list" data structure
This simple data structure binds the location of each data payload
inside of an RPC message to the chunk that will be used to push it
to or pull it from the client.

There are several benefits to this small additional overhead:

 * It enables support for more than one chunk in incoming Read and
   Write lists.

 * It translates the version-specific on-the-wire format into a
   generic in-memory structure, enabling support for multiple
   versions of the RPC/RDMA transport protocol.

 * It enables the server to re-organize a chunk list if it needs to
   adjust where Read chunk data lands in server memory without
   altering the contents of the XDR-encoded Receive buffer.

Construction of these lists is done while sanity checking each
incoming RPC/RDMA header. Subsequent patches will make use of the
generated data structures.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 13:00:22 -05:00
Chuck Lever
365e9992b9 svcrdma: Remove transport reference counting
Jason tells me that a ULP cannot rely on getting an ESTABLISHED
and DISCONNECTED event pair for each connection, so transport
reference counting in the CM event handler will never be reliable.

Now that we have ib_drain_qp(), svcrdma should no longer need to
hold transport references while Sends and Receives are posted. So
remove the get/put call sites in the CM event handlers.

This eliminates a significant source of locked memory bus traffic.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-28 10:18:14 -04:00
Chuck Lever
64d2642251 svcrdma: Fix another Receive buffer leak
During a connection tear down, the Receive queue is flushed before
the device resources are freed. Typically, all the Receives flush
with IB_WR_FLUSH_ERR.

However, any pending successful Receives flush with IB_WR_SUCCESS,
and the server automatically posts a fresh Receive to replace the
completing one. This happens even after the connection has closed
and the RQ is drained. Receives that are posted after the RQ is
drained appear never to complete, causing a Receive resource leak.
The leaked Receive buffer is left DMA-mapped.

To prevent these late-posted recv_ctxt's from leaking, block new
Receive posting after XPT_CLOSE is set.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-28 10:18:13 -04:00
Chuck Lever
007140ee9b svcrdma: Record Receive completion ID in svc_rdma_decode_rqst
When recording a trace event in the Receive path, tie decoding
results and errors to an incoming Receive completion.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13 17:28:24 -04:00
Chuck Lever
9b3bcf8c5c svcrdma: Introduce Receive completion IDs
Set up a completion ID in each svc_rdma_recv_ctxt. The ID is used
to match an incoming Receive completion to a transport and to a
previous ib_post_recv().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13 17:28:24 -04:00
Chuck Lever
f60a08697d svcrdma: Add common XDR decoders for RDMA and Read segments
Clean up: De-duplicate some code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13 17:28:24 -04:00
Chuck Lever
07e9a6325a SUNRPC: Add helpers for decoding list discriminators symbolically
Use these helpers in a few spots to demonstrate their use.

The remaining open-coded discriminator checks in rpcrdma will be
addressed in subsequent patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13 17:28:24 -04:00
Chuck Lever
ba6cc97738 svcrdma: Consolidate send_error helper functions
Final refactor: Replace internals of svc_rdma_send_error() with a
simple call to svc_rdma_send_error_msg().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13 17:28:24 -04:00
Chuck Lever
d1f6e2369c svcrdma: Add @rctxt parameter to svc_rdma_send_error() functions
Another step towards making svc_rdma_send_error_msg() and
svc_rdma_send_error() similar enough to eliminate one of them.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13 17:28:24 -04:00
Chuck Lever
27ce629444 svcrdma: Rename tracepoints that record header decoding errors
Clean up: Use a consistent naming convention so that these trace
points can be enabled quickly via a glob.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18 10:21:21 -04:00
Chuck Lever
ea740bd5f5 svcrdma: Fix backchannel return code
Way back when I was writing the RPC/RDMA server-side backchannel
code, I misread the TCP backchannel reply handler logic. When
svc_tcp_recvfrom() successfully receives a backchannel reply, it
does not return -EAGAIN. It sets XPT_DATA and returns zero.

Update svc_rdma_recvfrom() to return zero. Here, XPT_DATA doesn't
need to be set again: it is set whenever a new message is received,
behind a spin lock in a single threaded context.

Also, if handling the cb reply is not successful, the message is
simply dropped. There's no special message framing to deal with as
there is in the TCP case.

Now that the handle_bc_reply() return value is ignored, I've removed
the dprintk call sites in the error exit of handle_bc_reply() in
favor of trace points in other areas that already report the error
cases.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-05-18 10:21:21 -04:00
Chuck Lever
23cf1ee1f1 svcrdma: Fix leak of svc_rdma_recv_ctxt objects
Utilize the xpo_release_rqst transport method to ensure that each
rqstp's svc_rdma_recv_ctxt object is released even when the server
cannot return a Reply for that rqstp.

Without this fix, each RPC whose Reply cannot be sent leaks one
svc_rdma_recv_ctxt. This is a 2.5KB structure, a 4KB DMA-mapped
Receive buffer, and any pages that might be part of the Reply
message.

The leak is infrequent unless the network fabric is unreliable or
Kerberos is in use, as GSS sequence window overruns, which result
in connection loss, are more common on fast transports.

Fixes: 3a88092ee3 ("svcrdma: Preserve Receive buffer until svc_rdma_sendto")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-04-17 12:40:38 -04:00
Chuck Lever
aee4b74a3f svcrdma: Fix double sync of transport header buffer
Performance optimization: Avoid syncing the transport buffer twice
when Reply buffer pull-up is necessary.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:33 -04:00
Chuck Lever
6fd5034db4 svcrdma: Refactor chunk list encoders
Same idea as the receive-side changes I did a while back: use
xdr_stream helpers rather than open-coding the XDR chunk list
encoders. This builds the Reply transport header from beginning to
end without backtracking.

As additional clean-ups, fill in documenting comments for the XDR
encoders and sprinkle some trace points in the new encoding
functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:33 -04:00
Chuck Lever
2fe8c44633 svcrdma: De-duplicate code that locates Write and Reply chunks
Cache the locations of the Requester-provided Write list and Reply
chunk so that the Send path doesn't need to parse the Call header
again.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:32 -04:00
Chuck Lever
e604aad2ca svcrdma: Use struct xdr_stream to decode ingress transport headers
The logic that checks incoming network headers has to be scrupulous.

De-duplicate: replace open-coded buffer overflow checks with the use
of xdr_stream helpers that are used most everywhere else XDR
decoding is done.

One minor change to the sanity checks: instead of checking the
length of individual segments, cap the length of the whole chunk
to be sure it can fit in the set of pages available in rq_pages.
This should be a better test of whether the server can handle the
chunks in each request.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:32 -04:00
Chuck Lever
412055398b nfsd: Fix NFSv4 READ on RDMA when using readv
svcrdma expects that the payload falls precisely into the xdr_buf
page vector. This does not seem to be the case for
nfsd4_encode_readv().

This code is called only when fops->splice_read is missing or when
RQ_SPLICE_OK is clear, so it's not a noticeable problem in many
common cases.

Add new transport method: ->xpo_read_payload so that when a READ
payload does not fit exactly in rq_res's page vector, the XDR
encoder can inform the RPC transport exactly where that payload is,
without the payload's XDR pad.

That way, when a Write chunk is present, the transport knows what
byte range in the Reply message is supposed to be matched with the
chunk.

Note that the Linux NFS server implementation of NFS/RDMA can
currently handle only one Write chunk per RPC-over-RDMA message.
This simplifies the implementation of this fix.

Fixes: b042098063 ("nfsd4: allow exotic read compounds")
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=198053
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:31 -04:00
Chuck Lever
4866073e6d svcrdma: Use llist for managing cache of recv_ctxts
Use a wait-free mechanism for managing the svc_rdma_recv_ctxts free
list. Subsequently, sc_recv_lock can be eliminated.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19 10:59:28 -04:00
Chuck Lever
8820bcaa5b svcrdma: Remove syslog warnings in work completion handlers
These can result in a lot of log noise, and are able to be triggered
by client misbehavior. Since there are trace points in these
handlers now, there's no need to spam the log.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06 15:37:15 -05:00
J. Bruce Fields
95503d295a svcrpc: fix unlikely races preventing queueing of sockets
In the rpc server, When something happens that might be reason to wake
up a thread to do something, what we do is

	- modify xpt_flags, sk_sock->flags, xpt_reserved, or
	  xpt_nr_rqsts to indicate the new situation
	- call svc_xprt_enqueue() to decide whether to wake up a thread.

svc_xprt_enqueue may require multiple conditions to be true before
queueing up a thread to handle the xprt.  In the SMP case, one of the
other CPU's may have set another required condition, and in that case,
although both CPUs run svc_xprt_enqueue(), it's possible that neither
call sees the writes done by the other CPU in time, and neither one
recognizes that all the required conditions have been set.  A socket
could therefore be ignored indefinitely.

Add memory barries to ensure that any svc_xprt_enqueue() call will
always see the conditions changed by other CPUs before deciding to
ignore a socket.

I've never seen this race reported.  In the unlikely event it happens,
another event will usually come along and the problem will fix itself.
So I don't think this is worth backporting to stable.

Chuck tried this patch and said "I don't see any performance
regressions, but my server has only a single last-level CPU cache."

Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06 15:37:14 -05:00
Chuck Lever
97bce63408 svcrdma: Optimize the logic that selects the R_key to invalidate
o Select the R_key to invalidate while the CPU cache still contains
  the received RPC Call transport header, rather than waiting until
  we're about to send the RPC Reply.

o Choose Send With Invalidate if there is exactly one distinct R_key
  in the received transport header. If there's more than one, the
  client will have to perform local invalidation after it has
  already waited for remote invalidation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-28 18:36:03 -05:00