The various inputs & outputs of the crypto functions as well as the
values of the ECDH keys can be considered security sensitive. They
should therefore not end up in dmesg by mistake. This patch introduces a
new SMP_DBG macro which requires explicit compilation with -DDEBUG to be
enabled. All crypto related data logs now use this macro instead of
BT_DBG.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds basic OOB pairing support when we've received the remote
OOB data. This includes tracking the remote r value (in smp->rr) as well
as doing the appropriate f4() call when needed. Previously the OOB rand
would have been stored in smp->rrnd however these are actually two
independent values so we need separate variables for them. Na/Nb in the
spec maps to smp->prnd/rrnd and ra/rb maps to smp->rr with smp->pr to
come once local OOB data is supported.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If we have OOB data available for the remote device in question we
should set the OOB flag appropriately in the SMP pairing request or
response.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds proper support for passing LE OOB data to the
hci_add_remote_oob_data() function. For LE the 192-bit values are not
valid and should therefore be passed as NULL values.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
To be able to support OOB data for LE pairing we need to store the
address type of the remote device. This patch extends the relevant
functions and data types with a bdaddr_type variable.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There's no need to duplicate code for the 192 vs 192+256 variants of the
OOB data functions. This is also helpful to pave the way to support LE
SC OOB data where only 256 bit data is provided.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When Secure Connections-only mode is enabled we should reject any
pairing command that does not have Secure Connections set in the
authentication requirements. This patch adds the appropriate logic for
this to the command handlers of Pairing Request/Response and Security
Request.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When doing SMP over BR/EDR some of the routines can be shared with the
LE functionality whereas others needs to be split into their own BR/EDR
specific branches. This patch implements the split of BR/EDR specific
SMP code from the LE-only code, making sure SMP over BR/EDR works as
specified.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds the very basic code for creating and destroying SMP
L2CAP channels for BR/EDR connections.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
To make it possible to use LE SC functionality over BR/EDR with pre-4.1
controllers (that do not support BR/EDR SC links) it's useful to be able
to force LE SC operations even over a traditional SSP protected link.
This patch adds a debugfs switch to force a special debug flag which is
used to skip the checks for BR/EDR SC support.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
For LE Secure Connections we want to trigger cross transport key
generation only if a new link key was actually created during the BR/EDR
connection. This patch adds a new flag to track this information.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The HCI_USE_DEBUG_KEYS flag is intended to force our side to always use
debug keys for pairing. This means both BR/EDR SSP as well as SMP with
LE Secure Connections. This patch updates the SMP code to use the debug
keys instead of generating a random local key pair when the flag is set.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Since we don not actively try to clear the keypress notification bit we
might get these PDUs. To avoid failing the pairing process add a simple
dummy handler for these for now.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
According to the LE SC specification the initiating device sends its
DHKey check first and the non-initiating devices sends its DHKey check
as a response to this. It's also important that the non-initiating
device doesn't send the response if it's still waiting for user input.
In order to synchronize all this a new flag is added.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The passkey entry mechanism involves either both sides requesting the
user for a passkey, or one side requesting the passkey while the other
one displays it. The behavior as far as SMP PDUs are concerned are
considerably different from numeric comparison and therefore requires
several new functions to handle it.
In essence passkey entry involves both sides gradually committing to
each bit of the passkey which involves 20 rounds of pairing confirm and
pairing random PDUS being sent in both directions.
This patch adds a new smp->passkey_round variable to track the current
round of the passkey commitment and reuses the variables already present
in struct hci_conn for the passkey and entered key count.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We need to set the correct Link Key type based on the properties of the
LE SC pairing that it was derived from. If debug keys were used the type
should be a debug key, and the authenticated vs unauthenticated
information should be set on what kind of security level was reached.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If the just-works method was chosen we shouldn't send anything to user
space but simply proceed with sending the DHKey Check PDU. This patch
adds the necessary code for it.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
After generating the LTK we should set the correct type (normal SC or
debug) and authentication information for it.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
It is very unlikely, but to have a 100% guarantee of the generated key
type we need to reject any keys which happen to match the debug key.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We need to be able to detect if the remote side used a debug key for the
pairing. This patch adds the debug key defines and sets a flag to
indicate that a debug key was used. The debug private key (debug_sk) is
also added in this patch but will only be used in a subsequent patch
when local debug key support is implemented.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds code to select the authentication method for Secure
Connections based on the local and remote capabilities. A new
DSP_PASSKEY method is also added for displaying the passkey - something
that is not part of legacy SMP pairing.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
For Secure Connections we'll select the authentication method as soon as
we receive the public key, but only use it later (both when actually
triggering the method as well as when determining the quality of the
resulting LTK). Store the method therefore in the SMP context.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
As the last step of the LE SC pairing process it's time to generate and
distribute keys. The generation part is unique to LE SC and so this
patch adds a dedicated function for it. We also clear the distribution
bits for keys which are not distributed with LE SC, so that the code
shared with legacy SMP will not go ahead and try to distribute them.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Once we receive the DHKey check PDU it's time to first verify that the
value is correct and then proceed with encrypting the link.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
With LE SC, once the user has responded to the numeric comparison it's
time to send DHKey check values in both directions. The DHKey check
value is generated using new smp_f5 and smp_f6 cryptographic functions.
The smp_f5 function is responsible for generating the LTK and the MacKey
values whereas the smp_f6 function takes the MacKey as input and
generates the DHKey Check value.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
After the Pairing Confirm and Random PDUs have been exchanged in LE SC
it's time to generate a numeric comparison value using a new smp_g2
cryptographic function (which also builds on AES-CMAC). This patch adds
the smp_g2 implementation and updates the Pairing Random PDU handler to
proceed with the value genration and user confirmation.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When LE SC is being used we should always respond to it by sending our
local random number. This patch adds a convenience function for it which
also contains a check for the pre-requisite public key exchange
completion
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Once the public key exchange is complete the next step is for the
non-initiating device to send a SMP Pairing Confirm PDU to the
initiating device. This requires the use of a new smp_f4 confirm value
generation function which in turn builds on the AES-CMAC cryptographic
function.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a handler function for the LE SC SMP Public Key PDU.
When we receive the key we proceed with generating the shared DHKey
value from the remote public key and local private key.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When the initial pairing request & response PDUs have been exchanged and
both have had the LE SC bit set the next step is to generate a ECDH
key pair and to send the public key to the remote side. This patch adds
basic support for generating the key pair and sending the public key
using the new Public Key SMP PDU. It is the initiating device that sends
the public key first and the non-initiating device responds by sending
its public key respectively (in a subsequent patch).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a simple ECC library that will act as a fundamental
building block for LE Secure Connections. The library has a simple API
consisting of two functions: one for generating a public/private key
pair and another one for generating a Diffie-Hellman key from a local
private key and a remote public key.
The code has been taken from https://github.com/kmackay/easy-ecc and
modified to conform with the kernel coding style.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Most of the LE Secure Connections SMP crypto functions build on top of
the AES-CMAC function. This patch adds access to AES-CMAC in the kernel
crypto subsystem by allocating a crypto_hash handle for it in a similar
way that we have one for AES-CBC.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Depending on whether Secure Connections is enabled or not we may need to add
the link key generation bit to the key distribution. This patch does the
necessary modifications to the build_pairing_cmd() function.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Now that hci_find_ltk_by_addr is the only LTK lookup function there's no
need to keep the long name anymore. This patch shortens the function
name to simply hci_find_ltk.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Now that LTKs are always looked up based on bdaddr (with EDiv/Rand
checks done after a successful lookup) the hci_find_ltk function is not
needed anymore. This patch removes the function.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
LTKs derived from Secure Connections based pairing are symmetric, i.e.
they should match both master and slave role. This patch updates the LTK
lookup functions to ignore the desired role when dealing with SC LTKs.
Furthermore, with Secure Connections the EDiv and Rand values are not
used and should always be set to zero. This patch updates the LTK lookup
to first use the bdaddr as key and then do the necessary verifications
of EDiv and Rand based on whether the found LTK is for SC or not.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Since LE Secure Connections is a purely host-side feature we should
offer the Secure Connections mgmt setting for any adapter with LE
support. This patch updates the supported settings value and the
set_secure_conn command handler accordingly.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Since the HCI_SC_ENABLED flag will also be used for controllers without
BR/EDR Secure Connections support whenever we need to check specifically
for SC for BR/EDR we also need to check that the controller actually
supports it. This patch adds a convenience macro for check all the
necessary conditions and converts the places in the code that need it to
use it.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When the looked-up LTK is one generated by Secure Connections pairing
the security level it gives is BT_SECURITY_FIPS. This patch updates the
LTK request event handler to correctly set this level.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We need a dedicated LTK type for LTK resulting from a Secure Connections
based SMP pairing. This patch adds a new define for it and ensures that
both the New LTK event as well as the Load LTKs command supports it.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch updates the functions which map the SMP authentication
request to a security level and vice-versa to take into account the
Secure Connections feature.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a new SMP flag for tracking whether Secure Connections
is in use and sets the flag when both remote and local side have elected
to use Secure Connections.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If we haven't enabled SC support on our side we should use the same mask
for the authentication requirement as we were using before SC support
was added, otherwise we should use the extended mask for SC.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds basic SMP defines for commands, error codes and PDU
definitions for the LE Secure Connections feature.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The elements must be u32 sized for the used hash function.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Sven-Haegar Koch reported the issue:
sims:~# iptables -A OUTPUT -m set --match-set testset src -j ACCEPT
iptables: Invalid argument. Run `dmesg' for more information.
In syslog:
x_tables: ip_tables: set.3 match: invalid size 48 (kernel) != (user) 32
which was introduced by the counter extension in ipset.
The patch fixes the alignment issue with introducing a new set match
revision with the fixed underlying 'struct ip_set_counter_match'
structure.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When the set was full (hash type and maxelem reached), it was not
possible to update the extension part of already existing elements.
The patch removes this limitation.
Fixes: https://bugzilla.netfilter.org/show_bug.cgi?id=880
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When we get a Link Key Notification HCI event we should already have a
hci_conn object. This should have been created either in the Connection
Request event handler, the hci_connect_acl() function or the
hci_cs_create_conn() function (if the request was not sent by the
kernel).
Since the only case that we'd end up not having a hci_conn in the Link
Key Notification event handler would be essentially broken hardware it's
safe to simply bail out from the function if this happens.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
To allow brport device to return current brport flags set on port. Add
returned flags to nested IFLA_PROTINFO netlink msg built in dflt getlink.
With this change, netlink msg returned for bridge_getlink contains the port's
offloaded flag settings (the port's SELF settings).
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
When the swdev device learns a new mac/vlan on a port, it sends some async
notification to the driver and the driver installs an FDB in the device.
To give a holistic system view, the learned mac/vlan should be reflected
in the bridge's FBD table, so the user, using normal iproute2 cmds, can view
what is currently learned by the device. This API on the bridge driver gives
a way for the swdev driver to install an FBD entry in the bridge FBD table.
(And remove one).
This is equivalent to the device running these cmds:
bridge fdb [add|del] <mac> dev <dev> vid <vlan id> master
This patch needs some extra eyeballs for review, in paricular around the
locking and contexts.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To notify switch driver of change in STP state of bridge port, add new
.ndo op and provide switchdev wrapper func to call ndo op. Use it in bridge
code then.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Thomas Graf <tgraf@suug.ch>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netdevice represents a port in a switch, it will expose
IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with the same value
belong to one physical switch.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Thomas Graf <tgraf@suug.ch>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The goal of this is to provide a possibility to support various switch
chips. Drivers should implement relevant ndos to do so. Now there is
only one ndo defined:
- for getting physical switch id is in place.
Note that user can use random port netdevice to access the switch.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So this can be reused for identification of other "items" as well.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Thomas Graf <tgraf@suug.ch>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do the work of parsing NDA_VLAN directly in rtnetlink code, pass simple
u16 vid to drivers from there.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current name might seem that this actually offloads the fdb entry to
hw. So rename it to clearly present that this for hardware address
addition/removal.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The maximum size of ATR_REQ and ATR_RES is 64 bytes.
The maximum number of General Bytes is calculated by
the maximum number of data bytes in the ATR_REQ/ATR_RES,
substracted by the number of mandatory data bytes.
ATR_REQ: 16 mandatory data bytes, giving a maximum of
48 General Bytes.
ATR_RES: 17 mandatory data bytes, giving a maximum of
47 General Bytes.
Regression introduced in commit a99903ec.
Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Fixing: net/nfc/nci/ntf.c:106:31: warning: cast to restricted __le16
message when building with make C=1 CF=-D__CHECK_ENDIAN__
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Fix warnings:
net/nfc/llcp_commands.c:421:14: warning: incorrect type in assignment (different base types)
net/nfc/llcp_commands.c:421:14: expected unsigned short [unsigned] [usertype] miux
net/nfc/llcp_commands.c:421:14: got restricted __be16
net/nfc/llcp_commands.c:477:14: warning: incorrect type in assignment (different base types)
net/nfc/llcp_commands.c:477:14: expected unsigned short [unsigned] [usertype] miux
net/nfc/llcp_commands.c:477:14: got restricted __be16
Procedure to reproduce:
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Some pipe are only created by other host (different than the
Terminal Host).
The pipe values will for example be notified by
NFC_HCI_ADM_NOTIFY_PIPE_CREATED.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
se_io allows to send apdu over the CLF to the embedded
Secure Element.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Some tag might get deactivated after some read or write tentative.
This may happen for example with Mifare Ultralight C tag when trying
to read the last 4 blocks (starting block 0x2c) configured as write
only.
NFC_CMD_ACTIVATE_TARGET will try to reselect the tag in order to
detect if it got remove from the field or if it is still present.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
nci_rf_deactivate_req only support NCI_DEACTIVATE_TYPE_IDLE_MODE.
In some situation, it might be necessary to be able to support other
NCI_DEACTIVATE_TYPE such as NCI_DEACTIVATE_TYPE_SLEEP_MODE in order for
example to reactivate the selected target.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
A notification for rf deaction can be IDLE_MODE, SLEEP_MODE,
SLEEP_AF_MODE and DISCOVERY. According to each type and the NCI
state machine is different (see figure 10 RF Communication State
Machine in NCI specification)
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The nci status byte was ignored. In case of tag reading for example,
if the tag is removed from the antenna there is no way for the upper
layers (aka: stack) to get inform about such event.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
To pave the way for future fixed channels to be added easily we should
track both the local and remote mask on a per-L2CAP connection (struct
l2cap_conn) basis. So far the code has used a global variable in a racy
way which anyway needs fixing.
This patch renames the existing conn->fixed_chan_mask that tracked
the remote mask to conn->remote_fixed_chan and adds a new variable
conn->local_fixed_chan to track the local mask. Since the HS support
info is now available in the local mask we can remove the
conn->hs_enabled variable.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When switching from UICC to another, the CLF may signals to the Terminal
Host that some existing pipe are cleared for future update.
This notification needs to be "acked" by the Terminal Host with a ANY_OK
message.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
If our terminal connect with other host like UICC, it may create
a pipe with us, the host controller will notify us new pipe
created, after that UICC will open that pipe, if we don't handle
that request, UICC may failed to continue initialize which may
lead to card emulation feature failed to work
Signed-off-by: Arron Wang <arron.wang@intel.com>
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
se_io allows to send apdu over the CLF to the embedded Secure Element.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Some NFC controller using NCI protocols may need a proprietary commands
flow to disable a secure element
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Some NFC controller using NCI protocols may need a proprietary commands
flow to enable a secure element
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Some NFC controller using NCI protocols may need a proprietary commands
flow to discover all available secure element
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Fix sparse warning introduced by commit: 9e87f9a9c4
It was generating the following warning:
net/nfc/nci/ntf.c:170:7: sparse: symbol 'nci_get_prop_rf_protocol' was not declared. Should it be static?
Procedure to reproduce it:
# apt-get install sparse
git checkout 9e87f9a9c4
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
se_io allows to send apdu over the CLF to the embedded Secure Element.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Currently, it leaks when the allocation fails.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
All it does is indicate whether a xprt has already been deleted from
a list or not, which is unnecessary since we use list_del_init and it's
always set and checked under the sv_lock anyway.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
rtnl_link_get_net() holds a reference on the 'struct net', we need to release
it in case of error.
CC: Eric W. Biederman <ebiederm@xmission.com>
Fixes: b51642f6d7 ("net: Enable a userns root rtnl calls that are safe for unprivilged users")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It incorrectly identifies itself as "IPv4" packet logging.
Signed-off-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This can be used by drivers that cannot reliably map tx status
information onto specific skbs.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This op works like .tx_status, except it does not need access to the
skb. This will be used by drivers that cannot match tx status
information to specific packets.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some variables are assigned unconditionally, remove their
initialisations to help avoid introducing errors later.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When the regulatory settings change, some channels might become invalid.
Disconnect interfaces acting on these channels, after giving userspace
code a grace period to leave them.
This mode is currently opt-in, and not all interface operating modes are
supported for regulatory-enforcement checks. A wiphy that wishes to use
the new enforcement code must specify an appropriate regulatory flag,
and all its supported interface modes must be supported by the checking
code.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@suse.com>
[fix some indentation, typos]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fixes a crash on attempting to calculate the frame duration for a VHT
packet (which needs to be handled by hw/driver instead).
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Before signaling the deactivation, send a deactivation request if in
RFST_DISCOVERY state because neard assumes polling is stopped and will
try to restart it.
Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When the deactivation type reported by RF_DEACTIVATE_NTF is Discovery, go in
RFST_DISCOVERY state. The NFCC stays in Poll mode and/or Listen mode.
Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
As specified in NCI 1.0 and NCI 1.1, when using the NFC-DEP RF Interface, the
DH and the NFCC shall only use the Static RF Connection for data communication
with a Remote NFC Endpoint.
Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The Target responds to the ATR_REQ with the ATR_RES. Configure the General
Bytes in ATR_RES with the first three octets equal to the NFC Forum LLCP
magic number, followed by some LLC Parameters TLVs described in section
4.5 of [LLCP].
Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Changes:
* Extract the Listen mode activation parameters from RF_INTF_ACTIVATED_NTF.
* Store the General Bytes of ATR_REQ.
* Signal that Target mode is activated in case of an activation in NFC-DEP.
* Update the NCI state accordingly.
* Use the various constants defined in nfc.h.
* Fix the ATR_REQ and ATR_RES maximum size. As per NCI 1.0 and NCI 1.1, the
Activation Parameters for both Poll and Listen mode contain all the bytes of
ATR_REQ/ATR_RES starting and including Byte 3 as defined in [DIGITAL].
In [DIGITAL], the maximum size of ATR_REQ/ATR_RES is 64 bytes and they are
numbered starting from Byte 1.
Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Send LA_SEL_INFO and LF_PROTOCOL_TYPE with NFC-DEP protocol enabled.
Configure 212 Kbit/s and 412 Kbit/s bit rates for Listen F.
Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The Target mode protocols are given to the nci_start_poll() function
but were previously ignored.
To enable P2P Target, when NFC-DEP is requested as a Target mode protocol, add
NFC-A and NFC-F Passive Listen modes in RF_DISCOVER_CMD command.
Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
list_for_each_entry_safe() is necessary if list objects are deleted from
the list while traversing it. Not the case here, so we can use the base
list_for_each_entry variant.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When an NFC-DEP target receives an ATN PDU, its
supposed to respond with a similar ATN PDU.
When the Target receives an I PDU with the PNI
one less than the current PNI and the last PDU
sent was an ATN PDU, the Target is to resend the
last non-ATN PDU that it has sent. This is
described in section 14.12.3.4 of the NFC Digital
Protocol Spec.
The digital layer's NFC-DEP code doesn't implement
this so add that support.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When an NFC-DEP Initiator times out when waiting for
a DEP_RES from the Target, its supposed to send an
ATN to the Target. The Target should respond to the
ATN with a similar ATN PDU and the Initiator can then
resend the last non-ATN PDU that it sent. No more
than 'N(retry,atn)' are to be send where
2 <= 'N(retry,atn)' <= 5. If the Initiator had just
sent a NACK PDU when the timeout occurred, it is to
continue sending NACKs until 'N(retry,nack)' NACKs
have been send. This is described in section
14.12.5.6 of the NFC-DEP Digital Protocol Spec.
The digital layer's NFC-DEP code doesn't implement
this so add that support.
The value chosen for 'N(retry,atn)' is 2.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When an NFC-DEP Target receives a NACK PDU with
a PNI equal to 1 less than the current PNI, it
is supposed to re-send the last PDU. This is
implied in section 14.12.5.4 of the NFC Digital
Protocol Spec.
The digital layer's NFC-DEP code doesn't implement
Target-side NACK handing so add it. The last PDU
that was sent is saved in the 'nfc_digital_dev'
structure's 'saved_skb' member. The skb will have
an additional reference taken to ensure that the skb
isn't freed when the driver performs a kfree_skb()
on the skb. The length of the skb/PDU is also saved
so the length can be restored when re-sending the PDU
in the skb (the driver will perform an skb_pull() so
an skb_push() needs to be done to restore the skb's
data pointer/length).
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When an NFC-DEP Initiator receives a frame with
an incorrect CRC or with a parity error, and the
frame is at least 4 bytes long, its supposed to
send a NACK to the Target. The Initiator can
send up to 'N(retry,nack)' consecutive NACKs
where 2 <= 'N(retry,nack)' <= 5. When the limit
is exceeded, a PROTOCOL EXCEPTION is raised.
Any other type of transmission error is to be
ignored and the Initiator should continue
waiting for a new frame. This is described
in section 14.12.5.4 of the NFC Digital Protocol
Spec.
The digital layer's NFC-DEP code doesn't implement
any of this so add it. This support diverges from
the spec in two significant ways:
a) NACKs will be sent for ANY error reported by the
driver except a timeout. This is done because
there is currently no way for the digital layer
to distinguish a CRC or parity error from any
other type of error reported by the driver.
b) All other errors will cause a PROTOCOL EXCEPTION
even frames with CRC errors that are less than 4
bytes.
The value chosen for 'N(retry,nack)' is 2.
Targets do not send NACK PDUs.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When the peer in an NFC-DEP exchange has a
packet to send that is larger than the local
maximum payload, it sets the 'MI' bit in the
'I' PDU. This indicates that NFC-DEP chaining
is to occur.
When such a PDU is received, the local side
responds with an 'ACK' PDU and this continues
until the peer sends an 'I' PDU with the 'MI'
bit cleared. This indicates that the chaining
sequence is complete and the entire packet has
been transferred.
Receiving chained PDUs is currently not supported
by the digital layer so add that support. When a
chaining sequence is initiated by the peer, the
digital layer will allocate an skb large enough
to hold 8 maximum sized frame payloads. The maximum
payload can range from 64 to 254 bytes so 8 * 254 =
2032 seems like a reasonable compromise between
potentially wasting memory and constantly reallocating
new, larger skbs.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When the NFC-DEP code is given a packet to send
that is larger than the peer's maximum payload,
its supposed to set the 'MI' bit in the 'I' PDU's
Protocol Frame Byte (PFB). Setting this bit
indicates that NFC-DEP chaining is to occur.
When NFC-DEP chaining is progress, sender 'I' PDUs
are acknowledged with 'ACK' PDUs until the last 'I'
PDU in the chain (which has the 'MI' bit cleared)
is responded to with a normal 'I' PDU. This can
occur while in Initiator mode or in Target mode.
Sender NFC-DEP chaining is currently not implemented
in the digital layer so add that support. Unfortunately,
since sending a frame may require writing the CRC to the
end of the data, the relevant data part of the original
skb must be copied for each intermediate frame.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The maximum payload for NFC-DEP exchanges (i.e., the
number of bytes between SoD and EoD) is negotiated
using the ATR_REQ, ATR_RES, and PSL_REQ commands.
The valid maximum lengths are 64, 128, 192, and 254
bytes.
Currently, NFC-DEP code assumes that both sides are
always using 254 byte maximums and ignores attempts
by the peer to change it. Instead, implement the
negotiation code, enforce the local maximum when
receiving data from the peer, and don't send payloads
that exceed the remote's maximum. The default local
maximum is 254 bytes.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
NFC-DEP DEP_REQ and DEP_RES exchanges using 'I'
and 'ACK/NACK' PDUs have a sequence number called
the Packet Number Information (PNI). The PNI
is incremented (modulo 4) after every DEP_REQ/
DEP_RES pair and should be verified by the digital
layer code. That verification isn't always done,
though, so add code to make sure that it is done.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
According to chapter 14 of the NFC-DEP Digital
Protocol Spec., the NAD byte should never be
present in DEP_REQ or DEP_RES frames. However,
this is not enforced so add that enforcement code.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When in Target mode, the Initiator specifies whether
subsequent DEP_REQ and DEP_RES frames will include
a DID byte by the value passed in the ATR_REQ. If
the DID value in the ATR_REQ is '0' then no DID
byte will be included. If the DID value is between
'1' and '14' then a DID byte containing the same
value must be included in subsequent DEP_REQ and
DEP_RES frames. Any other DID value is invalid.
This is specified in sections 14.8.1.2 and 14.8.2.2
of the NFC Digital Protocol Spec.
Checking the DID value (if it should be there at all),
is not currently supported by the digital layer's
NFC-DEP code. Add this support by remembering the
DID value in the ATR_REQ, checking the DID value of
received DEP_REQ frames (if it should be there at all),
and including the remembered DID value in DEP_RES
frames when appropriate.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When in Initiator mode, the digital layer's
NFC-DEP code always sets the Device ID (DID)
value in the ATR_REQ to '0'. This means that
subsequent DEP_REQ and DEP_RES frames must
never include a DID byte. This is specified
in sections 14.8.1.1 and 14.8.2.1 of the NFC
Digital Protocol Spec.
Currently, the digital layer's NFC-DEP code
doesn't enforce this rule so add code to ensure
that there is no DID byte in DEP_RES frames.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Rearrange some of the code in digital_in_recv_dep_res()
and digital_tg_recv_dep_req() so the initial code looks
similar. The real reason is prepare the code for some
upcoming patches that require these changes.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When digital_in_send_cmd() or digital_tg_send_cmd()
fail, they do not free the skb that was passed to
them so the routine that allocated the skb should
free it. Currently, there are several routines in
the NFC-DEP code that don't do this so make them.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This option has been marked for deprecation and removal for
a little more than two years, but it's not been very clearly
signalled since it was always possible to just select it.
Make it unselectable now to signal anyone who's still using
it after all this time more clearly. They can still get it
back, but only by patching the kernel.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pull networking fixes from David Miller:
"Several small fixes here:
1) Don't crash in tg3 driver when the number of tx queues has been
configured to be different from the number of rx queues. From
Thadeu Lima de Souza Cascardo.
2) VLAN filter not disabled properly in promisc mode in ixgbe driver,
from Vlad Yasevich.
3) Fix OOPS on dellink op in VTI tunnel driver, from Xin Long.
4) IPV6 GRE driver WCCP code checks skb->protocol for ETH_P_IP
instead of ETH_P_IPV6, whoops. From Yuri Chislov.
5) Socket matching in ping driver is buggy when packet AF does not
match socket's AF. Fix from Jane Zhou.
6) Fix checksum calculation errors in VXLAN due to where the
udp_tunnel6_xmit_skb() helper gets it's saddr/daddr from. From
Alexander Duyck.
7) Fix 5G detection problem in rtlwifi driver, from Larry Finger.
8) Fix NULL deref in tcp_v{4,6}_send_reset, from Eric Dumazet.
9) Various missing netlink attribute verifications in bridging code,
from Thomas Graf.
10) tcp_recvmsg() unconditionally calls ipv4 ip_recv_error even for
ipv6 sockets, whoops. Fix from Willem de Bruijn"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits)
net-timestamp: make tcp_recvmsg call ipv6_recv_error for AF_INET6 socks
bridge: Sanitize IFLA_EXT_MASK for AF_BRIDGE:RTM_GETLINK
bridge: Add missing policy entry for IFLA_BRPORT_FAST_LEAVE
net: Check for presence of IFLA_AF_SPEC
net: Validate IFLA_BRIDGE_MODE attribute length
bridge: Validate IFLA_BRIDGE_FLAGS attribute length
stmmac: platform: fix default values of the filter bins setting
net/mlx4_core: Limit count field to 24 bits in qp_alloc_res
net: dsa: bcm_sf2: reset switch prior to initialization
net: dsa: bcm_sf2: fix unmapping registers in case of errors
tg3: fix ring init when there are more TX than RX channels
tcp: fix possible NULL dereference in tcp_vX_send_reset()
rtlwifi: Change order in device startup
rtlwifi: rtl8821ae: Fix 5G detection problem
Revert "netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse"
vxlan: Fix boolean flip in VXLAN_F_UDP_ZERO_CSUM6_[TX|RX]
ip6_udp_tunnel: Fix checksum calculation
net-timestamp: Fix a documentation typo
net/ping: handle protocol mismatching scenario
af_packet: fix sparse warning
...
Add a new directory heirarchy under the debugfs sunrpc/ directory:
sunrpc/
rpc_xprt/
<xprt id>/
Within that directory, we can put files that give info about the
xprts. We do have the (minor) problem that there is no succinct,
unique identifier for rpc_xprts. So we generate them synthetically
with a static atomic_t counter.
For now, this directory just holds an "info" file, but we may add
other files to it in the future.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
It's possible to get a dump of the RPC task queue by writing a value to
/proc/sys/sunrpc/rpc_debug. If you write any value to that file, you get
a dump of the RPC client task list into the log buffer. This is a rather
inconvenient interface however, and makes it hard to get immediate info
about the task queue.
Add a new directory hierarchy under debugfs:
sunrpc/
rpc_clnt/
<clientid>/
Within each clientid directory we create a new "tasks" file that will
dump info similar to what shows up in the log buffer, but with a few
small differences -- we avoid printing raw kernel addresses in favor of
symbolic names and the XID is also displayed.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Instead of keeping track of all those special cases where
VLAN interfaces have no bss_conf.chandef, just make sure
they have the same as the AP interface they belong to.
Among others, this fixes a crash getting a VLAN's channel
from userspace since a NULL channel is returned as a good
result (return value 0) for VLANs since the commit below.
Cc: stable@vger.kernel.org [3.18 only]
Fixes: c12bc4885f ("mac80211: return the vif's chandef in ieee80211_cfg_get_channel()")
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[rewrite commit log]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
One of the cases for an invalid channel definition is that
the channel pointer is NULL, in which case the warning is
a bit late since we'll dereference the pointer. Bail out
of the function upon warning about this.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This resolves linking problems with CONFIG_IPV6=n:
net/built-in.o: In function `redirect_tg6':
xt_REDIRECT.c:(.text+0x6d021): undefined reference to `nf_nat_redirect_ipv6'
Reported-by: Andreas Ruprecht <rupran@einserver.de>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds the missing bits to allow to match per meta l4proto from
the bridge. Example:
nft add rule bridge filter input ether type {ip, ip6} meta l4proto udp counter
Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch exports the functions nft_reject_iphdr_validate and
nft_reject_ip6hdr_validate to use it in follow up patches.
These functions check if the IPv4/IPv6 header is correct.
Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
add a __nfct_init_offset annotation member to struct nf_conn to make
it clear which members are covered by the memset when the conntrack
is allocated.
This avoids zeroing timer_list and ct_net; both are already inited
explicitly.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The maximum value for the hitcount parameter is given by
"ip_pkt_list_tot" parameter (default: 20).
Exceeding this value on the command line will cause the rule to be
rejected. The parameter is also readonly, i.e. it cannot be changed
without module unload or reboot.
Store size per table, then base nstamps[] size on the hitcount instead.
The module parameter is retained for backwards compatibility.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The Bluetooth spec states that automatically flushable packets may not
be sent over a LE-U link.
Signed-off-by: Steven Walter <stevenrwalter@gmail.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
These patches various bugfixes and cleanups for using NFS over RDMA, including
better error handling and performance improvements by using pad optimization.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJUdPv2AAoJENfLVL+wpUDrJkMQAKjtPZHLMcj+eHm4f1ZKLJxy
GSrUZV21TU9tL0NVE/5An8US6hoLwHpNXsW8o+gHTAeGRyiCmIaNXGd1Ql/4PYRH
zfzdNXoaJAh1N5iXX11fF3gOWqx/SolqzO2xLDVETK/3lAvq0VwMYoMElBQB6qQW
8sN3z8yVuz/9Ia9oGIFhqu1B6dcKPHkQDMtmsElGxeEX/+9yEg4HUKx+kZDtV0Uj
8/JM8Jh1FKRCQT/P6INkRItdY5KaSJGFc43BkC/8lbugfxa5XCyu/m/qMr9FJsDV
nM6rwaiVcmR/mvD3fL82+Jg/M+P9VUHQ1/Az0sV9G+fEoHH/1Mey3LfMzNpUmf9v
bykrPRuzXkPPQgN1VnjSaF2RF+CWwV9Nme1VVXM/zj8gHX1mcmQF/wPRxDuLjCrt
EObAFsvHOwDTZZmYp9bG5kc6IvwvT8aeeVQMJ4q4PSGD3w8AtoIyJDn+Ee0LFD1K
Zw0oZpTJpI4t7DVxGBSdo2wZWuMU/UKqGqGtGJ+ljXfTRuuq968Q5j5ujaA9vf0v
C9igYTU8hq4teMzhZrfR1jtTWoSS+5zamb1KtvAZy8gsht2PQVgE9xka2k/AV8uE
ul/w5HU4OV+QIrHNbiu7BE8B2Ags6smpdHMqn9fqLBwvG+JEwbWqk1zeTsajxzq+
hkvKkkMq6JjDbsDf96Yk
=YIru
-----END PGP SIGNATURE-----
Merge tag 'nfs-rdma-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next
Pull NFS client RDMA changes for 3.19 from Anna Schumaker:
"NFS: Client side changes for RDMA
These patches various bugfixes and cleanups for using NFS over RDMA, including
better error handling and performance improvements by using pad optimization.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>"
* tag 'nfs-rdma-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma:
xprtrdma: Display async errors
xprtrdma: Enable pad optimization
xprtrdma: Re-write rpcrdma_flush_cqs()
xprtrdma: Refactor tasklet scheduling
xprtrdma: unmap all FMRs during transport disconnect
xprtrdma: Cap req_cqinit
xprtrdma: Return an errno from rpcrdma_register_external()
These patches fixes for iostats and SETCLIENTID in addition to cleaning
up the nfs4_init_callback() function.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJUdfhQAAoJENfLVL+wpUDr9x4QANWmG6xjEU7RuIBLalOoxit6
eXnEDNZlwYp6NCkYktHVWaTqXdwq7fdGX+3p4eiwNg3C3SrJHkpWvjeFd4KT/SyP
B/w3vYG3o5H01i3Mb5kCD2uW0gbS9soZXpIg+uHHOl43yZzneC0vPTLhQ/h+9zPc
B32XcgQhvLIHR3LWrC/+uolsa31lcyya0W45PX+iCpzHF7i9qRBrNLODkTlw1hNQ
eEnYIvVy5oW00zlHJUYiTHP3e+0EJn5PAngdYbqiboJ9mK7DbB0QDwqyvJbIT7ql
WAip6cNcJnSv1eiVYqDwlR1ok8drK5X7yQCT3lcLzAMDznLsSAL1Itu0h2Ay3z61
f8XCyTwI0izq0DbdrMcPoqPSitqyM8nkPElnOuitXwzEroPaG40OF67yss3+ixbl
JeQZ+u35pnpCkKUaZdCK3Pn83StxmUaBcFx8eg30NBc0SN13Eiz6aZcGEperClrR
RwMLDUhUtAMcMRunRRxiN9lHafPqqeDeJre7uky0p0sU9CsH+1n5qIKLmk+Ber2d
ZS29TobdR7ktfjQ52XazMAIFzI7r1v4Zn7ziH/WRbvKoKw9ICcyoUGNy9CS5LyWu
BxWCZAJmby5H1i7V7ituJK8TImb81L4aPW06hHX9k+0SoTrgFjas//4jZYfysJ9N
PvR0MRNfMFhuBlsTj7Gy
=PMdS
-----END PGP SIGNATURE-----
Merge tag 'nfs-cel-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next
Pull pull additional NFS client changes for 3.19 from Anna Schumaker:
"NFS: Generic client side changes from Chuck
These patches fixes for iostats and SETCLIENTID in addition to cleaning
up the nfs4_init_callback() function.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>"
* tag 'nfs-cel-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma:
NFS: Clean up nfs4_init_callback()
NFS: SETCLIENTID XDR buffer sizes are incorrect
SUNRPC: serialize iostats updates
TCP timestamping introduced MSG_ERRQUEUE handling for TCP sockets.
If the socket is of family AF_INET6, call ipv6_recv_error instead
of ip_recv_error.
This change is more complex than a single branch due to the loadable
ipv6 module. It reuses a pre-existing indirect function call from
ping. The ping code is safe to call, because it is part of the core
ipv6 module and always present when AF_INET6 sockets are active.
Fixes: 4ed2d765 (net-timestamp: TCP timestamping)
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
It may also be worthwhile to add WARN_ON_ONCE(sk->family == AF_INET6)
to ip_recv_error.
Signed-off-by: David S. Miller <davem@davemloft.net>
Only search for IFLA_EXT_MASK if the message actually carries a
ifinfomsg header and validate minimal length requirements for
IFLA_EXT_MASK.
Fixes: 6cbdceeb ("bridge: Dump vlan information from a bridge port")
Cc: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: c2d3babf ("bridge: implement multicast fast leave")
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Payload is currently accessed blindly and may exceed valid message
boundaries.
Fixes: 407af3299 ("bridge: Add netlink interface to configure vlans on bridge ports")
Cc: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Having it as a sub-event for RSSI thresholds is very ugly,
but luckily no userspace actually uses the events yet.
Move the event to its own function call internally and to
its own event attribute in nl80211.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Use standard SKB list APIs associated with struct sk_buff_head to
manage socket outgoing packet chain and name table outgoing packet
chain, having relevant code simpler and more readable.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use standard SKB list APIs associated with struct sk_buff_head to
manage link's receive queue to simplify its relevant code cemplexity.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use standard SKB list APIs associated with struct sk_buff_head to
manage link's deferred queue, simplifying relevant code.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use standard SKB list APIs associated with struct sk_buff_head to
manage link transmission queue, having relevant code more clean.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pseudo message types of BUNDLE_CLOSED as well as BUNDLE_OPEN are
used to flag whether or not more messages can be bundled into a data
packet in the outgoing transmission queue. Obviously, no more messages
can be appended after the packet has been sent and is waiting to be
acknowledged and deleted. These message types do in reality represent
a send-side local implementation flag, and are not defined as part of
the protocol. It is therefore safe to move it to to where it belongs,
that is, the control area (TIPC_SKB_CB) of the buffer.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In original tipc_link_push_packet(), it pushes messages from protocol
message queue, retransmission queue and next_out queue. But as the two
first queues are removed, we can simplify its relevant code through
deleting tipc_link_push_queue().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TIPC retransmission queue is intended to record which messages
should be retransmitted when bearer is not congested. However,
as the retransmission queue becomes useless with the removal of
bearer congestion mechanism, it should be removed.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TIPC protocol message queue is intended to save one protocol message
when bearer is congested so that the message stored in the queue can
be immediately transmitted when bearer congestion is released. However,
as now the protocol queue has no mission any more with the removal of
bearer congestion mechanism, it should be removed.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The node subscribe infrastructure represents a virtual base class, so
its users, such as struct tipc_port and struct publication, can derive
its implemented functionalities. However, after the removal of struct
tipc_port, struct publication is left as its only single user now. So
defining an abstract infrastructure for one user becomes no longer
reasonable. If corresponding new functions associated with the
infrastructure are moved to name_table.c file, the node subscription
infrastructure can be removed as well.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "init_net" test in function addrconf_exit_net is introduced
in commit 44a6bd29 [Create ipv6 devconf-s for namespaces] to avoid freeing
init_net. In commit c900a800 [ipv6: fix bad free of addrconf_init_net],
function addrconf_init_net will allocate memory for every net regardless of
init_net. In this case, it is unnecessary to make "init_net" test.
CC: Hong Zhiguo <honkiko@gmail.com>
CC: Octavian Purdila <opurdila@ixiacom.com>
CC: Pavel Emelyanov <xemul@openvz.org>
CC: Cong Wang <cwang@twopensource.com>
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change remote checksum offload to call remcsum_adjust. This also
eliminates the optimization to skip an IP header as part of the
adjustment (really does not seem to be much of a win).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FQ/pacing has a clamp of delay of 125 ms, to avoid some possible harm.
It turns out this delay is too small to allow pacing low rates :
Some ISP setup very aggressive policers as low as 16kbit.
Now TCP stack has spurious rtx prevention, it seems safe to increase
this fixed parameter, without adding a qdisc attribute.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Much of the code can be shared by moving it into helper functions
for the CQM event sending.
Also move the code closer together, even in the header file.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pull nfsd bugfixes from Bruce Fields:
"These fix one mishandling of the case when security labels are
configured out, and two races in the 4.1 backchannel code"
* 'for-3.18' of git://linux-nfs.org/~bfields/linux:
nfsd: Fix slot wake up race in the nfsv4.1 callback code
SUNRPC: Fix locking around callback channel reply receive
nfsd: correctly define v4.2 support attributes
Occasionally mountstats reports a negative retransmission rate.
Ensure that two RPCs completing concurrently don't confuse the sums
in the transport's op_metrics array.
Since pNFS filelayout can invoke rpc_count_iostats() on another
transport from xprt_release(), we can't rely on simply holding the
transport_lock in xprt_release(). There's nothing for it but hard
serialization. One spin lock per RPC operation should make this as
painless as it can be.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
After commit ca777eff51 ("tcp: remove dst refcount false sharing for
prequeue mode") we have to relax check against skb dst in
tcp_v[46]_send_reset() if prequeue dropped the dst.
If a socket is provided, a full lookup was done to find this socket,
so the dst test can be skipped.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88191
Reported-by: Jaša Bartelj <jasa.bartelj@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Daniel Borkmann <dborkman@redhat.com>
Fixes: ca777eff51 ("tcp: remove dst refcount false sharing for prequeue mode")
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 5195c14c8b.
If the conntrack clashes with an existing one, it is left out of
the unconfirmed list, thus, crashing when dropping the packet and
releasing the conntrack since golden rule is that conntracks are
always placed in any of the existing lists for traceability reasons.
Reported-by: Daniel Borkmann <dborkman@redhat.com>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=88841
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The UDP checksum calculation for VXLAN tunnels is currently using the
socket addresses instead of the actual packet source and destination
addresses. As a result the checksum calculated is incorrect in some
cases.
Also uh->check was being set twice, first it was set to 0, and then it is
set again in udp6_set_csum. This change removes the redundant assignment
to 0.
Fixes: acbf74a7 ("vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions.")
Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An async error upcall is a hard error, and should be reported in
the system log.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The Linux NFS/RDMA server used to reject NFSv3 WRITE requests when
pad optimization was enabled. That bug was fixed by commit
e560e3b510 ("svcrdma: Add zero padding if the client doesn't send
it").
We can now enable pad optimization on the client, which helps
performance and is supported now by both Linux and Solaris servers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Currently rpcrdma_flush_cqs() attempts to avoid code duplication,
and simply invokes rpcrdma_recvcq_upcall and rpcrdma_sendcq_upcall.
1. rpcrdma_flush_cqs() can run concurrently with provider upcalls.
Both flush_cqs() and the upcalls were invoking ib_poll_cq() in
different threads using the same wc buffers (ep->rep_recv_wcs
and ep->rep_send_wcs), added by commit 1c00dd0776 ("xprtrmda:
Reduce calls to ib_poll_cq() in completion handlers").
During transport disconnect processing, this sometimes resulted
in the same reply getting added to the rpcrdma_tasklets_g list
more than once, which corrupted the list.
2. The upcall functions drain only a limited number of CQEs,
thanks to the poll budget added by commit 8301a2c047
("xprtrdma: Limit work done by completion handler").
Fixes: a7bc211ac9 ("xprtrdma: On disconnect, don't ignore ... ")
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=276
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Restore the separate function that schedules the reply handling
tasklet. I need to call it from two different paths.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
When using RPCRDMA_MTHCAFMR memory registration, after a few
transport disconnect / reconnect cycles, ib_map_phys_fmr() starts to
return EINVAL because the provider has exhausted its map pool.
Make sure that all FMRs are unmapped during transport disconnect,
and that ->send_request remarshals them during an RPC retransmit.
This resets the transport's MRs to ensure that none are leaked
during a disconnect.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Recent work made FRMR registration and invalidation completions
unsignaled. This greatly reduces the adapter interrupt rate.
Every so often, however, a posted send Work Request is allowed to
signal. Otherwise, the provider's Work Queue will wrap and the
workload will hang.
The number of Work Requests that are allowed to remain unsignaled is
determined by the value of req_cqinit. Currently, this is set to the
size of the send Work Queue divided by two, minus 1.
For FRMR, the send Work Queue is the maximum number of concurrent
RPCs (currently 32) times the maximum number of Work Requests an
RPC might use (currently 7, though some adapters may need more).
For mlx4, this is 224 entries. This leaves completion signaling
disabled for 111 send Work Requests.
Some providers hold back dispatching Work Requests until a CQE is
generated. If completions are disabled, then no CQEs are generated
for quite some time, and that can stall the Work Queue.
I've seen this occur running xfstests generic/113 over NFSv4, where
eventually, posting a FAST_REG_MR Work Request fails with -ENOMEM
because the Work Queue has overflowed. The connection is dropped
and re-established.
Cap the rep_cqinit setting so completions are not left turned off
for too long.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=269
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The RPC/RDMA send_request method and the chunk registration code
expects an errno from the registration function. This allows
the upper layers to distinguish between a recoverable failure
(for example, temporary memory exhaustion) and a hard failure
(for example, a bug in the registration logic).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Commit e1bd95bf7c ("crypto: algif - zeroize IV buffer") and
2a6af25bef ("crypto: algif - zeroize message digest buffer")
added memzero_explicit() calls on buffers that are later on
passed back to sock_kfree_s().
This is a discussed follow-up that, instead, extends the sock
API and adds sock_kzfree_s(), which internally uses kzfree()
instead of kfree() for passing the buffers back to slab.
Having sock_kzfree_s() allows to keep the changes more minimal
by just having a drop-in replacement instead of adding
memzero_explicit() calls everywhere before sock_kfree_s().
In kzfree(), the compiler is not allowed to optimize the memset()
away and thus there's no need for memzero_explicit(). Both,
sock_kfree_s() and sock_kzfree_s() are wrappers for
__sock_kfree_s() and call into kfree() resp. kzfree(); here,
__sock_kfree_s() needs to be explicitly inlined as we want the
compiler to optimize the call and condition away and thus it
produces e.g. on x86_64 the _same_ assembler output for
sock_kfree_s() before and after, and thus also allows for
avoiding code duplication.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If there are no channels allowing 80 MHz to be used, then the
station isn't really VHT capable even if the driver and device
support it in general. In this case, exclude the VHT capability
IE from probe request frames.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We have a channel pointer, and we use its center frequency
to look up a channel pointer - which will thus be exactly
the same as the original pointer.
Remove that pointless lookup and just use the pointer.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It's always set to the same value as CONFIG_TRACEPOINTS, so we can just
use that instead.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
It's always set to whatever CONFIG_SUNRPC_DEBUG is, so just use that.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
ping_lookup() may return a wrong sock if sk_buff's and sock's protocols
dont' match. For example, sk_buff's protocol is ETH_P_IPV6, but sock's
sk_family is AF_INET, in that case, if sk->sk_bound_dev_if is zero, a wrong
sock will be returned.
the fix is to "continue" the searching, if no matching, return NULL.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Jane Zhou <a17711@motorola.com>
Signed-off-by: Yiwei Zhao <gbjc64@motorola.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
af_packet produces lots of these:
net/packet/af_packet.c:384:39: warning: incorrect type in return expression (different modifiers)
net/packet/af_packet.c:384:39: expected struct page [pure] *
net/packet/af_packet.c:384:39: got struct page *
this seems to be because sparse does not realize that _pure
refers to function, not the returned pointer.
Tweak code slightly to avoid the warning.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using GRE redirection in WCCP, it sets the wrong skb->protocol,
that is, ETH_P_IP instead of ETH_P_IPV6 for the encapuslated traffic.
Fixes: c12b395a46 ("gre: Support GRE over IPv6")
Cc: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Yuri Chislov <yuri.chislov@gmail.com>
Tested-by: Yuri Chislov <yuri.chislov@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix sparse warnings about non-static declaration of static functions
in the new tipc netlink API.
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
netfilter/ipvs updates for net-next
The following patchset contains Netfilter updates for your net-next
tree, this includes the NAT redirection support for nf_tables, the
cgroup support for nft meta and conntrack zone support for the connlimit
match. Coming after those, a bunch of sparse warning fixes, missing
netns bits and cleanups. More specifically, they are:
1) Prepare IPv4 and IPv6 NAT redirect code to use it from nf_tables,
patches from Arturo Borrero.
2) Introduce the nf_tables redir expression, from Arturo Borrero.
3) Remove an unnecessary assignment in ip_vs_xmit/__ip_vs_get_out_rt().
Patch from Alex Gartrell.
4) Add nft_log_dereference() macro to the nf_log infrastructure, patch
from Marcelo Leitner.
5) Add some extra validation when registering logger families, also
from Marcelo.
6) Some spelling cleanups from stephen hemminger.
7) Fix sparse warning in nf_logger_find_get().
8) Add cgroup support to nf_tables meta, patch from Ana Rey.
9) A Kconfig fix for the new redir expression and fix sparse warnings in
the new redir expression.
10) Fix several sparse warnings in the netfilter tree, from
Florian Westphal.
11) Reduce verbosity when OOM in nfnetlink_log. User can basically do
nothing when this situation occurs.
12) Add conntrack zone support to xt_connlimit, again from Florian.
13) Add netnamespace support to the h323 conntrack helper, contributed
by Vasily Averin.
14) Remove unnecessary nul-pointer checks before free_percpu() and
module_put(), from Markus Elfring.
15) Use pr_fmt in nfnetlink_log, again patch from Marcelo Leitner.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add tracepoints inside the main loop on xs_tcp_data_recv that allow
us to keep an eye on what's happening during each phase of it.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
...so we can keep track of when calls are sent and replies received.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
...just around svc_send, svc_recv and svc_process for now.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The supported bandwidth field is a two-bit field, not a bitmap,
so treat it accordingly when disabling 80+80 or 160 MHz.
Note that we can only advertise "80+80 and 160" or "160", not
"80+80" by itself, so disabling 160 also disables 80+80.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
... and make it handle multi-segment iovecs - deals with that
"fix this later" issue for free. A bit of shame, really - it
had been there since 2.3.15pre3 when the whole thing went into the
tree, practically a historical artefact by now...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now the vti_link_ops do not point the .dellink, for fb tunnel device
(ip_vti0), the net_device will be removed as the default .dellink is
unregister_netdevice_queue,but the tunnel still in the tunnel list,
then if we add a new vti tunnel, in ip_tunnel_find():
hlist_for_each_entry_rcu(t, head, hash_node) {
if (local == t->parms.iph.saddr &&
remote == t->parms.iph.daddr &&
link == t->parms.link &&
==> type == t->dev->type &&
ip_tunnel_key_match(&t->parms, flags, key))
break;
}
the panic will happen, cause dev of ip_tunnel *t is null:
[ 3835.072977] IP: [<ffffffffa04103fd>] ip_tunnel_find+0x9d/0xc0 [ip_tunnel]
[ 3835.073008] PGD b2c21067 PUD b7277067 PMD 0
[ 3835.073008] Oops: 0000 [#1] SMP
.....
[ 3835.073008] Stack:
[ 3835.073008] ffff8800b72d77f0 ffffffffa0411924 ffff8800bb956000 ffff8800b72d78e0
[ 3835.073008] ffff8800b72d78a0 0000000000000000 ffffffffa040d100 ffff8800b72d7858
[ 3835.073008] ffffffffa040b2e3 0000000000000000 0000000000000000 0000000000000000
[ 3835.073008] Call Trace:
[ 3835.073008] [<ffffffffa0411924>] ip_tunnel_newlink+0x64/0x160 [ip_tunnel]
[ 3835.073008] [<ffffffffa040b2e3>] vti_newlink+0x43/0x70 [ip_vti]
[ 3835.073008] [<ffffffff8150d4da>] rtnl_newlink+0x4fa/0x5f0
[ 3835.073008] [<ffffffff812f68bb>] ? nla_strlcpy+0x5b/0x70
[ 3835.073008] [<ffffffff81508fb0>] ? rtnl_link_ops_get+0x40/0x60
[ 3835.073008] [<ffffffff8150d11f>] ? rtnl_newlink+0x13f/0x5f0
[ 3835.073008] [<ffffffff81509cf4>] rtnetlink_rcv_msg+0xa4/0x270
[ 3835.073008] [<ffffffff8126adf5>] ? sock_has_perm+0x75/0x90
[ 3835.073008] [<ffffffff81509c50>] ? rtnetlink_rcv+0x30/0x30
[ 3835.073008] [<ffffffff81529e39>] netlink_rcv_skb+0xa9/0xc0
[ 3835.073008] [<ffffffff81509c48>] rtnetlink_rcv+0x28/0x30
....
modprobe ip_vti
ip link del ip_vti0 type vti
ip link add ip_vti0 type vti
rmmod ip_vti
do that one or more times, kernel will panic.
fix it by assigning ip_tunnel_dellink to vti_link_ops' dellink, in
which we skip the unregister of fb tunnel device. do the same on ip6_vti.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change has no functional impact and simply addresses some coding
style issues detected by checkpatch. Specifically this change
adjusts "if" statements which also include the assignment of a
variable.
No changes to the resultant object files result as determined by objdiff.
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds SKB_GSO_TCPV4 to the list of supported GSO types handled by
the IPv6 GSO offloads. Without this change VXLAN tunnels running over IPv6
do not currently handle IPv4 TCP TSO requests correctly and end up handing
the non-segmented frame off to the device.
Below is the before and after for a simple netperf TCP_STREAM test between
two endpoints tunneling IPv4 over a VXLAN tunnel running on IPv6 on top of
a 1Gb/s network adapter.
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.29 0.88 Before
87380 16384 16384 10.03 895.69 After
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ieee802154/fakehard.c
A bug fix went into 'net' for ieee802154/fakehard.c, which is removed
in 'net-next'.
Add build fix into the merge from Stephen Rothwell in openvswitch, the
logging macros take a new initial 'log' argument, a new call was added
in 'net' so when we merge that in here we have to explicitly add the
new 'log' arg to it else the build fails.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix BUG when decrypting empty packets in mac80211, from Ronald Wahl.
2) nf_nat_range is not fully initialized and this is copied back to
userspace, from Daniel Borkmann.
3) Fix read past end of b uffer in netfilter ipset, also from Dan
Carpenter.
4) Signed integer overflow in ipv4 address mask creation helper
inet_make_mask(), from Vincent BENAYOUN.
5) VXLAN, be2net, mlx4_en, and qlcnic need ->ndo_gso_check() methods to
properly describe the device's capabilities, from Joe Stringer.
6) Fix memory leaks and checksum miscalculations in openvswitch, from
Pravin B SHelar and Jesse Gross.
7) FIB rules passes back ambiguous error code for unreachable routes,
making behavior confusing for userspace. Fix from Panu Matilainen.
8) ieee802154fake_probe() doesn't release resources properly on error,
from Alexey Khoroshilov.
9) Fix skb_over_panic in add_grhead(), from Daniel Borkmann.
10) Fix access of stale slave pointers in bonding code, from Nikolay
Aleksandrov.
11) Fix stack info leak in PPP pptp code, from Mathias Krause.
12) Cure locking bug in IPX stack, from Jiri Bohac.
13) Revert SKB fclone memory freeing optimization that is racey and can
allow accesses to freed up memory, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (71 commits)
tcp: Restore RFC5961-compliant behavior for SYN packets
net: Revert "net: avoid one atomic operation in skb_clone()"
virtio-net: validate features during probe
cxgb4 : Fix DCB priority groups being returned in wrong order
ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg
openvswitch: Don't validate IPv6 label masks.
pptp: fix stack info leak in pptp_getname()
brcmfmac: don't include linux/unaligned/access_ok.h
cxgb4i : Don't block unload/cxgb4 unload when remote closes TCP connection
ipv6: delete protocol and unregister rtnetlink when cleanup
net/mlx4_en: Add VXLAN ndo calls to the PF net device ops too
bonding: fix curr_active_slave/carrier with loadbalance arp monitoring
mac80211: minstrel_ht: fix a crash in rate sorting
vxlan: Inline vxlan_gso_check().
can: m_can: update to support CAN FD features
can: m_can: fix incorrect error messages
can: m_can: add missing delay after setting CCCR_INIT bit
can: m_can: fix not set can_dlc for remote frame
can: m_can: fix possible sleep in napi poll
can: m_can: add missing message RAM initialization
...
John W. Linville says:
====================
pull request: wireless-next 2014-11-21
Please pull this batch of updates intended for the 3.19 stream...
For the mac80211 bits, Johannes says:
"It has been a while since my last pull request, so we accumulated
another relatively large set of changes:
* TDLS off-channel support set from Arik/Liad, with some support
patches I did
* custom regulatory fixes from Arik
* minstrel VHT fix (and a small optimisation) from Felix
* add back radiotap vendor namespace support (myself)
* random MAC address scanning for cfg80211/mac80211/hwsim (myself)
* CSA improvements (Luca)
* WoWLAN Net Detect (wake on network found) support (Luca)
* and lots of other smaller changes from many people"
For the Bluetooth bits, Johan says:
"Here's another set of patches for 3.19. Most of it is again fixes and
cleanups to ieee802154 related code from Alexander Aring. We've also got
better handling of hardware error events along with a proper API for HCI
drivers to notify the HCI core of such situations. There's also a minor
fix for mgmt events as well as a sparse warning fix. The code for
sending HCI commands synchronously also gets a fix where we might loose
the completion event in the case of very fast HW (particularly easily
reproducible with an emulated HCI device)."
And...
"Here's another bluetooth-next pull request for 3.19. We've got:
- Various fixes, cleanups and improvements to ieee802154/mac802154
- Support for a Broadcom BCM20702A1 variant
- Lots of lockdep fixes
- Fixed handling of LE CoC errors that should trigger SMP"
For the Atheros bits, Kalle says:
"One ath6kl patch and rest for ath10k, but nothing really major which
stands out. Most notable:
o fix resume (Bartosz)
o firmware restart is now faster and more reliable (Michal)
o it's now possible to test hardware restart functionality without
crashing the firmware using hw-restart parameter with
simulate_fw_crash debugfs file (Michal)"
On top of that...both ath9k and mwifiex get their usual level of
updates. Of note is the ath9k spectral scan work from Oleksij Rempel.
I also pulled from the wireless tree in order to avoid some merge issues.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit a6111d3c "vlan: Pass SIOC[SG]HWTSTAMP ioctls to real device"
intended to enable hardware time stamping on VLAN interfaces, but
passing SIOCSHWTSTAMP is only half of the story. This patch adds
the second half, by letting user space find out the time stamping
capabilities of the device backing a VLAN interface.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit c3ae62af8e ("tcp: should drop incoming frames without ACK
flag set") was created to mitigate a security vulnerability in which a
local attacker is able to inject data into locally-opened sockets by
using TCP protocol statistics in procfs to quickly find the correct
sequence number.
This broke the RFC5961 requirement to send a challenge ACK in response
to spurious RST packets, which was subsequently fixed by commit
7b514a886b ("tcp: accept RST without ACK flag").
Unfortunately, the RFC5961 requirement that spurious SYN packets be
handled in a similar manner remains broken.
RFC5961 section 4 states that:
... the handling of the SYN in the synchronized state SHOULD be
performed as follows:
1) If the SYN bit is set, irrespective of the sequence number, TCP
MUST send an ACK (also referred to as challenge ACK) to the remote
peer:
<SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
After sending the acknowledgment, TCP MUST drop the unacceptable
segment and stop processing further.
By sending an ACK, the remote peer is challenged to confirm the loss
of the previous connection and the request to start a new connection.
A legitimate peer, after restart, would not have a TCB in the
synchronized state. Thus, when the ACK arrives, the peer should send
a RST segment back with the sequence number derived from the ACK
field that caused the RST.
This RST will confirm that the remote peer has indeed closed the
previous connection. Upon receipt of a valid RST, the local TCP
endpoint MUST terminate its connection. The local TCP endpoint
should then rely on SYN retransmission from the remote end to
re-establish the connection.
This patch lets SYN packets through the discard added in c3ae62af8e,
so that spurious SYN packets are properly dealt with as per the RFC.
The challenge ACK is sent unconditionally and is rate-limited, so the
original vulnerability is not reintroduced by this patch.
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not sure what I was thinking, but doing anything after
releasing a refcount is suicidal or/and embarrassing.
By the time we set skb->fclone to SKB_FCLONE_FREE, another cpu
could have released last reference and freed whole skb.
We potentially corrupt memory or trap if CONFIG_DEBUG_PAGEALLOC is set.
Reported-by: Chris Mason <clm@fb.com>
Fixes: ce1a4ea3f1 ("net: avoid one atomic operation in skb_clone()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_NAME_TABLE_GET command to the new tipc netlink API.
This command supports dumping the name table of all nodes.
Netlink logical layout of name table response message:
-> name table
-> publication
-> type
-> lower
-> upper
-> scope
-> node
-> ref
-> key
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_NET_SET command to the new tipc netlink API.
This command can set the network id and network (tipc) address.
Netlink logical layout of network set message:
-> net
[ -> id ]
[ -> address ]
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_NET_GET command to the new tipc netlink API.
This command dumps the network id of the node.
Netlink logical layout of returned network data:
-> net
-> id
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_NODE_GET to the new tipc netlink API.
This command can dump the address and node status of all nodes in the
tipc cluster.
Netlink logical layout of returned node/address data:
-> node
-> address
-> up flag
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_MEDIA_SET command to the new tipc netlink API.
This command can set one or more link properties for a particular
media.
Netlink logical layout of bearer set message:
-> media
-> name
-> link properties
[ -> tolerance ]
[ -> priority ]
[ -> window ]
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_MEDIA_GET command to the new tipc netlink API.
This command supports dumping all information about all defined
media as well as getting all information about a specific media.
The information about a media includes name and link properties.
Netlink logical layout of media get response message:
-> media
-> name
-> link properties
-> tolerance
-> priority
-> window
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_LINK_RESET_STATS command to the new netlink API.
This command resets the link statistics for a particular link.
Netlink logical layout of link reset message:
-> link
-> name
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_LINK_SET to the new tipc netlink API.
This command can set one or more link properties for a particular
link.
Netlink logical layout of link set message:
-> link
-> name
-> properties
[ -> tolerance ]
[ -> priority ]
[ -> window ]
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_LINK_GET command to the new tipc netlink API.
This command supports dumping all information about all links
(including the broadcast link) or getting all information about a
specific link (not the broadcast link).
The information about a link includes name, transmission info,
properties and link statistics.
As the tipc broadcast link is special we unfortunately have to treat
it specially. It is a deliberate decision not to abstract the
broadcast link on this (API) level.
Netlink logical layout of link response message:
-> port
-> name
-> MTU
-> RX
-> TX
-> up flag
-> active flag
-> properties
-> priority
-> tolerance
-> window
-> statistics
-> rx_info
-> rx_fragments
-> rx_fragmented
-> rx_bundles
-> rx_bundled
-> tx_info
-> tx_fragments
-> tx_fragmented
-> tx_bundles
-> tx_bundled
-> msg_prof_tot
-> msg_len_cnt
-> msg_len_tot
-> msg_len_p0
-> msg_len_p1
-> msg_len_p2
-> msg_len_p3
-> msg_len_p4
-> msg_len_p5
-> msg_len_p6
-> rx_states
-> rx_probes
-> rx_nacks
-> rx_deferred
-> tx_states
-> tx_probes
-> tx_nacks
-> tx_acks
-> retransmitted
-> duplicates
-> link_congs
-> max_queue
-> avg_queue
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_PUBL_GET command to the new tipc netlink API.
This command supports dumping of all publications for a specific
socket.
Netlink logical layout of request message:
-> socket
-> reference
Netlink logical layout of response message:
-> publication
-> type
-> lower
-> upper
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_SOCK_GET command to the new tipc netlink API.
This command supports dumping of all available sockets with their
associated connection or publication(s). It could be extended to reply
with a single socket if the NLM_F_DUMP isn't set.
The information about a socket includes reference, address, connection
information / publication information.
Netlink logical layout of response message:
-> socket
-> reference
-> address
[
-> connection
-> node
-> socket
[
-> connected flag
-> type
-> instance
]
]
[
-> publication flag
]
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_BEARER_SET command to the new tipc netlink API.
This command can set one or more link properties for a particular
bearer.
Netlink logical layout of bearer set message:
-> bearer
-> name
-> link properties
[ -> tolerance ]
[ -> priority ]
[ -> window ]
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_BEARER_GET command to the new tipc netlink API.
This command supports dumping all data about all bearers or getting
all information about a specific bearer.
The information about a bearer includes name, link priorities and
domain.
Netlink logical layout of bearer get message:
-> bearer
-> name
Netlink logical layout of returned bearer information:
-> bearer
-> name
-> link properties
-> priority
-> tolerance
-> window
-> domain
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A new netlink API for tipc that can disable or enable a tipc bearer.
The new API is separated from the old API because of a bug in the
user space client (tipc-config). The problem is that older versions
of tipc-config has a very low receive limit and adding commands to
the legacy genl_opts struct causes the ctrl_getfamily() response
message to grow, subsequently breaking the tool.
The new API utilizes netlink policies for input validation. Where the
top-level netlink attributes are tipc-logical entities, like bearer.
The top level entities then contain nested attributes. In this case
a name, nested link properties and a domain.
Netlink commands implemented in this patch:
TIPC_NL_BEARER_ENABLE
TIPC_NL_BEARER_DISABLE
Netlink logical layout of bearer enable message:
-> bearer
-> name
[ -> domain ]
[
-> properties
-> priority
]
Netlink logical layout of bearer disable message:
-> bearer
-> name
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's just silly to hold the skb destructor argument around inside
skb->cb[] as we currently do in SCTP.
Nowadays, we're sort of cheating on data accounting in the sense
that due to commit 4c3a5bdae2 ("sctp: Don't charge for data in
sndbuf again when transmitting packet"), we orphan the skb already
in the SCTP output path, i.e. giving back charged data memory, and
use a different destructor only to make sure the sk doesn't vanish
on skb destruction time. Thus, cb[] is still valid here as we
operate within the SCTP layer. (It's generally actually a big
candidate for future rework, imho.)
However, storing the destructor in the cb[] can easily cause issues
should an non sctp_packet_set_owner_w()'ed skb ever escape the SCTP
layer, since cb[] may get overwritten by lower layers and thus can
corrupt the chunk pointer. There are no such issues at present,
but lets keep the chunk in destructor_arg, as this is the actual
purpose for it.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sending packets out with PF_PACKET, SOCK_RAW, ensure that the
packet is at least as long as the device's expected link layer header.
This check already exists in tpacket_snd, but not in packet_snd.
Also rate limit the warning in tpacket_snd.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This tc action allows to work with vlan tagged skbs. Two supported
sub-actions are header pop and header push.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So it can be used from out of openvswitch code.
Did couple of cosmetic changes on the way, namely variable naming and
adding support for 8021AD proto.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
note that skb_make_writable already exists in net/netfilter/core.c
but does something slightly different.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use them to push skb->vlan_tci into the payload and avoid code
duplication.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Name fits better. Plus there's going to be introduced
__vlan_insert_tag later on.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Always returns the same skb it gets, so change to void.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace duplicated code by calling skb_postpull_rcsum
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains two bugfixes for your net tree, they are:
1) Validate netlink group from nfnetlink to avoid an out of bound array
access. This should only happen with superuser priviledges though.
Discovered by Andrey Ryabinin using trinity.
2) Don't push ethernet header before calling the netfilter output hook
for multicast traffic, this breaks ebtables since it expects to see
skb->data pointing to the network header, patch from Linus Luessing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless 2014-11-20
Please full this little batch of fixes intended for the 3.18 stream!
For the mac80211 patch, Johannes says:
"Here's another last minute fix, for minstrel HT crashing
depending on the value of some uninitialised stack."
On top of that...
Ben Greear fixes an ath9k regression in which a BSSID mask is
miscalculated.
Dmitry Torokhov corrects an error handling routing in brcmfmac which
was checking an unsigned variable for a negative value.
Johannes Berg avoids a build problem in brcmfmac for arches where
linux/unaligned/access_ok.h and asm/unaligned.h conflict.
Mathy Vanhoef addresses another brcmfmac issue so as to eliminate a
use-after-free of the URB transfer buffer if a timeout occurs.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes an old regression introduced by commit
b0d0d915 (ipx: remove the BKL).
When a recvmsg syscall blocks waiting for new data, no data can be sent on the
same socket with sendmsg because ipx_recvmsg() sleeps with the socket locked.
This breaks mars-nwe (NetWare emulator):
- the ncpserv process reads the request using recvmsg
- ncpserv forks and spawns nwconn
- ncpserv calls a (blocking) recvmsg and waits for new requests
- nwconn deadlocks in sendmsg on the same socket
Commit b0d0d915 has simply replaced BKL locking with
lock_sock/release_sock. Unlike now, BKL got unlocked while
sleeping, so a blocking recvmsg did not block a concurrent
sendmsg.
Only keep the socket locked while actually working with the socket data and
release it prior to calling skb_recv_datagram().
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When userspace doesn't provide a mask, OVS datapath generates a fully
unwildcarded mask for the flow by copying the flow and setting all bits
in all fields. For IPv6 label, this creates a mask that matches on the
upper 12 bits, causing the following error:
openvswitch: netlink: Invalid IPv6 flow label value (value=ffffffff, max=fffff)
This patch ignores the label validation check for masks, avoiding this
error.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
another relatively large set of changes:
* TDLS off-channel support set from Arik/Liad, with some support
patches I did
* custom regulatory fixes from Arik
* minstrel VHT fix (and a small optimisation) from Felix
* add back radiotap vendor namespace support (myself)
* random MAC address scanning for cfg80211/mac80211/hwsim (myself)
* CSA improvements (Luca)
* WoWLAN Net Detect (wake on network found) support (Luca)
* and lots of other smaller changes from many people
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJUbghkAAoJEDBSmw7B7bqroosP/RABYXUMua+k3Ccq7T4eU4jV
AEO2p2gt5nHBzEl1NCJtdUTJkZ6ftT7ehvAkV2zboB0FBoAoBTbZ8YDtcBBiWaY6
wJ5TYuOl6LFo7csAxWxpCzUxW3M+iq26itpyW9Zt9WWxP4QLSNPFyXEXV3SEh45n
HcDW9A0SE+mgdaTQ2LEMBJ5XWxG/Ic7i9Xn6Py3o4x7NsTB4EqFNOD0WXcPCq7M0
H8xlsIYIBYoGNMsV/2Nu7CEgcSXfDLqWcs9uPHQMCvWPjx/vIoEyOgTwJlE9bQHh
2tloc1LBP6XKQ6g2bJ/pBaQVnZGugcOJhD6KUq3ckNm9qIP1ZtRmJslH4V6pUSCS
eGl3TcAKSSE4BWIa7AaETWXeoH4X68dO7PM7pnflQRCQLzCJRbDWCdqjBst/AxBT
6hvAFAvExEcWBkNVSTJ2egRk/C9cDFKRaCWQ1h4wX9yvh+8efe1D0DDWLW9a9qv1
LsoGJE72BZdXn2CaQEME+CjTd3fWmn6u729d/c863cq2kspCSOof0QD0X9uWhBUx
BvqtgbQjGZzAvHFcjBd7yRd5hz0aDfLyBL59bq2IBzaU1QmyekNPqzSMSD+5ZlCp
uxEeE5AY2+pcNZV1KRtkvgAByfUgAVd0FHZcVb8SIM6QZ3IhqiOuzxuXtxv6hrYP
V/76B+ath4Sv1IPF56ex
=q4e6
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-john-2014-11-20' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg <johannes@sipsolutions.net> says:
"It has been a while since my last pull request, so we accumulated
another relatively large set of changes:
* TDLS off-channel support set from Arik/Liad, with some support
patches I did
* custom regulatory fixes from Arik
* minstrel VHT fix (and a small optimisation) from Felix
* add back radiotap vendor namespace support (myself)
* random MAC address scanning for cfg80211/mac80211/hwsim (myself)
* CSA improvements (Luca)
* WoWLAN Net Detect (wake on network found) support (Luca)
* and lots of other smaller changes from many people"
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The functions free_percpu() and module_put() test whether their argument
is NULL and then return immediately. Thus the test around the call is
not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently we can't lookup tunnels with wildcard endpoints.
This patch adds a method to lookup these tunnels in the
receive path.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
pim6_protocol was added when initiation, but it not deleted.
Similarly, unregister RTNL_FAMILY_IP6MR rtnetlink.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
use {compat_,}rw_copy_check_uvector(). As the result, we are
guaranteed that all iovecs seen in ->msg_iov by ->sendmsg()
and ->recvmsg() will pass access_ok().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Kernel-side struct msghdr is (currently) using the same layout as
userland one, but it's not a one-to-one copy - even without considering
32bit compat issues, we have msg_iov, msg_name and msg_control copied
to kernel[1]. It's fairly localized, so we get away with a few functions
where that knowledge is needed (and we could shrink that set even
more). Pretty much everything deals with the kernel-side variant and
the few places that want userland one just use a bunch of force-casts
to paper over the differences.
The thing is, kernel-side definition of struct msghdr is *not* exposed
in include/uapi - libc doesn't see it, etc. So we can add struct user_msghdr,
with proper annotations and let the few places that ever deal with those
beasts use it for userland pointers. Saner typechecking aside, that will
allow to change the layout of kernel-side msghdr - e.g. replace
msg_iov/msg_iovlen there with struct iov_iter, getting rid of the need
to modify the iovec as we copy data to/from it, etc.
We could introduce kernel_msghdr instead, but that would create much more
noise - the absolute majority of the instances would need to have the
type switched to kernel_msghdr and definition of struct msghdr in
include/linux/socket.h is not going to be seen by userland anyway.
This commit just introduces user_msghdr and switches the few places that
are dealing with userland-side msghdr to it.
[1] actually, it's even trickier than that - we copy msg_control for
sendmsg, but keep the userland address on recvmsg.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
depending on the value of some uninitialised stack.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJUa70EAAoJEDBSmw7B7bqr40gQAMgNUKILuYTyHZaKR7teV8aG
uVFzrMdmNjCZIsxAuH1X8VNOJZEAB+Ro+lTybNjC/tTPyA+Yu3lDCQjSYiE8UOFk
mDrNFG0tjZHGn3axceLIhujfPqXEks/wQ98tDEJU2eMLxS0nClwyTXjhc1IxzUBO
/L2HlhGWADRZ6FIV7yhRO33KN1l3Q8oxW79ybJq/5JCcclfGBt3OpYvi7AW6M4iK
qfgTfS6GbpcRRc4+HVA/cgYOFaRmMI71bvqMgvh8MgmqfQeRw+WSOaUiLx795CsR
OKrIQgdpVp2JHwHkFXIgBy+EBBbXtCozvxKX3u+YKPabj4GC/6dkT370nuvv9hkz
IzTxt+yNBTqJawum7vM6WypgT4wCZOR0JbY0vDW5jYLJnDfGJkYFbny7WtRtCVf7
xWzIixkyWLNFs6swbRdIKoU5GNw2fNe7az4fRoUqNtZGSPHiK7oD5sEkr81hWUV5
5XsifngqekfdVvIGTckpy2SAikxgkrtHOQFIGJz+PN1CjvDT++66Opr3AHnwWeSP
IAjmoVOCiyIBzXlcsZJmAfPa4Hvk+NAPRKCx6N25q/+ESpOZRARWFRV3VuPeGzQW
+SiC4V9VYWHvW9VPJ37WkAFYv5tPaLLyURdU7p6MSwuc0RCPgtcvXty2QFbjQLdf
k0apC9DZydQPF2kre6Pv
=L2Jb
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-john-2014-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg <johannes@sipsolutions.net> says:
"Here's another last minute fix, for minstrel HT crashing
depending on the value of some uninitialised stack."
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The __module_get() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The proc_remove() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
While working on sk_forward_alloc problems reported by Denys
Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
sk_forward_alloc is negative while connect is in progress.
We can fix this by calling regular sk_stream_alloc_skb() both for the
SYN packet (in tcp_connect()) and the syn_data packet in
tcp_send_syn_data()
Then, tcp_send_syn_data() can avoid copying syn_data as we simply
can manipulate syn_data->cb[] to remove SYN flag (and increment seq)
Instead of open coding memcpy_fromiovecend(), simply use this helper.
This leaves in socket write queue clean fast clone skbs.
This was tested against our fastopen packetdrill tests.
Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check the queue mapping earlier, skb->queue_mapping is more likely than
skb->data to still be in d-cache.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This allows drivers with a firmware or chip-based rate lookup table to
use the most recent default rate selection without having to get it from
per-packet data or explicit ieee80211_get_tx_rate calls
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Let the other listeners being notified when a new or del interface
command has been issued, thus reducing later necessary request to be in
sync with current context.
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Replace NL80211_ATTR_IFACE_SOCKET_OWNER attribute with more generic
NL80211_ATTR_SOCKET_OWNER that can be used with other commands
that interface creation.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Callback add_virtual_intf is supposed to return ERR_PTR and trying to
return NULL results in some "Unable to handle kernel paging request",
etc. As it may be complicated to debug & trace, let's catch it (WARN).
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Explicitly initialize the DFS state and beacon found state when handling
channels in the custom regulatory path.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Acked-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some channels fields were not being updated in the custom regulatory
path. Update them according to the code in handle_channel().
Signed-off-by: Jonathan Doron <jonathanx.doron@intel.com>
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Acked-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The rate mask code currently assumes that a rate is legacy if
IEEE80211_TX_RC_MCS is not set. This might be the cause of bogus VHT
rates being reported with minstrel_ht.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When 20MHz chandef is used, 40MHz rates shouldn't be
used (by the rate-control algorithm), even if the sta
ht capabilities indicate support for it.
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Singed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Allow drivers to support NL80211_SCAN_FLAG_RANDOM_ADDR with software
based scanning and generate a random MAC address for them for every
scan request with the flag.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In order to use the scan and scheduled scan request pointers during
RX to check for randomisation, make them accessible using RCU.
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add the necessary feature flags and a scan flag to support using
random MAC addresses for scan while unassociated.
The configuration for this supports an arbitrary MAC address
value and mask, so that any kind of configuration (e.g. fixed
OUI or full 46-bit random) can be requested. Full 46-bit random
is the default when no other configuration is passed.
Also add a small helper function to use the addr/mask correctly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
local->scan_req was tested in the previous line, so it
can't be NULL.
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add a new WoWLAN API to enable net-detect as a wake up trigger.
Net-detect allows the device to scan in the background while the
host is asleep to wake up the host system when a matching network
is found.
Reuse the scheduled scan attributes to specify how the scan is
performed while suspended and the matches that will trigger a
wake event.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For net detect, we will need to reuse most of the scheduled scan
parsing function, but not all, so split out the attributes parsing
part out of the main start sched_scan function.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In TDLS (e.g., TDLS off-channel) there is a requirement for
some drivers to supply an unused TID between the AP and the
device to the FW, to allow sending PTI requests and to allow
the FW to aggregate on a specific TID for better throughput.
To ensure that the allocated TID is indeed unused, this patch
introduces an API for blocking the driver from TXing on that
TID.
Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If the HW supports IEEE80211_HW_QUEUE_CONTROL, allow
flushing only specific queues rather than all of them.
Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When receiving a TDLS channel switch request or response, parse the frame
and call a new tdls_recv_channel_switch op in the low level driver with
the parsed data.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Implement the cfg80211 TDLS channel switch ops and introduce new mac80211
ones for low-level drivers.
Verify low-level driver support for the new ops when using the relevant
wiphy feature bit. Also verify the peer supports channel switching before
passing the command down.
Add a new STA flag to track the off-channel state with the TDLS peer and
make sure to cancel the channel-switch if the peer STA is unexpectedly
removed.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
These are used in TDLS channel switching code.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Introduce commands to initiate and cancel TDLS channel-switching. Once
TDLS channel-switching is started, the lower level driver is responsible
for continually initiating channel-switch operations and returning to
the base (AP) channel to listen for beacons from time to time.
Upon cancellation of the channel-switch all communication between the
relevant TDLS peers will continue on the base channel.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Split the data-generating from the Tx-sending functionality, as we do
not want to send templates to the lower driver. Also add an optional
chandef argument to the data-generating portion. It will be used for
channel-switch templates.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The AP or peer can prohibit TDLS channel switch via a bit in the
extended capabilities IE. Parse the IE and track this bit. Set an
appropriate STA flag if both the AP and peer STA support TDLS
channel-switching.
Add the new STA flag and the missing TDLS_INITIATOR to debugfs.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Define some related TDLS protocol constants and advertise channel switch
support in the extended-capabilities IE when the feature bit is defined.
Actually supporting TDLS channel-switching also requires support for
some new nl80211 commands, to be introduced by future patches.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add the BSS coex IE in case we support HT40 channels, as mandated by
section 8.5.13 in IEEE802.11 2012.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This information element is mandatory in case TDLS channel-switching is to
be supported. The channels given are ones supported and allowed to be
active in the current regulatory setting.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For some TDLS channel switch implementations data frames need to be
sent by the firmware based on a template. This template should be
created by mac80211, and thus needs to properly be built from an
802.3 frame into an 802.11 frame. In addition, the device will need
the key information so the select_key handler needs to be run.
However, the driver/device will be responsible for all of the crypto
encapsulation, as the sequence numbers etc. cannot be built by the
host anyway in this case since it's a template to be used multiple
times.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Factor out the 802.11 header building code from the xmit function
to be able to use it separately in a later commit.
While at it, fix up some documentation.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Instead of passing the band as a parameter to ieee80211_xmit()
and ieee80211_tx(), move it outside of the two functions while
making sure info->band is set up before calling them.
This removes the parameter and simplifies the follow commit.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since the TDLS peer station might not receive the teardown
packet (e.g., when in PS), this makes sure the packet is
retransmitted - this time through the AP - if the TDLS peer
didn't ACK the packet.
Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Allows setting of an skb's flags - if needed - when calling
ieee80211_subif_start_xmit().
Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Both xprt_lookup_rqst() and xprt_complete_rqst() require that you
take the transport lock in order to avoid races with xprt_transmit().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This patch converts the hdev->link_keys list to be protected through
RCU, thereby eliminating the need to hold the hdev lock while accessing
the list.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When a connection is requested the conn->pending_sec_level value gets
set to whatever level the user requested the connection to be. During
the pairing process there are various sanity checks to try to ensure
that the right length PIN or right IO Capability is used to satisfy the
target security level. However, when we finally get hold of the link key
that is to be used we should still set the actual final security level
from the key type.
This way when we eventually get an Encrypt Change event the correct
value gets copied to conn->sec_level.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In __hci_cmd_sync_ev() and __hci_req_sync() if the hci_req_run() call
fails and we return from the functions we should ensure that the state
doesn't remain in TASK_INTERRUPTIBLE that we just set it to. This patch
fixes missing calls to set_current_state(TASK_RUNNING) in both places.
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Tested-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The commit 5935839ad7
"mac80211: improve minstrel_ht rate sorting by throughput & probability"
introduced a crash on rate sorting that occurs when the rate added to
the sorting array is faster than all the previous rates. Due to an
off-by-one error, it reads the rate index from tp_list[-1], which
contains uninitialized stack garbage, and then uses the resulting index
for accessing the group rate stats, leading to a crash if the garbage
value is big enough.
Cc: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If icmp_rcv() has successfully processed the incoming ICMP datagram, we
should use consume_skb() rather than kfree_skb() because a hit on the likes
of perf -e skb:kfree_skb is not called-for.
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg <johan.hedberg@gmail.com> says:
"Here's another bluetooth-next pull request for 3.19. We've got:
- Various fixes, cleanups and improvements to ieee802154/mac802154
- Support for a Broadcom BCM20702A1 variant
- Lots of lockdep fixes
- Fixed handling of LE CoC errors that should trigger SMP"
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This brings in some mwifiex changes that further patches will
need to work on top to not cause merge conflicts.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Doing things like hci_conn_hash_flush() while holding the hdev lock is
risky since its synchronous pending work cancellation could cause the
L2CAP layer to try to reacquire the hdev lock. Right now there doesn't
seem to be any obvious places where this would for certain happen but
it's already enough to cause lockdep to start warning against the hdev
and the work struct locks being taken in the "wrong" order:
[ +0.000373] mgmt-tester/1603 is trying to acquire lock:
[ +0.000292] ((&conn->pending_rx_work)){+.+.+.}, at: [<c104266d>] flush_work+0x0/0x181
[ +0.000270]
but task is already holding lock:
[ +0.000000] (&hdev->lock){+.+.+.}, at: [<c13b9a80>] hci_dev_do_close+0x166/0x359
[ +0.000000]
which lock already depends on the new lock.
[ +0.000000]
the existing dependency chain (in reverse order) is:
[ +0.000000]
-> #1 (&hdev->lock){+.+.+.}:
[ +0.000000] [<c105ea8f>] lock_acquire+0xe3/0x156
[ +0.000000] [<c140c663>] mutex_lock_nested+0x54/0x375
[ +0.000000] [<c13d644b>] l2cap_recv_frame+0x293/0x1a9c
[ +0.000000] [<c13d7ca4>] process_pending_rx+0x50/0x5e
[ +0.000000] [<c1041a3f>] process_one_work+0x21c/0x436
[ +0.000000] [<c1041e3d>] worker_thread+0x1be/0x251
[ +0.000000] [<c1045a22>] kthread+0x94/0x99
[ +0.000000] [<c140f801>] ret_from_kernel_thread+0x21/0x30
[ +0.000000]
-> #0 ((&conn->pending_rx_work)){+.+.+.}:
[ +0.000000] [<c105e158>] __lock_acquire+0xa07/0xc89
[ +0.000000] [<c105ea8f>] lock_acquire+0xe3/0x156
[ +0.000000] [<c1042696>] flush_work+0x29/0x181
[ +0.000000] [<c1042864>] __cancel_work_timer+0x76/0x8f
[ +0.000000] [<c104288c>] cancel_work_sync+0xf/0x11
[ +0.000000] [<c13d4c18>] l2cap_conn_del+0x72/0x183
[ +0.000000] [<c13d8953>] l2cap_disconn_cfm+0x49/0x55
[ +0.000000] [<c13be37a>] hci_conn_hash_flush+0x7a/0xc3
[ +0.000000] [<c13b9af6>] hci_dev_do_close+0x1dc/0x359
[ +0.012038] [<c13bbe38>] hci_unregister_dev+0x6e/0x1a3
[ +0.000000] [<c12d33c1>] vhci_release+0x28/0x47
[ +0.000000] [<c10dd6a9>] __fput+0xd6/0x154
[ +0.000000] [<c10dd757>] ____fput+0xd/0xf
[ +0.000000] [<c1044bb2>] task_work_run+0x6b/0x8d
[ +0.000000] [<c1001bd2>] do_notify_resume+0x3c/0x3f
[ +0.000000] [<c140fa70>] work_notifysig+0x29/0x31
[ +0.000000]
other info that might help us debug this:
[ +0.000000] Possible unsafe locking scenario:
[ +0.000000] CPU0 CPU1
[ +0.000000] ---- ----
[ +0.000000] lock(&hdev->lock);
[ +0.000000] lock((&conn->pending_rx_work));
[ +0.000000] lock(&hdev->lock);
[ +0.000000] lock((&conn->pending_rx_work));
[ +0.000000]
*** DEADLOCK ***
Fully fixing this would require some quite heavy refactoring to change
how the hdev lock and hci_conn instances are handled together. A simpler
solution for now which this patch takes is to try ensure that the hdev
workqueue is empty before proceeding with the various cleanup calls,
including hci_conn_hash_flush().
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The common short form of "randomizer" is "rand" in many places
(including the Bluetooth specification). The shorter version also makes
for easier to read code with less forced line breaks. This patch renames
all occurences of "randomizer" to "rand" in the Bluetooth subsystem
code.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
For now the mgmt commands dealing with remote OOB data are strictly
BR/EDR-only. This patch fixes missing checks for the passed address type
so that any non-BR/EDR value triggers the appropriate error response.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Currently all the conntrack lookups are done using default zone.
In case the skb has a ct attached (e.g. template) we should use this zone
for lookups instead. This makes connlimit work with connections assigned
to other zones.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ebtables on the OUTPUT chain (NF_BR_LOCAL_OUT) would not work as expected
for both locally generated IGMP and MLD queries. The IP header specific
filter options are off by 14 Bytes for netfilter (actual output on
interfaces is fine).
NF_HOOK() expects the skb->data to point to the IP header, not the
ethernet one (while dev_queue_xmit() does not). Luckily there is an
br_dev_queue_push_xmit() helper function already - let's just use that.
Introduced by eb1d164143
("bridge: Add core IGMP snooping support")
Ebtables example:
$ ebtables -I OUTPUT -p IPv6 -o eth1 --logical-out br0 \
--log --log-level 6 --log-ip6 --log-prefix="~EBT: " -j DROP
before (broken):
~EBT: IN= OUT=eth1 MAC source = 02:04:64:a4:39:c2 \
MAC dest = 33:33:00:00:00:01 proto = 0x86dd IPv6 \
SRC=64a4:39c2:86dd:6000:0000:0020:0001:fe80 IPv6 \
DST=0000:0000:0000:0004:64ff:fea4:39c2:ff02, \
IPv6 priority=0x3, Next Header=2
after (working):
~EBT: IN= OUT=eth1 MAC source = 02:04:64:a4:39:c2 \
MAC dest = 33:33:00:00:00:01 proto = 0x86dd IPv6 \
SRC=fe80:0000:0000:0000:0004:64ff:fea4:39c2 IPv6 \
DST=ff02:0000:0000:0000:0000:0000:0000:0001, \
IPv6 priority=0x0, Next Header=0
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Make sure the netlink group exists, otherwise you can trigger an out
of bound array memory access from the netlink_bind() path. This splat
can only be triggered only by superuser.
[ 180.203600] UBSan: Undefined behaviour in ../net/netfilter/nfnetlink.c:467:28
[ 180.204249] index 9 is out of range for type 'int [9]'
[ 180.204697] CPU: 0 PID: 1771 Comm: trinity-main Not tainted 3.18.0-rc4-mm1+ #122
[ 180.205365] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
+04/01/2014
[ 180.206498] 0000000000000018 0000000000000000 0000000000000009 ffff88007bdf7da8
[ 180.207220] ffffffff82b0ef5f 0000000000000092 ffffffff845ae2e0 ffff88007bdf7db8
[ 180.207887] ffffffff8199e489 ffff88007bdf7e18 ffffffff8199ea22 0000003900000000
[ 180.208639] Call Trace:
[ 180.208857] dump_stack (lib/dump_stack.c:52)
[ 180.209370] ubsan_epilogue (lib/ubsan.c:174)
[ 180.209849] __ubsan_handle_out_of_bounds (lib/ubsan.c:400)
[ 180.210512] nfnetlink_bind (net/netfilter/nfnetlink.c:467)
[ 180.210986] netlink_bind (net/netlink/af_netlink.c:1483)
[ 180.211495] SYSC_bind (net/socket.c:1541)
Moreover, define the missing nf_tables and nf_acct multicast groups too.
Reported-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch changes the byteorder handling for short and panid handling.
We now except to get little endian in nl802154 for these attributes.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves the 802.15.4 constraints WPAN_NUM_ defines into
"net/ieee802154.h" which should contain all necessary 802.15.4 related
information. Also rename these defines to a common name which is
IEEE802154_MAX_CHANNEL and IEEE802154_MAX_PAGE.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for deleting a wpan interface via nl802154.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for setting an extended address while
registration a new interface. If ieee802154_is_valid_extended_addr
getting as parameter and invalid extended address then the perm address
is fallback. This is useful to make some default handling while for
example default registration of a wpan interface while phy registration.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a new nl802154 command for adding a new interface
according to a wpan phy via nl802154.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This parameter was grabbed from wireless implementation with the
identically wireless dev struct. We don't need this right now and so we
remove it. Maybe we will add it later again if we found any real reason
to have such parameter.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch replace the depracted IEEE802154_DEV to the new introduced
NL802154_IFTYPE_NODE types. There is a backwards compatibility to have
the identical types for both enum definitions. Also remove some inlcude
issue with "linux/nl802154.h", because the export nl_policy inside this
header it was always necessary to have an include of "net/rtnetlink.h"
before. The reason for this is more complicated. Nevertheless we removed
this now, because "linux/nl802154.h" is the depracted netlink interface.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We don't and we can't name it zigbee anymore. This patch removes
deprecated information for project website.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patches removes the const keyword in variables which are non
pointers. There is no sense to declare call by value parameters as
const.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patches removes the const keyword in variables which are non
pointers. There is no sense to declare call by value parameters as
const.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patches removes the const keyword in variables which are non
pointers. There is no sense to declare call by value parameters as const.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes some prototypes which are not used anymore.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
It has been reported that generating an MLD listener report on
devices with large MTUs (e.g. 9000) and a high number of IPv6
addresses can trigger a skb_over_panic():
skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
dev:port1
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:100!
invalid opcode: 0000 [#1] SMP
Modules linked in: ixgbe(O)
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
[...]
Call Trace:
<IRQ>
[<ffffffff80578226>] ? skb_put+0x3a/0x3b
[<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
[<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
[<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
[<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
[<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
[<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
[<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
[<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
[<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
[<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70
mld_newpack() skb allocations are usually requested with dev->mtu
in size, since commit 72e09ad107 ("ipv6: avoid high order allocations")
we have changed the limit in order to be less likely to fail.
However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
macros, which determine if we may end up doing an skb_put() for
adding another record. To avoid possible fragmentation, we check
the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
assumption as the actual max allocation size can be much smaller.
The IGMP case doesn't have this issue as commit 57e1ab6ead
("igmp: refine skb allocations") stores the allocation size in
the cb[].
Set a reserved_tailroom to make it fit into the MTU and use
skb_availroom() helper instead. This also allows to get rid of
igmp_skb_size().
Reported-by: Wei Liu <lw1a2.jing@gmail.com>
Fixes: 72e09ad107 ("ipv6: avoid high order allocations")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: David L Stevens <david.stevens@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
RSS (Receive Side Scaling) typically uses Toeplitz hash and a 40 or 52 bytes
RSS key.
Some drivers use a constant (and well known key), some drivers use a random
key per port, making bonding setups hard to tune. Well known keys increase
attack surface, considering that number of queues is usually a power of two.
This patch provides infrastructure to help drivers doing the right thing.
netdev_rss_key_fill() should be used by drivers to initialize their RSS key,
even if they provide ethtool -X support to let user redefine the key later.
A new /proc/sys/net/core/netdev_rss_key file can be used to get the host
RSS key even for drivers not providing ethtool -x support, in case some
applications want to precisely setup flows to match some RX queues.
Tested:
myhost:~# cat /proc/sys/net/core/netdev_rss_key
11:63:99:bb:79:fb:a5:a7:07:45:b2:20:bf:02:42:2d:08:1a:dd:19:2b:6b:23:ac:56:28:9d:70:c3:ac:e8:16:4b:b7:c1:10:53:a4:78:41:36:40:74:b6:15:ca:27:44:aa:b3:4d:72
myhost:~# ethtool -x eth0
RX flow hash indirection table for eth0 with 8 RX ring(s):
0: 0 1 2 3 4 5 6 7
RSS hash key:
11:63:99:bb:79:fb:a5:a7:07:45:b2:20:bf:02:42:2d:08:1a:dd:19:2b:6b:23:ac:56:28:9d:70:c3:ac:e8:16:4b:b7:c1:10:53:a4:78:41
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar says:
====================
Open vSwitch
Following fixes are accumulated in ovs-repo.
Three of them are related to protocol processing, one is
related to memory leak in case of error and one is to
fix race.
Patch "Validate IPv6 flow key and mask values" has conflicts
with net-next, Let me know if you want me to send the patch
for net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Solves possible lockup issues that can be seen from firmware DCB agents calling
into the DCB app api.
DCB firmware event queues can be tied in with NAPI so that dcb events are
generated in softIRQ context. This can results in calls to dcb_*app()
functions which try to take the dcb_lock.
If the the event triggers while we also have the dcb_lock because lldpad or
some other agent happened to be issuing a get/set command we could see a cpu
lockup.
This code was not originally written with firmware agents in mind, hence
grabbing dcb_lock from softIRQ context was not considered.
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no reason to limit the amount of possible links to a
neighboring node to 2. If we have more then two bearers we can also
establish more links.
Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Reviewed-By: Jon Maloy <jon.maloy@ericsson.com>
cc: Ying Xue <ying.xue@windriver.com>
cc: Erik Hugne <erik.hugne@ericsson.com>
cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter updates for your net tree,
they are:
1) Fix missing initialization of the range structure (allocated in the
stack) in nft_masq_{ipv4, ipv6}_eval, from Daniel Borkmann.
2) Make sure the data we receive from userspace contains the req_version
structure, otherwise return an error incomplete on truncated input.
From Dan Carpenter.
3) Fix handling og skb->sk which may cause incorrect handling
of connections from a local process. Via Simon Horman, patch from
Calvin Owens.
4) Fix wrong netns in nft_compat when setting target and match params
structure.
5) Relax chain type validation in nft_compat that was recently included,
this broke the matches that need to be run from the route chain type.
Now iptables-test.py automated regression tests report success again
and we avoid the only possible problematic case, which is the use of
nat targets out of nat chain type.
6) Use match->table to validate the tablename, instead of the match->name.
Again patch for nft_compat.
7) Restore the synchronous release of objects from the commit and abort
path in nf_tables. This is causing two major problems: splats when using
nft_compat, given that matches and targets may sleep and call_rcu is
invoked from softirq context. Moreover Patrick reported possible event
notification reordering when rules refer to anonymous sets.
8) Fix race condition in between packets that are being confirmed by
conntrack and the ctnetlink flush operation. This happens since the
removal of the central spinlock. Thanks to Jesper D. Brouer to looking
into this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Trying to add an unreachable route incorrectly returns -ESRCH if
if custom FIB rules are present:
[root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4
RTNETLINK answers: Network is unreachable
[root@localhost ~]# ip rule add to 55.66.77.88 table 200
[root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4
RTNETLINK answers: No such process
[root@localhost ~]#
Commit 83886b6b63 ("[NET]: Change "not found"
return value for rule lookup") changed fib_rules_lookup()
to use -ESRCH as a "not found" code internally, but for user space it
should be translated into -ENETUNREACH. Handle the translation centrally in
ipv4-specific fib_lookup(), leaving the DECnet case alone.
On a related note, commit b7a71b51ee
("ipv4: removed redundant conditional") removed a similar translation from
ip_route_input_slow() prematurely AIUI.
Fixes: b7a71b51ee ("ipv4: removed redundant conditional")
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Highlights include:
- Stable patches to fix NFSv4.x delegation reclaim error paths
- Fix a bug whereby we were advertising NFSv4.1 but using NFSv4.2 features
- Fix a use-after-free problem with pNFS block layouts
- Fix a memory leak in the pNFS files O_DIRECT code
- Replace an intrusive and Oops-prone performance fix in the NFSv4 atomic
open code with a safer one-line version and revert the two original patches.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUZol9AAoJEGcL54qWCgDyNQQQALnngvpPR51BoO/iTz9ruXol
fGZy0SRIlTUKm1ArsQsQ+HGbV5K0hgP3Tg+z2AtEEZ8u/2Fi2Bqdl6+eNY12tKHd
uUctDdM5TXLrETAn1UULrnd2eX1cvPMBfOlXlAdNHHsGEgC7w7YQ+rzGwnls+HDy
LYXzY7Y3jYGdTMaRgZc5YRdtd8JBpCxciRvPEQLDIobwP0JnZC1afTLe1XInqB2I
TZ4NTHT+DEWA+Ou1P2deL7+RuJNEAeWWBvULJy76n4BqKvN/HNedOO5HyBYXrwSd
3UX3wbx9CWRxN1F0EqNKxjxZ/597JwqBeNoTDRcofLsqumUfAOtlbym1EahcD3Ls
pykopNfgUhGuhxolStmuHdS6CnyQPERpR5lFZcDp7XtcwSq4FcwD8DRzLJMZW5dg
N1lkfFlwQN3rqdk/NEHL+IxS41Hlk4HXjMoP6MNbRtqzIN6tW9tvC4MtAWd1aYxO
YuUW281pbWxXQ731s0kTIrMUdQ9vGSRBMcbnO9rL3o+xkh8y5SPVkx9lhdhJN0UD
VbQ5Ws/xZ54bD1PfyYb+Yx659lI8MSFOsDuMuLmDtfYnVicHwCA3H63StvQ3ihf/
q0gu8Iex9YbNNjf7IfYGuWPmPn3gwPBoURPC0bcZvMPdY6DXodU6Oj4BRTQ5VCie
9N0pt2wp2eRjaSzD7r5A
=8YN6
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
- stable patches to fix NFSv4.x delegation reclaim error paths
- fix a bug whereby we were advertising NFSv4.1 but using NFSv4.2
features
- fix a use-after-free problem with pNFS block layouts
- fix a memory leak in the pNFS files O_DIRECT code
- replace an intrusive and Oops-prone performance fix in the NFSv4
atomic open code with a safer one-line version and revert the two
original patches"
* tag 'nfs-for-3.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor
NFS: Don't try to reclaim delegation open state if recovery failed
NFSv4: Ensure that we call FREE_STATEID when NFSv4.x stateids are revoked
NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return
NFSv4.1: nfs41_clear_delegation_stateid shouldn't trust NFS_DELEGATED_STATE
NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired
NFS: SEEK is an NFS v4.2 feature
nfs: Fix use of uninitialized variable in nfs_getattr()
nfs: Remove bogus assignment
nfs: remove spurious WARN_ON_ONCE in write path
pnfs/blocklayout: serialize GETDEVICEINFO calls
nfs: fix pnfs direct write memory leak
Revert "NFS: nfs4_do_open should add negative results to the dcache."
Revert "NFS: remove BUG possibility in nfs4_open_and_get_state"
NFSv4: Ensure nfs_atomic_open set the dentry verifier on ENOENT
When passed BDADDR_ANY the Remove Remote OOB Data comand is specified to
clear all entries. This patch adds the necessary check and calls
hci_remote_oob_data_clear() when necessary.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds some extra debug logs to L2CAP related code. These are
mainly to help track locking issues but will probably be useful for
debugging other types of issues as well.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Now that the SMP related key lists are converted to RCU there is nothing
in smp_cmd_sign_info() or smp_cmd_ident_addr_info() that would require
taking the hdev lock (including the smp_distribute_keys call). This
patch removes this unnecessary locking.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch set converts the hdev->identity_resolving_keys list to use
RCU to eliminate the need to use hci_dev_lock/unlock.
An additional change that must be done is to remove use of
CRYPTO_ALG_ASYNC for the hdev-specific AES crypto context. The reason is
that this context is used for matching RPAs and the loop that does the
matching is under the RCU read lock, i.e. is an atomic section which
cannot sleep.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch set converts the hdev->long_term_keys list to use RCU to
eliminate the need to use hci_dev_lock/unlock.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The insufficient authentication/encryption errors indicate to the L2CAP
client that it should try to elevate the security level. Since there
really isn't any exception to this rule it makes sense to fully handle
it on the kernel side instead of pushing the responsibility to user
space.
This patch adds special handling of these two error codes and calls
smp_conn_security() with the elevated security level if necessary.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
So far smp_sufficient_security() has returned false if we're encrypted
with an STK but do have an LTK available. However, for the sake of LE
CoC servers we do want to let the incoming connection through even
though we're only encrypted with the STK.
This patch adds a key preference parameter to smp_sufficient_security()
with two possible values (enum used instead of bool for readability).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
For LE CoC L2CAP servers we don't do security level elevation during the
BT_CONNECT2 state (instead LE CoC simply sends an immediate error
response if the security level isn't high enough). Therefore if we get a
security level change while an LE CoC channel is in the BT_CONNECT2
state we should simply do nothing.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
dp read operations depends on ovs_dp_cmd_fill_info(). This API
needs to looup vport to find dp name, but vport lookup can
fail. Therefore to keep vport reference alive we need to
take ovs lock.
Introduced by commit 6093ae9aba ("openvswitch: Minimize
dp and vport critical sections").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
match_validate() enforce that a mask matching on NDP attributes has also an
exact match on ICMPv6 type.
The ICMPv6 type, which is 8-bit wide, is stored in the 'tp.src' field of
'struct sw_flow_key', which is 16-bit wide.
Therefore, an exact match on ICMPv6 type should only check the first 8 bits.
This commit fixes a bug that prevented flows with an exact match on NDP field
from being installed
Introduced by commit 03f0d916aa ("openvswitch: Mega flow implementation").
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
The checksum of ICMPv6 packets uses the IP pseudoheader as part of
the calculation, unlike ICMP in IPv4. This was not implemented,
which means that modifying the IP addresses of an ICMPv6 packet
would cause the checksum to no longer be correct as the psuedoheader
did not match.
Introduced by commit 3fdbd1ce11 ("openvswitch: add ipv6 'set' action").
Reported-by: Neal Shrader <icosahedral@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Need to free memory in case of sample action error.
Introduced by commit 651887b0c2 ("openvswitch: Sample
action without side effects").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
John W. Linville says:
====================
pull request: wireless 2014-11-13
Please pull this set of a few more wireless fixes intended for the
3.18 stream...
For the mac80211 bits, Johannes says:
"This has just one fix, for an issue with the CCMP decryption
that can cause a kernel crash. I'm not sure it's remotely
exploitable, but it's an important fix nonetheless."
For the iwlwifi bits, Emmanuel says:
"Two fixes here - we weren't updating mac80211 if a scan
was cut short by RFKILL which confused cfg80211. As a
result, the latter wouldn't allow to run another scan.
Liad fixes a small bug in the firmware dump."
On top of that...
Arend van Spriel corrects a channel width conversion that caused a
WARNING in brcmfmac.
Hauke Mehrtens avoids a NULL pointer dereference in b43.
Larry Finger hits a trio of rtlwifi bugs left over from recent
backporting from the Realtek vendor driver.
Miaoqing Pan fixes a clocking problem in ath9k that could affect
packet timestamps and such.
Stanislaw Gruszka addresses an payload alignment issue that has been
plaguing rt2x00.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
After removal of the central spinlock nf_conntrack_lock, in
commit 93bb0ceb75 ("netfilter: conntrack: remove central
spinlock nf_conntrack_lock"), it is possible to race against
get_next_corpse().
The race is against the get_next_corpse() cleanup on
the "unconfirmed" list (a per-cpu list with seperate locking),
which set the DYING bit.
Fix this race, in __nf_conntrack_confirm(), by removing the CT
from unconfirmed list before checking the DYING bit. In case
race occured, re-add the CT to the dying list.
While at this, fix coding style of the comment that has been
updated.
Fixes: 93bb0ceb75 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock")
Reported-by: bill bonaparte <programme110@gmail.com>
Signed-off-by: bill bonaparte <programme110@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add dependency on INET to fix following build error. I have also
fixed MPLS dependency.
ERROR: "ip_route_output_flow" [net/openvswitch/openvswitch.ko]
undefined!
make[1]: *** [__modpost] Error 1
Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/chelsio/cxgb4vf/sge.c
drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c
sge.c was overlapping two changes, one to use the new
__dev_alloc_page() in net-next, and one to use s->fl_pg_order in net.
ixgbe_phy.c was a set of overlapping whitespace changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) sunhme driver lacks DMA mapping error checks, based upon a report by
Meelis Roos.
2) Fix memory leak in mvpp2 driver, from Sudip Mukherjee.
3) DMA memory allocation sizes are wrong in systemport ethernet driver,
fix from Florian Fainelli.
4) Fix use after free in mac80211 defragmentation code, from Johannes
Berg.
5) Some networking uapi headers missing from Kbuild file, from Stephen
Hemminger.
6) TUN driver gets csum_start offset wrong when VLAN accel is enabled,
and macvtap has a similar bug, from Herbert Xu.
7) Adjust several tunneling drivers to set dev->iflink after registry,
because registry sets that to -1 overwriting whatever we did. From
Steffen Klassert.
8) Geneve forgets to set inner tunneling type, causing GSO segmentation
to fail on some NICs. From Jesse Gross.
9) Fix several locking bugs in stmmac driver, from Fabrice Gasnier and
Giuseppe CAVALLARO.
10) Fix spurious timeouts with NewReno on low traffic connections, from
Marcelo Leitner.
11) Fix descriptor updates in enic driver, from Govindarajulu
Varadarajan.
12) PPP calls bpf_prog_create() with locks held, which isn't kosher.
Fix from Takashi Iwai.
13) Fix NULL deref in SCTP with malformed INIT packets, from Daniel
Borkmann.
14) psock_fanout selftest accesses past the end of the mmap ring, fix
from Shuah Khan.
15) Fix PTP timestamping for VLAN packets, from Richard Cochran.
16) netlink_unbind() calls in netlink pass wrong initial argument, from
Hiroaki SHIMODA.
17) vxlan socket reuse accidently reuses a socket when the address
family is different, so we have to explicitly check this, from
Marcelo Lietner.
18) Fix missing include in nft_reject_bridge.c breaking the build on ppc
and other architectures, from Guenter Roeck.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits)
vxlan: Do not reuse sockets for a different address family
smsc911x: power-up phydev before doing a software reset.
lib: rhashtable - Remove weird non-ASCII characters from comments
net/smsc911x: Fix delays in the PHY enable/disable routines
net/smsc911x: Fix rare soft reset timeout issue due to PHY power-down mode
netlink: Properly unbind in error conditions.
net: ptp: fix time stamp matching logic for VLAN packets.
cxgb4 : dcb open-lldp interop fixes
selftests/net: psock_fanout seg faults in sock_fanout_read_ring()
net: bcmgenet: apply MII configuration in bcmgenet_open()
net: bcmgenet: connect and disconnect from the PHY state machine
net: qualcomm: Fix dependency
ixgbe: phy: fix uninitialized status in ixgbe_setup_phy_link_tnx
net: phy: Correctly handle MII ioctl which changes autonegotiation.
ipv6: fix IPV6_PKTINFO with v4 mapped
net: sctp: fix memory leak in auth key management
net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
net: ppp: Don't call bpf_prog_create() in ppp_lock
net/mlx4_en: Advertize encapsulation offloads features only when VXLAN tunnel is set
cxgb4 : Fix bug in DCB app deletion
...
In DC world, GSO packets initially cooked by tcp_sendmsg() are usually
big, as sk_pacing_rate is high.
When network is congested, cwnd can be smaller than the GSO packets
found in socket write queue. tcp_write_xmit() splits GSO packets
using the available cwnd, and we end up sending a single GSO packet,
consuming all available cwnd.
With GRO aggregation on the receiver, we might handle a single GRO
packet, sending back a single ACK.
1) This single ACK might be lost
TLP or RTO are forced to attempt a retransmit.
2) This ACK releases a full cwnd, sender sends another big GSO packet,
in a ping pong mode.
This behavior does not fill the pipes in the best way, because of
scheduling artifacts.
Make sure we always have at least two GSO packets in flight.
This allows us to safely increase GRO efficiency without risking
spurious retransmits.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reallocation is only required for shrinking and expanding and both rely
on a mutex for synchronization and callers of rhashtable_init() are in
non atomic context. Therefore, no reason to continue passing allocation
hints through the API.
Instead, use GFP_KERNEL and add __GFP_NOWARN | __GFP_NORETRY to allow
for silent fall back to vzalloc() without the OOM killer jumping in as
pointed out by Eric Dumazet and Eric W. Biederman.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently mutex_is_held can only test locks in the that are global
since it takes no arguments. This prevents rhashtable from being
used in places where locks are lock, e.g., per-namespace locks.
This patch adds a parent field to mutex_is_held and rhashtable_params
so that local locks can be used (and tested).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rhashtable function mutex_is_held is only used when PROVE_LOCKING
is enabled. This patch modifies netfilter so that we can rhashtable.h
itself can later make mutex_is_held optional depending on PROVE_LOCKING.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rhashtable function mutex_is_held is only used when PROVE_LOCKING
is enabled. This patch modifies netlink so that we can rhashtable.h
itself can later make mutex_is_held optional depending on PROVE_LOCKING.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Large receive offloading is known to cause problems if received packets
are passed to other host. Therefore the kernel disables it by calling
dev_disable_lro() whenever a network device is enslaved in a bridge or
forwarding is enabled for it (or globally). For virtual devices we need
to disable LRO on the underlying physical device (which is actually
receiving the packets).
Current dev_disable_lro() code handles this propagation for a vlan
(including 802.1ad nested vlan), macvlan or a vlan on top of a macvlan.
It doesn't handle other stacked devices and their combinations, in
particular propagation from a bond to its slaves which often causes
problems in virtualization setups.
As we now have generic data structures describing the upper-lower device
relationship, dev_disable_lro() can be generalized to disable LRO also
for all lower devices (if any) once it is disabled for the device
itself.
For bonding and teaming devices, it is necessary to disable LRO not only
on current slaves at the moment when dev_disable_lro() is called but
also on any slave (port) added later.
v2: use lower device links for all devices (including vlan and macvlan)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/ipv4/fou.c: In function ‘ip_tunnel_encap_del_fou_ops’:
net/ipv4/fou.c:861:1: warning: no return statement in function returning non-void [-Wreturn-type]
Fixes: a8c5f90fb5 ("ip_tunnel: Ops registration for secondary encap (fou, gue)")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
kick_requests() can put linger requests on the notarget list. This
means we need to clear the much-overloaded req->r_req_lru_item in
__unregister_linger_request() as well, or we get an assertion failure
in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item).
AFAICT the assumption was that registered linger requests cannot be on
any of req->r_req_lru_item lists, but that's clearly not the case.
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Requests have to be unlinked from both osd->o_requests (normal
requests) and osd->o_linger_requests (linger requests) lists when
clearing req->r_osd. Otherwise __unregister_linger_request() gets
confused and we trip over a !list_empty(&osd->o_linger_requests)
assert in __remove_osd().
MON=1 OSD=1:
# cat remove-osd.sh
#!/bin/bash
rbd create --size 1 test
DEV=$(rbd map test)
ceph osd out 0
sleep 3
rbd map dne/dne # obtain a new osdmap as a side effect
rbd unmap $DEV & # will block
sleep 3
ceph osd in 0
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
In case of OOM, there's nothing userspace can do.
If there's no room to put the payload in __build_packet_message(),
jump to nla_put_failure which already performs the corresponding
error reporting.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
net/bridge/br_netfilter.c:870:6: symbol 'br_netfilter_enable' was not declared. Should it be static?
no; add include
net/ipv4/netfilter/nft_reject_ipv4.c:22:6: symbol 'nft_reject_ipv4_eval' was not declared. Should it be static?
yes
net/ipv6/netfilter/nf_reject_ipv6.c:16:6: symbol 'nf_send_reset6' was not declared. Should it be static?
no; add include
net/ipv6/netfilter/nft_reject_ipv6.c:22:6: symbol 'nft_reject_ipv6_eval' was not declared. Should it be static?
yes
net/netfilter/core.c:33:32: symbol 'nf_ipv6_ops' was not declared. Should it be static?
no; add include
net/netfilter/xt_DSCP.c:40:57: cast truncates bits from constant value (ffffff03 becomes 3)
net/netfilter/xt_DSCP.c:57:59: cast truncates bits from constant value (ffffff03 becomes 3)
add __force, 3 is what we want.
net/ipv4/netfilter/nf_log_arp.c:77:6: symbol 'nf_log_arp_packet' was not declared. Should it be static?
yes
net/ipv4/netfilter/nf_reject_ipv4.c:17:6: symbol 'nf_send_reset' was not declared. Should it be static?
no; add include
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
For a long time we couldn't actually use __xfrm_policy_link in
xfrm_policy_insert because the latter wanted to do hashing at
a specific position.
Now that __xfrm_policy_link no longer does hashing it can now
be safely used in xfrm_policy_insert to kill some duplicate code,
finally reuniting general policies with socket policies.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Back in 2003 when I added policy expiration, I half-heartedly
did a clean-up and renamed xfrm_sk_policy_link/xfrm_sk_policy_unlink
to __xfrm_policy_link/__xfrm_policy_unlink, because the latter
could be reused for all policies. I never actually got around
to using __xfrm_policy_link for non-socket policies.
Later on hashing was added to all xfrm policies, including socket
policies. In fact, we don't need hashing on socket policies at
all since they're always looked up via a linked list.
This patch restores xfrm_sk_policy_link/xfrm_sk_policy_unlink
as wrappers around __xfrm_policy_link/__xfrm_policy_unlink so
that it's obvious we're dealing with socket policies.
This patch also removes hashing from __xfrm_policy_link as for
now it's only used by socket policies which do not need to be
hashed. Ironically this will in fact allow us to use this helper
for non-socket policies which I shall do later.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Server channels in BT_LISTEN state should use L2CAP_NESTING_PARENT. This
patch fixes the nesting value for the 6lowpan channel.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There's no reason why all users of L2CAP would need to worry about
initializing chan->nesting to L2CAP_NESTING_NORMAL (which is important
since 0 is the same as NESTING_SMP). This patch moves the initialization
to the common place that's used to create all new channels, i.e. the
l2cap_chan_create() function.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The teardown callback for L2CAP channels is problematic in that it is
explicitly called for all types of channels from l2cap_chan_del(),
meaning it's not possible to hard-code a nesting level when taking the
socket lock. The simplest way to have a correct nesting level for the
socket locking is to use the same value as for the chan. This also means
that the other places trying to lock parent sockets need to be update to
use the chan value (since L2CAP_NESTING_PARENT is defined as 2 whereas
SINGLE_DEPTH_NESTING has the value 1).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
By default lockdep considers all L2CAP channels equal. This would mean
that we get warnings if a channel is locked when another one's lock is
tried to be acquired in the same thread. This kind of inter-channel
locking dependencies exist in the form of parent-child channels as well
as any channel wishing to elevate the security by requesting procedures
on the SMP channel.
To eliminate the chance for these lockdep warnings we introduce a
nesting level for each channel and use that when acquiring the channel
lock. For now there exists the earlier mentioned three identified
categories: SMP, "normal" channels and parent channels (i.e. those in
BT_LISTEN state). The nesting level is defined as atomic_t since we need
access to it before the lock is actually acquired.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a new interframe spacing time handling into mac802154
layer. Interframe spacing time is a time period between each transmit.
This patch adds a high resolution timer into mac802154 and starts on
xmit complete with corresponding interframe spacing expire time if
ifs_handling is true. We make it variable because it depends if
interframe spacing time is handled by transceiver or mac802154. At the
timer complete function we wake the netdev queue again. This avoids
new frame transmit in range of interframe spacing time.
For synced driver we add no handling of interframe spacing time. This
is currently a lack of support in all synced xmit drivers. I suppose
it's working because the latency of workqueue which is needed to call
spi_sync.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Fix the build failures that result from the use of pr_debug
without the referenced char * arrays being defined.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even if netlink_kernel_cfg::unbind is implemented the unbind() method is
not called, because cfg->unbind is omitted in __netlink_kernel_create().
And fix wrong argument of test_bit() and off by one problem.
At this point, no unbind() method is implemented, so there is no real
issue.
Fixes: 4f52090052 ("netlink: have netlink per-protocol bind function return an error code.")
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Cc: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of calling fou and gue functions directly from ip_tunnel
use ops for these that were previously registered. This patch adds the
logic to add and remove encapsulation operations for ip_tunnel,
and modified fou (and gue) to register with ip_tunnels.
This patch also addresses a circular dependency between ip_tunnel
and fou that was causing link errors when CONFIG_NET_IP_TUNNEL=y
and CONFIG_NET_FOU=m. References to fou an gue have been removed from
ip_tunnel.c
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Standardize function pointer uses.
Convert calling style from:
(*foo)(args...);
to:
foo(args...);
Other miscellanea:
o Add braces around loops with single ifs on multiple lines
o Realign arguments around these functions
o Invert logic in if to return immediately.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the normal kernel debugging mechanism which also
enables dynamic_debug at the same time.
Other miscellanea:
o Remove sysctl for irda_debug
o Remove function tracing like uses (use ftrace instead)
o Coalesce formats
o Realign arguments
o Remove unnecessary OOM messages
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing xtables matches and targets, when used from nft_compat, may
sleep from the destroy path, ie. when removing rules. Since the objects
are released via call_rcu from softirq context, this results in lockdep
splats and possible lockups that may be hard to reproduce.
Patrick also indicated that delayed object release via call_rcu can
cause us problems in the ordering of event notifications when anonymous
sets are in place.
So, this patch restores the synchronous object release from the commit
and abort paths. This includes a call to synchronize_rcu() to make sure
that no packets are walking on the objects that are going to be
released. This is slowier though, but it's simple and it resolves the
aforementioned problems.
This is a partial revert of c7c32e7 ("netfilter: nf_tables: defer all
object release via rcu") that was introduced in 3.16 to speed up
interaction with userspace.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Instead of the match->name, which is of course not relevant.
Fixes: f3f5dde ("netfilter: nft_compat: validate chain type in match/target")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Check for nat chain dependency only, which is the one that can
actually crash the kernel. Don't care if mangle, filter and security
specific match and targets are used out of their scope, they are
harmless.
This restores iptables-compat with mangle specific match/target when
used out of the OUTPUT chain, that are actually emulated through filter
chains, which broke when performing strict validation.
Fixes: f3f5dde ("netfilter: nft_compat: validate chain type in match/target")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
warning: (NETFILTER_XT_TARGET_REDIRECT) selects NF_NAT_REDIRECT_IPV4 which has unmet direct dependencies (NET && INET && NETFILTER && NF_NAT_IPV4)
warning: (NETFILTER_XT_TARGET_REDIRECT) selects NF_NAT_REDIRECT_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_NAT_IPV6)
Fixes: 8b13edd ("netfilter: refactor NAT redirect IPv4 to use it from nf_tables")
Fixes: 9de920e ("netfilter: refactor NAT redirect IPv6 code to use it from nf_tables")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The mgmt_user_passkey_request and related functions do not do anything
else except read access to hdev->id. This member never changes after the
hdev creation so there is no need to acquire a lock to read it.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Any code calling bt_accept_dequeue() to get a new child socket from a
server socket should use lock_sock_nested to avoid lockdep warnings due
to the parent and child sockets being locked at the same time. The
l2cap_sock_accept() function is already doing this correctly but a
second place calling bt_accept_dequeue() is the code path from
l2cap_sock_teardown_cb() that calls l2cap_sock_cleanup_listen().
This patch fixes the proper nested locking annotation and thereby avoids
the following style of lockdep warning.
[ +0.000224] [ INFO: possible recursive locking detected ]
[ +0.000222] 3.17.0+ #1153 Not tainted
[ +0.000130] ---------------------------------------------
[ +0.000227] l2cap-tester/562 is trying to acquire lock:
[ +0.000210] (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+...}, at: [<c1393f47>] bt_accept_dequeue+0x68/0x11b
[ +0.000467]
but task is already holding lock:
[ +0.000186] (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+...}, at: [<c13b949a>] lock_sock+0xa/0xc
[ +0.000421]
other info that might help us debug this:
[ +0.000199] Possible unsafe locking scenario:
[ +0.000117] CPU0
[ +0.000000] ----
[ +0.000000] lock(sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP);
[ +0.000000] lock(sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP);
[ +0.000000]
*** DEADLOCK ***
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for setting listen before transmit mode via
nl802154 framework.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch add support for setting mac frame retries setting via
nl802154 framework.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch add support for max csma backoffs setting via nl802154
framework.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for setting backoff exponents via nl802154
framework.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for setting short address via nl802154 framework.
Also added a comment because a 0xffff seems to be valid address that we
don't have a short address. This is a valid setting but we need
more checks in upper layers to don't allow this address as source address.
Also the current netlink interface doesn't allow to set the short_addr
to 0xffff. Same for the 0xfffe short address which describes a not
allocated short address.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for setting pan_id via nl802154 framework.
Adding a comment because setting 0xffff as pan_id seems to be valid
setting. The pan_id 0xffff as source pan is invalid. I am not sure now
about this setting but for the current netlink interface this is an
invalid setting, so we do the same now. Maybe we need to change that
when we have coordinator support and association support.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds page and channel setting support to nl802154 framework.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a netdev notifier for interface renaming. We have a name
attribute inside of subif data struct. This is needed to have always the
actual netdev name in sdata name attribute.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch changes the module description like wireless which is IEEE
802.11 "subsystem" and not "implementation".
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds an unique id for an wpan_phy. This behaviour is mostly
grabbed from wireless stack. This is needed for upcomming patches which
identify the wpan netdev while NETDEV_CHANGENAME in netdev notify function.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Instead of always re-lock the iflist_mtx at multiple interfaces we lock
the complete for each loop at start and at the end.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch is just a cleanup to name the temporary variable for
protected list for each loop as tmp.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch move the iface unregistration into iface.c file to have
a behaviour which is similar like mac80211. Also iface handling should
be inside iface.c file only.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
ip_vs_prepare_tunneled_skb() ignores ->sk when allocating a new
skb, either unconditionally setting ->sk to NULL or allowing
the uninitialized ->sk from a newly allocated skb to leak through
to the caller.
This patch properly copies ->sk and increments its reference count.
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
And use the more common mechanisms directly.
Other miscellanea:
o Coalesce formats
o Add missing newlines
o Realign arguments
o Remove unnecessary OOM message logging as
there's a generic stack dump already on OOM.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use IS_ENABLED(CONFIG_IPV6), to enable this code if IPv6 is
a module.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: c8e6ad0829 ("ipv6: honor IPV6_PKTINFO with v4 mapped addresses on sendmsg")
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently there are only three neigh tables in the whole kernel:
arp table, ndisc table and decnet neigh table. What's more,
we don't support registering multiple tables per family.
Therefore we can just make these tables statically built-in.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A very minimal and simple user space application allocating an SCTP
socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing
the socket again will leak the memory containing the authentication
key from user space:
unreferenced object 0xffff8800837047c0 (size 16):
comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s)
hex dump (first 16 bytes):
01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff816d7e8e>] kmemleak_alloc+0x4e/0xb0
[<ffffffff811c88d8>] __kmalloc+0xe8/0x270
[<ffffffffa0870c23>] sctp_auth_create_key+0x23/0x50 [sctp]
[<ffffffffa08718b1>] sctp_auth_set_key+0xa1/0x140 [sctp]
[<ffffffffa086b383>] sctp_setsockopt+0xd03/0x1180 [sctp]
[<ffffffff815bfd94>] sock_common_setsockopt+0x14/0x20
[<ffffffff815beb61>] SyS_setsockopt+0x71/0xd0
[<ffffffff816e58a9>] system_call_fastpath+0x12/0x17
[<ffffffffffffffff>] 0xffffffffffffffff
This is bad because of two things, we can bring down a machine from
user space when auth_enable=1, but also we would leave security sensitive
keying material in memory without clearing it after use. The issue is
that sctp_auth_create_key() already sets the refcount to 1, but after
allocation sctp_auth_set_key() does an additional refcount on it, and
thus leaving it around when we free the socket.
Fixes: 65b07e5d0d ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An SCTP server doing ASCONF will panic on malformed INIT ping-of-death
in the form of:
------------ INIT[PARAM: SET_PRIMARY_IP] ------------>
While the INIT chunk parameter verification dissects through many things
in order to detect malformed input, it misses to actually check parameters
inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary
IP address' parameter in ASCONF, which has as a subparameter an address
parameter.
So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS
or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0
and thus sctp_get_af_specific() returns NULL, too, which we then happily
dereference unconditionally through af->from_addr_param().
The trace for the log:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
IP: [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
PGD 0
Oops: 0000 [#1] SMP
[...]
Pid: 0, comm: swapper Not tainted 2.6.32-504.el6.x86_64 #1 Bochs Bochs
RIP: 0010:[<ffffffffa01e9c62>] [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
[...]
Call Trace:
<IRQ>
[<ffffffffa01f2add>] ? sctp_bind_addr_copy+0x5d/0xe0 [sctp]
[<ffffffffa01e1fcb>] sctp_sf_do_5_1B_init+0x21b/0x340 [sctp]
[<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
[<ffffffffa01e5c09>] ? sctp_endpoint_lookup_assoc+0xc9/0xf0 [sctp]
[<ffffffffa01e61f6>] sctp_endpoint_bh_rcv+0x116/0x230 [sctp]
[<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
[<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
[<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
[<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
[<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
[<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[...]
A minimal way to address this is to check for NULL as we do on all
other such occasions where we know sctp_get_af_specific() could
possibly return with NULL.
Fixes: d6de309759 ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the more common dynamic_debug capable net_dbg_ratelimited
and remove the LIMIT_NETDEBUG macro.
All messages are still ratelimited.
Some KERN_<LEVEL> uses are changed to KERN_DEBUG.
This may have some negative impact on messages that were
emitted at KERN_INFO that are not not enabled at all unless
DEBUG is defined or dynamic_debug is enabled. Even so,
these messages are now _not_ emitted by default.
This also eliminates the use of the net_msg_warn sysctl
"/proc/sys/net/core/warnings". For backward compatibility,
the sysctl is not removed, but it has no function. The extern
declaration of net_msg_warn is removed from sock.h and made
static in net/core/sysctl_net_core.c
Miscellanea:
o Update the sysctl documentation
o Remove the embedded uses of pr_fmt
o Coalesce format fragments
o Realign arguments
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar says:
====================
Open vSwitch
Following batch of patches brings feature parity between upstream
ovs and out of tree ovs module.
Two features are added, first adds support to export egress
tunnel information for a packet. This is used to improve
visibility in network traffic. Second feature allows userspace
vswitchd process to probe ovs module features. Other patches
are optimization and code cleanup.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Neaten and standardize the logging output.
Other miscellanea:
o Use pr_notice_once instead of a guard flag.
o Convert existing pr_<level> uses too.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alternative to RPS/RFS is to use hardware support for multiple
queues.
Then split a set of million of sockets into worker threads, each
one using epoll() to manage events on its own socket pool.
Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
know after accept() or connect() on which queue/cpu a socket is managed.
We normally use one cpu per RX queue (IRQ smp_affinity being properly
set), so remembering on socket structure which cpu delivered last packet
is enough to solve the problem.
After accept(), connect(), or even file descriptor passing around
processes, applications can use :
int cpu;
socklen_t len = sizeof(cpu);
getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);
And use this information to put the socket into the right silo
for optimal performance, as all networking stack should run
on the appropriate cpu, without need to send IPI (RPS/RFS).
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_mark_napi_id() is used to record for a flow napi id of incoming
packets for busypoll sake.
We should do this only on established flows, not on listeners.
This was 'working' by virtue of the socket cloning, but doing
this on SYN packets in unecessary cache line dirtying.
Even if we move sk_napi_id in the same cache line than sk_lock,
we are working to make SYN processing lockless, so it is desirable
to set sk_napi_id only for established flows.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Put duplicate detection into its own RX handler, and separate
out the conditions a bit to make the code more readable.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When kfree() is all that's needed to free an object protected by RCU
there's a kfree_rcu() convenience function that can be used. This patch
updates the 6lowpan code to use this, thereby eliminating the need for
the separate peer_free() function.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We could be reading 8 bytes into a 4 byte buffer here. It seems
harmless but adding a check is the right thing to do and it silences a
static checker warning.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch fixes a regression that was introduced by commit
cb77c3ec07. In addition to BT_CONFIG,
BT_CONNECTED is also a state in which we may get a remote name and need
to indicate over mgmt the connection status. This scenario is
particularly likely to happen for incoming connections that do not need
authentication since there the hci_conn state will reach BT_CONNECTED
before the remote name is received.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This fixes the following sparse warning:
net/bluetooth/amp.c:152:53: warning: Variable length array is used.
The warning itself is probably harmless since this kind of usage of
shash_desc is present also in other places in the kernel (there's even a
convenience macro SHASH_DESC_ON_STACK available for defining such stack
variables). However, dynamically allocated versions are also used in
several places of the kernel (e.g. kernel/kexec.c and lib/digsig.c)
which have the benefit of not exhibiting the sparse warning.
Since there are no more sparse warnings in the Bluetooth subsystem after
fixing this one it is now easier to spot whenever new ones might get
introduced by future patches.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When doing GRO processing for UDP tunnels, we never add
SKB_GSO_UDP_TUNNEL to gso_type - only the type of the inner protocol
is added (such as SKB_GSO_TCPV4). The result is that if the packet is
later resegmented we will do GSO but not treat it as a tunnel. This
results in UDP fragmentation of the outer header instead of (i.e.) TCP
segmentation of the inner header as was originally on the wire.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless-next 2014-11-07
Please pull this batch of updates intended for the 3.19 stream!
For the mac80211 bits, Johannes says:
"This relatively large batch of changes is comprised of the following:
* large mac80211-hwsim changes from Ben, Jukka and a bit myself
* OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
University in Prague and Volkswagen Group Research
* minstrel VHT work from Karl
* more CSA work from Luca
* WMM admission control support in mac80211 (myself)
* various smaller fixes, spelling corrections, and minor API additions"
For the Bluetooth bits, Johan says:
"Here's the first bluetooth-next pull request for 3.19. The vast majority
of patches are for ieee802154 from Alexander Aring with various fixes
and cleanups. There are also several LE/SMP fixes as well as improved
support for handling LE devices that have lost their pairing information
(the patches from Alfonso). Jukka provides a couple of stability fixes
for 6lowpan and Szymon conformance fixes for RFCOMM. For the HCI drivers
we have one new USB ID for an Acer controller as well as a reset
handling fix for H5."
For the Atheros bits, Kalle says:
"Major changes are:
o ethtool support (Ben)
o print dev string prefix with debug hex buffers dump (Michal)
o debugfs file to read calibration data from the firmware verification
purposes (me)
o fix fw_stats debugfs file, now results are more reliable (Michal)
o firmware crash counters via debugfs (Ben&me)
o various tracing points to debug firmware (Rajkumar)
o make it possible to provide firmware calibration data via a file (me)
And we have quite a lot of smaller fixes and clean up."
For the iwlwifi bits, Emmanuel says:
"The big new thing here is netdetect which allows the
firmware to wake up the platform when a specific network
is detected. Along with that I have fixes for d3 operation.
The usual amount of rate scaling stuff - we now support STBC.
The other commit that stands out is Johannes's work on
devcoredump. He basically starts to use the standard
infrastructure he built."
Along with that are the usual sort of updates and such for ath9k,
brcmfmac, wil6210, and a handful of other bits here and there...
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ever since raw_probe_proto_opt was added it had the problem of
causing the user iov to be read twice, once during the probe for
the protocol header and once again in ip_append_data.
This is a potential security problem since it means that whatever
we're probing may be invalid. This patch plugs the hole by
firstly advancing the iov so we don't read the same spot again,
and secondly saving what we read the first time around for use
by ip_append_data.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function raw_probe_proto_opt tries to extract the first two
bytes from the user input in order to seed the IPsec lookup for
ICMP packets. In doing so it's processing iovec by hand and
overcomplicating things.
This patch replaces the manual iovec processing with a call to
memcpy_fromiovecend.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
that can cause a kernel crash. I'm not sure it's remotely
exploitable, but it's an important fix nonetheless.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJUYJNkAAoJEDBSmw7B7bqrgooP/ivTw/pJJ5i5wLsfAUUNV1tI
FDe3+w/kBsD0d0lNsnKoyq58iORyYIxHtmLhvFV+yuBMvpfADTNm9CkLDDjg939m
mU6/kjO6wbHv/SoLSL4rss/AggGaEsu+3ggGpmhXl2JApSUIVjw0z9XixNcU4Vnz
CEjIw9jsnKA5UOeH2cv+fwQLXImXKSsofAf2NgZ3zJ6Ci0zLOZtr/ZmO2o4IU9Do
HWp3CHsaehoWlaqtPx6r8JaodE7XPEPtim66aWoUaeSYueL9iYKJU7BS2jkTWJnQ
Kv8JuVQIQPlMsTuvTR8oHvpHZ29m0Z6mc8XsKeNu6UCNzoOBGS2Ak2YZwcAeqSLr
oQRDDBc4aKsY5YWcaChPx6N8gOEXRr4cQgfyY79o5vaPPJwixxLMPUUOU1rKHC3D
E61YfQ/tfoFWzEitksX14P3Gynas6abySku5mhM5y2L/1XFdDHugZjzRjcBv/1ry
J9x44k2EWbduvz8GBGGethmKMghKYYTqfB5rWh1cKxafgBEFt61ZcE4LuqaNnP17
sg+IlXFxZjJLnhhg0YL2oZt+++/CyCLl9G+mx6wZE4HLqsl26ElZ6ql2kekJ5baV
HKsPyBa2r48icJfrogA9WpjumI8b19ztG5OrsF0AQmK77IVSj80JFVnsTBal9/Y7
Ao2xMau9OghJq8AEJpOi
=H7HX
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-john-2014-11-10' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg <johannes@sipsolutions.net> says:
"This has just one fix, for an issue with the CCMP decryption
that can cause a kernel crash. I'm not sure it's remotely
exploitable, but it's an important fix nonetheless."
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Tuning coalescing parameters on NIC can be really hard.
Servers can handle both bulk and RPC like traffic, with conflicting
goals : bulk flows want as big GRO packets as possible, RPC want minimal
latencies.
To reach big GRO packets on 10Gbe NIC, one can use :
ethtool -C eth0 rx-usecs 4 rx-frames 44
But this penalizes rpc sessions, with an increase of latencies, up to
50% in some cases, as NICs generally do not force an interrupt when
a packet with TCP Push flag is received.
Some NICs do not have an absolute timer, only a timer rearmed for every
incoming packet.
This patch uses a different strategy : Let GRO stack decides what do do,
based on traffic pattern.
Packets with Push flag wont be delayed.
Packets without Push flag might be held in GRO engine, if we keep
receiving data.
This new mechanism is off by default, and shall be enabled by setting
/sys/class/net/ethX/gro_flush_timeout to a value in nanosecond.
To fully enable this mechanism, drivers should use napi_complete_done()
instead of napi_complete().
Tested:
Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)
Without this feature, we send back about 305,000 ACK per second.
GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)
Setting a timer of 2000 nsec is enough to increase GRO packet sizes
and reduce number of ACK packets. (811/19.2 = 42)
Receiver performs less calls to upper stacks, less wakes up.
This also reduces cpu usage on the sender, as it receives less ACK
packets.
Note that reducing number of wakes up increases cpu efficiency, but can
decrease QPS, as applications wont have the chance to warmup cpu caches
doing a partial read of RPC requests/answers if they fit in one skb.
B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average: eth0 811269.80 305732.30 1199462.57 19705.72 0.00
0.00 0.50
B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average: eth0 811577.30 19230.80 1199916.51 1239.80 0.00
0.00 0.50
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When transferring from the original range in nf_nat_masquerade_{ipv4,ipv6}()
we copy over values from stack in from min_proto/max_proto due to uninitialized
range variable in both, nft_masq_{ipv4,ipv6}_eval. As we only initialize
flags at this time from nft_masq struct, just zero out the rest.
Fixes: 9ba1f726be ("netfilter: nf_tables: add new nft_masq expression")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Allow setting bandwidth related regulatory flags. These flags are mapped
to the corresponding channel flags in the specified range.
Make sure the new flags are consulted when calculating the maximum
bandwidth allowed by a regulatory-rule.
Also allow propagating the GO_CONCURRENT modifier from a reg-rule to a
channel.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Radiotap vendor namespace data might still be useful, but we
reverted it because it used too much space in the RX status.
Put it back, but address the space problem by using a single
bit only and putting everything else into the skb->data.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For multi-vif channel switches, we want to send
NL80211_CMD_CH_SWITCH_NOTIFY to the userspace to let it decide whether
other interfaces need to be moved as well. This is needed when we
want a P2P GO interface to follow the channel of a station, for
example.
Modify the code so that all interfaces can send CSA notifications.
Additionally, send notifications for STA CSA as well.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Send a channel switch notification to userspace when a channel switch
is requested or when we react to a remote CSA.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add a new NL80211_CH_SWITCH_STARTED_NOTIFY message that can be sent to
the userspace when a channel switch process has started. This allows
userspace to take action, for instance, by requesting other interfaces
to switch channel as necessary.
This patch introduces a function that allows the drivers to send this
notification. It should be used when the driver starts processing a
channel switch initiated by a remote device (eg. when a STA receives a
CSA from the AP) and when it successfully starts a userspace-triggered
channel switch (eg. when hostapd triggers a channel swith in the AP).
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The device_timestamp value was left out of the event trace for
drv_pre_channel_switch by mistake. Add it.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There was a mistake when merging commit 6d027bcc (mac80211: add
pre_channel_switch driver operation) for upstream. The assignment of
the values in the ch_switch structure came below the call to
drv_pre_channel_switch. Fix the order.
Fixes: 6d027bcc (mac80211: add pre_channel_switch driver operation)
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This new flag is useful for suppressing error logging while probing
for datapath features using flow commands. For backwards
compatibility reasons the commands are executed normally, but error
logging is suppressed.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
struct dp_upcall_info has pointer to pkt_key which is already
available in OVS_CB. This also simplifies upcall handling
for gso packet.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
OVS need to flow key for flow lookup in recic action. OVS
does key extract in recic action. Most of cases we could
use OVS_CB packet key directly and can avoid packet flow key
extract. SET action we can update flow-key along with packet
to keep it consistent. But there are some action like MPLS
pop which forces OVS to do flow-extract. In such cases we
can mark flow key as invalid so that subsequent recirc
action can do full flow extract.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
OVS vswitch has extended IPFIX exporter to export tunnel headers
to improve network visibility.
To export this information userspace needs to know egress tunnel
for given packet. By extending packet attributes datapath can
export egress tunnel info for given packet. So that userspace
can ask for egress tunnel info in userspace action. This
information is used to build IPFIX data for given flow.
Signed-off-by: Wenyu Zhang <wenyuz@vmware.com>
Acked-by: Romain Lenglet <rlenglet@vmware.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
vport can be compiled as modules, therefore openvswitch needs
to export few symbols. Export them as GPL symbols.
CC: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
This patch adds a netif_running check while trying to change the address
attributes via ioctl. While netif_running is true these attributes
should be only readable.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a hacked solution for an interface dump with a running
lowpan interface. This will crash because lowpan and wpan interface use
the same arphdr. To change the arphdr will change the UAPI, this patch
checks on mtu which should on lowpan interface always different than
IEEE802154_MTU.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds rtnl lock hold mechanism while accessing wpan_dev
attributes. Furthermore these attributes should be protected by rtnl
lock and netif_running only.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes the setting of dev_addr on a monitor device. This
address should be zero. A monitor should only sniff and send raw frames
out. The address should be never used by upper layers and receiving
frame parsing.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for wpan_dev dump via nl802154 framework.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for dumping wpan_phy attributes via nl802154.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a basic nl802154 framework. Most of this code was
grabbed from nl80211 framework.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds new sysfs entries for wpan_phy index and name. This
needed for the new 802.15.4 userspace tool.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a wpan_dev_list list into cfg802154_registered_device
struct. Also adding new wpan_dev into this list while
cfg802154_netdev_notifier_call. This behaviour is mostly grab from
wireless core.c implementation and is needed for preparing nl802154
framework.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds an iftype argument to the wpan_dev. This is needed to
get the interface type from netdev ieee802154_ptr. The subif data struct
can only accessible in mac802154 branch.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a new cfg802154_rdev_list to remember all registered
cfg802154_registered_device structs. This is needed to prepare the
upcomming nl802154 framework.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch renames the wpan_phy_alloc function to wpan_phy_new. This
naming convention is like wireless and "wiphy_new" function.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes the mac_params from subif data struct. Instead we
manipulate the wpan attributes directly.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves all mac pib attributes into the wpan_dev struct.
Furthermore we can easier access these attributes over the netdev
802154_ptr pointer. Currently this is only possible over a complicated
callback structure in mac802154 because subif data structure is
accessable inside mac802154 only.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This allows you to filter traffic by process control group (cgroup).
Signed-off-by: Ana Rey <anarey@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Remove the dependency on the "warning" sysctl (net_msg_warn)
which is only used by the LIMIT_NETDEBUG macro.
Convert the LIMIT_NETDEBUG use in DCCP_WARN to the more
common net_warn_ratelimited mechanism.
This still ratelimits based on the net_ratelimit()
function, but removes the check for the sysctl.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reverts commit:
a7807d73 ("Bluetooth: 6lowpan: Avoid memory leak if memory allocation
fails")
which was wrong suggested by Alexander Aring. The function skb_unshare
run also kfree_skb on failure.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 3.18.x
As NIC multicast filtering isn't perfect, and some platforms are
quite content to spew broadcasts, we should not trigger an event
for skb:kfree_skb when we do not have a match for such an incoming
datagram. We do though want to avoid sweeping the matter under the
rug entirely, so increment a suitable statistic.
This incorporates feedback from David L. Stevens, Karl Neiss and Eric
Dumazet.
V3 - use bool per David Miller
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that both macvtap and tun are using skb_copy_datagram_iter, we
can kill the abomination that is skb_copy_datagram_const_iovec.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds skb_copy_datagram_iter, which is identical to
skb_copy_datagram_iovec except that it operates on iov_iter
instead of iovec.
Eventually all users of skb_copy_datagram_iovec should switch
over to iov_iter and then we can remove skb_copy_datagram_iovec.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless 2014-11-06
Please pull this batch of fixes intended for the 3.18 stream...
For the mac80211 bits, Johannes says:
"This contains another small set of fixes for 3.18, these are all
over the place and most of the bugs are old, one even dates back
to the original mac80211 we merged into the kernel."
For the iwlwifi bits, Emmanuel says:
"I fix here two issues that are related to the firmware
loading flow. A user reported that he couldn't load the
driver because the rfkill line was pulled up while we
were running the calibrations. This was happening while
booting the system: systemd was restoring the "disable
wifi settings" and that raised an RFKILL interrupt during
the calibration. Our driver didn't handle that properly
and this is now fixed."
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar says:
====================
Open vSwitch
First two patches are related to OVS MPLS support. Rest of patches
are mostly refactoring and minor improvements to openvswitch.
v1-v2:
- Fix conflicts due to "gue: Remote checksum offload"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we ensure that the skb is freed on every error path in IPHC
decompression which makes it easy to introduce skb leaks. By centralising
the skb_free into the receive function it makes future decompression routines
easier to maintain. It does come at the expense of ensuring that the skb
passed into the decompression routine must not be copied.
Signed-off-by: Martin Townsend <mtownsend1973@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Commit 64ce207306 ("[NET]: Make NETDEBUG pure printk wrappers")
originally had these NETDEBUG printks as always emitting.
Commit a2a316fd06 ("[NET]: Replace CONFIG_NET_DEBUG with sysctl")
added a net_msg_warn sysctl to these NETDEBUG uses.
Convert these NETDEBUG uses to normal pr_info calls.
This changes the output prefix from "ESP: " to include
"IPSec: " for the ipv4 case and "IPv6: " for the ipv6 case.
These output lines are now like the other messages in the files.
Other miscellanea:
Neaten the arithmetic spacing to be consistent with other
arithmetic spacing in the files.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These messages aren't useful as there's a generic dump_stack()
on OOM.
Neaten the comment and if test above the OOM by separating the
assign in if into an allocation then if test.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the ports phys are connected to the switches internal MDIO bus,
we need to connect the phy to the slave netdev, otherwise
auto-negotiation etc, does not work.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 4bba3925 ("[PKT_SCHED]: Prefix tc actions with act_")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for tunnels with local or
remote wildcard endpoints. With this we get a
NBMA tunnel mode like we have it for ipv4 and
sit tunnels.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we need the IP6_TNL_F_CAP_XMIT capabiltiy to transmit
packets through an ipv6 tunnel. This capability is set when the
tunnel gets configured, based on the tunnel endpoint addresses.
On tunnels with wildcard tunnel endpoints, we need to do the
capabiltiy checking on a per packet basis like it is done in
the receive path.
This patch extends ip6_tnl_xmit_ctl() to take local and remote
addresses as parameters to allow for per packet capabiltiy
checking.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Opcodes in switch/case in hci_cmd_status_evt are not sorted
by value. This patch restores proper ordering.
Signed-off-by: Kuba Pawlak <kubax.t.pawlak@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If role switch was rejected by the controller and HCI Event: Command Status
returned with status "Command Disallowed" (0x0C) the flag
HCI_CONN_RSWITCH_PEND remains set. No further role switches are
possible as this flag prevents us from sending any new HCI Switch Role
requests and the only way to clear it is to receive a valid
HCI Event Switch Role.
This patch clears the flag if command was rejected.
2013-01-01 00:03:44.209913 < HCI Command: Switch Role (0x02|0x000b) plen 7
bdaddr BC:C6:DB:C4:6F:79 role 0x00
Role: Master
2013-01-01 00:03:44.210867 > HCI Event: Command Status (0x0f) plen 4
Switch Role (0x02|0x000b) status 0x0c ncmd 1
Error: Command Disallowed
Signed-off-by: Kuba Pawlak <kubax.t.pawlak@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Commit 7ec7c4a9a6 (mac80211: port CCMP to
cryptoapi's CCM driver) introduced a regression when decrypting empty
packets (data_len == 0). This will lead to backtraces like:
(scatterwalk_start) from [<c01312f4>] (scatterwalk_map_and_copy+0x2c/0xa8)
(scatterwalk_map_and_copy) from [<c013a5a0>] (crypto_ccm_decrypt+0x7c/0x25c)
(crypto_ccm_decrypt) from [<c032886c>] (ieee80211_aes_ccm_decrypt+0x160/0x170)
(ieee80211_aes_ccm_decrypt) from [<c031c628>] (ieee80211_crypto_ccmp_decrypt+0x1ac/0x238)
(ieee80211_crypto_ccmp_decrypt) from [<c032ef28>] (ieee80211_rx_handlers+0x870/0x1d24)
(ieee80211_rx_handlers) from [<c0330c7c>] (ieee80211_prepare_and_rx_handle+0x8a0/0x91c)
(ieee80211_prepare_and_rx_handle) from [<c0331260>] (ieee80211_rx+0x568/0x730)
(ieee80211_rx) from [<c01d3054>] (__carl9170_rx+0x94c/0xa20)
(__carl9170_rx) from [<c01d3324>] (carl9170_rx_stream+0x1fc/0x320)
(carl9170_rx_stream) from [<c01cbccc>] (carl9170_usb_tasklet+0x80/0xc8)
(carl9170_usb_tasklet) from [<c00199dc>] (tasklet_hi_action+0x88/0xcc)
(tasklet_hi_action) from [<c00193c8>] (__do_softirq+0xcc/0x200)
(__do_softirq) from [<c0019734>] (irq_exit+0x80/0xe0)
(irq_exit) from [<c0009c10>] (handle_IRQ+0x64/0x80)
(handle_IRQ) from [<c000c3a0>] (__irq_svc+0x40/0x4c)
(__irq_svc) from [<c0009d44>] (arch_cpu_idle+0x2c/0x34)
Such packets can appear for example when using the carl9170 wireless driver
because hardware sometimes generates garbage when the internal FIFO overruns.
This patch adds an additional length check.
Cc: stable@vger.kernel.org
Fixes: 7ec7c4a9a6 ("mac80211: port CCMP to cryptoapi's CCM driver")
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Ronald Wahl <ronald.wahl@raritan.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
OVS does mask validation even if it does not need to convert
netlink mask attributes to mask structure. ovs_nla_get_match()
caller can pass NULL mask structure pointer if the caller does
not need mask. Therefore NULL check is required in SW_FLOW_KEY*
macros. Following patch does not convert mask netlink attributes
if mask pointer is NULL, so we do not need these checks in
SW_FLOW_KEY* macro.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Andy Zhou <azhou@nicira.com>
There are two separate API to allocate and copy actions list. Anytime
OVS needs to copy action list, it needs to call both functions.
Following patch moves action allocation to copy function to avoid
code duplication.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
The 'flow' memeber was chosen for removal because it's only used
in ovs_execute_actions() we can pass it as argument to this
function.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
If the internal device is not up, it should drop received
packets. Sometimes it receive the broadcast or multicast
packets, and the ip protocol stack will casue more cpu
usage wasted.
Signed-off-by: Chunhe Li <lichunhe@huawei.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Avoid recursive read_rcu_lock() by using the lighter weight
get_dp_rcu() API. Add proper locking assertions to get_dp().
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Split up ovs_flow_cmd_fill_info() to make it easier to cache parts of a
dump reply. This will be used to streamline flow_dump in a future patch.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
skb_clone() NULL check is implemented in do_output(), as past of the
common (fast) path. Refactoring so that NULL check is done in the
slow path, immediately after skb_clone() is called.
Besides optimization, this change also improves code readability by
making the skb_clone() NULL check consistent within OVS datapath
module.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
There are many possible ways that a flow can be invalid so we've
added logging for most of them. This adds logs for the remaining
possible cases so there isn't any ambiguity while debugging.
CC: Federico Iezzi <fiezzi@enter.it>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
These two cases used to be treated differently for IPv4/IPv6,
but they are now identical.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Ths simplifies flow-table-destroy API. No need to pass explicit
parameter about context.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
Allow datapath to recognize and extract MPLS labels into flow keys
and execute actions which push, pop, and set labels on packets.
Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.
Cc: Ravi K <rkerur@gmail.com>
Cc: Leo Alterman <lalterman@nicira.com>
Cc: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Device can export MPLS GSO support in dev->mpls_features same way
it export vlan features in dev->vlan_features. So it is safe to
remove NETIF_F_GSO_MPLS redundant flag.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
When filling netlink info, dport is being returned as flags. Fix
instances to return correct value.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It has been reported that generating an MLD listener report on
devices with large MTUs (e.g. 9000) and a high number of IPv6
addresses can trigger a skb_over_panic():
skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
dev:port1
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:100!
invalid opcode: 0000 [#1] SMP
Modules linked in: ixgbe(O)
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
[...]
Call Trace:
<IRQ>
[<ffffffff80578226>] ? skb_put+0x3a/0x3b
[<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
[<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
[<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
[<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
[<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
[<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
[<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
[<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
[<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
[<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70
mld_newpack() skb allocations are usually requested with dev->mtu
in size, since commit 72e09ad107 ("ipv6: avoid high order allocations")
we have changed the limit in order to be less likely to fail.
However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
macros, which determine if we may end up doing an skb_put() for
adding another record. To avoid possible fragmentation, we check
the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
assumption as the actual max allocation size can be much smaller.
The IGMP case doesn't have this issue as commit 57e1ab6ead
("igmp: refine skb allocations") stores the allocation size in
the cb[].
Set a reserved_tailroom to make it fit into the MTU and use
skb_availroom() helper instead. This also allows to get rid of
igmp_skb_size().
Reported-by: Wei Liu <lw1a2.jing@gmail.com>
Fixes: 72e09ad107 ("ipv6: avoid high order allocations")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: David L Stevens <david.stevens@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using a single fixed string is smaller code size than using
a format and many string arguments.
Reduces overall code size a little.
$ size net/ipv4/igmp.o* net/ipv6/mcast.o* net/ipv6/ip6_flowlabel.o*
text data bss dec hex filename
34269 7012 14824 56105 db29 net/ipv4/igmp.o.new
34315 7012 14824 56151 db57 net/ipv4/igmp.o.old
30078 7869 13200 51147 c7cb net/ipv6/mcast.o.new
30105 7869 13200 51174 c7e6 net/ipv6/mcast.o.old
11434 3748 8580 23762 5cd2 net/ipv6/ip6_flowlabel.o.new
11491 3748 8580 23819 5d0b net/ipv6/ip6_flowlabel.o.old
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ueki Kohei reported that when we are using NewReno with connections that
have a very low traffic, we may timeout the connection too early if a
second loss occurs after the first one was successfully acked but no
data was transfered later. Below is his description of it:
When SACK is disabled, and a socket suffers multiple separate TCP
retransmissions, that socket's ETIMEDOUT value is calculated from the
time of the *first* retransmission instead of the *latest*
retransmission.
This happens because the tcp_sock's retrans_stamp is set once then never
cleared.
Take the following connection:
Linux remote-machine
| |
send#1---->(*1)|--------> data#1 --------->|
| | |
RTO : :
| | |
---(*2)|----> data#1(retrans) ---->|
| (*3)|<---------- ACK <----------|
| | |
| : :
| : :
| : :
16 minutes (or more) :
| : :
| : :
| : :
| | |
send#2---->(*4)|--------> data#2 --------->|
| | |
RTO : :
| | |
---(*5)|----> data#2(retrans) ---->|
| | |
| | |
RTO*2 : :
| | |
| | |
ETIMEDOUT<----(*6)| |
(*1) One data packet sent.
(*2) Because no ACK packet is received, the packet is retransmitted.
(*3) The ACK packet is received. The transmitted packet is acknowledged.
At this point the first "retransmission event" has passed and been
recovered from. Any future retransmission is a completely new "event".
(*4) After 16 minutes (to correspond with retries2=15), a new data
packet is sent. Note: No data is transmitted between (*3) and (*4).
The socket's timeout SHOULD be calculated from this point in time, but
instead it's calculated from the prior "event" 16 minutes ago.
(*5) Because no ACK packet is received, the packet is retransmitted.
(*6) At the time of the 2nd retransmission, the socket returns
ETIMEDOUT.
Therefore, now we clear retrans_stamp as soon as all data during the
loss window is fully acked.
Reported-by: Ueki Kohei
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This encapsulates all of the skb_copy_datagram_iovec() callers
with call argument signature "skb, offset, msghdr->msg_iov, length".
When we move to iov_iters in the networking, the iov_iter object will
sit in the msghdr.
Having a helper like this means there will be less places to touch
during that transformation.
Based upon descriptions and patch from Al Viro.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add processing of the remote checksum offload option in both the normal
path as well as the GRO path. The implements patching the affected
checksum to derive the offloaded checksum.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add if_tunnel flag TUNNEL_ENCAP_FLAG_REMCSUM to configure
remote checksum offload on an IP tunnel. Add logic in gue_build_header
to insert remote checksum offload option.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new GSO type, SKB_GSO_TUNNEL_REMCSUM, which indicates remote
checksum offload being done (in this case inner checksum must not
be offloaded to the NIC).
Added logic in __skb_udp_tunnel_segment to handle remote checksum
offload case.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add functions and basic definitions for processing standard flags,
private flags, and control messages. This includes definitions
to compute length of optional fields corresponding to a set of flags.
Flag validation is in validate_gue_flags function. This checks for
unknown flags, and that length of optional fields is <= length
in guehdr hlen.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In __skb_udp_tunnel_segment if outer UDP checksums are enabled and
ip_summed is not already CHECKSUM_PARTIAL, set up checksum offload
if device features allow it.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move fou_build_header out of ip_tunnel.c and into fou.c splitting
it up into fou_build_header, gue_build_header, and fou_build_udp.
This allows for other users for TX of FOU or GUE. Change ip_tunnel_encap
to call fou_build_header or gue_build_header based on the tunnel
encapsulation type. Similarly, added fou_encap_hlen and gue_encap_hlen
functions which are called by ip_encap_hlen. New net/fou.h has
prototypes and defines for this.
Added NET_FOU_IP_TUNNELS configuration. When this is set, IP tunnels
can use FOU/GUE and fou module is also selected.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the af_ieee802154 defines and use the
IEEE802154_EXTENDED_ADDR_LEN. We should do this everywhere in the
802.15.4 subsystem because af_ieee802154 should be normally an uapi
header.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adding support for a perm extended address. This is useful
when a device supports an eeprom with a programmed static extended address.
If a device doesn't support such eeprom or serial registers then the
driver should generate a random extended address.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch cleanups the ieee802154_be64_to_le64 to have a similar
function like ieee802154_le64_to_be64 only with switched source and
destionation types.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds an ieee802154_vif similar like the ieee80211_vif which
holds the interface type and maybe further more attributes like the
ieee80211_vif structure.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a default interface registration for a wpan interface
type. Currently the 802.15.4 subsystem need to call userspace tools to
add an interface. This patch is like mac80211 handling for registration
a station interface type by default.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes the get_phy callback from mlme ops structure. Instead
we doing a dereference via ieee802154_ptr dev pointer. For backwards
compatibility we need to run get_device after dereference wpan_phy via
ieee802154_ptr.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch meld mac802154_netdev_register into ieee802154_if_add
function. Also we have now only one alloc_netdev call with one interface
setup routine "ieee802154_if_setup" instead two different one for each
interface type. This patch checks via runtime the interface type and do
different handling now. Additional we add the wpan_dev struct in
ieee802154_sub_if_data and set the new ieee802154_ptr while netdev
registration. This behaviour is very similar the mac80211 netdev
registration functionality.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves the dev_hold call inside of nl-phy ieee802154_add_iface
function. The ieee802154_add_iface is the only one function which use the
ieee802154_if_add function and contains the corresponding dev_put call.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves and renames the mac802154_add_iface and
mac802154_netdev_register functions into iface.c. The function
mac802154_add_iface is renamed to ieee802154_if_add which is a similar naming
convention like mac80211.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves and rename the mac802154_del_iface function into
iface.c and rename the function to ieee802154_if_remove which is a similar
naming convention like mac80211.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The include/net/nl802154.h file contains a lot of prototypes which are
not used inside of ieee802154 subsystem. This patch removes this file
and make the only one used prototype "ieee802154_nl_start_confirm" as
static declaration in ieee802154/nl-mac.c
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch reworks the wpan_phy index incrementation. It's now similar
like wireless wiphy index incrementation. We move the wpan_phy index
attribute inside of cfg802154_registered_device and use atomic
operations instead locking mechanism via wpan_phy_mutex.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The pernet ops aren't ever unregistered, which causes a memory
leak and an OOPs if the module is ever reinserted.
Fixes: 0b5e8b8eea ("net: Add Geneve tunneling protocol driver")
CC: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geneve does not currently set the inner protocol type when
transmitting packets. This causes GSO segmentation to fail on NICs
that do not support Geneve offloading.
CC: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The return value of seq_printf() is soon to be removed. Remove the
checks from seq_printf() in favor of seq_has_overflowed().
Link: http://lkml.kernel.org/r/20141104142236.GA10239@salvia
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Since adding a new function to seq_file (seq_has_overflowed())
there isn't any value for functions called from seq_show to
return anything. Remove the int returns of the various
print_tuple/<foo>_print_tuple functions.
Link: http://lkml.kernel.org/p/f2e8cf8df433a197daa62cbaf124c900c708edc7.1412031505.git.joe@perches.com
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The seq_printf() and friends are having their return values removed.
The print_conntrack() returns the result of seq_printf(), which is
meaningless when seq_printf() returns void. Might as well remove the
return values of print_conntrack() as well.
Link: http://lkml.kernel.org/r/20141029220107.465008329@goodmis.org
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The "else" block is on several lines and use bracket.
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
following:
* large mac80211-hwsim changes from Ben, Jukka and a bit myself
* OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
University in Prague and Volkswagen Group Research
* minstrel VHT work from Karl
* more CSA work from Luca
* WMM admission control support in mac80211 (myself)
* various smaller fixes, spelling corrections, and minor API additions
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJUWMs/AAoJEDBSmw7B7bqrmBQQAIbfAe7wH1WifRtOnhw3zWQQ
K36+Edf3HlQ+EIkSs63QousRj2e7pGDOyhzMWLaqsmeTLteUtlGbr7qwiJO1QZdf
Ml2V5O2s+b8hUIClDBVQF2L6+GGUmRUdQqvDDhkN1guoxD/Nk8cNtsRkSdiXWJWy
R48NzvYDflBhc8uqPtR8jDb10eM3c00YP9HB+w9hYAfizD+FRue7UNp4MQIqwp9V
HdKRT6L2n/6QA+Mzse0rMDes5qI7nIUNgj+hjqgJSnhITPMgGR5j/pitnVHrr81M
ngOipBFG3svsQrwZh8nM4Llp0cM4Gs+GlgCieu9+TJpr2sY00Z3kYcp0pxtDoSxz
Wblqz9n/bnW9mrkEfl12XqwwT5vguchwHoZ9cXhejDxSawWXoTRx20uW4ahO8ArA
kWwwjTBVsQ5WMCtOBiqggzNKghwCc2ILmcZnjGdg9aNXcWsmQ4vyeCfG2QxBz/UB
Grv/f9NSy6mzKQ34yv+lyR7rFZ8XcT03EVAnZSYz8X0ZZGxwtFupRp1RrBh1KPtD
TJoe6Q71FfHKYRJ2xgygYkQFo+r9d0BKBeerq+Vu2hBeaqyi4aUwSj7d1sUaaq6N
tL8fmAUqFjVOOUFeH1g07Xke5QD+yrEC7sJKkeRMfcRGB+dEa+2m3I5p4WDz9bWM
AEvFSsYr/I9KI4d1huXD
=6GIj
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-john-2014-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg <johannes@sipsolutions.net> says:
"This relatively large batch of changes is comprised of the
following:
* large mac80211-hwsim changes from Ben, Jukka and a bit myself
* OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
University in Prague and Volkswagen Group Research
* minstrel VHT work from Karl
* more CSA work from Luca
* WMM admission control support in mac80211 (myself)
* various smaller fixes, spelling corrections, and minor API additions"
Conflicts:
drivers/net/wireless/ath/wil6210/cfg80211.c
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch allows to set ECN on a per-route basis in case the sysctl
tcp_ecn is not set to 1. In other words, when ECN is set for specific
routes, it provides a tcp_ecn=1 behaviour for that route while the rest
of the stack acts according to the global settings.
One can use 'ip route change dev $dev $net features ecn' to toggle this.
Having a more fine-grained per-route setting can be beneficial for various
reasons, for example, 1) within data centers, or 2) local ISPs may deploy
ECN support for their own video/streaming services [1], etc.
There was a recent measurement study/paper [2] which scanned the Alexa's
publicly available top million websites list from a vantage point in US,
Europe and Asia:
Half of the Alexa list will now happily use ECN (tcp_ecn=2, most likely
blamed to commit 255cac91c3 ("tcp: extend ECN sysctl to allow server-side
only ECN") ;)); the break in connectivity on-path was found is about
1 in 10,000 cases. Timeouts rather than receiving back RSTs were much
more common in the negotiation phase (and mostly seen in the Alexa
middle band, ranks around 50k-150k): from 12-thousand hosts on which
there _may_ be ECN-linked connection failures, only 79 failed with RST
when _not_ failing with RST when ECN is not requested.
It's unclear though, how much equipment in the wild actually marks CE
when buffers start to fill up.
We thought about a fallback to non-ECN for retransmitted SYNs as another
global option (which could perhaps one day be made default), but as Eric
points out, there's much more work needed to detect broken middleboxes.
Two examples Eric mentioned are buggy firewalls that accept only a single
SYN per flow, and middleboxes that successfully let an ECN flow establish,
but later mark CE for all packets (so cwnd converges to 1).
[1] http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf, p.15
[2] http://ecn.ethz.ch/
Joint work with Daniel Borkmann.
Reference: http://thread.gmane.org/gmane.linux.network/335797
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function cookie_check_timestamp(), both called from IPv4/6 context,
is being used to decode the echoed timestamp from the SYN/ACK into TCP
options used for follow-up communication with the peer.
We can remove ECN handling from that function, split it into a separate
one, and simply rename the original function into cookie_decode_options().
cookie_decode_options() just fills in tcp_option struct based on the
echoed timestamp received from the peer. Anything that fails in this
function will actually discard the request socket.
While this is the natural place for decoding options such as ECN which
commit 172d69e63c ("syncookies: add support for ECN") added, we argue
that in particular for ECN handling, it can be checked at a later point
in time as the request sock would actually not need to be dropped from
this, but just ECN support turned off.
Therefore, we split this functionality into cookie_ecn_ok(), which tells
us if the timestamp indicates ECN support AND the tcp_ecn sysctl is enabled.
This prepares for per-route ECN support: just looking at the tcp_ecn sysctl
won't be enough anymore at that point; if the timestamp indicates ECN
and sysctl tcp_ecn == 0, we will also need to check the ECN dst metric.
This would mean adding a route lookup to cookie_check_timestamp(), which
we definitely want to avoid. As we already do a route lookup at a later
point in cookie_{v4,v6}_check(), we can simply make use of that as well
for the new cookie_ecn_ok() function w/o any additional cost.
Joint work with Daniel Borkmann.
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Was a bit more difficult to read than needed due to magic shifts;
add defines and document the used encoding scheme.
Joint work with Daniel Borkmann.
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver uses devm_gpiod_get_index(..., index) so that the index refers
directly to the GpioIo resource under the ACPI device. The problem with
this is that if the ordering changes we get wrong GPIOs.
With ACPI 5.1 _DSD we can now use names instead to reference GPIOs
analogous to Device Tree. However, we still have systems out there that do
not provide _DSD at all. These systems must be supported as well.
Luckily we now have acpi_dev_add_driver_gpios() that can be used to provide
mappings for systems where _DSD is not provided and still take advantage of
_DSD if it exists.
This patch changes the driver to create default GPIO mappings if we are
running on ACPI system.
While there we can drop the indices completely and use devm_gpiod_get()
with name instead.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: John W. Linville <linville@tuxdriver.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
over the place and most of the bugs are old, one even dates back
to the original mac80211 we merged into the kernel.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJUWJSYAAoJEDBSmw7B7bqr17IP/3RFbqI4S/ceZzrVNLtvGDUd
MIlkGhngDdhpFhSxTdOH4opFM/j9bkwndk0F35d4r94mCeB5eJQKrtUfeon/7aft
AKaRa3CNsEVQgCempCYOKGwlJZQQ86IL6IvU4CW5CTNHENUBLA83KHqX+6Aoumhm
mdJxhSzmB53Qn1bteIJXyJmOjgxQvBZggBIF/25Xnosb3FBH3hvPsH0qbIKZaicy
PlD5JWk9UseySjLNwk1/jriQ4koF5Dy/BVRyQ/0fRYswdmS3o2EiC4JOWjsOfIUi
NE9Ax+DAKvHHGYNcsX/hXsPJTc6fYgq3INEZBvnK04GHVFVGLq1WoEIfOeLugK7o
j7OIEJbkKAQjJSnEpB9Y6YHO/jPXEokJjUNT7VuZJqLElp4Hd8K9jnhKD9jkZBA6
TGjNO5NJqgGdlxnq3nu4+XFh9StAam6J1Ey1TWarc6Kxd8Gtg3Ymkj3cO46rHcQU
JX3i3RGlYqibEQ0NVtZ4EfnGjtcGx0Vbf+yAc9ZpWzKFvX9YKS1wuOd5i/eZI8bb
hxMjHFwmViV3Ifk9GjBNKioXkCpEfk9Q3pKzRllHQn56ueTu1mBvAfIe93PRm9kR
y/giIZvHEhs8VH2PHVuHzT16YMVnNfQniAi+BK73QWC3zAhj1ss3xN33+Q8FfpMM
xw/prlY9IAH2A9zis1Vz
=8uwd
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-john-2014-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg <johannes@sipsolutions.net> says:
"This contains another small set of fixes for 3.18, these are all
over the place and most of the bugs are old, one even dates back
to the original mac80211 we merged into the kernel."
Signed-off-by: John W. Linville <linville@tuxdriver.com>
else is unnecessary after return 0 in __udp4_lib_rcv()
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
remove __inline__ / inline and let compiler decide what to do
with static functions
Inspired-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
If you use RAW sockets the transport header offset is not set by the
ipv6 stack so when we get to the udp header compression it does not
compress the right part of the packet.
This patch adds a check for this scenario and sets the transport
header offset.
Signed-off-by: Simon Vincent <simon.vincent@xsilon.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Netlink command for nl80211_send_mpath() should be NL80211_CMD_NEW_MPATH.
Signed-off-by: Henning Rogge <henning.rogge@fkie.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Drivers might want to know also when mac80211 has
completed reconfiguring after resume (e.g. in order
to know when frames can be passed to mac80211).
Rename restart_complete() to a more-generic reconfig_complete(),
and add a new enum to indicate the reconfiguration type.
Update the current users with the new prototype.
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Deliver up to 128 frames during service period instead of 8 if
unlimited is specified by the client during association.
8 was just an arbitrary value; so is 128 since unlimited can
be any number.
However for large traffic bursts, increasing this value looks
reasonable. Also, it seems that a few certification tests
expect more frames to be delivered during SP.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When the RIC data element (RDE) is included in the IEs coming
from userspace for an association request, its handling is
currently broken as any IEs that are contained within it would
be split off from it and inserted again after all the IEs that
mac80211 generates (e.g. HT, VHT.)
To fix this, treat the RIC element specially, and stop after
it only when we find something that doesn't actually belong to
it. This assumes userspace is actually correctly building it,
directly after the fast BSS transition IE and before all the
others like extended capabilities.
This leaves as a potential problem the case where userspace is
building the following IEs:
[RDE] [vendor resource description] [vendor non-resource IE]
In this case, we'd erroneously consider all three IEs to be
part of the RIC data together, and not split them between the
two vendor IEs. Unfortunately, it isn't easily possible to
distinguish vendor IEs, so this isn't easy to fix. Luckily,
this case is rare as normally wpa_supplicant will include an
extended capabilities IE in the IEs, and that certainly will
break the two vendor IEs apart correctly.
Reviewed-by: Eliad Peller <eliad@wizery.com>
Reviewed-by: Beni Lev <beni.lev@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch adds 802.11p OCB (Outside the Context of a BSS) mode
support.
When communicating in OCB mode a mandatory wildcard BSSID
(48 '1' bits) is used.
The EDCA parameters handling function was changed to support
802.11p specific values.
The insertion of a newly discovered STAs is done in the similar way
as in the IBSS mode -- through the deferred insertion.
The OCB mode uses a periodic 'housekeeping task' for expiration of
disconnected STAs (in the similar manner as in the MESH mode).
New Kconfig option for verbose OCB debugging outputs is added.
Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch adds new iface type (NL80211_IFTYPE_OCB) representing
the OCB (Outside the Context of a BSS) mode.
When establishing a connection to the network a cfg80211_join_ocb
function is called (particular nl80211_command is added as well).
A mandatory parameters during the ocb_join operation are 'center
frequency' and 'channel width (5/10 MHz)'.
Changes done in mac80211 are minimal possible required to avoid
many warnings (warning: enumeration value 'NL80211_IFTYPE_OCB'
not handled in switch) during compilation. Full functionality
(where needed) is added in the following patch.
Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The configured tx power is often limited by hardware capabilities,
channel settings, antenna configuration, etc.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[fix tracing compilation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch fixes the following sparse warnings in rfcomm/core.c:
net/bluetooth/rfcomm/core.c:391:16: warning: dubious: x | !y
net/bluetooth/rfcomm/core.c:546:24: warning: dubious: x | !y
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
rtnl_lock_unregistering*() take rtnl_lock() -- a mutex -- inside a
wait loop. The wait loop relies on current->state to function, but so
does mutex_lock(), nesting them makes for the inner to destroy the
outer state.
Fix this using the new wait_woken() bits.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jerry Chu <hkchu@google.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: sfeldma@cumulusnetworks.com <sfeldma@cumulusnetworks.com>
Cc: stephen hemminger <stephen@networkplumber.org>
Cc: Tom Gundersen <teg@jklm.no>
Cc: Tom Herbert <therbert@google.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20141029173110.GE15602@worktop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
rfcomm_run() is a tad broken in that is has a nested wait loop. One
cannot rely on p->state for the outer wait because the inner wait will
overwrite it.
Fix this using the new wait_woken() facility.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Alexander Holler <holler@ahsoftware.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Joe Perches <joe@perches.com>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Libor Pechacek <lpechacek@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Vignesh Raman <Vignesh_Raman@mentor.com>
Cc: linux-bluetooth@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull ceph fixes from Sage Weil:
"There is a GFP flag fix from Mike Christie, an error code fix from
Jan, and fixes for two unnecessary allocations (kmalloc and workqueue)
from Ilya. All are well tested.
Ilya has one other fix on the way but it didn't get tested in time"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: eliminate unnecessary allocation in process_one_ticket()
rbd: Fix error recovery in rbd_obj_read_sync()
libceph: use memalloc flags for net IO
rbd: use a single workqueue for all devices
Yaogong replaces TCP out of order receive queue by an RB tree.
As netem already does a private skb->{next/prev/tstamp} union
with a 'struct rb_node', lets do this in a cleaner way.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise it gets overwritten by register_netdev().
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipip6_tunnel_init() sets the dev->iflink via a call to
ipip6_tunnel_bind_dev(). After that, register_netdevice()
sets dev->iflink = -1. So we loose the iflink configuration
for ipv6 tunnels. Fix this by using ipip6_tunnel_init() as the
ndo_init function. Then ipip6_tunnel_init() is called after
dev->iflink is set to -1 from register_netdevice().
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vti6_dev_init() sets the dev->iflink via a call to
vti6_link_config(). After that, register_netdevice()
sets dev->iflink = -1. So we loose the iflink configuration
for vti6 tunnels. Fix this by using vti6_dev_init() as the
ndo_init function. Then vti6_dev_init() is called after
dev->iflink is set to -1 from register_netdevice().
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6_tnl_dev_init() sets the dev->iflink via a call to
ip6_tnl_link_config(). After that, register_netdevice()
sets dev->iflink = -1. So we loose the iflink configuration
for ipv6 tunnels. Fix this by using ip6_tnl_dev_init() as the
ndo_init function. Then ip6_tnl_dev_init() is called after
dev->iflink is set to -1 from register_netdevice().
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net_rx_action() can mask irqs a single time to transfert sd->poll_list
into a private list, for a very short duration.
Then, napi_complete() can avoid masking irqs again,
and net_rx_action() only needs to mask irq again in slow path.
This patch removes 2 couples of irq mask/unmask per typical NAPI run,
more if multiple napi were triggered.
Note this also allows to give control back to caller (do_softirq())
more often, so that other softirq handlers can be called a bit earlier,
or ksoftirqd can be wakeup earlier under pressure.
This was developed while testing an alternative to RX interrupt
mitigation to reduce latencies while keeping or improving GRO
aggregation on fast NIC.
Idea is to test napi->gro_list at the end of a napi->poll() and
reschedule one NAPI poll, but after servicing a full round of
softirqs (timers, TX, rcu, ...). This will be allowed only if softirq
is currently serviced by idle task or ksoftirqd, and resched not needed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix:
net/bridge/netfilter/nft_reject_bridge.c:
In function 'nft_reject_br_send_v6_unreach':
net/bridge/netfilter/nft_reject_bridge.c:240:3:
error: implicit declaration of function 'csum_ipv6_magic'
csum_ipv6_magic(&nip6h->saddr, &nip6h->daddr,
^
make[3]: *** [net/bridge/netfilter/nft_reject_bridge.o] Error 1
Seen with powerpc:allmodconfig.
Fixes: 523b929d54 ("netfilter: nft_reject_bridge: don't use IP stack to reject traffic")
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to Management Interface API 'Start Discovery' command should
generate a Command Complete event on failure. Currently kernel is
sending Command Status on early errors. This results in userspace
ignoring such event due to invalid size.
bluetoothd[28499]: src/adapter.c:trigger_start_discovery()
bluetoothd[28499]: src/adapter.c:cancel_passive_scanning()
bluetoothd[28499]: src/adapter.c:start_discovery_timeout()
bluetoothd[28499]: src/adapter.c:start_discovery_complete() status 0x0a
bluetoothd[28499]: Wrong size of start discovery return parameters
Reported-by: Jukka Taimisto <jtt@codenomicon.com>
Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Upon receiving the last fragment, all but the first fragment
are freed, but the multicast check for statistics at the end
of the function refers to the current skb (the last fragment)
causing a use-after-free bug.
Since multicast frames cannot be fragmented and we check for
this early in the function, just modify that check to also
do the accounting to fix the issue.
Cc: stable@vger.kernel.org
Reported-by: Yosef Khyal <yosefx.khyal@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The debufs entry for the BR/EDR whitelist is confusing since there is
a controller debugfs entry with the name white_list and both are two
different things.
With the BR/EDR whitelist, the actual interface in use is the device
list and thus just include all values from the internal BR/EDR whitelist
in the device_list debugfs entry.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
After this commit, the attribute XFRMA_REPLAY_VAL is added when no ESN replay
value is defined. Thus sequence number values are always notified to userspace.
Signed-off-by: dingzhi <zhi.ding@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Running make C=2 occurs warning:
symbol 'mac802154_config_ops' was not declared. Should it be static?
This patch adds a missing include in cfg.c to solve this warning.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Running make C=2 occurs in warnings:
symbol 'wpan_phy_class' was not declared. Should it be static?
symbol 'wpan_phy_sysfs_init' was not declared. Should it be static?
symbol 'wpan_phy_sysfs_exit' wasnot declared. Should it be static?
This patch adds a missing include "sysfs.h" to solve these warnings.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The sk_prot is irda's own set of protocol handlers, so irda should
statically know what that function is anyway, without using an indirect
pointer. And as it happens, we know *exactly* what that pointer is
statically: it's NULL, because irda doesn't define a disconnect
operation.
So calling that function is doubly wrong, and will just cause an oops.
Reported-by: Martin Lang <mlg.hessigheim@gmail.com>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some Bluetooth drivers require to reset the upper stack. To avoid having
all drivers send HCI Hardware Error events, provide a generic function
to wrap the reset functionality.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The current kernel options do not make it clear which modules are for
Bluetooth Classic (BR/EDR) and which are for Bluetooth Low Energy (LE).
To make it really clear, introduce BT_BREDR and BT_LE options with
proper dependencies into the different modules. Both new options
default to y to not create a regression with previous kernel config
files.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When the HCI_Hardware_Error event is send by the controller or
injected by the driver, then at least print an error message.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When the HCI_Reset command returns, the status needs to be checked. It
is unlikely that HCI_Reset actually fails, but when it fails, it is a
bad idea to reset all values since the controller will have not reset
its values in that case.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Currently the ieee802154 6lowpan interface operates on wpan interfaces
only. Setting the wpan mac address over 6lowpan interface is complex and
maybe we can't never do this. This patch removes the set of mac address
handling in ieee802154 6lowpan interface for now.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch use the validation function to check if an extended address
is valid or not while set the extended address.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
All PHY attributes should be directly set to the transceiver after netlink.
MAC attributes should be set by interface up. Currently the macparams
netlink cmd contains mixed attributes of phy and mac settings. This patch
moves all phy settings to the netlink receive function for setting macparams.
This is the only way which doesn't change the userspace API and keep the
deprecated netlink interface alive.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves the setting of hardware panid address filtering
inside of interface up instead doing it it directly inside of netlink
interface. The netlink call which can only be called when netif isn't
running sets only the necessary panid value in sdata. After an
interface up the address filter will be set with this value.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves the setting of hardware short address filtering
inside of interface up instead doing it it directly inside of netlink
interface. The netlink call which can only be called when netif isn't
running sets only the necessary short_addr value in sdata. After an
interface up the address filter will be set with this value.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves the setting of hardware extended address filtering
inside of interface up instead doing it directly inside of netlink interface.
Also we don't need to set the sdata extended attribute in netlink. This
is already done by ndo_set_mac_address of net_device_ops.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch changes the actual behaviour for setting address attributes.
We should not change addresses while netif_running is true. Furthermore
when netif_running is running the address attributes becomes read only
and we can remove locking mechanism in receive and transmit hothpaths
of 802.15.4 subsystem.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes the wpan_phy callbacks for add and del an interface
on a phy. Instead we introduce deprecated cfg802154 callbacks for this.
Furthermore we introduce a new netlink interface nl802154 which use
different callbacks. The deprecated function is to have a backwards
compatibility with the current netlink interface.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch introduce a function to get the cfg802154_registered_device
from a wpan_phy.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch introduces mac802154_config_ops struct. Like wireless this
struct should be the only one interface between ieee802154 to mac802154
or possible HardMAC drivers.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch introduce the cfg802154_registered_device struct. Like
cfg80211_registered_device in wireless this should contain similar
functionality for cfg802154. This patch should not change any behaviour.
We just adds cfg802154_registered_device as container for wpan_phy struct.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes the default channel setting. A channel is always set
and there is no default channel setting according 802.15.4.
Drivers should set the default channel and page in probing routine. This
behaviour is currently a lack of all 802.15.4 drivers.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
__hci_cmd_sync_ev(), __hci_req_sync() could miss wake_up_interrupt from
hci_req_sync_complete() because hci_cmd_work() workqueue and its response
could be completed before they are ready to get the signal through
add_wait_queue(), set_current_state(TASK_INTERRUPTIBLE).
Signed-off-by: Chan-yeol Park <chanyeol.park@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Commit c27a3e4d66 ("libceph: do not hard code max auth ticket len")
while fixing a buffer overlow tried to keep the same as much of the
surrounding code as possible and introduced an unnecessary kmalloc() in
the unencrypted ticket path. It is likely to fail on huge tickets, so
get rid of it.
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
If a driver supports reading EEPROM but no EEPROM is installed in the system,
the driver's get_eeprom_len function returns 0. ethtool will subsequently
try to read that zero-length EEPROM anyway. If the driver does not support
EEPROM access at all, this operation will return -EOPNOTSUPP. If the driver
does support EEPROM access but no EEPROM is installed, the operation will
return -EINVAL. Return -EOPNOTSUPP in both cases for consistency.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kconfig already allows mpls to be built as module. Following patch
fixes Makefile to do same.
CC: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mpls gso handler needs to pull skb after segmenting skb.
CC: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
netfilter/ipvs fixes for net
The following patchset contains fixes for netfilter/ipvs. This round of
fixes is larger than usual at this stage, specifically because of the
nf_tables bridge reject fixes that I would like to see in 3.18. The
patches are:
1) Fix a null-pointer dereference that may occur when logging
errors. This problem was introduced by 4a4739d56b ("ipvs: Pull
out crosses_local_route_boundary logic") in v3.17-rc5.
2) Update hook mask in nft_reject_bridge so we can also filter out
packets from there. This fixes 36d2af5 ("netfilter: nf_tables: allow
to filter from prerouting and postrouting"), which needs this chunk
to work.
3) Two patches to refactor common code to forge the IPv4 and IPv6
reject packets from the bridge. These are required by the nf_tables
reject bridge fix.
4) Fix nft_reject_bridge by avoiding the use of the IP stack to reject
packets from the bridge. The idea is to forge the reject packets and
inject them to the original port via br_deliver() which is now
exported for that purpose.
5) Restrict nft_reject_bridge to bridge prerouting and input hooks.
the original skbuff may cloned after prerouting when the bridge stack
needs to flood it to several bridge ports, it is too late to reject
the traffic.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Most code avoids having a default case in interface type switch
statements already, to make it easier to find places that need
to be extended. Change the code in the __cfg80211_leave() and
nl80211_key_allowed() functions to not have a default case.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Restrict the reject expression to the prerouting and input bridge
hooks. If we allow this to be used from forward or any other later
bridge hook, if the frame is flooded to several ports, we'll end up
sending several reject packets, one per cloned packet.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If the packet is received via the bridge stack, this cannot reject
packets from the IP stack.
This adds functions to build the reject packet and send it from the
bridge stack. Comments and assumptions on this patch:
1) Validate the IPv4 and IPv6 headers before further processing,
given that the packet comes from the bridge stack, we cannot assume
they are clean. Truncated packets are dropped, we follow similar
approach in the existing iptables match/target extensions that need
to inspect layer 4 headers that is not available. This also includes
packets that are directed to multicast and broadcast ethernet
addresses.
2) br_deliver() is exported to inject the reject packet via
bridge localout -> postrouting. So the approach is similar to what
we already do in the iptables reject target. The reject packet is
sent to the bridge port from which we have received the original
packet.
3) The reject packet is forged based on the original packet. The TTL
is set based on sysctl_ip_default_ttl for IPv4 and per-net
ipv6.devconf_all hoplimit for IPv6.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
That can be reused by the reject bridge expression to build the reject
packet. The new functions are:
* nf_reject_ip6_tcphdr_get(): to sanitize and to obtain the TCP header.
* nf_reject_ip6hdr_put(): to build the IPv6 header.
* nf_reject_ip6_tcphdr_put(): to build the TCP header.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
That can be reused by the reject bridge expression to build the reject
packet. The new functions are:
* nf_reject_ip_tcphdr_get(): to sanitize and to obtain the TCP header.
* nf_reject_iphdr_put(): to build the IPv4 header.
* nf_reject_ip_tcphdr_put(): to build the TCP header.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use codespell to find spelling errors.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
UFO is now disabled on all drivers that work with virtio net headers,
but userland may try to send UFO/IPv6 packets anyway. Instead of
sending with ID=0, we should select identifiers on their behalf (as we
used to).
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 916e4cf46d ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
Signed-off-by: David S. Miller <davem@davemloft.net>
Some drivers are unable to perform TX completions in a bound time.
They instead call skb_orphan()
Problem is skb_fclone_busy() has to detect this case, otherwise
we block TCP retransmits and can freeze unlucky tcp sessions on
mostly idle hosts.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 1f3279ae0c ("tcp: avoid retransmits of TCP packets hanging in host queues")
Signed-off-by: David S. Miller <davem@davemloft.net>
Challenge ACK is described in RFC 5961, fix typo.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, skb_inner_network_header is used but this does not account
for Ethernet header for ETH_P_TEB. Use skb_inner_mac_header which
handles TEB and also should work with IP encapsulation in which case
inner mac and inner network headers are the same.
Tested: Ran TCP_STREAM over GRE, worked as expected.
Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes checkpatch warning:
"WARNING: Prefer seq_puts to seq_printf"
Signed-off-by: Michele Baldessari <michele@acksyn.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is often quite helpful to be able to know the state of a transport
outside of the application itself (for troubleshooting purposes or for
monitoring purposes). Add it under /proc/net/sctp/remaddr.
Signed-off-by: Michele Baldessari <michele@acksyn.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we cache them, the kernel will reuse them, independently of
whether forwarding is enabled or not. Which means that if forwarding is
disabled on the input interface where the first routing request comes
from, then that unreachable result will be cached and reused for
other interfaces, even if forwarding is enabled on them. The opposite
is also true.
This can be verified with two interfaces A and B and an output interface
C, where B has forwarding enabled, but not A and trying
ip route get $dst iif A from $src && ip route get $dst iif B from $src
Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only count packets that failed cookie-authentication.
We can get SYNCOOKIESFAILED > 0 while we never even sent a single cookie.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fallback device is in ipv6 mode by default.
The mode can not be changed in runtime, so there
is no way to decapsulate ip4in6 packets coming from
various sources without creating the specific tunnel
ifaces for each peer.
This allows to update the fallback tunnel device, but only
the mode could be changed. Usual command should work for the
fallback device: `ip -6 tun change ip6tnl0 mode any`
The fallback device can not be hidden from the packet receiver
as a regular tunnel, but there is no need for synchronization
as long as we do single assignment.
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexey Andriyanov <alan@al-an.info>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do assignment before if condition and test !skb like in rawv6_recvmsg()
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
remove __inline__ / inline and let compiler decide what to do
with static functions
Inspired-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Apply commit e0f36310f7
("ipx: remove unnecessary casting on ntohl")
to all seq_printf/08lX
Inspired-by: "David S. Miller" <davem@davemloft.net>
Inspired-by: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for reading switch registers with 'ethtool -d'.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
On some chips it is possible to access the switch eeprom.
Add infrastructure support for it.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some switches provide chip temperature data.
Add support for reporting it through the hwmon subsystem.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting skb->protocol to a private protocol type may result in warning
messages such as
e1000e 0000:00:19.0 em1: checksum_partial proto=dada!
This happens if the L3 protocol is IP or IPv6 and skb->ip_summed is set
to CHECKSUM_PARTIAL. Looking through the code, it appears that changing
skb->protocol for transmitted packets is not necessary and may actually
be harmful. For example, it prevents purposely unmodified (from a DSA
perspective) network drivers from properly setting up their transmit
checksum offload pointers since they inspect skb->protocol to set up the
IPv4 header or IPv6 header pointers. So don't unnecessarily change the
protocol field.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The internal representation of the LE white list needs to be cleared
when receiving a successful HCI_Reset command. A reset of the controller
is expected to start with an empty LE white list.
When the LE white list is not cleared on controller reset, the passive
background scanning might skip programming the remote devices. Only
changes to the LE white list are programmed when passive background
is started.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Cc: stable@vger.kernel.org # 3.17.x
This was accidentally changed from list_for_each_entry_safe() to
list_for_each_entry() so now it has a use after free bug. I've changed
it back.
Fixes: 9030582963 ('Bluetooth: 6lowpan: Converting rwlocks to use RCU')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Simon Horman says:
====================
The single patch in this series fixes some minor fallout from adding
support IPv6 real servers in IPv4 virtual-services and vice versa.
It should not have any run-time affect other than perhaps saving a few cycles.
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently, despite the comment right before the function,
nf_log_register allows registering two loggers on with the same type and
end up overwriting the previous register.
Not a real issue today as current tree doesn't have two loggers for the
same type but it's better to get this protected.
Also make sure that all of its callers do error checking.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Wrap up a common call pattern in an easier to handle call.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When an interface is deleted, an ongoing hardware scan is canceled and
the driver must abort the scan, at the very least reporting completion
while the interface is removed.
However, if it scheduled the work that might only run after everything
is said and done, which leads to cfg80211 warning that the scan isn't
reported as finished yet; this is no fault of the driver, it already
did, but mac80211 hasn't processed it.
To fix this situation, flush the delayed work when the interface being
removed is the one that was executing the scan.
Cc: stable@vger.kernel.org
Reported-by: Sujith Manoharan <sujith@msujith.org>
Tested-by: Sujith Manoharan <sujith@msujith.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch has ceph's lib code use the memalloc flags.
If the VM layer needs to write data out to free up memory to handle new
allocation requests, the block layer must be able to make forward progress.
To handle that requirement we use structs like mempools to reserve memory for
objects like bios and requests.
The problem is when we send/receive block layer requests over the network
layer, net skb allocations can fail and the system can lock up.
To solve this, the memalloc related flags were added. NBD, iSCSI
and NFS uses these flags to tell the network/vm layer that it should
use memory reserves to fullfill allcation requests for structs like
skbs.
I am running ceph in a bunch of VMs in my laptop, so this patch was
not tested very harshly.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
This patch adds basic support for monitor mode. Also change the open
call that we set the transceiver mac setting on an interface up. Futher
patches will add a better handling while interface up an interface.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds error handling after skb_clone and deliver only if
skb_clone was successful.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch replace the !netif_running(sdata->dev) instead we doing a
!ieee802154_sdata_running(sdata). Also move this in two separate if
branches to compare with mac80211 code.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds rx stats incrementation when the monitor interface
recevied a frame.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes netif_rx_ni call. Instead we call netif_receive_skb,
we can do that since commit c5c47e67bc
("mac802154: rx: use tasklet instead workqueue").
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes pkt_type set to PACKET_HOST while monitor receiving.
This should be PACKET_OTHERHOST on monitor mode which already set
before.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a new hardware flag which indicate that the transceiver
doesn't support check for bad checksum via hardware. Also add a handling of
this while receive.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch change the actual crc handling while receive. Currently the
IEEE802154_HW_RX_OMIT_CKSUM flag is used to filter a frame with a bad crc.
This patch changes the behaviour of IEEE802154_HW_RX_OMIT_CKSUM to add a
crc while receiving for the monitor interface. After monitor receiving
we remove the crc for frame parsing. This affect the driver layer
because all drivers sets IEEE802154_HW_RX_OMIT_CKSUM and deliver without
checksum.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes a not used parameter in ieee802154_deliver_skb.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch splits the IEEE802154_HW_OMIT_CKSUM hardware flag into
IEEE802154_HW_TX_OMIT_CKSUM and IEEE802154_HW_RX_OMIT_CKSUM. This is
useful to deliver the received crc from the driver layer to the monitor
interface. At the moment we can't do that without change the xmit
handling.
The received checksum should be visible in monitor mode only.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a new driver operation to bring the transceiver into
promiscuous mode.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes an unnecessary include of driver-ops header file.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In neigh_parms_release() we loop over all entries to find the entry given in
argument and being able to remove it from the list. By using a double linked
list, we can avoid this loop.
Here are some numbers with 30 000 dummy interfaces configured:
Before the patch:
$ time rmmod dummy
real 2m0.118s
user 0m0.000s
sys 1m50.048s
After the patch:
$ time rmmod dummy
real 1m9.970s
user 0m0.000s
sys 0m47.976s
Suggested-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
napi_schedule() can be called from any context and has to mask hard
irqs.
Add a variant that can only be called from hard interrupts handlers
or when irqs are already masked.
Many NIC drivers can use it from their hard IRQ handler instead of
generic variant.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The WARN_ON in inet_evict_bucket can be triggered by a valid case:
inet_frag_kill and inet_evict_bucket can be running in parallel on the
same queue which means that there has been at least one more ref added
by a previous inet_frag_find call, but inet_frag_kill can delete the
timer before inet_evict_bucket which will cause the WARN_ON() there to
trigger since we'll have refcnt!=1. Now, this case is valid because the
queue is being "killed" for some reason (removed from the chain list and
its timer deleted) so it will get destroyed in the end by one of the
inet_frag_put() calls which reaches 0 i.e. refcnt is still valid.
CC: Florian Westphal <fw@strlen.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McLean <chutzpah@gentoo.org>
Fixes: b13d3cbfb8 ("inet: frag: move eviction of queues to work queue")
Reported-by: Patrick McLean <chutzpah@gentoo.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the evictor is running it adds some chosen frags to a local list to
be evicted once the chain lock has been released but at the same time
the *frag_queue can be running for some of the same queues and it
may call inet_frag_kill which will wait on the chain lock and
will then delete the queue from the wrong list since it was added in the
eviction one. The fix is simple - check if the queue has the evict flag
set under the chain lock before deleting it, this is safe because the
evict flag is set only under that lock and having the flag set also means
that the queue has been detached from the chain list, so no need to delete
it again.
An important note to make is that we're safe w.r.t refcnt because
inet_frag_kill and inet_evict_bucket will sync on the del_timer operation
where only one of the two can succeed (or if the timer is executing -
none of them), the cases are:
1. inet_frag_kill succeeds in del_timer
- then the timer ref is removed, but inet_evict_bucket will not add
this queue to its expire list but will restart eviction in that chain
2. inet_evict_bucket succeeds in del_timer
- then the timer ref is kept until the evictor "expires" the queue, but
inet_frag_kill will remove the initial ref and will set
INET_FRAG_COMPLETE which will make the frag_expire fn just to remove
its ref.
In the end all of the queue users will do an inet_frag_put and the one
that reaches 0 will free it. The refcount balance should be okay.
CC: Florian Westphal <fw@strlen.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McLean <chutzpah@gentoo.org>
Fixes: b13d3cbfb8 ("inet: frag: move eviction of queues to work queue")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Patrick McLean <chutzpah@gentoo.org>
Tested-by: Patrick McLean <chutzpah@gentoo.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a sysctl that causes an interface's optimistic addresses
to be considered equivalent to other non-deprecated addresses
for source address selection purposes. Preferred addresses
will still take precedence over optimistic addresses, subject
to other ranking in the source address selection algorithm.
This is useful where different interfaces are connected to
different networks from different ISPs (e.g., a cell network
and a home wifi network).
The current behaviour complies with RFC 3484/6724, and it
makes sense if the host has only one interface, or has
multiple interfaces on the same network (same or cooperating
administrative domain(s), but not in the multiple distinct
networks case.
For example, if a mobile device has an IPv6 address on an LTE
network and then connects to IPv6-enabled wifi, while the wifi
IPv6 address is undergoing DAD, IPv6 connections will try use
the wifi default route with the LTE IPv6 address, and will get
stuck until they time out.
Also, because optimistic nodes can receive frames, issue
an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC
flag appropriately set). A second RTM_NEWADDR is sent if DAD
completes (the address flags have changed), otherwise an
RTM_DELADDR is sent.
Also: add an entry in ip-sysctl.txt for optimistic_dad.
Signed-off-by: Erik Kline <ek@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While testing upcoming Yaogong patch (converting out of order queue
into an RB tree), I hit the max reordering level of linux TCP stack.
Reordering level was limited to 127 for no good reason, and some
network setups [1] can easily reach this limit and get limited
throughput.
Allow a new max limit of 300, and add a sysctl to allow admins to even
allow bigger (or lower) values if needed.
[1] Aggregation of links, per packet load balancing, fabrics not doing
deep packet inspections, alternative TCP congestion modules...
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch generalizes commit d6a4a10411 ("tcp: GSO should be TSQ
friendly") to protocols using skb_set_owner_w()
TCP uses its own destructor (tcp_wfree) and needs a more complex scheme
as explained in commit 6ff50cd555 ("tcp: gso: do not generate out of
order packets")
This allows UDP sockets using UFO to get proper backpressure,
thus avoiding qdisc drops and excessive cpu usage.
Here are performance test results (macvlan on vlan):
- Before
# netperf -t UDP_STREAM ...
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 65507 60.00 144096 1224195 1258.56
212992 60.00 51 0.45
Average: CPU %user %nice %system %iowait %steal %idle
Average: all 0.23 0.00 25.26 0.08 0.00 74.43
- After
# netperf -t UDP_STREAM ...
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 65507 60.00 109593 0 957.20
212992 60.00 109593 957.20
Average: CPU %user %nice %system %iowait %steal %idle
Average: all 0.18 0.00 8.38 0.02 0.00 91.43
[edumazet] Rewrote patch and changelog.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NetworkManager might want to know that it changed when the router advertisement
arrives.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Vijay Subramanian <vijaynsu@cisco.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
The helper function can't ever create negative values, so use
u32 pointers as the function arguments as the caller does.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
radar_detect_width is unused since commit 97dc94f1d9
("cfg80211: remove channel_switch combination check")
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Due to the time it takes to process the beacon that started the CSA
process, we may be late for the switch if we try to reach exactly
beacon 0. To avoid that, use count - 1 when calculating the switch time.
Cc: stable@vger.kernel.org
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If we are switching from an HT40+ to an HT40- channel (or vice-versa),
we need the secondary channel offset IE to specify what is the
post-CSA offset to be used. This applies both to beacons and to probe
responses.
In ieee80211_parse_ch_switch_ie() we were ignoring this IE from
beacons and using the *current* HT information IE instead. This was
causing us to use the same offset as before the switch.
Fix that by using the secondary channel offset IE also for beacons and
don't ever use the pre-switch offset. Additionally, remove the
"beacon" argument from ieee80211_parse_ch_switch_ie(), since it's not
needed anymore.
Cc: stable@vger.kernel.org
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The m->pool_to[] array has "maxpools" number of elements. It's
allocated in svc_pool_map_alloc_arrays() which we called earlier in the
function. This test should be >= instead of >.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Userspace can add keys to an AP mode interface before start_ap has been
called. If there have been no calls to start_ap/stop_ap in the mean
time, the keys will still be around when the interface is brought down.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[adjust comments, fix AP_VLAN case]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Use spin_lock_bh() as the code is called from softirq in networking subsystem.
This is needed to prevent deadlocks when 6lowpan link is in use.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves the sysfs handling in a own file. This is like wireless
sysfs file handling.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch cleanups the open_count variable increment in open and close
calls of netdev.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
These functions can be static in mac_cmd file.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
These channel attributes was part of "channel context switch while xmit"
which was removed by commit dc67c6b30f
("mac802154: tx: remove xmit channel context switch"). This patch
removes these unnecessary variables and use the current_page and
current_channel by wpan_phy struct now.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
These variables should already be zero, so we remove the extra assign to
zero.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds synchronization handling in start and stop driver ops
calls. This patch is mostly grab from mac80211 which was introduced by
commit ea77f12f2c ("mac80211: remove
tasklet enable/disable"). This is to be sure that we don't run into same
issues.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes the current handling of started boolean. This is
actually dead code, because mac802154_netdev_register can't never be
called before ieee802154_register_hw. This means that local->started is
always be true when mac802154_netdev_register is called. Instead we
using this now like mac80211 to indicate that an instance of sdata is
running.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This variable should be handled like ieee80211_local struct of mac80211.
We rename this variable to started now to have the same name convention.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch reworks the handling for setting the state like mac80211. We
use bit's instead a bool variable. The mutex is not needed because it use
test and set bits which are atomic operations.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes the driver ops callbacks inside of wpan_phy struct.
It was used to check if a phy supports this driver ops call. We do this
now via hardware flags.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch replaces all directly called driver ops by previous
introduced driver-ops function wrappers.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch introduce a driver-ops header file with function wrappers to
call the driver ops. These wrappers checking on right context
information and warn if optional driver ops are called when these aren't
implemented. This behaviour is like mac80211 driver-ops header file,
just without function tracing calls.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The ieee802154_ops structure should be never changed during runtime.
This patch declare this structure as const to avoid a runtime change.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
These functions can be static inside the iface file, because it's not
used anywhere else. This patch moves these functions into iface file.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes the monitor implementation file and put all monitor
stuff into iface file. It's now small enough to put all necessary
handling into iface.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
These days we allow simultaneous LE scanning and advertising. Checking
for whether advertising is enabled or not is therefore not a reliable
way to determine whether directed advertising was used to trigger the
connection creation. The appropriate place to check (instead of the hdev
context) is the connection role that's stored in the hci_conn. This
patch fixes such a check in le_conn_timeout() which could otherwise lead
to incorrect HCI commands being sent.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 3.16.x
The le_conn_timeout() may call hci_le_conn_failed() which in turn may
call hci_conn_del(). Trying to use the _sync variant for cancelling the
conn timeout from hci_conn_del() could therefore result in a deadlock.
This patch converts hci_conn_del() to use the non-sync variant so the
deadlock is not possible.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 3.16.x
ERROR: "lockdep_ovsl_is_held" [net/openvswitch/vport-gre.ko] undefined!
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The original motivation for this change was to allow the helper to be used
in files other than actions.c as part of work on an odp select group
action.
It was as pointed out by Thomas Graf that this helper would be best off
living in netlink.h. Furthermore, I think that the generic nature of this
helper means it is best off in netlink.h regardless of if it is used more
than one .c file or not. Thus, I would like it considered independent of
the work on an odp select group action.
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless 2014-10-28
Please pull this batch of fixes intended for the 3.18 stream!
For the mac80211 bits, Johannes says:
"Here are a few fixes for the wireless stack: one fixes the
RTS rate, one for a debugfs file, one to return the correct
channel to userspace, a sanity check for a userspace value
and the remaining two are just documentation fixes."
For the iwlwifi bits, Emmanuel says:
"I revert here a patch that caused interoperability issues.
dvm gets a fix for a bug that was reported by many users.
Two minor fixes for BT Coex and platform power fix that helps
reducing latency when the PCIe link goes to low power states."
In addition...
Felix Fietkau adds a couple of ath code fixes related to regulatory
rule enforcement.
Hauke Mehrtens fixes a build break with bcma when CONFIG_OF_ADDRESS
is not set.
Karsten Wiese provides a trio of minor fixes for rtl8192cu.
Kees Cook prevents a potential information leak in rtlwifi.
Larry Finger also brings a trio of minor fixes for rtlwifi.
Rafał Miłecki adds a device ID to the bcma bus driver.
Rickard Strandqvist offers some strn* -> strl* changes in brcmfmac
to eliminate non-terminated string issues.
Sujith Manoharan avoids some ath9k stalls by enabling HW queue control
only for MCC.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If there is a mismatch between enabled tagging protocols and the
protocol the switch supports, error out, rather than continue with a
situation which is unlikely to work.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
cc: alexander.h.duyck@intel.com
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The internal and netdev vport remain part of openvswitch.ko. Encap
vports including vxlan, gre, and geneve can be built as separate
modules and are loaded on demand. Modules can be unloaded after use.
Datapath ports keep a reference to the vport module during their
lifetime.
Allows to remove the error prone maintenance of the global list
vport_ops_list.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a device ndo_start_xmit() calls again dev_queue_xmit(),
lockdep can complain because dev_queue_xmit() is re-entered and the
spinlocks protecting tx queues share a common lockdep class.
Same issue was fixed for ieee802154 in commit "20e7c4e80dcd"
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The rwlocks are converted to use RCU. This helps performance as the
irq locks are not needed any more.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This reverts commits c6992e9ef2 and
4cd3362da8.
The reason for the revert is that we cannot have more than one module
initialization function and the SMP one breaks the build with modular
kernels. As the proper fix for this is right now looking non-trivial
it's better to simply revert the problematic patches in order to keep
the upstream tree compilable.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
It is a precondition of the function that daddr be equal to dest->addr.ip
if dest is non-NULL, so this additional assignment is just confusing for
stupid engineers like me.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Use daddr instead of reaching into dest.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
introduce two configs:
- hidden CONFIG_BPF to select eBPF interpreter that classic socket filters
depend on
- visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use
that solves several problems:
- tracing and others that wish to use eBPF don't need to depend on NET.
They can use BPF_SYSCALL to allow loading from userspace or select BPF
to use it directly from kernel in NET-less configs.
- in 3.18 programs cannot be attached to events yet, so don't force it on
- when the rest of eBPF infra is there in 3.19+, it's still useful to
switch it off to minimize kernel size
bloat-o-meter on x64 shows:
add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601)
tested with many different config combinations. Hopefully didn't miss anything.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This feature is defined in IEEE Std 802.11-2012, 10.23.13. It allows
the AP devices to keep track of the hardware-address-to-IP-address
mapping of the mobile devices within the WLAN network.
The AP will learn this mapping via observing DHCP, ARP, and NS/NA
frames. When a request for such information is made (i.e. ARP request,
Neighbor Solicitation), the AP will respond on behalf of the
associated mobile device. In the process of doing so, the AP will drop
the multicast request frame that was intended to go out to the wireless
medium.
It was recommended at the LKS workshop to do this implementation in
the bridge layer. vxlan.c is already doing something very similar.
The DHCP snooping code will be added to the userspace application
(hostapd) per the recommendation.
This RFC commit is only for IPv4. A similar approach in the bridge
layer will be taken for IPv6 as well.
Signed-off-by: Kyeyoon Park <kyeyoonp@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Allow to recycle a TCP port in conntrack when the change role from
server to client, from Marcelo Leitner.
2) Fix possible off by one access in ip_set_nfnl_get_byindex(), patch
from Dan Carpenter.
3) alloc_percpu returns NULL on error, no need for IS_ERR() in nf_tables
chain statistic updates. From Sabrina Dubroca.
4) Don't compile ip options in bridge netfilter, this mangles the packet
and bridge should not alter layer >= 3 headers when forwarding packets.
Patch from Herbert Xu and tested by Florian Westphal.
5) Account the final NLMSG_DONE message when calculating the size of the
nflog netlink batches. Patch from Florian Westphal.
6) Fix a possible netlink attribute length overflow with large packets.
Again from Florian Westphal.
7) Release the skbuff if nfnetlink_log fails to put the final
NLMSG_DONE message. This fixes a leak on error. This shouldn't ever
happen though, otherwise this means we miscalculate the netlink batch
size, so spot a warning if this ever happens so we can track down the
problem. This patch from Houcheng Lin.
8) Look at the right list when recycling targets in the nft_compat,
patch from Arturo Borrero.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This new expression provides NAT in the redirect flavour, which is to
redirect packets to local machine.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch refactors the IPv6 code so it can be usable both from xt and
nf_tables.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch refactors the IPv4 code so it can be usable both from xt and
nf_tables.
A similar patch follows-up to handle IPv6.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The code looks for an already loaded target, and the correct list to search
is nft_target_list, not nft_match_list.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Let compiler decide what to do with static void __ipxitf_put()
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
use %08X instead of %08lX and remove casting.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
unsigned char *sha (source) was already in original git version
but was never used.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
RTS rate, one for a debugfs file, one to return the correct
channel to userspace, a sanity check for a userspace value
and the remaining two are just documentation fixes.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJUSUotAAoJEDBSmw7B7bqr2TwP/2EJiYMOXhLTM5F/sEaGP5aX
+hN+/Za3hAuLu3GkYXnIEw8uJL0ooDpUkQLoyV7AUoqVKhgqCuMQPWbklU0Ns2Og
qubAl9BRY1DPgMtdGa3mMMJ3GYkIC8Hbh6kTOPqYASVZWover5NRKFlp2jp1uhbf
ypv0IfIJVO323s90TWv1ZQsTtnGxMtL9DaLwqBNKN8nSGKxe62cUZsQN+H5KGm4N
/n7eN62XPkidFUsTmdAXHfcgEpGv82rtSpxWmSrwxDbQEj12xkP66cRTuomZJ5v1
981OIzcxtV0ngLjfnoSGev6bvgO2TDEbvQScIsZiqfnaJuBfzPEAaghnWozox3op
dfolKkD3LecLGcxVVGJlKddxm3K4+2q7tuwkfDcxKNx2KFqtOqM6gY8z1uXYX8MW
Jv7669nwpKgWM0e3hsxz6WJauEsdWRVzarmimK/Ymitu0RgNmXTVbdvvFVSTenuZ
0HLqfr7Uk0gw5gQgWfj4F0qjNxzmjhnw/pz+c1DRtYs6w6SGToCqcm5yRU6f8pLt
SHc3LJ67xK5RnOq8+KJ8o92MfE29HH2CTzLzgghNrLnwqcYBTApuCr0OtpQb04Zf
AgQRMq61IGXwCDiVFE1ElpRgrW4/aekUZh/JB8pGHlhrAvl+HwNYVek66MBDeMyl
akmLeHhrCkuWstDHHz+o
=m/XT
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-john-2014-10-23' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg <johannes@sipsolutions.net> says:
"Here are a few fixes for the wireless stack: one fixes the
RTS rate, one for a debugfs file, one to return the correct
channel to userspace, a sanity check for a userspace value
and the remaining two are just documentation fixes."
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch changes the naming convention of mac802154 rx file. It should
be more named like mac80211 stack. Furthermore we introduce a new frame
parsing implementation which is much similar the mac80211
implementation.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Instead of twice lock and unlock mechanism this patch hold these locks
only once at one position.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves the skb_reset_mac_header call before frame parsing
while wpan rx and before monitor deliver functionality.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a PACKET_OTHERHOST setting when a monitor interface
receives a skb. All receiving skb's to the monitor interface should
be PACKET_OTHERHOST.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds CHECKSUM_UNNECESSARY to skb->ip_summed before delivery.
There exist no transceiver with IP checksum functionality.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves the skb->protocol setting to the position when it's
needed. It's only needed when frame parsing was successful.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes the mac802154_subif_rx function and do the necessary
calls inside of ieee802154_rx function. The ieee802154_rx is small
enough to move the functionality inside this function.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This removes the call of monitor receive funktion when any interface
type call xmit. There exist no such use case that a monitor interface
should receive the actual sending frame. One use case could be that a
wpan interface and monitor interface could be running at the same time
on one phy. Then the monitor interface receives the wpan frames also.
Furthermore we adding support for promiscous mode setting. With
promiscous mode setting we can't run a wpan and monitor interface at the
same time.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes all relevant receiving functions inclusive frame
parsing into rx file. Like mac80211 we should implement the complete
receive handling and parsing in this file.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch is similar like d20ef63d32
("mac80211: document ieee80211_rx() context requirement"). The
netif_receive_skb call requires with softirqs disabled. This patch
adds a warning if softirqs are pending while calling ieee802154_rx.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Tasklets have much less overhead than workqueues. This patch also
removes the heap allocation for the worker on receiving path.
Like mac80211 we should prefer use a tasklet here instead a workqueue to
getting fast out of interrupt context when ieee802154_rx_irqsafe is
called by driver. Like wireless inside the tasklet context we should
call netif_receive_skb instead netif_rx_ni anymore.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch replaces the memcpy with a put_unaligned_le16. The placement
of crc inside of PSDU can also be unaligned. With memcpy this can fail
on some architectures.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
As we have decouple decompression from data delivery we can now rename all
occurences of process_data in receive path.
Signed-off-by: Martin Townsend <mtownsend1973@gmail.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
As process_data now returns just error codes fix up the calls to this
function to only drop the skb if an error code is returned.
Signed-off-by: Martin Townsend <mtownsend1973@gmail.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Separating skb delivery from decompression ensures that we can support further
decompression schemes and removes the mixed return value of error codes with
NET_RX_FOO.
Signed-off-by: Martin Townsend <mtownsend1973@gmail.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When CONFIG_MAC80211_RC_MINSTREL_VHT is set, the module param
minstrel_vht_only tells minstrel_ht whether to allow the mix of ht rates
with vht rates.
ATM, minstrel_ht skips ht rates when minstrel_vht_only is true, but it does
that even if vht is not supported, which makes the sta rates fallback to
legacy as no ht rate gets enabled.
Fixes: 9208247d74 ("mac80211: minstrel_ht: add basic support for VHT rates <= 3SS@80MHz")
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
All the callers of ieee80211_mgd_probe_ap_send return right
after they call the flush() callback. This means that calling
flush() is uneeded since its meaning is to wait until the
queues of the device are empty.
Devices that know how to report status on Tx will do so using
the regular path (ieee80211_tx_status) and this status will
trigger the continuation of the flow of the probe
(ieee80211_sta_tx_notify).
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This is useful when creating virtual interfaces.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This is useful when creating virtual interfaces.
Keeps udev from mucking with things it shouldn't, since
the default MAC is never seen by udev when specified on
the cmd-line during creation.
Signed-off-by: Ben Greear <greearb@candelatech.com>
[check for feature flag in nl80211 to force drivers to set it]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This will be helpful when using the mac80211_hwsim
wiphys and automated testing. Let user create the
vifs as needed, and named as expected.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Support creating wiphy devices with an optional name.
This will be used by hwsim to have better automated control
over virtual radio creation/deletion.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Kernel will attempt to use the name if it is supplied,
but if name cannot be used for some reason, the default
phyX name will be used instead.
Signed-off-by: Ben Greear <greearb@candelatech.com>
[while at it, use wiphy_name() instead of dev_name(),
fix format string issue reported by Kees Cook]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Do not reuse skb if it was pfmemalloc tainted, otherwise
future frame might be dropped anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the worker information struct out of skb control block.
Instead control block we declare it static inside of tx.c file. We can do
that, because the worker can't be used twice at the same time. It's
protected by stop and wake netdev queue.
This patch fix an issue that the "struct ieee802154_xmit_cb" doesn't fit
into the skb control block on some kernel configuartion reported by
kbuild test robot.
It was introduced by commit fe24371d66
("mac802154: tx: remove kmalloc in xmit hotpath").
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch changes the naming convention of the tx functions like
mac80211. Just with an 802154 instead 80211 inside the name.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves the stats increment of successful transmitted packets
in the right place when the skb was really successful transmitted.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch replace the pr_foo printout function to netdev_foo printout
function. Inside the xmit handling, the interface is already known.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch holds rtnl lock while sync xmit inside of workqueue.
Otherwise we could down the interface while worker xmit handling.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch renames the existsing xmit callback to xmit_sync and
introduces an asynchronous xmit_async function. If ieee802154_ops
doesn't provide the xmit_async callback, then we have a fallback to
the xmit_sync callback.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In case of an error we should call kfree_skb instead of consume_skb which
is called by ieee802154_xmit_complete function.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch uses the queue utility helpers inside the xmit worker of
mac802154 subsystem.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a new file net/mac802154/util.c which contains utility
functions for drivers, etc. This file contains functions to start and
stop queues for all virtual interfaces, this is useful for asynchronous
handling by driver level.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes the channel hopping feature before xmit. There are
several issues to provide a real channel hopping (timing requirements,
etc...).
We don't have any known kernelspace protocol which really use this
feature. And I don't know an real user of this feature.
We simply drop this feature now.
This patch removes also the hold of pib lock which isn't needed by any
real driver xmit callback implementation.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch introduce some new stack variables to avoid multiple
dereferencing inside the xmit worker function.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes the kmalloc allocation for workqueue data. This patch
replaces the kmalloc and uses the control block of skb. The control block
has enough space and isn't use by any other layer in this case.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves the netdev xmit callback functions into the tx.c file.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
percpu tcp_md5sig_pool contains memory blobs that ultimately
go through sg_set_buf().
-> sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
This requires that whole area is in a physically contiguous portion
of memory. And that @buf is not backed by vmalloc().
Given that alloc_percpu() can use vmalloc() areas, this does not
fit the requirements.
Replace alloc_percpu() by a static DEFINE_PER_CPU() as tcp_md5sig_pool
is small anyway, there is no gain to dynamically allocate it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 765cf9976e ("tcp: md5: remove one indirection level in tcp_md5sig_pool")
Reported-by: Crestez Dan Leonard <cdleonard@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver_ops callback function is never used by any driver.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Small rename to use the name workqueue than dev_workqueue. To bring the
same naming convention like wireless into 802.15.4.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This function adds a wrapper to call netdev_priv to getting the sdata
attribute. This is similar like the IEEE80211_DEV_TO_SUB_IF function
inside wireless stack implementation.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch replace the mac802154_to_priv macro with a static inline
function named hw_to_local. This brings a similar naming convention like
mac80211 stack.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch renamens the slaves attribute in sdata to interfaces and
slaves_mtx to iflist_mtx. This is similar like the mac80211 stack naming
convention.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch renames the hw attribute in struct ieee802154_sub_if_data to
local. This avoid confusing with the struct ieee802154_hw hw; inside of
local struct.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Like wireless this structure should named ieee802154_sub_if_data and not
mac802154_sub_if_data. This patch renames the struct and variables to
sdata instead priv sometimes.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch rename the mac802154_priv to ieee802154_local. The
mac802154_priv structure is like ieee80211_local and so we name it
ieee802154_local.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The identical struct of the wireless stack implementation is named
ieee80211_hw. This is useful to name the variable hw instead of get
confusing with netdev dev variable.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves the ieee802154 header into include/linux instead
include/net. Similar like wireless which have the ieee80211 header
inside of include/linux.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Like the wireless core.c file this file contains function for phy
allocation and freeing. Move this file to core.c to get similar
behaviour.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The wpan-phy header contains the wpan_phy struct information. Later this
header will be have similar function like cfg80211 header. The cfg80211
header contains the wiphy struct which is identically the wpan_phy
struct inside 802.15.4 subsystem.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The wpan.c file contains the interface handling functions now. It's similar
like the mac80211 iface.c file. This patch renames this file to iface.c to
have similar naming convention in mac802154.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves the mac802154.h internal header to ieee802154_i.h like
the wireless stack ieee80211_i.h file. This avoids confusing with the
not internal header include/net/mac802154.h header. Additional we get
the same naming conversion like mac80211 for this file.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The ieee802154_dev functionality contains various function for
allocation and registration of an ieee802154_dev. This is equal to the
net/mac80211/main.c file. This patch rename the ieee802154_dev.c to
main.c to have the same behaviour.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds self-tests for the c1 and s1 crypto functions used for
SMP pairing. The data used is the sample data from the core
specification.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a basic skeleton for SMP self-tests. The tests are put
behind a new configuration option since running them will slow down the
boot process. For now there are no actual tests defined but those will
come in a subsequent patch.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In order to make unit testing possible we need to make the SMP crypto
functions only take the crypto context instead of the full SMP context
(the latter would require having hci_dev, hci_conn, l2cap_chan,
l2cap_conn, etc around). The drawback is that we no-longer get the
involved hdev in the debug logs, but this is really the only way to make
simple unit tests for the code.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
use clkoff_cp for hci_cp_read_clock_offset instead of cp
(already defined above).
Suggested-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes an unnecessary tailing semicolon after macro
define. Otherwise we get a trailing semicolon while using this macro.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch fixs a typo in address filter defines from IEEE802515 to
IEEE802154.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch fix a typo and fix align instead allign.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes the FSF address in files which belongs to ieee802154
and mac802154.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In the error case where credits is greater than max_credits there
is a missing l2cap_chan_unlock before returning.
Signed-off-by: Martin Townsend <mtownsend1973@gmail.com>
Tested-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Currently there are potentially 2 skb_copy_expand calls in IPHC
decompression. This patch replaces this with one call to
skb_cow which will check to see if there is enough headroom
first to ensure it's only done if necessary and will handle
alignment issues for cache.
As skb_cow uses pskb_expand_head we ensure the skb isn't shared from
bluetooth and ieee802.15.4 code that use the IPHC decompression.
Signed-off-by: Martin Townsend <martin.townsend@xsilon.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
netif_rx() only returns NET_RX_DROP and NET_RX_SUCCESS, not returns
negative value
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Value returned by this macro might be used as bit value so it should
return either 0 or 1 to avoid possible bugs (similar to NSC bug)
when shifting it.
Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
rfcomm_send_nsc expects CR to be either 0 or 1 since it is later
passed to __mcc_type macro and shitfed. Unfortunatelly CR extracted
from received frame type was not sanitized and shifted value was passed
resulting in bogus response.
Note: shifted value was also passed to other functions but was used
only in if satements so this bug appears only for NSC case.
The CR bit in the value octet shall be set to the same value
as the CR bit in the type field octet of the not supported command
frame but the CR bit for NCS response should be set to 0 since it is
always a response.
This was affecting TC_RFC_BV_25_C PTS qualification test.
Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Systematically removing the LE connection parameters and autoconnect
action is inconvenient for rebonding without disconnecting from
userland (i.e. unpairing followed by repairing without
disconnecting). The parameters will be lost after unparing and
userland needs to take care of book-keeping them and re-adding them.
This patch allows userland to forget about parameter management when
rebonding without disconnecting. It defers clearing the connection
parameters when unparing without disconnecting, giving a chance of
keeping the parameters if a repairing happens before the connection is
closed.
Signed-off-by: Alfonso Acosta <fons@spotify.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
This patch ensure that the rtnl lock is hold while newlink callback.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch improves the packet registration handling. Instead of
registration with module init we have a open count variable and
registration the lowpan packet handler when it's needed.
The open count variable should be protected by RTNL.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
NULL-checking conn->dev_class is pointless since the variable is
defined as an array, i.e. it will always be non-NULL.
Signed-off-by: Alfonso Acosta <fons@spotify.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
There are scenarios when autoconnecting to a device after the
reception of an ADV_IND report (action 0x02), in which userland
might want to examine the report's contents.
For instance, the Service Data might have changed and it would be
useful to know ahead of time before starting any GATT procedures.
Also, the ADV_IND may contain Manufacturer Specific data which would
be lost if not propagated to userland. In fact, this patch results
from the need to rebond with a device lacking persistent storage which
notifies about losing its LTK in ADV_IND reports.
This patch appends the ADV_IND report which triggered the
autoconnection to the EIR Data in the Device Connected event.
Signed-off-by: Alfonso Acosta <fons@spotify.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The values of a lot of the mgmt_device_connected() parameters come
straight from a hci_conn object. We can simplify the function by passing
the full hci_conn pointer to it.
Signed-off-by: Alfonso Acosta <fons@spotify.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
This patch fix ERR_PTR(-rc) to ERR_PTR(rc). The variable rc is already a
negative errno value.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch fix byte order handling in reassembly code of 802.15.4
6LoWPAN fragmentation handling.
net/ieee802154/reassembly.c:58:43: warning: restricted __be16 degrades to integer
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch fix byteorder issues with fragment tag of generation 802.15.4
6LoWPAN fragment header.
net/ieee802154/6lowpan_rtnl.c:278:54: warning restricted __be16 degrades to integer
net/ieee802154/6lowpan_rtnl.c:278:18: warning: incorrect type in assignment (different base types)
net/ieee802154/6lowpan_rtnl.c:278:18: expected restricted __be16 [usertype] frag_tag
net/ieee802154/6lowpan_rtnl.c:278:18: got unsigned short
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There is no point processing pkts which are PACKET_OTHERHOST
in 6lowpan as they are discarded as soon as they reach the
ipv6 layer. Therefore we should drop them in the 6lowpan layer.
Signed-off-by: Simon Vincent <simon.vincent@xsilon.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The kernel should reserve enough room in the skb so that the DONE
message can always be appended. However, in case of e.g. new attribute
erronously not being size-accounted for, __nfulnl_send() will still
try to put next nlmsg into this full skbuf, causing the skb to be stuck
forever and blocking delivery of further messages.
Fix issue by releasing skb immediately after nlmsg_put error and
WARN() so we can track down the cause of such size mismatch.
[ fw@strlen.de: add tailroom/len info to WARN ]
Signed-off-by: Houcheng Lin <houcheng@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
don't try to queue payloads > 0xffff - NLA_HDRLEN, it does not work.
The nla length includes the size of the nla struct, so anything larger
results in u16 integer overflow.
This patch is similar to
9cefbbc9c8 (netfilter: nfnetlink_queue: cleanup copy_range usage).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We currently neither account for the nlattr size, nor do we consider
the size of the trailing NLMSG_DONE when allocating nlmsg skb.
This can result in nflog to stop working, as __nfulnl_send() re-tries
sending forever if it failed to append NLMSG_DONE (which will never
work if buffer is not large enough).
Reported-by: Houcheng Lin <houcheng@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit 462fb2af97
bridge : Sanitize skb before it enters the IP stack
broke when IP options are actually used because it mangles the
skb as if it entered the IP stack which is wrong because the
bridge is supposed to operate below the IP stack.
Since nobody has actually requested for parsing of IP options
this patch fixes it by simply reverting to the previous approach
of ignoring all IP options, i.e., zeroing the IPCB.
If and when somebody who uses IP options and actually needs them
to be parsed by the bridge complains then we can revisit this.
Reported-by: David Newall <davidn@davidnewall.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch save the fn before doing rt6_backtrack.
Hence, without redo-ing the fib6_lookup(), saved_fn can be used
to redo rt6_select() with RT6_LOOKUP_F_REACHABLE off.
Some minor changes I think make sense to review as a single patch:
* Remove the 'out:' goto label.
* Remove the 'reachable' variable. Only use the 'strict' variable instead.
After this patch, "failing ip6_ins_rt()" should be the only case that
requires a redo of fib6_lookup().
Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When there is a RTF_CACHE hit, no need to redo fib6_lookup()
with reachable=0.
Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is the prep work to reduce the number of calls to fib6_lookup().
The BACKTRACK macro could be hard-to-read and error-prone due to
its side effects (mainly goto).
This patch is to:
1. Replace BACKTRACK macro with a function (fib6_backtrack) with the following
return values:
* If it is backtrack-able, returns next fn for retry.
* If it reaches the root, returns NULL.
2. The caller needs to decide if a backtrack is needed (by testing
rt == net->ipv6.ip6_null_entry).
3. Rename the goto labels in ip6_pol_route() to make the next few
patches easier to read.
Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove trailing whitespace in tcp.h icmp.c syncookies.c
Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow drivers to iterate all stations currently uploaded to them.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
During reconfig the station list is traversed in order and station are
added back to the driver. Make sure the stations are added to the driver
in the same order they were added to mac80211.
This has a real side effect - some drivers (iwlwifi) require TDLS
stations to be added only after the AP station for the same network.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some drivers need to know which station is the TDLS link initiator.
Expose this value via the mac80211 ieee80211_sta structure.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When an IV is generated, only the MAC header is moved back. The network
header location remains the same relative to the skb head, as the new IV
is using headroom space that was reserved in advance.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Export ieee80211_ie_split function, so it can be reused by
drivers which need to insert additional elements.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When displaying a rate through debugfs minstrel_ht guesses its flags
comparing group indexes. Since 3ec373c421 ("mac80211: minstrel_ht:
include type (cck/ht) in rates flag"), the rate flags of interest are
present in the mcs_group-s, so use it.
While improving the code, this also fixes a smatch false positive
"error: testing array offset 'i' after use" in minstrel_ht_stats_dump.
This warning only triggers after 9208247d74 ("mac80211: minstrel_ht:
add basic support for VHT rates <= 3SS@80MHz") with
CONFIG_MAC80211_RC_MINSTREL_VHT unset because then MINSTREL_VHT_GROUP_0
is above MINSTREL_GROUPS_NB and smatch only barks when the "testing
array offset" seems to prevent possible out of bonds accesses (which
does not happen here since i < ARRAY_SIZE(mi->groups)).
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Thanks to Andrea Arcangeli for pointing out these checks are
obviously unnecessary given the preceding calculations.
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The commit "net: Save TX flow hash in sock and set in skbuf on xmit"
introduced the inet_set_txhash() and ip6_set_txhash() routines to calculate
and record flow hash(sk_txhash) in the socket structure. sk_txhash is used
to set skb->hash which is used to spread flows across multiple TXQs.
But, the above routines are invoked before the source port of the connection
is created. Because of this all outgoing connections that just differ in the
source port get hashed into the same TXQ.
This patch fixes this problem for IPv4/6 by invoking the the above routines
after the source port is available for the socket.
Fixes: b73c3d0e4("net: Save TX flow hash in sock and set in skbuf on xmit")
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pskb_may_pull() maybe change skb->data and make nh and exthdr pointer
oboslete, so recompute the nd and exthdr
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The crafted header start address is from a driver supplied buffer, which
one can reasonably expect to be aligned on a 4-bytes boundary.
However ATM the TSO helper API is only used by ethernet drivers and
the tcp header will then be aligned to a 2-bytes only boundary from the
header start address.
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ->ip_set_list[] array is initialized in ip_set_net_init() and it
has ->ip_set_max elements so this check should be >= instead of >
otherwise we are off by one.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When a port that was used to listen for inbound connections gets closed
and reused for outgoing connections (like rsh ends up doing for stderr
flow), current we may reject the SYN/ACK packet for the new connection
because tcp_conntracks states forbirds a port to become a client while
there is still a TIME_WAIT entry in there for it.
As TCP may expire the TIME_WAIT socket in 60s and conntrack's timeout
for it is 120s, there is a ~60s window that the application can end up
opening a port that conntrack will end up blocking.
This patch fixes this by simply allowing such state transition: if we
see a SYN, in TIME_WAIT state, on REPLY direction, move it to sSS. Note
that the rest of the code already handles this situation, more
specificly in tcp_packet(), first switch clause.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When a key is tainted during resume, it is no longer programmed
into the device; however, it's uploaded flag may (will) be set.
Clear the flag when not programming it because it's tainted to
avoid attempting to remove it again later.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Use the currently existing APIs between mac80211 and the low
level driver to implement WMM admission control.
The low level driver needs to report the media time used by
each transmitted packet in ieee80211_tx_status. Based on that
information, mac80211 will modify the QoS parameters of the
admission controlled Access Category when the limit is
reached. Once the original QoS parameters can be restored,
mac80211 will do so.
One issue with this approach is that management frames will
also erroneously be downgraded, but the upside is that the
implementation is simple. In the future, it can be extended
to driver- or device-based implementations that are better.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There's no reason to ever set invalid CW_min/CW_max to the
drivers, we should catch it in higher layers. However, the
consequences of setting it wrong can be quite severe, so
double-check at a low level and error out for invalid data.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
During the review of the corresponding wpa_supplicant patches we
noticed that the only way for it to detect that this functionality
is supported currently is to check for the command support. This
can be misleading though, as the command was also designed to, in
the future, support pure 802.11 TSPECs.
Expose the WMM-TSPEC feature flag to nl80211 so later we can also
expose an 802.11-TSPEC feature flag (if needed) to differentiate
the two cases.
Note: this change isn't needed in 3.18 as there's no driver there
yet that supports the functionality at all.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Use netdev_alloc_pcpu_stats to allocate percpu stats and initialize syncp.
Fixes: 22e0f8b932 "net: sched: make bstats per cpu and estimator RCU safe"
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The synchronize_rcu() in netlink_release() introduces unacceptable
latency. Reintroduce minimal lookup so we can drop the
synchronize_rcu() until socket destruction has been RCUfied.
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Locking dependency detected below possible unsafe locking scenario:
CPU0 CPU1
T0: tipc_named_rcv() tipc_rcv()
T1: [grab nametble write lock]* [grab node lock]*
T2: tipc_update_nametbl() tipc_node_link_up()
T3: tipc_nodesub_subscribe() tipc_nametbl_publish()
T4: [grab node lock]* [grab nametble write lock]*
The opposite order of holding nametbl write lock and node lock on
above two different paths may result in a deadlock. If we move the
the updating of the name table after link state named out of node
lock, the reverse order of holding locks will be eliminated, and
as a result, the deadlock risk.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The debugfs_remove() function can safely take NULL parameters
so the additionally null test isn't required, and there's no
other reason to have it here, so remove it.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
[rewrite commit message, re-introduce blank line after assert]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When the new CONFIG_MAC80211_RC_MINSTREL_VHT is not set (default 'N'),
there is no behavioral change including in sampling and MCS_GROUP_RATES
remains 8.
Otherwise MCS_GROUP_RATES is 10, and a module parameter *vht_only*
(default 'true'), restricts the rates selection to VHT when VHT is
supported.
Regarding the debugfs stats buffer:
It is explicitly increased from 8k to 32k to fit every rates incl. when
both HT and VHT rates are enabled, as for the format, before:
type rate tpt eprob *prob ret *ok(*cum) ok( cum)
HT20/LGI ABCDP MCS0 0.0 0.0 0.0 1 0( 0) 0( 0)
after:
type rate tpt eprob *prob ret *ok(*cum) ok( cum)
HT20/LGI ABCDP MCS0 0.0 0.0 0.0 1 0( 0) 0( 0)
VHT40/LGI MCS5/2 0.0 0.0 0.0 0 0( 0) 0( 0)
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
ATM, we grep cck rates idx with idx / MCS_GROUP_RATES ==
MINSTREL_CCK_GROUP.
Matching neither-cck-non-ht rates could be done by replacing '==' with
'>', however it would be less versatile or explicit.
This will allow to match VHT rates with IEEE80211_TX_RC_VHT_MCS.
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
No functional change.
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since 5935839ad7 ("mac80211: improve minstrel_ht rate sorting by
throughput & probability"), the rate indexes are manipulated via u8's
and hence allow for a maximum of 256 mcs_group entries in
minstrel_mcs_groups.
ATM, minstrel_ht advertizes support up to 3HTSS@40MHz, consuming:
8(MCS_GROUP_RATES) * (3(SS)*2(GI)*2(BW)+1(CCK)), i.e. 104 entries.
Support for 3VHTSS@80MHz will require:
10(MCS_GROUP_RATES) * (3(SS)*2(GI)*2(BW)+1(CCK)) +
10(MCS_GROUP_RATES) * (3(SS)*2(GI)*3(BW)), i.e. 130 + 180 entries.
This change moves from u8s to u16s where necessary.
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This showed up as a sparse warning (with higher verbosity) and is
certainly correct - the change flags should be unsigned. It's not
that important since high flag numbers aren't used and bitwise
operations would still work.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
if ->encapsulation is set we have to use inner_tcp_hdrlen and add the
size of the inner network headers too.
This is 'mostly harmless'; tbf might send skb that is slightly over
quota or drop skb even if it would have fit.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_gso_segment has three possible return values:
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL. This can happen when GSO is used for header verification.
However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.
Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.
However, there have been issues with some protocol handlers erronously not
respecting the specified feature mask in some cases.
It is preferable to get 'have to turn off hw offloading, else slow' reports
rather than 'kernel crashes'.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_gso_segment() has a 'features' argument representing offload features
available to the output path.
A few handlers, e.g. GRE, instead re-fetch the features of skb->dev and use
those instead of the provided ones when handing encapsulation/tunnels.
Depending on dev->hw_enc_features of the output device skb_gso_segment() can
then return NULL even when the caller has disabled all GSO feature bits,
as segmentation of inner header thinks device will take care of segmentation.
This e.g. affects the tbf scheduler, which will silently drop GRE-encap GSO skbs
that did not fit the remaining token quota as the segmentation does not work
when device supports corresponding hw offload capabilities.
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
netfilter fixes for net
The following patchset contains netfilter fixes for your net tree,
they are:
1) Fix missing MODULE_LICENSE() in the new nf_reject_ipv{4,6} modules.
2) Restrict nat and masq expressions to the nat chain type. Otherwise,
users may crash their kernel if they attach a nat/masq rule to a non
nat chain.
3) Fix hook validation in nft_compat when non-base chains are used.
Basically, initialize hook_mask to zero.
4) Make sure you use match/targets in nft_compat from the right chain
type. The existing validation relies on the table name which can be
avoided by
5) Better netlink attribute validation in nft_nat. This expression has
to reject the configuration when no address and proto configurations
are specified.
6) Interpret NFTA_NAT_REG_*_MAX if only if NFTA_NAT_REG_*_MIN is set.
Yet another sanity check to reject incorrect configurations from
userspace.
7) Conditional NAT attribute dumping depending on the existing
configuration.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The optional NL80211_ATTR_MGMT_SUBTYPE and NL80211_ATTR_REASON_CODE
attributes can now be included in NL80211_CMD_DEL_STATION to indicate to
the driver which frame (Deauthentication/Disassociation) and reason code
in that frame should be used to indicate removal to the specific
station. This is used by drivers that implement AP SME and generate
those frames internally.
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
ATM an HT rc_stats line is 106 chars.
Times 8(MCS_GROUP_RATES)*3(SS)*2(GI)*2(BW) + CCK(4), i.e. x100, this is
well above the current 8192 - sizeof(*ms) currently allocated.
Fix this by squeezing the output as follows (not that we're short on
memory but this also improves readability and range, the new format adds
one more digit to *ok/*cum and ok/cum):
- Before (HT) (106 ch):
type rate throughput ewma prob this prob retry this succ/attempt success attempts
CCK/LP 5.5M 0.0 0.0 0.0 0 0( 0) 0 0
HT20/LGI ABCDP MCS0 0.0 0.0 0.0 1 0( 0) 0 0
- After (75 ch):
type rate tpt eprob *prob ret *ok(*cum) ok( cum)
CCK/LP 5.5M 0.0 0.0 0.0 0 0( 0) 0( 0)
HT20/LGI ABCDP MCS0 0.0 0.0 0.0 1 0( 0) 0( 0)
- Align non-HT format Before (non-HT) (83 ch):
rate throughput ewma prob this prob this succ/attempt success attempts
ABCDP 6 0.0 0.0 0.0 0( 0) 0 0
54 0.0 0.0 0.0 0( 0) 0 0
- After (61 ch):
rate tpt eprob *prob *ok(*cum) ok( cum)
ABCDP 1 0.0 0.0 0.0 0( 0) 0( 0)
54 0.0 0.0 0.0 0( 0) 0( 0)
*This also adds dynamic checks for overflow, lowers the size of the
non-HT request (allowing > 30 entries) and replaces the buddy-rounded
allocations (s/sizeof(*ms) + 8192/8192).
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This makes it easier to add new parameters for the del_station calls
without having to modify all drivers that use this.
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pull networking fixes from David Miller:
"A quick batch of bug fixes:
1) Fix build with IPV6 disabled, from Eric Dumazet.
2) Several more cases of caching SKB data pointers across calls to
pskb_may_pull(), thus referencing potentially free'd memory. From
Li RongQing.
3) DSA phy code tests operation presence improperly, instead of going:
if (x->ops->foo)
r = x->ops->foo(args);
it was going:
if (x->ops->foo(args))
r = x->ops->foo(args);
Fix from Andew Lunn"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
Net: DSA: Fix checking for get_phy_flags function
ipv6: fix a potential use after free in sit.c
ipv6: fix a potential use after free in ip6_offload.c
ipv4: fix a potential use after free in gre_offload.c
tcp: fix build error if IPv6 is not enabled
The check for the presence or not of the optional switch function
get_phy_flags() called the function, rather than checked to see if it
is a NULL pointer. This causes a derefernce of a NULL pointer on all
switch chips except the sf2, the only switch to implement this call.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 6819563e64 ("net: dsa: allow switch drivers to specify phy_device::dev_flags")
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
been sitting in MST's tree during my vacation. I changed a function name
and made one trivial change, then they spent two days in linux-next.
Thanks,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUQFBQAAoJENkgDmzRrbjxJRIP/1yCQRElQewxURSmJelyqCdU
0mHYB0R9Mf3tfre1xnofqs2lWeSMc/4ptKHsVR6pupoztSwnz7HsLHfEFvFJh4mj
KsaqYElxkNxTcfyHwLjyJS0/J6tG1tYypXGiimTBS0bvFHL3XZdimVgJ6WvX+gO7
YSaDEX8/EqCERafslS5+gKJlz3drDOnCZCe9y4BDSmsvl2k7bkpSxIn8vsR6jIC0
c5JpUy6QVF+3XA/J932M7yRs+xpqxNoUWiyY3ar9o3CtQAaQB0ZAetSxY6hTfvVc
GlNFzCifdsaQwsl2SVsE2h6tWaRhtMtcGWQuhHThIPyIf8XxhYyBRY2FLo70LMz1
eqtwy6F/Bg/nzUsdee4PZBMeoKHlAEL12RpsEKgfUoLzj16Aqa8ll+Agbglbkw8G
f3d2FwzKAlpY5NwHETC1wYy52PJ3efqksRWuhokmYpxNSbHJS/lsiJOE7272/4Qr
MtXuvRmo22tf34XFd5y7zqWjgZ58eeFOqQWi/K+6ZgpqVOvikjrXXKEuiVdjO0ZD
kTVR/sQKiR+79rzENk80XBhWaMveECNXF1TiZ/3MmURkmEOBRQMxRQ20BX3exvna
AJ/WVA5DcfXZc1yyqknE1NLGrvSBMJENH13x2QPwrqNWAryOOKuF1VKKIwWlDw5j
vtx5nXiJa8YYdxI2TJCN
=JK6x
-----END PGP SIGNATURE-----
Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio updates from Rusty Russell:
"One cc: stable commit, the rest are a series of minor cleanups which
have been sitting in MST's tree during my vacation. I changed a
function name and made one trivial change, then they spent two days in
linux-next"
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits)
virtio-rng: refactor probe error handling
virtio_scsi: drop scan callback
virtio_balloon: enable VQs early on restore
virtio_scsi: fix race on device removal
virito_scsi: use freezable WQ for events
virtio_net: enable VQs early on restore
virtio_console: enable VQs early on restore
virtio_scsi: enable VQs early on restore
virtio_blk: enable VQs early on restore
virtio_scsi: move kick event out from virtscsi_init
virtio_net: fix use after free on allocation failure
9p/trans_virtio: enable VQs early
virtio_console: enable VQs early
virtio_blk: enable VQs early
virtio_net: enable VQs early
virtio: add API to enable VQs early
virtio_net: minor cleanup
virtio-net: drop config_mutex
virtio_net: drop config_enable
virtio-blk: drop config_mutex
...
pskb_may_pull() maybe change skb->data and make iph pointer oboslete,
fix it by geting ip header length directly.
Fixes: ca15a078 (sit: generate icmpv6 error when receiving icmpv4 error)
Cc: Oussama Ghorbel <ghorbel@pivasoftware.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pskb_may_pull() maybe change skb->data and make opth pointer oboslete,
so set the opth again
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pskb_may_pull() may change skb->data and make greh pointer oboslete;
so need to reassign greh;
but since first calling pskb_may_pull already ensured that skb->data
has enough space for greh, so move the reference of greh before second
calling pskb_may_pull(), to avoid reassign greh.
Fixes: 7a7ffbabf9("ipv4: fix tunneled VM traffic over hw VXLAN/GRE GSO NIC")
Cc: Wei-Chun Chao <weichunc@plumgrid.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Include fixes for netrom and dsa (Fabian Frederick and Florian
Fainelli)
2) Fix FIXED_PHY support in stmmac, from Giuseppe CAVALLARO.
3) Several SKB use after free fixes (vxlan, openvswitch, vxlan,
ip_tunnel, fou), from Li ROngQing.
4) fec driver PTP support fixes from Luwei Zhou and Nimrod Andy.
5) Use after free in virtio_net, from Michael S Tsirkin.
6) Fix flow mask handling for megaflows in openvswitch, from Pravin B
Shelar.
7) ISDN gigaset and capi bug fixes from Tilman Schmidt.
8) Fix route leak in ip_send_unicast_reply(), from Vasily Averin.
9) Fix two eBPF JIT bugs on x86, from Alexei Starovoitov.
10) TCP_SKB_CB() reorganization caused a few regressions, fixed by Cong
Wang and Eric Dumazet.
11) Don't overwrite end of SKB when parsing malformed sctp ASCONF
chunks, from Daniel Borkmann.
12) Don't call sock_kfree_s() with NULL pointers, this function also has
the side effect of adjusting the socket memory usage. From Cong Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
bna: fix skb->truesize underestimation
net: dsa: add includes for ethtool and phy_fixed definitions
openvswitch: Set flow-key members.
netrom: use linux/uaccess.h
dsa: Fix conversion from host device to mii bus
tipc: fix bug in bundled buffer reception
ipv6: introduce tcp_v6_iif()
sfc: add support for skb->xmit_more
r8152: return -EBUSY for runtime suspend
ipv4: fix a potential use after free in fou.c
ipv4: fix a potential use after free in ip_tunnel_core.c
hyperv: Add handling of IP header with option field in netvsc_set_hash()
openvswitch: Create right mask with disabled megaflows
vxlan: fix a free after use
openvswitch: fix a use after free
ipv4: dst_entry leak in ip_send_unicast_reply()
ipv4: clean up cookie_v4_check()
ipv4: share tcp_v4_save_options() with cookie_v4_check()
ipv4: call __ip_options_echo() in cookie_v4_check()
atm: simplify lanai.c by using module_pci_driver
...
Interpret NFTA_NAT_REG_ADDR_MAX if NFTA_NAT_REG_ADDR_MIN is present,
otherwise, skip it. Same thing with NFTA_NAT_REG_PROTO_MAX.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We have to validate that we at least get an NFTA_NAT_REG_ADDR_MIN or
NFTA_NFT_REG_PROTO_MIN attribute. Reject the configuration if none
of them are present.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We have to validate the real chain type to ensure that matches/targets
are not used out from their scope (eg. MASQUERADE in nat chain type).
The existing validation relies on the table name, but this is not
sufficient since userspace can fool us by using the appropriate table
name with a different chain type.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
net/dsa/slave.c uses functions and structures declared in phy_fixed.h
but does not explicitely include it, while dsa.h needs structure
declarations for 'struct ethtool_wolinfo' and 'struct ethtool_eee', fix
those by including the correct header files.
Fixes: ec9436baed ("net: dsa: allow drivers to do link adjustment")
Fixes: ce31b31c68 ("net: dsa: allow updating fixed PHY link information")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds missing memset which are required to initialize
flow key member. For example for IP flow we need to initialize
ip.frag for all cases.
Found by inspection.
This bug is introduced by commit 0714812134
("openvswitch: Eliminate memset() from flow_extract").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit ec8a2e5621 ("tipc: same receive
code path for connection protocol and data messages") we omitted the
the possiblilty that an arriving message extracted from a bundle buffer
may be a multicast message. Such messages need to be to be delivered to
the socket via a separate function, tipc_sk_mcast_rcv(). As a result,
small multicast messages arriving as members of a bundle buffer will be
silently dropped.
This commit corrects the error by considering this case in the function
tipc_link_bundle_rcv().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 971f10eca1 ("tcp: better TCP_SKB_CB layout to reduce cache line
misses") added a regression for SO_BINDTODEVICE on IPv6.
This is because we still use inet6_iif() which expects that IP6 control
block is still at the beginning of skb->cb[]
This patch adds tcp_v6_iif() helper and uses it where necessary.
Because __inet6_lookup_skb() is used by TCP and DCCP, we add an iif
parameter to it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 971f10eca1 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pskb_may_pull() maybe change skb->data and make uh pointer oboslete,
so reload uh and guehdr
Fixes: 37dd0247 ("gue: Receive side for Generic UDP Encapsulation")
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pskb_may_pull() maybe change skb->data and make eth pointer oboslete,
so set eth after pskb_may_pull()
Fixes:3d7b46cd("ip_tunnel: push generic protocol handling to ip_tunnel module")
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If megaflows are disabled, the userspace does not send the netlink attribute
OVS_FLOW_ATTR_MASK, and the kernel must create an exact match mask.
sw_flow_mask_set() sets every bytes (in 'range') of the mask to 0xff, even the
bytes that represent padding for struct sw_flow, or the bytes that represent
fields that may not be set during ovs_flow_extract().
This is a problem, because when we extract a flow from a packet,
we do not memset() anymore the struct sw_flow to 0.
This commit gets rid of sw_flow_mask_set() and introduces mask_set_nlattr(),
which operates on the netlink attributes rather than on the mask key. Using
this approach we are sure that only the bytes that the user provided in the
flow are matched.
Also, if the parse_flow_mask_nlattrs() for the mask ENCAP attribute fails, we
now return with an error.
This bug is introduced by commit 0714812134
("openvswitch: Eliminate memset() from flow_extract").
Reported-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pskb_may_pull() called by arphdr_ok can change skb->data, so put the arp
setting after arphdr_ok to avoid the use the freed memory
Fixes: 0714812134 ("openvswitch: Eliminate memset() from flow_extract.")
Cc: Jesse Gross <jesse@nicira.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_setup_cork() called inside ip_append_data() steals dst entry from rt to cork
and in case errors in __ip_append_data() nobody frees stolen dst entry
Fixes: 2e77d89b2f ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
Signed-off-by: Vasily Averin <vvs@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can retrieve opt from skb, no need to pass it as a parameter.
And opt should always be non-NULL, no need to check.
Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
Cc: Eric Dumazet <edumazet@google.com>
Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cookie_v4_check() allocates ip_options_rcu in the same way
with tcp_v4_save_options(), we can just make it a helper function.
Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 971f10eca1 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
missed that cookie_v4_check() still calls ip_options_echo() which uses
IPCB(). It should use TCPCB() at TCP layer, so call __ip_options_echo()
instead.
Fixes: commit 971f10eca1 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
Cc: Eric Dumazet <edumazet@google.com>
Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All functions used struct vport *vport except
ovs_vport_find_upcall_portid.
This fixes 1 kerneldoc warning
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
s/sock/gs
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ndo_gso_check which a device can define to indicate whether is
is capable of doing GSO on a packet. This funciton would be called from
the stack to determine whether software GSO is needed to be done. A
driver should populate this function if it advertises GSO types for
which there are combinations that it wouldn't be able to handle. For
instance a device that performs UDP tunneling might only implement
support for transparent Ethernet bridging type of inner packets
or might have limitations on lengths of inner headers.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull percpu consistent-ops changes from Tejun Heo:
"Way back, before the current percpu allocator was implemented, static
and dynamic percpu memory areas were allocated and handled separately
and had their own accessors. The distinction has been gone for many
years now; however, the now duplicate two sets of accessors remained
with the pointer based ones - this_cpu_*() - evolving various other
operations over time. During the process, we also accumulated other
inconsistent operations.
This pull request contains Christoph's patches to clean up the
duplicate accessor situation. __get_cpu_var() uses are replaced with
with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().
Unfortunately, the former sometimes is tricky thanks to C being a bit
messy with the distinction between lvalues and pointers, which led to
a rather ugly solution for cpumask_var_t involving the introduction of
this_cpu_cpumask_var_ptr().
This converts most of the uses but not all. Christoph will follow up
with the remaining conversions in this merge window and hopefully
remove the obsolete accessors"
* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
irqchip: Properly fetch the per cpu offset
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
Revert "powerpc: Replace __get_cpu_var uses"
percpu: Remove __this_cpu_ptr
clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
sparc: Replace __get_cpu_var uses
avr32: Replace __get_cpu_var with __this_cpu_write
blackfin: Replace __get_cpu_var uses
tile: Use this_cpu_ptr() for hardware counters
tile: Replace __get_cpu_var uses
powerpc: Replace __get_cpu_var uses
alpha: Replace __get_cpu_var
ia64: Replace __get_cpu_var uses
s390: cio driver &__get_cpu_var replacements
s390: Replace __get_cpu_var uses
mips: Replace __get_cpu_var uses
MIPS: Replace __get_cpu_var uses in FPU emulator.
arm: Replace __this_cpu_ptr with raw_cpu_ptr
...
Pull Ceph updates from Sage Weil:
"There is the long-awaited discard support for RBD (Guangliang Zhao,
Josh Durgin), a pile of RBD bug fixes that didn't belong in late -rc's
(Ilya Dryomov, Li RongQing), a pile of fs/ceph bug fixes and
performance and debugging improvements (Yan, Zheng, John Spray), and a
smattering of cleanups (Chao Yu, Fabian Frederick, Joe Perches)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits)
ceph: fix divide-by-zero in __validate_layout()
rbd: rbd workqueues need a resque worker
libceph: ceph-msgr workqueue needs a resque worker
ceph: fix bool assignments
libceph: separate multiple ops with commas in debugfs output
libceph: sync osd op definitions in rados.h
libceph: remove redundant declaration
ceph: additional debugfs output
ceph: export ceph_session_state_name function
ceph: include the initial ACL in create/mkdir/mknod MDS requests
ceph: use pagelist to present MDS request data
libceph: reference counting pagelist
ceph: fix llistxattr on symlink
ceph: send client metadata to MDS
ceph: remove redundant code for max file size verification
ceph: remove redundant io_iter_advance()
ceph: move ceph_find_inode() outside the s_mutex
ceph: request xattrs if xattr_version is zero
rbd: set the remaining discard properties to enable support
rbd: use helpers to handle discard for layered images correctly
...
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, but virtio 9p device
adds self to channel list within probe, at which point VQ can be
used in violation of the spec.
To fix, call virtio_device_ready before using VQs.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
TCP Small queues tries to keep number of packets in qdisc
as small as possible, and depends on a tasklet to feed following
packets at TX completion time.
Choice of tasklet was driven by latencies requirements.
Then, TCP stack tries to avoid reorders, by locking flows with
outstanding packets in qdisc in a given TX queue.
What can happen is that many flows get attracted by a low performing
TX queue, and cpu servicing TX completion has to feed packets for all of
them, making this cpu 100% busy in softirq mode.
This became particularly visible with latest skb->xmit_more support
Strategy adopted in this patch is to detect when tcp_wfree() is called
from ksoftirqd and let the outstanding queue for this flow being drained
before feeding additional packets, so that skb->ooo_okay can be set
to allow select_queue() to select the optimal queue :
Incoming ACKS are normally handled by different cpus, so this patch
gives more chance for these cpus to take over the burden of feeding
qdisc with future packets.
Tested:
lpaa23:~# ./super_netperf 1400 --google-pacing-rate 3028000 -H lpaa24 -l 3600 &
lpaa23:~# sar -n DEV 1 10 | grep eth1
06:16:18 AM eth1 595448.00 1190564.00 38381.09 1760253.12 0.00 0.00 1.00
06:16:19 AM eth1 594858.00 1189686.00 38340.76 1758952.72 0.00 0.00 0.00
06:16:20 AM eth1 597017.00 1194019.00 38480.79 1765370.29 0.00 0.00 1.00
06:16:21 AM eth1 595450.00 1190936.00 38380.19 1760805.05 0.00 0.00 0.00
06:16:22 AM eth1 596385.00 1193096.00 38442.56 1763976.29 0.00 0.00 1.00
06:16:23 AM eth1 598155.00 1195978.00 38552.97 1768264.60 0.00 0.00 0.00
06:16:24 AM eth1 594405.00 1188643.00 38312.57 1757414.89 0.00 0.00 1.00
06:16:25 AM eth1 593366.00 1187154.00 38252.16 1755195.83 0.00 0.00 0.00
06:16:26 AM eth1 593188.00 1186118.00 38232.88 1753682.57 0.00 0.00 1.00
06:16:27 AM eth1 596301.00 1192241.00 38440.94 1762733.09 0.00 0.00 0.00
Average: eth1 595457.30 1190843.50 38381.69 1760664.84 0.00 0.00 0.50
lpaa23:~# ./tc -s -d qd sh dev eth1 | grep backlog
backlog 7606336b 2513p requeues 167982
backlog 224072b 74p requeues 566
backlog 581376b 192p requeues 5598
backlog 181680b 60p requeues 1070
backlog 5305056b 1753p requeues 110166 // Here, this TX queue is attracting flows
backlog 157456b 52p requeues 1758
backlog 672216b 222p requeues 3025
backlog 60560b 20p requeues 24541
backlog 448144b 148p requeues 21258
lpaa23:~# echo 1 >/proc/sys/net/ipv4/tcp_tsq_enable_tcp_wfree_ksoftirqd_detect
Immediate jump to full bandwidth, and traffic is properly
shard on all tx queues.
lpaa23:~# sar -n DEV 1 10 | grep eth1
06:16:46 AM eth1 1397632.00 2795397.00 90081.87 4133031.26 0.00 0.00 1.00
06:16:47 AM eth1 1396874.00 2793614.00 90032.99 4130385.46 0.00 0.00 0.00
06:16:48 AM eth1 1395842.00 2791600.00 89966.46 4127409.67 0.00 0.00 1.00
06:16:49 AM eth1 1395528.00 2791017.00 89946.17 4126551.24 0.00 0.00 0.00
06:16:50 AM eth1 1397891.00 2795716.00 90098.74 4133497.39 0.00 0.00 1.00
06:16:51 AM eth1 1394951.00 2789984.00 89908.96 4125022.51 0.00 0.00 0.00
06:16:52 AM eth1 1394608.00 2789190.00 89886.90 4123851.36 0.00 0.00 1.00
06:16:53 AM eth1 1395314.00 2790653.00 89934.33 4125983.09 0.00 0.00 0.00
06:16:54 AM eth1 1396115.00 2792276.00 89984.25 4128411.21 0.00 0.00 1.00
06:16:55 AM eth1 1396829.00 2793523.00 90030.19 4130250.28 0.00 0.00 0.00
Average: eth1 1396158.40 2792297.00 89987.09 4128439.35 0.00 0.00 0.50
lpaa23:~# tc -s -d qd sh dev eth1 | grep backlog
backlog 7900052b 2609p requeues 173287
backlog 878120b 290p requeues 589
backlog 1068884b 354p requeues 5621
backlog 996212b 329p requeues 1088
backlog 984100b 325p requeues 115316
backlog 956848b 316p requeues 1781
backlog 1080996b 357p requeues 3047
backlog 975016b 322p requeues 24571
backlog 990156b 327p requeues 21274
(All 8 TX queues get a fair share of the traffic)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike normal kfree() it is never right to call sock_kfree_s() with
a NULL pointer, because sock_kfree_s() also has the side effect of
discharging the memory from the sockets quota.
Signed-off-by: David S. Miller <davem@davemloft.net>
It is okay to free a NULL pointer but not okay to mischarge the socket optmem
accounting. Compile test only.
Reported-by: rucsoftsec@gmail.com
Cc: Chien Yen <chien.yen@oracle.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
parent cfusbl was used instead of first structure member 'layer'
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
fib_nh_match does not match nexthops correctly. Example:
ip route add 172.16.10/24 nexthop via 192.168.122.12 dev eth0 \
nexthop via 192.168.122.13 dev eth0
ip route del 172.16.10/24 nexthop via 192.168.122.14 dev eth0 \
nexthop via 192.168.122.15 dev eth0
Del command is successful and route is removed. After this patch
applied, the route is correctly matched and result is:
RTNETLINK answers: No such process
Please consider this for stable trees as well.
Fixes: 4e902c5741 ("[IPv4]: FIB configuration using struct fib_config")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We worked hard to improve tcp_ack() performance, by not accessing
skb_shinfo() in fast path (cd7d8498c9 tcp: change tcp_skb_pcount()
location)
We still have one spurious access because of ACK timestamping,
added in commit e1c8a607b2 ("net-timestamp: ACK timestamp for
bytestreams")
By checking if sk_tsflags has SOF_TIMESTAMPING_TX_ACK set,
we can avoid two cache line misses for the common case.
While we are at it, add two prefetchw() :
One in tcp_ack() to bring skb at the head of write queue.
One in tcp_clean_rtx_queue() loop to bring following skb,
as we will delete skb from the write queue and dirty skb->next->prev.
Add a couple of [un]likely() clauses.
After this patch, tcp_ack() is no longer the most consuming
function in tcp stack.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f363e45fd1 ("net/ceph: make ceph_msgr_wq non-reentrant")
effectively removed WQ_MEM_RECLAIM flag from ceph_msgr_wq. This is
wrong - libceph is very much a memory reclaim path, so restore it.
Cc: stable@vger.kernel.org # needs backporting for < 3.12
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Tested-by: Micha Krause <micha@krausam.de>
Reviewed-by: Sage Weil <sage@redhat.com>
For requests with multiple ops, separate ops with commas instead of \t,
which is a field separator here.
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Bring in missing osd ops and strings, use macros to eliminate multiple
points of maintenance.
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
this allow pagelist to present data that may be sent multiple times.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
TCP Small Queues (tcp_tsq_handler()) can hold one reference on
sk->sk_wmem_alloc, preventing skb->ooo_okay being set.
We should relax test done to set skb->ooo_okay to take care
of this extra reference.
Minimal truesize of skb containing one byte of payload is
SKB_TRUESIZE(1)
Without this fix, we have more chance locking flows into the wrong
transmit queue.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
queue_work() doesn't "fail to queue", it returns false if work was
already on a queue, which can't happen here since we allocate
event_work right before we queue it. So don't bother at all.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Use the more common pr_warn.
Other miscellanea:
o Coalesce formats
o Realign arguments
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
If the state variable is krealloced successfully, map->osd_state will be
freed, once following two reallocation failed, and exit the function
without resetting map->osd_state, map->osd_state become a wild pointer.
fix it by resetting them after krealloc successfully.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
We want "cbc(aes)" algorithm, so select CRYPTO_CBC too, not just
CRYPTO_AES. Otherwise on !CRYPTO_CBC kernels we fail rbd map/mount
with
libceph: error -2 building auth method x request
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Both not yet registered (r_linger && list_empty(&r_linger_item)) and
registered linger requests should use the new tid on resend to avoid
the dup op detection logic on the OSDs, yet we were doing this only for
"registered" case. Factor out and simplify the "registered" logic and
use the new helper for "not registered" case as well.
Fixes: http://tracker.ceph.com/issues/8806
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
This scenario is not limited to ASCONF, just taken as one
example triggering the issue. When receiving ASCONF probes
in the form of ...
-------------- INIT[ASCONF; ASCONF_ACK] ------------->
<----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
-------------------- COOKIE-ECHO -------------------->
<-------------------- COOKIE-ACK ---------------------
---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------>
[...]
---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------>
... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed
ASCONFs and have increasing serial numbers, we process such
ASCONF chunk(s) marked with !end_of_packet and !singleton,
since we have not yet reached the SCTP packet end. SCTP does
only do verification on a chunk by chunk basis, as an SCTP
packet is nothing more than just a container of a stream of
chunks which it eats up one by one.
We could run into the case that we receive a packet with a
malformed tail, above marked as trailing JUNK. All previous
chunks are here goodformed, so the stack will eat up all
previous chunks up to this point. In case JUNK does not fit
into a chunk header and there are no more other chunks in
the input queue, or in case JUNK contains a garbage chunk
header, but the encoded chunk length would exceed the skb
tail, or we came here from an entirely different scenario
and the chunk has pdiscard=1 mark (without having had a flush
point), it will happen, that we will excessively queue up
the association's output queue (a correct final chunk may
then turn it into a response flood when flushing the
queue ;)): I ran a simple script with incremental ASCONF
serial numbers and could see the server side consuming
excessive amount of RAM [before/after: up to 2GB and more].
The issue at heart is that the chunk train basically ends
with !end_of_packet and !singleton markers and since commit
2e3216cd54 ("sctp: Follow security requirement of responding
with 1 packet") therefore preventing an output queue flush
point in sctp_do_sm() -> sctp_cmd_interpreter() on the input
chunk (chunk = event_arg) even though local_cork is set,
but its precedence has changed since then. In the normal
case, the last chunk with end_of_packet=1 would trigger the
queue flush to accommodate possible outgoing bundling.
In the input queue, sctp_inq_pop() seems to do the right thing
in terms of discarding invalid chunks. So, above JUNK will
not enter the state machine and instead be released and exit
the sctp_assoc_bh_rcv() chunk processing loop. It's simply
the flush point being missing at loop exit. Adding a try-flush
approach on the output queue might not work as the underlying
infrastructure might be long gone at this point due to the
side-effect interpreter run.
One possibility, albeit a bit of a kludge, would be to defer
invalid chunk freeing into the state machine in order to
possibly trigger packet discards and thus indirectly a queue
flush on error. It would surely be better to discard chunks
as in the current, perhaps better controlled environment, but
going back and forth, it's simply architecturally not possible.
I tried various trailing JUNK attack cases and it seems to
look good now.
Joint work with Vlad Yasevich.
Fixes: 2e3216cd54 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When receiving a e.g. semi-good formed connection scan in the
form of ...
-------------- INIT[ASCONF; ASCONF_ACK] ------------->
<----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
-------------------- COOKIE-ECHO -------------------->
<-------------------- COOKIE-ACK ---------------------
---------------- ASCONF_a; ASCONF_b ----------------->
... where ASCONF_a equals ASCONF_b chunk (at least both serials
need to be equal), we panic an SCTP server!
The problem is that good-formed ASCONF chunks that we reply with
ASCONF_ACK chunks are cached per serial. Thus, when we receive a
same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
not need to process them again on the server side (that was the
idea, also proposed in the RFC). Instead, we know it was cached
and we just resend the cached chunk instead. So far, so good.
Where things get nasty is in SCTP's side effect interpreter, that
is, sctp_cmd_interpreter():
While incoming ASCONF_a (chunk = event_arg) is being marked
!end_of_packet and !singleton, and we have an association context,
we do not flush the outqueue the first time after processing the
ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
queued up, although we set local_cork to 1. Commit 2e3216cd54
changed the precedence, so that as long as we get bundled, incoming
chunks we try possible bundling on outgoing queue as well. Before
this commit, we would just flush the output queue.
Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
continue to process the same ASCONF_b chunk from the packet. As
we have cached the previous ASCONF_ACK, we find it, grab it and
do another SCTP_CMD_REPLY command on it. So, effectively, we rip
the chunk->list pointers and requeue the same ASCONF_ACK chunk
another time. Since we process ASCONF_b, it's correctly marked
with end_of_packet and we enforce an uncork, and thus flush, thus
crashing the kernel.
Fix it by testing if the ASCONF_ACK is currently pending and if
that is the case, do not requeue it. When flushing the output
queue we may relink the chunk for preparing an outgoing packet,
but eventually unlink it when it's copied into the skb right
before transmission.
Joint work with Vlad Yasevich.
Fixes: 2e3216cd54 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 6f4c618ddb ("SCTP : Add paramters validity check for
ASCONF chunk") added basic verification of ASCONF chunks, however,
it is still possible to remotely crash a server by sending a
special crafted ASCONF chunk, even up to pre 2.6.12 kernels:
skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768
head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950
end:0x440 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
<IRQ>
[<ffffffff8144fb1c>] skb_put+0x5c/0x70
[<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp]
[<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp]
[<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20
[<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp]
[<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
[<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0
[<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
[<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
[<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
[<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
[<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
[<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
[<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
[<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0
[<ffffffff81497078>] ip_local_deliver+0x98/0xa0
[<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440
[<ffffffff81496ac5>] ip_rcv+0x275/0x350
[<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750
[<ffffffff81460588>] netif_receive_skb+0x58/0x60
This can be triggered e.g., through a simple scripted nmap
connection scan injecting the chunk after the handshake, for
example, ...
-------------- INIT[ASCONF; ASCONF_ACK] ------------->
<----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
-------------------- COOKIE-ECHO -------------------->
<-------------------- COOKIE-ACK ---------------------
------------------ ASCONF; UNKNOWN ------------------>
... where ASCONF chunk of length 280 contains 2 parameters ...
1) Add IP address parameter (param length: 16)
2) Add/del IP address parameter (param length: 255)
... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
Address Parameter in the ASCONF chunk is even missing, too.
This is just an example and similarly-crafted ASCONF chunks
could be used just as well.
The ASCONF chunk passes through sctp_verify_asconf() as all
parameters passed sanity checks, and after walking, we ended
up successfully at the chunk end boundary, and thus may invoke
sctp_process_asconf(). Parameter walking is done with
WORD_ROUND() to take padding into account.
In sctp_process_asconf()'s TLV processing, we may fail in
sctp_process_asconf_param() e.g., due to removal of the IP
address that is also the source address of the packet containing
the ASCONF chunk, and thus we need to add all TLVs after the
failure to our ASCONF response to remote via helper function
sctp_add_asconf_response(), which basically invokes a
sctp_addto_chunk() adding the error parameters to the given
skb.
When walking to the next parameter this time, we proceed
with ...
length = ntohs(asconf_param->param_hdr.length);
asconf_param = (void *)asconf_param + length;
... instead of the WORD_ROUND()'ed length, thus resulting here
in an off-by-one that leads to reading the follow-up garbage
parameter length of 12336, and thus throwing an skb_over_panic
for the reply when trying to sctp_addto_chunk() next time,
which implicitly calls the skb_put() with that length.
Fix it by using sctp_walk_params() [ which is also used in
INIT parameter processing ] macro in the verification *and*
in ASCONF processing: it will make sure we don't spill over,
that we walk parameters WORD_ROUND()'ed. Moreover, we're being
more defensive and guard against unknown parameter types and
missized addresses.
Joint work with Vlad Yasevich.
Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set hook_mask to zero for non-base chains, otherwise people may hit
bogus errors from the xt_check_target() and xt_check_match() when
validating the uninitialized hook_mask.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It affects non-(V)HT rates and can lead to selecting an rts_cts rate
that is not a basic rate or way superior to the reference rate (ATM
rates[0] used for the 1st attempt of the protected frame data).
E.g, assuming drivers register growing (bitrate) sorted tables of
ieee80211_rate-s, having :
- rates[0].idx == d'2 and basic_rates == b'10100
will select rts_cts idx b'10011 & ~d'(BIT(2)-1), i.e. 1, likewise
- rates[0].idx == d'2 and basic_rates == b'10001
will select rts_cts idx b'10000
The first is not a basic rate and the second is > rates[0].
Also, wrt severity of the addressed misbehavior, ATM we only have one
rts_cts_rate_idx rather than one per rate table entry, so this idx might
still point to bitrates > rates[1..MAX_RATES].
Fixes: 5253ffb8c9 ("mac80211: always pick a basic rate to tx RTS/CTS for pre-HT rates")
Cc: stable@vger.kernel.org
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In kernel we have %*pE specifier to print an escaped buffer. All users
now switched to that approach.
This fixes a bug as well. The current implementation wrongly prints
octal numbers: only two first digits are used in case when 3 are
required and the rest of the string ends up cut off.
Additionally by default the \f, \v, \a, and \e are escaped to their
alphabetic representation. It's safe to do since it is currently used
for messaging only.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: "John W . Linville" <linville@tuxdriver.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kernel used to contain two functions for length-delimited,
case-insensitive string comparison, strnicmp with correct semantics and
a slightly buggy strncasecmp. The latter is the POSIX name, so strnicmp
was renamed to strncasecmp, and strnicmp made into a wrapper for the new
strncasecmp to avoid breaking existing users.
To allow the compat wrapper strnicmp to be removed at some point in the
future, and to avoid the extra indirection cost, do
s/strnicmp/strncasecmp/g.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Acked-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kernel used to contain two functions for length-delimited,
case-insensitive string comparison, strnicmp with correct semantics and
a slightly buggy strncasecmp. The latter is the POSIX name, so strnicmp
was renamed to strncasecmp, and strnicmp made into a wrapper for the new
strncasecmp to avoid breaking existing users.
To allow the compat wrapper strnicmp to be removed at some point in the
future, and to avoid the extra indirection cost, do
s/strnicmp/strncasecmp/g.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds the missing validation code to avoid the use of nat/masq from
non-nat chains. The validation assumes two possible configuration
scenarios:
1) Use of nat from base chain that is not of nat type. Reject this
configuration from the nft_*_init() path of the expression.
2) Use of nat from non-base chain. In this case, we have to wait until
the non-base chain is referenced by at least one base chain via
jump/goto. This is resolved from the nft_*_validate() path which is
called from nf_tables_check_loops().
The user gets an -EOPNOTSUPP in both cases.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull vfs updates from Al Viro:
"The big thing in this pile is Eric's unmount-on-rmdir series; we
finally have everything we need for that. The final piece of prereqs
is delayed mntput() - now filesystem shutdown always happens on
shallow stack.
Other than that, we have several new primitives for iov_iter (Matt
Wilcox, culled from his XIP-related series) pushing the conversion to
->read_iter()/ ->write_iter() a bit more, a bunch of fs/dcache.c
cleanups and fixes (including the external name refcounting, which
gives consistent behaviour of d_move() wrt procfs symlinks for long
and short names alike) and assorted cleanups and fixes all over the
place.
This is just the first pile; there's a lot of stuff from various
people that ought to go in this window. Starting with
unionmount/overlayfs mess... ;-/"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (60 commits)
fs/file_table.c: Update alloc_file() comment
vfs: Deduplicate code shared by xattr system calls operating on paths
reiserfs: remove pointless forward declaration of struct nameidata
don't need that forward declaration of struct nameidata in dcache.h anymore
take dname_external() into fs/dcache.c
let path_init() failures treated the same way as subsequent link_path_walk()
fix misuses of f_count() in ppp and netlink
ncpfs: use list_for_each_entry() for d_subdirs walk
vfs: move getname() from callers to do_mount()
gfs2_atomic_open(): skip lookups on hashed dentry
[infiniband] remove pointless assignments
gadgetfs: saner API for gadgetfs_create_file()
f_fs: saner API for ffs_sb_create_file()
jfs: don't hash direct inode
[s390] remove pointless assignment of ->f_op in vmlogrdr ->open()
ecryptfs: ->f_op is never NULL
android: ->f_op is never NULL
nouveau: __iomem misannotations
missing annotation in fs/file.c
fs: namespace: suppress 'may be used uninitialized' warnings
...
Pull security subsystem updates from James Morris.
Mostly ima, selinux, smack and key handling updates.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits)
integrity: do zero padding of the key id
KEYS: output last portion of fingerprint in /proc/keys
KEYS: strip 'id:' from ca_keyid
KEYS: use swapped SKID for performing partial matching
KEYS: Restore partial ID matching functionality for asymmetric keys
X.509: If available, use the raw subjKeyId to form the key description
KEYS: handle error code encoded in pointer
selinux: normalize audit log formatting
selinux: cleanup error reporting in selinux_nlmsg_perm()
KEYS: Check hex2bin()'s return when generating an asymmetric key ID
ima: detect violations for mmaped files
ima: fix race condition on ima_rdwr_violation_check and process_measurement
ima: added ima_policy_flag variable
ima: return an error code from ima_add_boot_aggregate()
ima: provide 'ima_appraise=log' kernel option
ima: move keyring initialization to ima_init()
PKCS#7: Handle PKCS#7 messages that contain no X.509 certs
PKCS#7: Better handling of unsupported crypto
KEYS: Overhaul key identification when searching for asymmetric keys
KEYS: Implement binary asymmetric key ID handling
...
Pull networking fixes from David Miller:
"This set fixes a bunch of fallout from the changes that went in during
this merge window, particularly:
- Fix fsl_pq_mdio (Claudiu Manoil) and fm10k (Pranith Kumar) build
failures.
- Several networking drivers do atomic_set() on page counts where
that's not exactly legal. From Eric Dumazet.
- Make __skb_flow_get_ports() work cleanly with unaligned data, from
Alexander Duyck.
- Fix some kernel-doc buglets in rfkill and netlabel, from Fabian
Frederick.
- Unbalanced enable_irq_wake usage in bcmgenet and systemport
drivers, from Florian Fainelli.
- pxa168_eth needs to depend on HAS_DMA, from Geert Uytterhoeven.
- Multi-dequeue in the qdisc layer severely bypasses the fairness
limits the previous code used to enforce, reintroduce in a way that
at the same time doesn't compromise bulk dequeue opportunities.
From Jesper Dangaard Brouer.
- macvlan receive path unnecessarily hops through a softirq by using
netif_rx() instead of netif_receive_skb(). From Jason Baron"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (51 commits)
net: systemport: avoid unbalanced enable_irq_wake calls
net: bcmgenet: avoid unbalanced enable_irq_wake calls
net: bcmgenet: fix off-by-one in incrementing read pointer
net: fix races in page->_count manipulation
mlx4: fix race accessing page->_count
ixgbe: fix race accessing page->_count
igb: fix race accessing page->_count
fm10k: fix race accessing page->_count
net/phy: micrel: Add clock support for KSZ8021/KSZ8031
flow-dissector: Fix alignment issue in __skb_flow_get_ports
net: filter: fix the comments
Documentation: replace __sk_run_filter with __bpf_prog_run
macvlan: optimize the receive path
macvlan: pass 'bool' type to macvlan_count_rx()
drivers: net: xgene: Add 10GbE ethtool support
drivers: net: xgene: Add 10GbE support
drivers: net: xgene: Preparing for adding 10GbE support
dtb: Add 10GbE node to APM X-Gene SoC device tree
Documentation: dts: Update section header for APM X-Gene
MAINTAINERS: Update APM X-Gene section
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUNZK4AAoJEAAOaEEZVoIVI08P/iM7eaIVRnqaqtWw/JBzxiba
EMDlJYUBSlv6lYk9s8RJT4bMmcmGAKSYzVAHSoPahzNcqTDdFLeDTLGxJ8uKBbjf
d1qRRdH1yZHGUzCvJq3mEendjfXn435Y3YburUxjLfmzrzW7EbMvndiQsS5dhAm9
PEZ+wrKF/zFL7LuXa1YznYrbqOD/GRsJAXGEWc3kNwfS9avephVG/RI3GtpI2PJj
RY1mf8P7+WOlrShYoEuUo5aqs01MnU70LbqGHzY8/QKH+Cb0SOkCHZPZyClpiA+G
MMJ+o2XWcif3BZYz+dobwz/FpNZ0Bar102xvm2E8fqByr/T20JFjzooTKsQ+PtCk
DetQptrU2gtyZDKtInJUQSDPrs4cvA13TW+OEB1tT8rKBnmyEbY3/TxBpBTB9E6j
eb/V3iuWnywR3iE+yyvx24Qe7Pov6deM31s46+Vj+GQDuWmAUJXemhfzPtZiYpMT
exMXTyDS3j+W+kKqHblfU5f+Bh1eYGpG2m43wJVMLXKV7NwDf8nVV+Wea962ga+w
BAM3ia4JRVgRWJBPsnre3lvGT5kKPyfTZsoG+kOfRxiorus2OABoK+SIZBZ+c65V
Xh8VH5p3qyCUBOynXlHJWFqYWe2wH0LfbPrwe9dQwTwON51WF082EMG5zxTG0Ymf
J2z9Shz68zu0ok8cuSlo
=Hhee
-----END PGP SIGNATURE-----
Merge tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux
Pull file locking related changes from Jeff Layton:
"This release is a little more busy for file locking changes than the
last:
- a set of patches from Kinglong Mee to fix the lockowner handling in
knfsd
- a pile of cleanups to the internal file lease API. This should get
us a bit closer to allowing for setlease methods that can block.
There are some dependencies between mine and Bruce's trees this cycle,
and I based my tree on top of the requisite patches in Bruce's tree"
* tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux: (26 commits)
locks: fix fcntl_setlease/getlease return when !CONFIG_FILE_LOCKING
locks: flock_make_lock should return a struct file_lock (or PTR_ERR)
locks: set fl_owner for leases to filp instead of current->files
locks: give lm_break a return value
locks: __break_lease cleanup in preparation of allowing direct removal of leases
locks: remove i_have_this_lease check from __break_lease
locks: move freeing of leases outside of i_lock
locks: move i_lock acquisition into generic_*_lease handlers
locks: define a lm_setup handler for leases
locks: plumb a "priv" pointer into the setlease routines
nfsd: don't keep a pointer to the lease in nfs4_file
locks: clean up vfs_setlease kerneldoc comments
locks: generic_delete_lease doesn't need a file_lock at all
nfsd: fix potential lease memory leak in nfs4_setlease
locks: close potential race in lease_get_mtime
security: make security_file_set_fowner, f_setown and __f_setown void return
locks: consolidate "nolease" routines
locks: remove lock_may_read and lock_may_write
lockd: rip out deferred lock handling from testlock codepath
NFSD: Get reference of lockowner when coping file_lock
...
This is illegal to use atomic_set(&page->_count, ...) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.
The only case it is valid is when page->_count is 0
Fixes: 540eb7bf0b ("net: Update alloc frag to reduce get/put page usage and recycle pages")
Signed-off-by: Eric Dumaze <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch addresses a kernel unaligned access bug seen on a sparc64 system
with an igb adapter. Specifically the __skb_flow_get_ports was returning a
be32 pointer which was then having the value directly returned.
In order to prevent this it is actually easier to simply not populate the
ports or address values when an skb is not present. In this case the
assumption is that the data isn't needed and rather than slow down the
faster aligned accesses by making them have to assume the unaligned path on
architectures that don't support efficent unaligned access it makes more
sense to simply switch off the bits that were copying the source and
destination address/port for the case where we only care about the protocol
types and lengths which are normally 16 bit fields anyway.
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. sk_run_filter has been renamed, sk_filter() is using SK_RUN_FILTER.
2. Remove wrong comments about storing intermediate value.
3. replace sk_run_filter with __bpf_prog_run for check_load_and_stores's
comments
Cc: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
minimal configurations where EPOLL, PERF_EVENTS, etc are disabled,
but NET is enabled, are failing to build with link error:
kernel/built-in.o: In function `bpf_prog_load':
syscall.c:(.text+0x3b728): undefined reference to `anon_inode_getfd'
fix it by selecting ANON_INODES when NET is enabled
Reported-by: Michal Sojka <sojkam1@fel.cvut.cz>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net-next
This batch contains two fixes for what you have in your net-next,
they are:
1) Remove nf_send_reset6() from header file. This function now resides
in the nf_reject_ipv6 module. Reported by Eric Dumazet.
2) Fix wrong NFT_REJECT_ICMPX_MAX definition and adjust code to fix
errors reported by Dan Carpenter's static analysis tools.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless-next 2014-10-09
Please pull this batch of fixes intended for the 3.18 stream!
Andrea Merello makes rtl818x_pci use a more reasonable transmission
rate for HW generated frames.
Fabian Frederick tweaks some kernel-doc bits to avoid warnings.
Larry Finger corrects a possible unaligned access in the rtlwifi code.
Marek Puzyniak avoids a kernel panic in ath9k_hw_reset.
Sujith Manoharan goes for the hat trick -- he fixes a smatch warning
in the shared ath code, he fixes a crash in ath9k, and he corrects
a sequence number assignment problem in ath9k too.
For ease of merging, I pulled the last bits of the wireless tree as well...
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ATM, specifying the frequency when connecting sends a void 'supported
rates' EID.
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
[fix memory leak in error path]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It is okay to enable DFS for channel contexts
based drivers as long as no combination advertises
radar detection and multi-channel operation at the
same time.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pull percpu updates from Tejun Heo:
"A lot of activities on percpu front. Notable changes are...
- percpu allocator now can take @gfp. If @gfp doesn't contain
GFP_KERNEL, it tries to allocate from what's already available to
the allocator and a work item tries to keep the reserve around
certain level so that these atomic allocations usually succeed.
This will replace the ad-hoc percpu memory pool used by
blk-throttle and also be used by the planned blkcg support for
writeback IOs.
Please note that I noticed a bug in how @gfp is interpreted while
preparing this pull request and applied the fix 6ae833c7fe
("percpu: fix how @gfp is interpreted by the percpu allocator")
just now.
- percpu_ref now uses longs for percpu and global counters instead of
ints. It leads to more sparse packing of the percpu counters on
64bit machines but the overhead should be negligible and this
allows using percpu_ref for refcnting pages and in-memory objects
directly.
- The switching between percpu and single counter modes of a
percpu_ref is made independent of putting the base ref and a
percpu_ref can now optionally be initialized in single or killed
mode. This allows avoiding percpu shutdown latency for cases where
the refcounted objects may be synchronously created and destroyed
in rapid succession with only a fraction of them reaching fully
operational status (SCSI probing does this when combined with
blk-mq support). It's also planned to be used to implement forced
single mode to detect underflow more timely for debugging.
There's a separate branch percpu/for-3.18-consistent-ops which cleans
up the duplicate percpu accessors. That branch causes a number of
conflicts with s390 and other trees. I'll send a separate pull
request w/ resolutions once other branches are merged"
* 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits)
percpu: fix how @gfp is interpreted by the percpu allocator
blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode
percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky
percpu_ref: add PERCPU_REF_INIT_* flags
percpu_ref: decouple switching to percpu mode and reinit
percpu_ref: decouple switching to atomic mode and killing
percpu_ref: add PCPU_REF_DEAD
percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch
percpu_ref: replace pcpu_ prefix with percpu_
percpu_ref: minor code and comment updates
percpu_ref: relocate percpu_ref_reinit()
Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe"
Revert "percpu: free percpu allocation info for uniprocessor system"
percpu-refcount: make percpu_ref based on longs instead of ints
percpu-refcount: improve WARN messages
percpu: fix locking regression in the failure path of pcpu_alloc()
percpu-refcount: add @gfp to percpu_ref_init()
proportions: add @gfp to init functions
percpu_counter: add @gfp to percpu_counter_init()
percpu_counter: make percpu_counters_lock irq-safe
...
Restore the quota fairness between qdisc's, that we broke with commit
5772e9a346 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE").
Before that commit, the quota in __qdisc_run() were in packets as
dequeue_skb() would only dequeue a single packet, that assumption
broke with bulk dequeue.
We choose not to account for the number of packets inside the TSO/GSO
packets (accessable via "skb_gso_segs"). As the previous fairness
also had this "defect". Thus, GSO/TSO packets counts as a single
packet.
Further more, we choose to slack on accuracy, by allowing a bulk
dequeue try_bulk_dequeue_skb() to exceed the "packets" limit, only
limited by the BQL bytelimit. This is done because BQL prefers to get
its full budget for appropriate feedback from TX completion.
In future, we might consider reworking this further and, if it allows,
switch to a time-based model, as suggested by Eric. Right now, we only
restore old semantics.
Joint work with Eric, Hannes, Daniel and Jesper. Hannes wrote the
first patch in cooperation with Daniel and Jesper. Eric rewrote the
patch.
Fixes: 5772e9a346 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fix following warning.
Warning(.//net/core/skbuff.c:4142): No description found for parameter 'header_len'
Warning(.//net/core/skbuff.c:4142): No description found for parameter 'data_len'
Warning(.//net/core/skbuff.c:4142): No description found for parameter 'max_page_order'
Warning(.//net/core/skbuff.c:4142): No description found for parameter 'errcode'
Warning(.//net/core/skbuff.c:4142): No description found for parameter 'gfp_mask'
Acutually the descriptions exist, but missing "@" in front.
This problem start to happen when following commit was merged
into Linus's tree during 3.18-rc1 merge period.
commit 2e4e441071
net: add alloc_skb_with_frags() helper
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to store ieee80211_vif_use_reserved_context()
result and test it before returning.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Channel switch with multiple channel contexts should now work fine.
Remove check that disallows switches when multiple contexts are in
use.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Instead of immediately reopening the queues (in case of block_tx),
calling the post_channel_switch operation and sending the
notification, wait for the first beacon on the new channel. This
makes sure that we don't lose packets if the AP/GO is not on the new
channel yet.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
As a counterpart to the pre_channel_switch operation, add a
post_channel_switch operation. This allows the drivers to go back to
a normal configuration after the channel switch is completed.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some drivers may need to prepare for a channel switch also when it is
initiated from the remote side (eg. station, P2P client). To make
this possible, add a generic callback that can be called for all
interface types.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The Extended Channel Switching capability bit in the extended
capabilities element must be set if the driver supports CSA on
non-beaconing interfaces.
Since this capability needs to be set during driver registration, the
extended_capabiliities global variable needs to be moved to the local
structure so that it can be modified.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some devices may need the device timestamp in order to synchronize the
channel switch. To pass this value back to the driver, add it to the
channel switch structure and copy the device_timestamp value received
in the rx info structure into it.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The nl80211 channel switch count attribute
(NL80211_ATTR_CH_SWITCH_COUNT) is specified as u32, but the
specification uses u8 for the counter. To make sure strange things
don't happen without informing the user, sanity check the value and
return -EINVAL if it doesn't fit in u8.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Implement get_mpp and dump_mpp cfg80211_ops to export the content of the
802.11s mesh proxy path table to userspace.
Signed-off-by: Henning Rogge <henning.rogge@fkie.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add two new cfg80211 operations for querying a table with proxied mesh
paths.
Signed-off-by: Henning Rogge <henning.rogge@fkie.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Correct the kernel-doc, the parameter is called "blocked" not "state".
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The chandef of the channel context a vif is using may be different
than the chandef of the vif itself. For instance, the bandwidth used
by the vif may be narrower than the one configured in the channel
context. To avoid confusion, return the vif's chandef in
ieee80211_cfg_get_channel() instead of the chandef of the channel
context.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit 5935839ad7 ("mac80211: improve minstrel_ht rate sorting by
throughput & probability") replaced the constant 8 with MCS_GROUP_RATES
when getting the number of streams of an HT MCS. See commit 7a5e3fa2c8
("mac80211: minstrel_ht: replace some occurences of MCS_GROUP_RATES").
Fixes: 5935839ad7 ("mac80211: improve minstrel_ht rate sorting by throughput & probability")
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Forgot to add an entry to the struct description of sta_info.
Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
we used to check for "nobody else could start doing anything with
that opened file" by checking that refcount was 2 or less - one
for descriptor table and one we'd acquired in fget() on the way to
wherever we are. That was race-prone (somebody else might have
had a reference to descriptor table and do fget() just as we'd
been checking) and it had become flat-out incorrect back when
we switched to fget_light() on those codepaths - unlike fget(),
it doesn't grab an extra reference unless the descriptor table
is shared. The same change allowed a race-free check, though -
we are safe exactly when refcount is less than 2.
It was a long time ago; pre-2.6.12 for ioctl() (the codepath leading
to ppp one) and 2.6.17 for sendmsg() (netlink one). OTOH,
netlink hadn't grown that check until 3.9 and ppp used to live
in drivers/net, not drivers/net/ppp until 3.1. The bug existed
well before that, though, and the same fix used to apply in old
location of file.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
no secid argument in netlbl_cfg_unlbl_static_del
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
"Most notable changes in here:
1) By far the biggest accomplishment, thanks to a large range of
contributors, is the addition of multi-send for transmit. This is
the result of discussions back in Chicago, and the hard work of
several individuals.
Now, when the ->ndo_start_xmit() method of a driver sees
skb->xmit_more as true, it can choose to defer the doorbell
telling the driver to start processing the new TX queue entires.
skb->xmit_more means that the generic networking is guaranteed to
call the driver immediately with another SKB to send.
There is logic added to the qdisc layer to dequeue multiple
packets at a time, and the handling mis-predicted offloads in
software is now done with no locks held.
Finally, pktgen is extended to have a "burst" parameter that can
be used to test a multi-send implementation.
Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4,
virtio_net
Adding support is almost trivial, so export more drivers to
support this optimization soon.
I want to thank, in no particular or implied order, Jesper
Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal
Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann,
David Tat, Hannes Frederic Sowa, and Rusty Russell.
2) PTP and timestamping support in bnx2x, from Michal Kalderon.
3) Allow adjusting the rx_copybreak threshold for a driver via
ethtool, and add rx_copybreak support to enic driver. From
Govindarajulu Varadarajan.
4) Significant enhancements to the generic PHY layer and the bcm7xxx
driver in particular (EEE support, auto power down, etc.) from
Florian Fainelli.
5) Allow raw buffers to be used for flow dissection, allowing drivers
to determine the optimal "linear pull" size for devices that DMA
into pools of pages. The objective is to get exactly the
necessary amount of headers into the linear SKB area pre-pulled,
but no more. The new interface drivers use is eth_get_headlen().
From WANG Cong, with driver conversions (several had their own
by-hand duplicated implementations) by Alexander Duyck and Eric
Dumazet.
6) Support checksumming more smoothly and efficiently for
encapsulations, and add "foo over UDP" facility. From Tom
Herbert.
7) Add Broadcom SF2 switch driver to DSA layer, from Florian
Fainelli.
8) eBPF now can load programs via a system call and has an extensive
testsuite. Alexei Starovoitov and Daniel Borkmann.
9) Major overhaul of the packet scheduler to use RCU in several major
areas such as the classifiers and rate estimators. From John
Fastabend.
10) Add driver for Intel FM10000 Ethernet Switch, from Alexander
Duyck.
11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric
Dumazet.
12) Add Datacenter TCP congestion control algorithm support, From
Florian Westphal.
13) Reorganize sk_buff so that __copy_skb_header() is significantly
faster. From Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits)
netlabel: directly return netlbl_unlabel_genl_init()
net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers
net: description of dma_cookie cause make xmldocs warning
cxgb4: clean up a type issue
cxgb4: potential shift wrapping bug
i40e: skb->xmit_more support
net: fs_enet: Add NAPI TX
net: fs_enet: Remove non NAPI RX
r8169:add support for RTL8168EP
net_sched: copy exts->type in tcf_exts_change()
wimax: convert printk to pr_foo()
af_unix: remove 0 assignment on static
ipv6: Do not warn for informational ICMP messages, regardless of type.
Update Intel Ethernet Driver maintainers list
bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING
tipc: fix bug in multicast congestion handling
net: better IFF_XMIT_DST_RELEASE support
net/mlx4_en: remove NETDEV_TX_BUSY
3c59x: fix bad split of cpu_to_le32(pci_map_single())
net: bcmgenet: fix Tx ring priority programming
...
No need to store netlbl_unlabel_genl_init result and test it before returning.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to copy exts->type when committing the change, otherwise
it would be always 0. This is a quick fix for -net and -stable,
for net-next tcf_exts will be removed.
Fixes: commit 33be627159 ("net_sched: act: use standard struct list_head")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull nfsd updates from Bruce Fields:
"Highlights:
- support the NFSv4.2 SEEK operation (allowing clients to support
SEEK_HOLE/SEEK_DATA), thanks to Anna.
- end the grace period early in a number of cases, mitigating a
long-standing annoyance, thanks to Jeff
- improve SMP scalability, thanks to Trond"
* 'for-3.18' of git://linux-nfs.org/~bfields/linux: (55 commits)
nfsd: eliminate "to_delegation" define
NFSD: Implement SEEK
NFSD: Add generic v4.2 infrastructure
svcrdma: advertise the correct max payload
nfsd: introduce nfsd4_callback_ops
nfsd: split nfsd4_callback initialization and use
nfsd: introduce a generic nfsd4_cb
nfsd: remove nfsd4_callback.cb_op
nfsd: do not clear rpc_resp in nfsd4_cb_done_sequence
nfsd: fix nfsd4_cb_recall_done error handling
nfsd4: clarify how grace period ends
nfsd4: stop grace_time update at end of grace period
nfsd: skip subsequent UMH "create" operations after the first one for v4.0 clients
nfsd: set and test NFSD4_CLIENT_STABLE bit to reduce nfsdcltrack upcalls
nfsd: serialize nfsdcltrack upcalls for a particular client
nfsd: pass extra info in env vars to upcalls to allow for early grace period end
nfsd: add a v4_end_grace file to /proc/fs/nfsd
lockd: add a /proc/fs/lockd/nlm_end_grace file
nfsd: reject reclaim request when client has already sent RECLAIM_COMPLETE
nfsd: remove redundant boot_time parm from grace_done client tracking op
...
Highlights include:
Stable fixes:
- fix an NFSv4.1 state renewal regression
- fix open/lock state recovery error handling
- fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
- fix statd when reconnection fails
- Don't wake tasks during connection abort
- Don't start reboot recovery if lease check fails
- fix duplicate proc entries
Features:
- pNFS block driver fixes and clean ups from Christoph
- More code cleanups from Anna
- Improve mmap() writeback performance
- Replace use of PF_TRANS with a more generic mechanism for avoiding
deadlocks in nfs_release_page
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUMpFYAAoJEGcL54qWCgDywHYP/A7XNykwOGhoHVP1Cgr3xqoz
gVhAw97AEMZE8xSNVEGS++pJTe59JVzsIsYAwdHMwePV33l3zyzYorae6N9p7zWF
0xVaNQ4qNLVhbrNLAoB5KA/c3/jMnNjF5t15+8akZad5pt4kXLlhSKjyVpdEEtJE
A0eneXShMYEeLZoOJhpQt5bsw0OZ8YbWWEMjGlDqyeelvV3K1+zfivQOoyX6hS4w
XFkPEDmU7zunE/xFP9ZoUaVdLO0TvOWfEZ7STWoHm7NuWfPQiDb9w1mTnuZbZyka
ssezoGcitzwsjCcQ5e1iKTOoFRIsm/zYXFQgFQL7VFMBU1Tss9Of8047EyDkqcPF
GxctsGg0gQ2FkG7yx7JH7AKpyibOIuByQrQQ916coWSf7K0L4H4Rcky3vryroylP
1e1RI49xu215OTm+dLvlvYCv55bqCrTmaUGImZac18+ixD2eh6MNfW2ubSdxk89L
U2rTFV09Bd52N7IQOGQx1FBEI2ZnIFUV4UaFz7v+rGFxOnk6+WYe+iWyb4wC70Yc
8Jh/gTIQDd5aghql3FTieMOyfEvO6Re4pLMXmqEWMAevicx2t8DwkJriRu6X8Iy2
rlDlBPwu5QmRWC20Dc897f0VajwDtwdeB8puod7nobOWzOfx4FrNqLJ+jR3pmHUk
0otvJytqemXt+zkqqHKK
=/OQi
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Stable fixes:
- fix an NFSv4.1 state renewal regression
- fix open/lock state recovery error handling
- fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
- fix statd when reconnection fails
- don't wake tasks during connection abort
- don't start reboot recovery if lease check fails
- fix duplicate proc entries
Features:
- pNFS block driver fixes and clean ups from Christoph
- More code cleanups from Anna
- Improve mmap() writeback performance
- Replace use of PF_TRANS with a more generic mechanism for avoiding
deadlocks in nfs_release_page"
* tag 'nfs-for-3.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (66 commits)
NFSv4.1: Fix an NFSv4.1 state renewal regression
NFSv4: fix open/lock state recovery error handling
NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
NFS: Fabricate fscache server index key correctly
SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT
NFSv3: Fix missing includes of nfs3_fs.h
NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page()
NFS: avoid waiting at all in nfs_release_page when congested.
NFS: avoid deadlocks with loop-back mounted NFS filesystems.
MM: export page_wakeup functions
SCHED: add some "wait..on_bit...timeout()" interfaces.
NFS: don't use STABLE writes during writeback.
NFSv4: use exponential retry on NFS4ERR_DELAY for async requests.
rpc: Add -EPERM processing for xs_udp_send_request()
rpc: return sent and err from xs_sendpages()
lockd: Try to reconnect if statd has moved
SUNRPC: Don't wake tasks during connection abort
Fixing lease renewal
nfs: fix duplicate proc entries
pnfs/blocklayout: Fix a 64-bit division/remainder issue in bl_map_stripe
...
1/ Step down as dmaengine maintainer see commit 08223d80df "dmaengine
maintainer update"
2/ Removal of net_dma, as it has been marked 'broken' since 3.13 (commit
7787380336 "net_dma: mark broken"), without reports of performance
regression.
3/ Miscellaneous fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUKDLKAAoJEB7SkWpmfYgC7wwP/iNHqRjf1suMUTBIF3P6Hgbe
VCUwh0IkuujMPDG46WRn6cYzarRxVPLoGaLHLPszgjI6pmGPVv19wqeDOlUxtcmr
0iQWEWv/zqseaAIW+4gj/WYCyMgKil49EUBJKCZCfNmIaad+e0pr8f0uE5yOkHPM
tqWoZERu9A4dlXGr1TjeOZVzdnPrCt92MrLDN6ZZ6tMuJaEc5PauaLxKTeGy5fYj
UB+k1xJQzECbsYfpB+uCVYl5/qPO1rNyuBYS8THCsW+JYmrbbfH2kkF2lo2FaUpO
8Yd50FtzXHKWwAt7BzfIwU2M7x0wRmryrC/xsQi6M+WmVeHYvvHUIpzaA66xRZ5x
fCy3Fu8sEnmnmboAbh2v2c5uTycqRl2xPzbpLAuxglloXIxzi3ckp6ESF/Z4SldH
oxIoEievN7lah3vKgvlHZYcWDzrYr8EKf/EzFe9RqDBQDKtzDzre1H9Uivr387Vm
uFUcGHYG/GXuX47C7EUsMtaSW2UEoR2ytw/HR6CKFPTVXwAzEO6kA9vg0EqL0iIq
2wVLgavlZuwegmaUBgnr+bgVZMvVN7OU7fAIRVe5xNO6itrPKvheSlQthmRiiq9C
uzOu4PS6PexqzHUNPCcJpCsj+lawmCSrE0bxtPzTA/CQInVgWs219V9+W5Gn/0YA
EARN9k6ueX9PZPQrPQLm
=BBBv
-----END PGP SIGNATURE-----
Merge tag 'dmaengine-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine
Pull dmaengine updates from Dan Williams:
"Even though this has fixes marked for -stable, given the size and the
needed conflict resolutions this is 3.18-rc1/merge-window material.
These patches have been languishing in my tree for a long while. The
fact that I do not have the time to do proper/prompt maintenance of
this tree is a primary factor in the decision to step down as
dmaengine maintainer. That and the fact that the bulk of drivers/dma/
activity is going through Vinod these days.
The net_dma removal has not been in -next. It has developed simple
conflicts against mainline and net-next (for-3.18).
Continuing thanks to Vinod for staying on top of drivers/dma/.
Summary:
1/ Step down as dmaengine maintainer see commit 08223d80df
"dmaengine maintainer update"
2/ Removal of net_dma, as it has been marked 'broken' since 3.13
(commit 7787380336 "net_dma: mark broken"), without reports of
performance regression.
3/ Miscellaneous fixes"
* tag 'dmaengine-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
net: make tcp_cleanup_rbuf private
net_dma: revert 'copied_early'
net_dma: simple removal
dmaengine maintainer update
dmatest: prevent memory leakage on error path in thread
ioat: Use time_before_jiffies()
dmaengine: fix xor sources continuation
dma: mv_xor: Rename __mv_xor_slot_cleanup() to mv_xor_slot_cleanup()
dma: mv_xor: Remove all callers of mv_xor_slot_cleanup()
dma: mv_xor: Remove unneeded mv_xor_clean_completed_slots() call
ioat: Use pci_enable_msix_exact() instead of pci_enable_msix()
drivers: dma: Include appropriate header file in dca.c
drivers: dma: Mark functions as static in dma_v3.c
dma: mv_xor: Add DMA API error checks
ioat/dca: Use dev_is_pci() to check whether it is pci device
Use current logging functions and add module name prefix.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no reason to emit a log message for these.
Based upon a suggestion from Hannes Frederic Sowa.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
As we may defragment the packet in IPv4 PRE_ROUTING and refragment
it after POST_ROUTING we should save the value of frag_max_size.
This is still very wrong as the bridge is supposed to leave the
packets intact, meaning that the right thing to do is to use the
original frag_list for fragmentation.
Unfortunately we don't currently guarantee that the frag_list is
left untouched throughout netfilter so until this changes this is
the best we can do.
There is also a spot in FORWARD where it appears that we can
forward a packet without going through fragmentation, mark it
so that we can fix it later.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
One aim of commit 50100a5e39 ("tipc:
use pseudo message to wake up sockets after link congestion") was
to handle link congestion abatement in a uniform way for both unicast
and multicast transmit. However, the latter doesn't work correctly,
and has been broken since the referenced commit was applied.
If a user now sends a burst of multicast messages that is big
enough to cause broadcast link congestion, it will be put to sleep,
and not be waked up when the congestion abates as it should be.
This has two reasons. First, the flag that is used, TIPC_WAKEUP_USERS,
is set correctly, but in the wrong field. Instead of setting it in the
'action_flags' field of the arrival node struct, it is by mistake set
in the dummy node struct that is owned by the broadcast link, where it
will never tested for. Second, we cannot use the same flag for waking
up unicast and multicast users, since the function tipc_node_unlock()
needs to pick the wakeup pseudo messages to deliver from different
queues. It must hence be able to distinguish between the two cases.
This commit solves this problem by adding a new flag
TIPC_WAKEUP_BCAST_USERS, and a new function tipc_bclink_wakeup_user().
The latter is to be called by tipc_node_unlock() when the named flag,
now set in the correct field, is encountered.
v2: using explicit 'unsigned int' declaration instead of 'uint', as
per comment from David Miller.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFT_REJECT_ICMPX_MAX should be __NFT_REJECT_ICMPX_MAX - 1.
nft_reject_icmp_code() and nft_reject_icmpv6_code() are called from the
packet path, so BUG_ON in case we try to access an unknown abstracted
ICMP code. This should not happen since we already validate this from
nft_reject_{inet,bridge}_init().
Fixes: 51b0a5d ("netfilter: nft_reject: introduce icmp code abstraction for inet and bridge")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Testing xmit_more support with netperf and connected UDP sockets,
I found strange dst refcount false sharing.
Current handling of IFF_XMIT_DST_RELEASE is not optimal.
Dropping dst in validate_xmit_skb() is certainly too late in case
packet was queued by cpu X but dequeued by cpu Y
The logical point to take care of drop/force is in __dev_queue_xmit()
before even taking qdisc lock.
As Julian Anastasov pointed out, need for skb_dst() might come from some
packet schedulers or classifiers.
This patch adds new helper to cleanly express needs of various drivers
or qdiscs/classifiers.
Drivers that need skb_dst() in their ndo_start_xmit() should call
following helper in their setup instead of the prior :
dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
->
netif_keep_dst(dev);
Instead of using a single bit, we use two bits, one being
eventually rebuilt in bonding/team drivers.
The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
rebuilt in bonding/team. Eventually, we could add something
smarter later.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Probably not a big deal, but we'd better just use the
one we get in retry loop.
Fixes: commit 22e0f8b932 ("net: sched: make bstats per cpu and estimator RCU safe")
Reported-by: Joe Perches <joe@perches.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a openvswitch compilation error when CONFIG_INET is not set:
=====================================================
In file included from include/net/geneve.h:4:0,
from net/openvswitch/flow_netlink.c:45:
include/net/udp_tunnel.h: In function 'udp_tunnel_handle_offloads':
>> include/net/udp_tunnel.h💯2: error: implicit declaration of function 'iptunnel_handle_offloads' [-Werror=implicit-function-declaration]
>> return iptunnel_handle_offloads(skb, udp_csum, type);
>> ^
>> >> include/net/udp_tunnel.h💯2: warning: return makes pointer from integer without a cast
>> >> cc1: some warnings being treated as errors
=====================================================
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a sparse warning introduced by commit:
f579668406 (openvswitch: Add support for
Geneve tunneling.) caught by kbuild test robot:
reproduce:
# apt-get install sparse
# git checkout f579668406
# make ARCH=x86_64 allmodconfig
# make C=1 CF=-D__CHECK_ENDIAN__
#
#
# sparse warnings: (new ones prefixed by >>)
#
# >> net/openvswitch/vport-geneve.c:109:15: sparse: incorrect type in assignment (different base types)
# net/openvswitch/vport-geneve.c:109:15: expected restricted __be16 [usertype] sport
# net/openvswitch/vport-geneve.c:109:15: got int
# >> net/openvswitch/vport-geneve.c:110:56: sparse: incorrect type in argument 3 (different base types)
# net/openvswitch/vport-geneve.c:110:56: expected unsigned short [unsigned] [usertype] value
# net/openvswitch/vport-geneve.c:110:56: got restricted __be16 [usertype] sport
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Try to reduce number of possible fn_sernum mutation by constraining them
to their namespace.
Also remove rt_genid which I forgot to remove in 705f1c869d ("ipv6:
remove rt6i_genid").
Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also renamed struct fib6_walker_t to fib6_walker and enum fib_walk_state_t
to fib6_walk_state as recommended by Cong Wang.
Cc: Cong Wang <cwang@twopensource.com>
Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marking this as static allows compiler to inline it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using the tcf_proto pointer 'tp' from inside the classifiers callback
is not valid because it may have been cleaned up by another call_rcu
occuring on another CPU.
'tp' is currently being used by tcf_unbind_filter() in this patch we
move instances of tcf_unbind_filter outside of the call_rcu() context.
This is safe to do because any running schedulers will either read the
valid class field or it will be zeroed.
And all schedulers today when the class is 0 do a lookup using the
same call used by the tcf_exts_bind(). So even if we have a running
classifier hit the null class pointer it will do a lookup and get
to the same result. This is particularly fragile at the moment because
the only way to verify this is to audit the schedulers call sites.
Reported-by: Cong Wang <xiyou.wangconf@gmail.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not RCU safe to destroy the action chain while there
is a possibility of readers accessing it. Move this code
into the rcu callback using the same rcu callback used in the
code patch to make a change to head.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This removes the tcf_proto argument from the ematch code paths that
only need it to reference the net namespace. This allows simplifying
qdisc code paths especially when we need to tear down the ematch
from an RCU callback. In this case we can not guarentee that the
tcf_proto structure is still valid.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some TSO engines might have a too heavy setup cost, that impacts
performance on hosts sending small bursts (2 MSS per packet).
This patch adds a device gso_min_segs, allowing drivers to set
a minimum segment size for TSO packets, according to the NIC
performance.
Tested on a mlx4 NIC, this allows to get a ~110% increase of
throughput when sending 2 MSS per packet.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case we find a general query with non-zero number of sources, we
are dropping the skb as it's malformed.
RFC3376, section 4.1.8. Number of Sources (N):
This number is zero in a General Query or a Group-Specific Query,
and non-zero in a Group-and-Source-Specific Query.
Therefore, reflect that by using kfree_skb() instead of consume_skb().
Fixes: d679c5324d ("igmp: avoid drop_monitor false positives")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use new ethtool [sg]et_tunable() to set tx_copybread (inline threshold)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Standard qdisc API to setup a timer implies an atomic operation on every
packet dequeue : qdisc_unthrottled()
It turns out this is not really needed for FQ, as FQ has no concept of
global qdisc throttling, being a qdisc handling many different flows,
some of them can be throttled, while others are not.
Fix is straightforward : add a 'bool throttle' to
qdisc_watchdog_schedule_ns(), and remove calls to qdisc_unthrottled()
in sch_fq.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Its unfortunate we have to walk again skb list to find the tail
after segmentation, even if data is probably hot in cpu caches.
skb_segment() can store the tail of the list into segs->prev,
and validate_xmit_skb_list() can immediately get the tail.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Openvswitch implementation is completely agnostic to the options
that are in use and can handle newly defined options without
further work. It does this by simply matching on a byte array
of options and allowing userspace to setup flows on this array.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Singed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the size of the flow key grows, it can put some pressure on the
stack. This is particularly true in ovs_flow_cmd_set(), which needs several
copies of the key on the stack. One of those uses is logically separate,
so this factors it out to reduce stack pressure and improve readibility.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the flow information that is matched for tunnels and
the tunnel data passed around with packets is the same. However,
as additional information is added this is not necessarily desirable,
as in the case of pointers.
This adds a new structure for tunnel metadata which currently contains
only the existing struct. This change is purely internal to the kernel
since the current OVS_KEY_ATTR_IPV4_TUNNEL is simply a compressed version
of OVS_KEY_ATTR_TUNNEL that is translated at flow setup.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some tunnel formats have mechanisms for indicating that packets are
OAM frames that should be handled specially (either as high priority or
not forwarded beyond an endpoint). This provides support for allowing
those types of packets to be matched.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As new protocols are added, the size of the flow key tends to
increase although few protocols care about all of the fields. In
order to optimize this for hashing and matching, OVS uses a variable
length portion of the key. However, when fields are extracted from
the packet we must still zero out the entire key.
This is no longer necessary now that OVS implements masking. Any
fields (or holes in the structure) which are not part of a given
protocol will be by definition not part of the mask and zeroed out
during lookup. Furthermore, since masking already uses variable
length keys this zeroing operation automatically benefits as well.
In principle, the only thing that needs to be done at this point
is remove the memset() at the beginning of flow. However, some
fields assume that they are initialized to zero, which now must be
done explicitly. In addition, in the event of an error we must also
zero out corresponding fields to signal that there is no valid data
present. These increase the total amount of code but very little of
it is executed in non-error situations.
Removing the memset() reduces the profile of ovs_flow_extract()
from 0.64% to 0.56% when tested with large packets on a 10G link.
Suggested-by: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a device level support for Geneve -- Generic Network
Virtualization Encapsulation. The protocol is documented at
http://tools.ietf.org/html/draft-gross-geneve-01
Only protocol layer Geneve support is provided by this driver.
Openvswitch can be used for configuring, set up and tear down
functional Geneve tunnels.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently association restarts do not take into consideration the
state of the socket. When a restart happens, the current assocation
simply transitions into established state. This creates a condition
where a remote system, through a the restart procedure, may create a
local association that is no way reachable by user. The conditions
to trigger this are as follows:
1) Remote does not acknoledge some data causing data to remain
outstanding.
2) Local application calls close() on the socket. Since data
is still outstanding, the association is placed in SHUTDOWN_PENDING
state. However, the socket is closed.
3) The remote tries to create a new association, triggering a restart
on the local system. The association moves from SHUTDOWN_PENDING
to ESTABLISHED. At this point, it is no longer reachable by
any socket on the local system.
This patch addresses the above situation by moving the newly ESTABLISHED
association into SHUTDOWN-SENT state and bundling a SHUTDOWN after
the COOKIE-ACK chunk. This way, the restarted associate immidiately
enters the shutdown procedure and forces the termination of the
unreachable association.
Reported-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless-next 2014-10-03
Please pull tihs batch of updates intended for the 3.18 stream!
For the iwlwifi bits, Emmanuel says:
"I have here a few things that depend on the latest mac80211's changes:
RRM, TPC, Quiet Period etc... Eyal keeps improving our rate control
and we have a new device ID. This last patch should probably have
gone to wireless.git, but at that stage, I preferred to send it to
-next and CC stable."
For (most of) the Atheros bits, Kalle says:
"The only new feature is testmode support from me. Ben added a new method
to crash the firmware with an assert for debug purposes. As usual, we
have lots of smaller fixes from Michal. Matteo fixed a Kconfig
dependency with debugfs. I fixed some warnings recently added to
checkpatch."
For the NFC bits, Samuel says:
"We've had major updates for TI and ST Microelectronics drivers, and a
few NCI related changes.
For TI's trf7970a driver:
- Target mode support for trf7970a
- Suspend/resume support for trf7970a
- DT properties additions to handle different quirks
- A bunch of fixes for smartphone IOP related issues
For ST Microelectronics' ST21NFCA and ST21NFCB drivers:
- ISO15693 support for st21nfcb
- checkpatch and sparse related warning fixes
- Code cleanups and a few minor fixes
Finally, Marvell added ISO15693 support to the NCI stack, together with a
couple of NCI fixes."
For the Bluetooth bits, Johan says:
"This 3.18 pull request replaces the one I did on Monday ("bluetooth-next
2014-09-22", which hasn't been pulled yet). The additions since the last
request are:
- SCO connection fix for devices not supporting eSCO
- Cleanups regarding the SCO establishment logic
- Remove unnecessary return value from logging functions
- Header compression fix for 6lowpan
- Cleanups to the ieee802154/mrf24j40 driver
Here's a copy from previous request that this one replaces:
'
Here are some more patches for 3.18. They include various fixes to the
btusb HCI driver, a fix for LE SMP, as well as adding Jukka to the
MAINTAINERS file for generic 6LoWPAN (as requested by Alexander Aring).
I've held on to this pull request a bit since we were waiting for a SCO
related fix to get sorted out first. However, since the merge window is
getting closer I decided not to wait for it. If we do get the fix sorted
out there'll probably be a second small pull request later this week.
'"
And,
"Unless 3.17 gets delayed this will probably be our last -next pull request for
3.18. We've got:
- New Marvell hardware supportr
- Multicast support for 6lowpan
- Several of 6lowpan fixes & cleanups
- Fix for a (false-positive) lockdep warning in L2CAP
- Minor btusb cleanup"
On top of all that comes the usual sort of updates to ath5k, ath9k,
ath10k, brcmfmac, mwifiex, and wil6210. This time around there are
also a number of rtlwifi updates to enable some new hardware and
to reconcile the in-kernel drivers with some newer releases of the
Realtek vendor drivers. Also of note is some device tree work for
the bcma bus.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains another batch with Netfilter/IPVS updates
for net-next, they are:
1) Add abstracted ICMP codes to the nf_tables reject expression. We
introduce four reasons to reject using ICMP that overlap in IPv4
and IPv6 from the semantic point of view. This should simplify the
maintainance of dual stack rule-sets through the inet table.
2) Move nf_send_reset() functions from header files to per-family
nf_reject modules, suggested by Patrick McHardy.
3) We have to use IS_ENABLED(CONFIG_BRIDGE_NETFILTER) everywhere in the
code now that br_netfilter can be modularized. Convert remaining spots
in the network stack code.
4) Use rcu_barrier() in the nf_tables module removal path to ensure that
we don't leave object that are still pending to be released via
call_rcu (that may likely result in a crash).
5) Remove incomplete arch 32/64 compat from nft_compat. The original (bad)
idea was to probe the word size based on the xtables match/target info
size, but this assumption is wrong when you have to dump the information
back to userspace.
6) Allow to filter from prerouting and postrouting in the nf_tables bridge.
In order to emulate the ebtables NAT chains (which are actually simple
filter chains with no special semantics), we have support filtering from
this hooks too.
7) Add explicit module dependency between xt_physdev and br_netfilter.
This provides a way to detect if the user needs br_netfilter from
the configuration path. This should reduce the breakage of the
br_netfilter modularization.
8) Cleanup coding style in ip_vs.h, from Simon Horman.
9) Fix crash in the recently added nf_tables masq expression. We have
to register/unregister the notifiers to clean up the conntrack table
entries from the module init/exit path, not from the rule addition /
deletion path. From Arturo Borrero.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when vlan filtering is turned on on the bridge, the bridge
will drop all traffic untill the user configures the filter. This
isn't very nice for ports that don't care about vlans and just
want untagged traffic.
A concept of a default_pvid was recently introduced. This patch
adds filtering support for default_pvid. Now, ports that don't
care about vlans and don't define there own filter will belong
to the VLAN of the default_pvid and continue to receive untagged
traffic.
This filtering can be disabled by setting default_pvid to 0.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, if the pvid is not set, we return an illegal vlan value
even though the pvid value is set to 0. Since pvid of 0 is currently
invalid, just return 0 instead. This makes the current and future
checks simpler.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows the user to set and retrieve default_pvid
value. A new value can only be stored when vlan filtering
is disabled.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The result of a negated container has to be inverted before checking for
early ending.
This fixes my previous attempt (17c9c82326) to
make inverted containers work correctly.
Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f7f1de51ed ("net: dsa: start and stop the PHY state machine")
add calls to phy_start() in dsa_slave_open() respectively phy_stop() in
dsa_slave_close().
We also call phy_start_aneg() in dsa_slave_create(), and this call is
messing up with the PHY state machine, since we basically start the
auto-negotiation, and later on restart it when calling phy_start().
phy_start() does not currently handle the PHY_FORCING or PHY_AN states
properly, but such a fix would be too invasive for this window.
Fixes: f7f1de51ed ("net: dsa: start and stop the PHY state machine")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SKB_FCLONE_UNAVAILABLE has overloaded meaning depending on type of skb.
1: If skb is allocated from head_cache, it indicates fclone is not available.
2: If skb is a companion fclone skb (allocated from fclone_cache), it indicates
it is available to be used.
To avoid confusion for case 2 above, this patch replaces
SKB_FCLONE_UNAVAILABLE with SKB_FCLONE_FREE where appropriate. For fclone
companion skbs, this indicates it is free for use.
SKB_FCLONE_UNAVAILABLE will now simply indicate skb is from head_cache and
cannot / will not have a companion fclone.
Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In xmit path, we build a flowi6 which will be used for the output route lookup.
We are sending a GRE packet, neither IPv4 nor IPv6 encapsulated packet, thus the
protocol should be IPPROTO_GRE.
Fixes: c12b395a46 ("gre: Support GRE over IPv6")
Reported-by: Matthieu Ternisien d'Ouville <matthieu.tdo@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows configuring IPIP, sit, and GRE tunnels to use GUE.
This is very similar to fou excpet that we need to insert the GUE header
in addition to the UDP header on transmit.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support receiving for GUE packets in the fou module. The
fou module now supports direct foo-over-udp (no encapsulation header)
and GUE. To support this a type parameter is added to the fou netlink
parameters.
For a GUE socket we define gue_udp_recv, gue_gro_receive, and
gue_gro_complete to handle the specifics of the GUE protocol. Most
of the code to manage and configure sockets is common with the fou.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes fou[46]_gro_receive and fou[46]_gro_complete
functions. The v4 or v6 variants were chosen for the UDP offloads
based on the address family of the socket this is not necessary
or correct. Alternatively, this patch adds is_ipv6 to napi_gro_skb.
This is set in udp6_gro_receive and unset in udp4_gro_receive. In
fou_gro_receive the value is used to select the correct inet_offloads
for the protocol of the outer IP header.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When adjusting max_header for the tunnel interface based on egress
device we need to account for any extra bytes in secondary encapsulation
(e.g. FOU).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_gro_receive() is only called from tcp_gro_receive() which is
not in a module.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Validation of skb can be pretty expensive :
GSO segmentation and/or checksum computations.
We can do this without holding qdisc lock, so that other cpus
can queue additional packets.
Trick is that requeued packets were already validated, so we carry
a boolean so that sch_direct_xmit() can validate a fresh skb list,
or directly use an old one.
Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads
host.
Turning TSO on or off had no effect on throughput, only few more cpu
cycles. Lock contention on qdisc lock disappeared.
Same if disabling TX checksum offload.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I got a report of a double free happening at RDS slab cache. One
suspicion was that may be somewhere we were doing a sock_hold/sock_put
on an already freed sock. Thus after providing a kernel with the
following change:
static inline void sock_hold(struct sock *sk)
{
- atomic_inc(&sk->sk_refcnt);
+ if (!atomic_inc_not_zero(&sk->sk_refcnt))
+ WARN(1, "Trying to hold sock already gone: %p (family: %hd)\n",
+ sk, sk->sk_family);
}
The warning successfuly triggered:
Trying to hold sock already gone: ffff81f6dda61280 (family: 21)
WARNING: at include/net/sock.h:350 sock_hold()
Call Trace:
<IRQ> [<ffffffff8adac135>] :rds:rds_send_remove_from_sock+0xf0/0x21b
[<ffffffff8adad35c>] :rds:rds_send_drop_acked+0xbf/0xcf
[<ffffffff8addf546>] :rds_rdma:rds_ib_recv_tasklet_fn+0x256/0x2dc
[<ffffffff8009899a>] tasklet_action+0x8f/0x12b
[<ffffffff800125a2>] __do_softirq+0x89/0x133
[<ffffffff8005f30c>] call_softirq+0x1c/0x28
[<ffffffff8006e644>] do_softirq+0x2c/0x7d
[<ffffffff8006e4d4>] do_IRQ+0xee/0xf7
[<ffffffff8005e625>] ret_from_intr+0x0/0xa
<EOI>
Looking at the call chain above, the only way I think this would be
possible is if somewhere we already released the same socket->sock which
is assigned to the rds_message at rds_send_remove_from_sock. Which seems
only possible to happen after the tear down done on rds_release.
rds_release properly calls rds_send_drop_to to drop the socket from any
rds_message, and some proper synchronization is in place to avoid race
with rds_send_drop_acked/rds_send_remove_from_sock. However, I still see
a very narrow window where it may be possible we touch a sock already
released: when rds_release races with rds_send_drop_acked, we check
RDS_MSG_ON_CONN to avoid cleanup on the same rds_message, but in this
specific case we don't clear rm->m_rs. In this case, it seems we could
then go on at rds_send_drop_to and after it returns, the sock is freed
by last sock_put on rds_release, with concurrently we being at
rds_send_remove_from_sock; then at some point in the loop at
rds_send_remove_from_sock we process an rds_message which didn't have
rm->m_rs unset for a freed sock, and a possible sock_hold on an sock
already gone at rds_release happens.
This hopefully address the described condition above and avoids a double
free on "second last" sock_put. In addition, I removed the comment about
socket destruction on top of rds_send_drop_acked: we call rds_send_drop_to
in rds_release and we should have things properly serialized there, thus
I can't see the comment being accurate there.
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I see two problems if we consider the sock->ops->connect attempt to fail in
rds_tcp_conn_connect. The first issue is that for example we don't remove the
previously added rds_tcp_connection item to rds_tcp_tc_list at
rds_tcp_set_callbacks, which means that on a next reconnect attempt for the
same rds_connection, when rds_tcp_conn_connect is called we can again call
rds_tcp_set_callbacks, resulting in duplicated items on rds_tcp_tc_list,
leading to list corruption: to avoid this just make sure we call
properly rds_tcp_restore_callbacks before we exit. The second issue
is that we should also release the sock properly, by setting sock = NULL
only if we are returning without error.
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TSO and GSO segmented packets already benefit from bulking
on their own.
The TSO packets have always taken advantage of the only updating
the tailptr once for a large packet.
The GSO segmented packets have recently taken advantage of
bulking xmit_more API, via merge commit 53fda7f7f9 ("Merge
branch 'xmit_list'"), specifically via commit 7f2e870f2a ("net:
Move main gso loop out of dev_hard_start_xmit() into helper.")
allowing qdisc requeue of remaining list. And via commit
ce93718fb7 ("net: Don't keep around original SKB when we
software segment GSO frames.").
This patch allow further bulking of TSO/GSO packets together,
when dequeueing from the qdisc.
Testing:
Measuring HoL (Head-of-Line) blocking for TSO and GSO, with
netperf-wrapper. Bulking several TSO show no performance regressions
(requeues were in the area 32 requeues/sec).
Bulking several GSOs does show small regression or very small
improvement (requeues were in the area 8000 requeues/sec).
Using ixgbe 10Gbit/s with GSO bulking, we can measure some additional
latency. Base-case, which is "normal" GSO bulking, sees varying
high-prio queue delay between 0.38ms to 0.47ms. Bulking several GSOs
together, result in a stable high-prio queue delay of 0.50ms.
Using igb at 100Mbit/s with GSO bulking, shows an improvement.
Base-case sees varying high-prio queue delay between 2.23ms to 2.35ms
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
sending/processing an entire skb list.
This patch implements qdisc bulk dequeue, by allowing multiple packets
to be dequeued in dequeue_skb().
The optimization principle for this is two fold, (1) to amortize
locking cost and (2) avoid expensive tailptr update for notifying HW.
(1) Several packets are dequeued while holding the qdisc root_lock,
amortizing locking cost over several packet. The dequeued SKB list is
processed under the TXQ lock in dev_hard_start_xmit(), thus also
amortizing the cost of the TXQ lock.
(2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
API to delay HW tailptr update, which also reduces the cost per
packet.
One restriction of the new API is that every SKB must belong to the
same TXQ. This patch takes the easy way out, by restricting bulk
dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
qdisc only have attached a single TXQ.
Some detail about the flow; dev_hard_start_xmit() will process the skb
list, and transmit packets individually towards the driver (see
xmit_one()). In case the driver stops midway in the list, the
remaining skb list is returned by dev_hard_start_xmit(). In
sch_direct_xmit() this returned list is requeued by dev_requeue_skb().
To avoid overshooting the HW limits, which results in requeuing, the
patch limits the amount of bytes dequeued, based on the drivers BQL
limits. In-effect bulking will only happen for BQL enabled drivers.
Small amounts for extra HoL blocking (2x MTU/0.24ms) were
measured at 100Mbit/s, with bulking 8 packets, but the
oscillating nature of the measurement indicate something, like
sched latency might be causing this effect. More comparisons
show, that this oscillation goes away occationally. Thus, we
disregard this artifact completely and remove any "magic" bulking
limit.
For now, as a conservative approach, stop bulking when seeing TSO and
segmented GSO packets. They already benefit from bulking on their own.
A followup patch add this, to allow easier bisect-ability for finding
regressions.
Jointed work with Hannes, Daniel and Florian.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have to register the notifiers in the masquerade expression from
the the module _init and _exit path.
This fixes crashes when removing the masquerade rule with no
ipt_MASQUERADE support in place (which was masking the problem).
Fixes: 9ba1f72 ("netfilter: nf_tables: add new nft_masq expression")
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Conflicts:
drivers/net/usb/r8152.c
net/netfilter/nfnetlink.c
Both r8152 and nfnetlink conflicts were simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
You can use physdev to match the physical interface enslaved to the
bridge device. This information is stored in skb->nf_bridge and it is
set up by br_netfilter. So, this is only available when iptables is
used from the bridge netfilter path.
Since 34666d4 ("netfilter: bridge: move br_netfilter out of the core"),
the br_netfilter code is modular. To reduce the impact of this change,
we can autoload the br_netfilter if the physdev match is used since
we assume that the users need br_netfilter in place.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This allows us to emulate the NAT table in ebtables, which is actually
a plain filter chain that hooks at prerouting, output and postrouting.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This code was based on the wrong asumption that you can probe based
on the match/target private size that we get from userspace. This
doesn't work at all when you have to dump the info back to userspace
since you don't know what word size the userspace utility is using.
Currently, the extensions that require arch compat are limit match
and the ebt_mark match/target. The standard targets are not used by
the nft-xt compat layer, so they are not affected. We can work around
this limitation with a new revision that uses arch agnostic types.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In 34666d4 ("netfilter: bridge: move br_netfilter out of the core"),
the bridge netfilter code has been modularized.
Use IS_ENABLED instead of ifdef to cover the module case.
Fixes: 34666d4 ("netfilter: bridge: move br_netfilter out of the core")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Move nf_send_reset() and nf_send_reset6() to nf_reject_ipv4 and
nf_reject_ipv6 respectively. This code is shared by x_tables and
nf_tables.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch introduces the NFT_REJECT_ICMPX_UNREACH type which provides
an abstraction to the ICMP and ICMPv6 codes that you can use from the
inet and bridge tables, they are:
* NFT_REJECT_ICMPX_NO_ROUTE: no route to host - network unreachable
* NFT_REJECT_ICMPX_PORT_UNREACH: port unreachable
* NFT_REJECT_ICMPX_HOST_UNREACH: host unreachable
* NFT_REJECT_ICMPX_ADMIN_PROHIBITED: administratevely prohibited
You can still use the specific codes when restricting the rule to match
the corresponding layer 3 protocol.
I decided to not overload the existing NFT_REJECT_ICMP_UNREACH to have
different semantics depending on the table family and to allow the user
to specify ICMP family specific codes if they restrict it to the
corresponding family.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We did not return error if multicast packet transmit failed.
This might not be desired so return error also in this case.
If there are multiple 6lowpan devices where the multicast packet
is sent, then return error even if sending to only one of them fails.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Make sure that we are able to return EAGAIN from l2cap_chan_send()
even for multicast packets. The error code was ignored unncessarily.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
If skb_unshare() returns NULL, then we leak the original skb.
Solution is to use temp variable to hold the new skb.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The earlier multicast commit 36b3dd250d ("Bluetooth: 6lowpan:
Ensure header compression does not corrupt IPv6 header") lost one
skb free which then caused memory leak.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The L2CAP connection's channel list lock (conn->chan_lock) must never be
taken while already holding a channel lock (chan->lock) in order to
avoid lock-inversion and lockdep warnings. So far the l2cap_chan_connect
function has acquired the chan->lock early in the function and then
later called l2cap_chan_add(conn, chan) which will try to take the
conn->chan_lock. This violates the correct order of taking the locks and
may lead to the following type of lockdep warnings:
-> #1 (&conn->chan_lock){+.+...}:
[<c109324d>] lock_acquire+0x9d/0x140
[<c188459c>] mutex_lock_nested+0x6c/0x420
[<d0aab48e>] l2cap_chan_add+0x1e/0x40 [bluetooth]
[<d0aac618>] l2cap_chan_connect+0x348/0x8f0 [bluetooth]
[<d0cc9a91>] lowpan_control_write+0x221/0x2d0 [bluetooth_6lowpan]
-> #0 (&chan->lock){+.+.+.}:
[<c10928d8>] __lock_acquire+0x1a18/0x1d20
[<c109324d>] lock_acquire+0x9d/0x140
[<c188459c>] mutex_lock_nested+0x6c/0x420
[<d0ab05fd>] l2cap_connect_cfm+0x1dd/0x3f0 [bluetooth]
[<d0a909c4>] hci_le_meta_evt+0x11a4/0x1260 [bluetooth]
[<d0a910eb>] hci_event_packet+0x3ab/0x3120 [bluetooth]
[<d0a7cb08>] hci_rx_work+0x208/0x4a0 [bluetooth]
CPU0 CPU1
---- ----
lock(&conn->chan_lock);
lock(&chan->lock);
lock(&conn->chan_lock);
lock(&chan->lock);
Before calling l2cap_chan_add() the channel is not part of the
conn->chan_l list, and can therefore only be accessed by the L2CAP user
(such as l2cap_sock.c). We can therefore assume that it is the
responsibility of the user to handle mutual exclusion until this point
(which we can see is already true in l2cap_sock.c by it in many places
touching chan members without holding chan->lock).
Since the hci_conn and by exctension l2cap_conn creation in the
l2cap_chan_connect() function depend on chan details we cannot simply
add a mutex_lock(&conn->chan_lock) in the beginning of the function
(since the conn object doesn't yet exist there). What we can do however
is move the chan->lock taking later into the function where we already
have the conn object and can that way take conn->chan_lock first.
This patch implements the above strategy and does some other necessary
changes such as using __l2cap_chan_add() which assumes conn->chan_lock
is held, as well as adding a second needed label so the unlocking
happens as it should.
Reported-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Tested-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch demonstrates the effect of delaying update of HW tailptr.
(based on earlier patch by Jesper)
burst=1 is the default. It sends one packet with xmit_more=false
burst=2 sends one packet with xmit_more=true and
2nd copy of the same packet with xmit_more=false
burst=3 sends two copies of the same packet with xmit_more=true and
3rd copy with xmit_more=false
Performance with ixgbe (usec 30):
burst=1 tx:9.2 Mpps
burst=2 tx:13.5 Mpps
burst=3 tx:14.5 Mpps full 10G line rate
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for being able to propagate port states to e.g: notifiers
or other kernel parts, do not manipulate the port state directly, but
instead use a helper function which will allow us to do a bit more than
just setting the state.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes the following crash:
[ 63.976822] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 63.980094] CPU: 1 PID: 15 Comm: ksoftirqd/1 Not tainted 3.17.0-rc6+ #648
[ 63.980094] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 63.980094] task: ffff880117dea690 ti: ffff880117dfc000 task.ti: ffff880117dfc000
[ 63.980094] RIP: 0010:[<ffffffff817e6d07>] [<ffffffff817e6d07>] u32_destroy_key+0x27/0x6d
[ 63.980094] RSP: 0018:ffff880117dffcc0 EFLAGS: 00010202
[ 63.980094] RAX: ffff880117dea690 RBX: ffff8800d02e0820 RCX: 0000000000000000
[ 63.980094] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 6b6b6b6b6b6b6b6b
[ 63.980094] RBP: ffff880117dffcd0 R08: 0000000000000000 R09: 0000000000000000
[ 63.980094] R10: 00006c0900006ba8 R11: 00006ba100006b9d R12: 0000000000000001
[ 63.980094] R13: ffff8800d02e0898 R14: ffffffff817e6d4d R15: ffff880117387a30
[ 63.980094] FS: 0000000000000000(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
[ 63.980094] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 63.980094] CR2: 00007f07e6732fed CR3: 000000011665b000 CR4: 00000000000006e0
[ 63.980094] Stack:
[ 63.980094] ffff88011a9cd300 ffffffff82051ac0 ffff880117dffce0 ffffffff817e6d68
[ 63.980094] ffff880117dffd70 ffffffff810cb4c7 ffffffff810cb3cd ffff880117dfffd8
[ 63.980094] ffff880117dea690 ffff880117dea690 ffff880117dfffd8 000000000000000a
[ 63.980094] Call Trace:
[ 63.980094] [<ffffffff817e6d68>] u32_delete_key_freepf_rcu+0x1b/0x1d
[ 63.980094] [<ffffffff810cb4c7>] rcu_process_callbacks+0x3bb/0x691
[ 63.980094] [<ffffffff810cb3cd>] ? rcu_process_callbacks+0x2c1/0x691
[ 63.980094] [<ffffffff817e6d4d>] ? u32_destroy_key+0x6d/0x6d
[ 63.980094] [<ffffffff810780a4>] __do_softirq+0x142/0x323
[ 63.980094] [<ffffffff810782a8>] run_ksoftirqd+0x23/0x53
[ 63.980094] [<ffffffff81092126>] smpboot_thread_fn+0x203/0x221
[ 63.980094] [<ffffffff81091f23>] ? smpboot_unpark_thread+0x33/0x33
[ 63.980094] [<ffffffff8108e44d>] kthread+0xc9/0xd1
[ 63.980094] [<ffffffff819e00ea>] ? do_wait_for_common+0xf8/0x125
[ 63.980094] [<ffffffff8108e384>] ? __kthread_parkme+0x61/0x61
[ 63.980094] [<ffffffff819e43ec>] ret_from_fork+0x7c/0xb0
[ 63.980094] [<ffffffff8108e384>] ? __kthread_parkme+0x61/0x61
tp could be freed in call_rcu callback too, the order is not guaranteed.
John Fastabend says:
====================
Its worth noting why this is safe. Any running schedulers will either
read the valid class field or it will be zeroed.
All schedulers today when the class is 0 do a lookup using the
same call used by the tcf_exts_bind(). So even if we have a running
classifier hit the null class pointer it will do a lookup and get
to the same result. This is particularly fragile at the moment because
the only way to verify this is to audit the schedulers call sites.
====================
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call skb_set_inner_protocol to set inner Ethernet protocol to
protocol being encapsulation by GRE before tunnel_xmit. This is
needed for GSO if UDP encapsulation (fou) is being done.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call skb_set_inner_ipproto to set inner IP protocol to IPPROTO_IPV4
before tunnel_xmit. This is needed if UDP encapsulation (fou) is
being done.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call skb_set_inner_ipproto to set inner IP protocol to IPPROTO_IPV6
before tunnel_xmit. This is needed if UDP encapsulation (fou) is
being done.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_udp_segment is the function called from udp4_ufo_fragment to
segment a UDP tunnel packet. This function currently assumes
segmentation is transparent Ethernet bridging (i.e. VXLAN
encapsulation). This patch generalizes the function to
operate on either Ethertype or IP protocol.
The inner_protocol field must be set to the protocol of the inner
header. This can now be either an Ethertype or an IP protocol
(in a union). A new flag in the skbuff indicates which type is
effective. skb_set_inner_protocol and skb_set_inner_ipproto
helper functions were added to set the inner_protocol. These
functions are called from the point where the tunnel encapsulation
is occuring.
When skb_udp_tunnel_segment is called, the function to segment the
inner packet is selected based on the inner IP or Ethertype. In the
case of an IP protocol encapsulation, the function is derived from
inet[6]_offloads. In the case of Ethertype, skb->protocol is
set to the inner_protocol and skb_mac_gso_segment is called. (GRE
currently does this, but it might be possible to lookup the protocol
in offload_base and call the appropriate segmenation function
directly).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fast clone cloning can actually avoid an atomic_inc(), if we
guarantee prior clone_ref value is 1.
This requires a change kfree_skbmem(), to perform the
atomic_dec_and_test() on clone_ref before setting fclone to
SKB_FCLONE_UNAVAILABLE.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ccid_activate is only called by __init ccid_initialize_builtins in same module.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
dccp_mib_init is only called by __init dccp_init in same module.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lets use a proper structure to clearly document and implement
skb fast clones.
Then, we might experiment more easily alternative layouts.
This patch adds a new skb_fclone_busy() helper, used by tcp and xfrm,
to stop leaking of implementation details.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we have two different policies for orphan sockets
that repeatedly stall on zero window ACKs. If a socket gets
a zero window ACK when it is transmitting data, the RTO is
used to probe the window. The socket is aborted after roughly
tcp_orphan_retries() retries (as in tcp_write_timeout()).
But if the socket was idle when it received the zero window ACK,
and later wants to send more data, we use the probe timer to
probe the window. If the receiver always returns zero window ACKs,
icsk_probes keeps getting reset in tcp_ack() and the orphan socket
can stall forever until the system reaches the orphan limit (as
commented in tcp_probe_timer()). This opens up a simple attack
to create lots of hanging orphan sockets to burn the memory
and the CPU, as demonstrated in the recent netdev post "TCP
connection will hang in FIN_WAIT1 after closing if zero window is
advertised." http://www.spinics.net/lists/netdev/msg296539.html
This patch follows the design in RTO-based probe: we abort an orphan
socket stalling on zero window when the probe timer reaches both
the maximum backoff and the maximum RTO. For example, an 100ms RTT
connection will timeout after roughly 153 seconds (0.3 + 0.6 +
.... + 76.8) if the receiver keeps the window shut. If the orphan
socket passes this check, but the system already has too many orphans
(as in tcp_out_of_resources()), we still abort it but we'll also
send an RST packet as the connection may still be active.
In addition, we change TCP_USER_TIMEOUT to cover (life or dead)
sockets stalled on zero-window probes. This changes the semantics
of TCP_USER_TIMEOUT slightly because it previously only applies
when the socket has pending transmission.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reported-by: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
cipso_v4_cache_init is only called by __init cipso_v4_init
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip4_frags_ctl_register is only called by __init ipfrag_init
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dsa_switch_suspend() and dsa_switch_resume() functions are only used
when PM_SLEEP is enabled, so they need #ifdef CONFIG_PM_SLEEP protection
to avoid a compiler warning.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Proper CHECKSUM_COMPLETE support needs to adjust skb->csum
when we remove one header. Its done using skb_gro_postpull_rcsum()
In the case of IPv4, we know that the adjustment is not really needed,
because the checksum over IPv4 header is 0. Lets add a comment to
ease code comprehension and avoid copy/paste errors.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 3243acd37f
("ieee802154: add __init to lowpan_frags_sysctl_register")
added __init to lowpan_frags_ns_sysctl_register instead of
lowpan_frags_sysctl_register
Suggested-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
No caller uses the return value, so make this function return void.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
lowpan_frags_sysctl_register is only called by __init lowpan_net_frag_init
(part of the lowpan module).
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
irlan_open is only called by __init irlan_init in same module.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric reports build failure with
CONFIG_BRIDGE_NETFILTER=n
We insist to build br_nf_core.o unconditionally, but we must only do so
if br_netfilter was enabled, else it fails to build due to
functions being defined to empty stubs (and some structure members
being defined out).
Also, BRIDGE_NETFILTER=y|m makes no sense when BRIDGE=n.
Fixes: 34666d467 (netfilter: bridge: move br_netfilter out of the core)
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet noticed that all no-nonexthop or no-gateway routes which
are already marked DST_HOST (e.g. input routes routes) will always be
invalidated during sk_dst_check. Thus per-socket dst caching absolutely
had no effect and early demuxing had no effect.
Thus this patch removes rt6i_genid: fn_sernum already gets modified during
add operations, so we only must ensure we mutate fn_sernum during ipv6
address remove operations. This is a fairly cost extensive operations,
but address removal should not happen that often. Also our mtu update
functions do the same and we heard no complains so far. xfrm policy
changes also cause a call into fib6_flush_trees. Also plug a hole in
rt6_info (no cacheline changes).
I verified via tracing that this change has effect.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After previous patches to simplify qstats the qstats can be
made per cpu with a packed union in Qdisc struct.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This removes the use of qstats->qlen variable from the classifiers
and makes it an explicit argument to gnet_stats_copy_queue().
The qlen represents the qdisc queue length and is packed into
the qstats at the last moment before passnig to user space. By
handling it explicitely we avoid, in the percpu stats case, having
to figure out which per_cpu variable to put it in.
It would probably be best to remove it from qstats completely
but qstats is a user space ABI and can't be broken. A future
patch could make an internal only qstats structure that would
avoid having to allocate an additional u32 variable on the
Qdisc struct. This would make the qstats struct 128bits instead
of 128+32.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds helpers to manipulate qstats logic and replaces locations
that touch the counters directly. This simplifies future patches
to push qstats onto per cpu counters.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to run qdisc's without locking statistics and estimators
need to be handled correctly.
To resolve bstats make the statistics per cpu. And because this is
only needed for qdiscs that are running without locks which is not
the case for most qdiscs in the near future only create percpu
stats when qdiscs set the TCQ_F_CPUSTATS flag.
Next because estimators use the bstats to calculate packets per
second and bytes per second the estimator code paths are updated
to use the per cpu statistics.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Negated expressions and sub-expressions need to have their flags checked for
TCF_EM_INVERT and their result negated accordingly.
Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 8a29111c7c ("net: gro: allow to build full sized skb")
I added a regression for linear skb that traditionally force GRO
to use the frag_list fallback.
Erez Shitrit found that at most two segments were aggregated and
the "if (skb_gro_len(p) != pinfo->gso_size)" test was failing.
This is because pinfo at this spot still points to the last skb in the
chain, instead of the first one, where we find the correct gso_size
information.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 8a29111c7c ("net: gro: allow to build full sized skb")
Reported-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
pull request: netfilter/ipvs updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next,
most relevantly they are:
1) Four patches to make the new nf_tables masquerading support
independent of the x_tables infrastructure. This also resolves a
compilation breakage if the masquerade target is disabled but the
nf_tables masq expression is enabled.
2) ipset updates via Jozsef Kadlecsik. This includes the addition of the
skbinfo extension that allows you to store packet metainformation in the
elements. This can be used to fetch and restore this to the packets through
the iptables SET target, patches from Anton Danilov.
3) Add the hash:mac set type to ipset, from Jozsef Kadlecsick.
4) Add simple weighted fail-over scheduler via Simon Horman. This provides
a fail-over IPVS scheduler (unlike existing load balancing schedulers).
Connections are directed to the appropriate server based solely on
highest weight value and server availability, patch from Kenny Mathis.
5) Support IPv6 real servers in IPv4 virtual-services and vice versa.
Simon Horman informs that the motivation for this is to allow more
flexibility in the choice of IP version offered by both virtual-servers
and real-servers as they no longer need to match: An IPv4 connection
from an end-user may be forwarded to a real-server using IPv6 and
vice versa. No ip_vs_sync support yet though. Patches from Alex Gartrell
and Julian Anastasov.
6) Add global generation ID to the nf_tables ruleset. When dumping from
several different object lists, we need a way to identify that an update
has ocurred so userspace knows that it needs to refresh its lists. This
also includes a new command to obtain the 32-bits generation ID. The
less significant 16-bits of this ID is also exposed through res_id field
in the nfnetlink header to quickly detect the interference and retry when
there is no risk of ID wraparound.
7) Move br_netfilter out of the bridge core. The br_netfilter code is
built in the bridge core by default. This causes problems of different
kind to people that don't want this: Jesper reported performance drop due
to the inconditional hook registration and I remember to have read complains
on netdev from people regarding the unexpected behaviour of our bridging
stack when br_netfilter is enabled (fragmentation handling, layer 3 and
upper inspection). People that still need this should easily undo the
damage by modprobing the new br_netfilter module.
8) Dump the set policy nf_tables that allows set parameterization. So
userspace can keep user-defined preferences when saving the ruleset.
From Arturo Borrero.
9) Use __seq_open_private() helper function to reduce boiler plate code
in x_tables, From Rob Jones.
10) Safer default behaviour in case that you forget to load the protocol
tracker. Daniel Borkmann and Florian Westphal detected that if your
ruleset is stateful, you allow traffic to at least one single SCTP port
and the SCTP protocol tracker is not loaded, then any SCTP traffic may
be pass through unfiltered. After this patch, the connection tracking
classifies SCTP/DCCP/UDPlite/GRE packets as invalid if your kernel has
been compiled with support for these modules.
====================
Trivially resolved conflict in include/linux/skbuff.h, Eric moved some
netfilter skbuff members around, and the netfilter tree adjusted the
ifdef guards for the bridging info pointer.
Signed-off-by: David S. Miller <davem@davemloft.net>
Suggested by Stephen. Also drop inline keyword and let compiler decide.
gcc 4.7.3 decides to no longer inline tcp_ecn_check_ce, so split it up.
The actual evaluation is not inlined anymore while the ECN_OK test is.
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
After Octavian Purdilas tcp ipv4/ipv6 unification work this helper only
has a single callsite.
While at it, convert name to lowercase, suggested by Stephen.
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Svcrdma currently advertises 1MB, which is too large. The correct value
is the minimum of RPCSVC_MAXPAYLOAD and the max scatter-gather allowed
in an NFSRDMA IO chunk * the host page size. This bug is usually benign
because the Linux X64 NFSRDMA client correctly limits the payload size to
the correct value (64*4096 = 256KB). But if the Linux client is PPC64
with a 64KB page size, then the client will indeed use a payload size
that will overflow the server.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This variable i is overwritten to 0 by following code
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With proliferation of bit fields in sk_buff, __copy_skb_header() became
quite expensive, showing as the most expensive function in a GSO
workload.
__copy_skb_header() performance is also critical for non GSO TCP
operations, as it is used from skb_clone()
This patch carefully moves all the fields that were not copied in a
separate zone : cloned, nohdr, fclone, peeked, head_frag, xmit_more
Then I moved all other fields and all other copied fields in a section
delimited by headers_start[0]/headers_end[0] section so that we
can use a single memcpy() call, inlined by compiler using long
word load/stores.
I also tried to make all copies in the natural orders of sk_buff,
to help hardware prefetching.
I made sure sk_buff size did not change.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set multicast support for 6lowpan network interface.
This is needed in every network interface that supports IPv6.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If skb is going to multiple destinations, then make sure that we
do not overwrite the common IPv6 headers. So before compressing
the IPv6 headers, we copy the skb and that is then sent to 6LoWPAN
Bluetooth devices.
This is a similar patch as what was done for IEEE 802.154 6LoWPAN
in commit f19f4f9525 ("ieee802154: 6lowpan: ensure header compression
does not corrupt ipv6 header")
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Given following iptables ruleset:
-P FORWARD DROP
-A FORWARD -m sctp --dport 9 -j ACCEPT
-A FORWARD -p tcp --dport 80 -j ACCEPT
-A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT
One would assume that this allows SCTP on port 9 and TCP on port 80.
Unfortunately, if the SCTP conntrack module is not loaded, this allows
*all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT,
which we think is a security issue.
This is because on the first SCTP packet on port 9, we create a dummy
"generic l4" conntrack entry without any port information (since
conntrack doesn't know how to extract this information).
All subsequent packets that are unknown will then be in established
state since they will fallback to proto_generic and will match the
'generic' entry.
Our originally proposed version [1] completely disabled generic protocol
tracking, but Jozsef suggests to not track protocols for which a more
suitable helper is available, hence we now mitigate the issue for in
tree known ct protocol helpers only, so that at least NAT and direction
information will still be preserved for others.
[1] http://www.spinics.net/lists/netfilter-devel/msg33430.html
Joint work with Daniel Borkmann.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We want to know in which cases the user explicitly sets the policy
options. In that case, we also want to dump back the info.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We need to make sure that the saved skb exists when
resuming or suspending a CoC channel. This can happen if
initial credits is 0 when channel is connected.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This work adds the DataCenter TCP (DCTCP) congestion control
algorithm [1], which has been first published at SIGCOMM 2010 [2],
resp. follow-up analysis at SIGMETRICS 2011 [3] (and also, more
recently as an informational IETF draft available at [4]).
DCTCP is an enhancement to the TCP congestion control algorithm for
data center networks. Typical data center workloads are i.e.
i) partition/aggregate (queries; bursty, delay sensitive), ii) short
messages e.g. 50KB-1MB (for coordination and control state; delay
sensitive), and iii) large flows e.g. 1MB-100MB (data update;
throughput sensitive). DCTCP has therefore been designed for such
environments to provide/achieve the following three requirements:
* High burst tolerance (incast due to partition/aggregate)
* Low latency (short flows, queries)
* High throughput (continuous data updates, large file
transfers) with commodity, shallow buffered switches
The basic idea of its design consists of two fundamentals: i) on the
switch side, packets are being marked when its internal queue
length > threshold K (K is chosen so that a large enough headroom
for marked traffic is still available in the switch queue); ii) the
sender/host side maintains a moving average of the fraction of marked
packets, so each RTT, F is being updated as follows:
F := X / Y, where X is # of marked ACKs, Y is total # of ACKs
alpha := (1 - g) * alpha + g * F, where g is a smoothing constant
The resulting alpha (iow: probability that switch queue is congested)
is then being used in order to adaptively decrease the congestion
window W:
W := (1 - (alpha / 2)) * W
The means for receiving marked packets resp. marking them on switch
side in DCTCP is the use of ECN.
RFC3168 describes a mechanism for using Explicit Congestion Notification
from the switch for early detection of congestion, rather than waiting
for segment loss to occur.
However, this method only detects the presence of congestion, not
the *extent*. In the presence of mild congestion, it reduces the TCP
congestion window too aggressively and unnecessarily affects the
throughput of long flows [4].
DCTCP, as mentioned, enhances Explicit Congestion Notification (ECN)
processing to estimate the fraction of bytes that encounter congestion,
rather than simply detecting that some congestion has occurred. DCTCP
then scales the TCP congestion window based on this estimate [4],
thus it can derive multibit feedback from the information present in
the single-bit sequence of marks in its control law. And thus act in
*proportion* to the extent of congestion, not its *presence*.
Switches therefore set the Congestion Experienced (CE) codepoint in
packets when internal queue lengths exceed threshold K. Resulting,
DCTCP delivers the same or better throughput than normal TCP, while
using 90% less buffer space.
It was found in [2] that DCTCP enables the applications to handle 10x
the current background traffic, without impacting foreground traffic.
Moreover, a 10x increase in foreground traffic did not cause any
timeouts, and thus largely eliminates TCP incast collapse problems.
The algorithm itself has already seen deployments in large production
data centers since then.
We did a long-term stress-test and analysis in a data center, short
summary of our TCP incast tests with iperf compared to cubic:
This test measured DCTCP throughput and latency and compared it with
CUBIC throughput and latency for an incast scenario. In this test, 19
senders sent at maximum rate to a single receiver. The receiver simply
ran iperf -s.
The senders ran iperf -c <receiver> -t 30. All senders started
simultaneously (using local clocks synchronized by ntp).
This test was repeated multiple times. Below shows the results from a
single test. Other tests are similar. (DCTCP results were extremely
consistent, CUBIC results show some variance induced by the TCP timeouts
that CUBIC encountered.)
For this test, we report statistics on the number of TCP timeouts,
flow throughput, and traffic latency.
1) Timeouts (total over all flows, and per flow summaries):
CUBIC DCTCP
Total 3227 25
Mean 169.842 1.316
Median 183 1
Max 207 5
Min 123 0
Stddev 28.991 1.600
Timeout data is taken by measuring the net change in netstat -s
"other TCP timeouts" reported. As a result, the timeout measurements
above are not restricted to the test traffic, and we believe that it
is likely that all of the "DCTCP timeouts" are actually timeouts for
non-test traffic. We report them nevertheless. CUBIC will also include
some non-test timeouts, but they are drawfed by bona fide test traffic
timeouts for CUBIC. Clearly DCTCP does an excellent job of preventing
TCP timeouts. DCTCP reduces timeouts by at least two orders of
magnitude and may well have eliminated them in this scenario.
2) Throughput (per flow in Mbps):
CUBIC DCTCP
Mean 521.684 521.895
Median 464 523
Max 776 527
Min 403 519
Stddev 105.891 2.601
Fairness 0.962 0.999
Throughput data was simply the average throughput for each flow
reported by iperf. By avoiding TCP timeouts, DCTCP is able to
achieve much better per-flow results. In CUBIC, many flows
experience TCP timeouts which makes flow throughput unpredictable and
unfair. DCTCP, on the other hand, provides very clean predictable
throughput without incurring TCP timeouts. Thus, the standard deviation
of CUBIC throughput is dramatically higher than the standard deviation
of DCTCP throughput.
Mean throughput is nearly identical because even though cubic flows
suffer TCP timeouts, other flows will step in and fill the unused
bandwidth. Note that this test is something of a best case scenario
for incast under CUBIC: it allows other flows to fill in for flows
experiencing a timeout. Under situations where the receiver is issuing
requests and then waiting for all flows to complete, flows cannot fill
in for timed out flows and throughput will drop dramatically.
3) Latency (in ms):
CUBIC DCTCP
Mean 4.0088 0.04219
Median 4.055 0.0395
Max 4.2 0.085
Min 3.32 0.028
Stddev 0.1666 0.01064
Latency for each protocol was computed by running "ping -i 0.2
<receiver>" from a single sender to the receiver during the incast
test. For DCTCP, "ping -Q 0x6 -i 0.2 <receiver>" was used to ensure
that traffic traversed the DCTCP queue and was not dropped when the
queue size was greater than the marking threshold. The summary
statistics above are over all ping metrics measured between the single
sender, receiver pair.
The latency results for this test show a dramatic difference between
CUBIC and DCTCP. CUBIC intentionally overflows the switch buffer
which incurs the maximum queue latency (more buffer memory will lead
to high latency.) DCTCP, on the other hand, deliberately attempts to
keep queue occupancy low. The result is a two orders of magnitude
reduction of latency with DCTCP - even with a switch with relatively
little RAM. Switches with larger amounts of RAM will incur increasing
amounts of latency for CUBIC, but not for DCTCP.
4) Convergence and stability test:
This test measured the time that DCTCP took to fairly redistribute
bandwidth when a new flow commences. It also measured DCTCP's ability
to remain stable at a fair bandwidth distribution. DCTCP is compared
with CUBIC for this test.
At the commencement of this test, a single flow is sending at maximum
rate (near 10 Gbps) to a single receiver. One second after that first
flow commences, a new flow from a distinct server begins sending to
the same receiver as the first flow. After the second flow has sent
data for 10 seconds, the second flow is terminated. The first flow
sends for an additional second. Ideally, the bandwidth would be evenly
shared as soon as the second flow starts, and recover as soon as it
stops.
The results of this test are shown below. Note that the flow bandwidth
for the two flows was measured near the same time, but not
simultaneously.
DCTCP performs nearly perfectly within the measurement limitations
of this test: bandwidth is quickly distributed fairly between the two
flows, remains stable throughout the duration of the test, and
recovers quickly. CUBIC, in contrast, is slow to divide the bandwidth
fairly, and has trouble remaining stable.
CUBIC DCTCP
Seconds Flow 1 Flow 2 Seconds Flow 1 Flow 2
0 9.93 0 0 9.92 0
0.5 9.87 0 0.5 9.86 0
1 8.73 2.25 1 6.46 4.88
1.5 7.29 2.8 1.5 4.9 4.99
2 6.96 3.1 2 4.92 4.94
2.5 6.67 3.34 2.5 4.93 5
3 6.39 3.57 3 4.92 4.99
3.5 6.24 3.75 3.5 4.94 4.74
4 6 3.94 4 5.34 4.71
4.5 5.88 4.09 4.5 4.99 4.97
5 5.27 4.98 5 4.83 5.01
5.5 4.93 5.04 5.5 4.89 4.99
6 4.9 4.99 6 4.92 5.04
6.5 4.93 5.1 6.5 4.91 4.97
7 4.28 5.8 7 4.97 4.97
7.5 4.62 4.91 7.5 4.99 4.82
8 5.05 4.45 8 5.16 4.76
8.5 5.93 4.09 8.5 4.94 4.98
9 5.73 4.2 9 4.92 5.02
9.5 5.62 4.32 9.5 4.87 5.03
10 6.12 3.2 10 4.91 5.01
10.5 6.91 3.11 10.5 4.87 5.04
11 8.48 0 11 8.49 4.94
11.5 9.87 0 11.5 9.9 0
SYN/ACK ECT test:
This test demonstrates the importance of ECT on SYN and SYN-ACK packets
by measuring the connection probability in the presence of competing
flows for a DCTCP connection attempt *without* ECT in the SYN packet.
The test was repeated five times for each number of competing flows.
Competing Flows 1 | 2 | 4 | 8 | 16
------------------------------
Mean Connection Probability 1 | 0.67 | 0.45 | 0.28 | 0
Median Connection Probability 1 | 0.65 | 0.45 | 0.25 | 0
As the number of competing flows moves beyond 1, the connection
probability drops rapidly.
Enabling DCTCP with this patch requires the following steps:
DCTCP must be running both on the sender and receiver side in your
data center, i.e.:
sysctl -w net.ipv4.tcp_congestion_control=dctcp
Also, ECN functionality must be enabled on all switches in your
data center for DCTCP to work. The default ECN marking threshold (K)
heuristic on the switch for DCTCP is e.g., 20 packets (30KB) at
1Gbps, and 65 packets (~100KB) at 10Gbps (K > 1/7 * C * RTT, [4]).
In above tests, for each switch port, traffic was segregated into two
queues. For any packet with a DSCP of 0x01 - or equivalently a TOS of
0x04 - the packet was placed into the DCTCP queue. All other packets
were placed into the default drop-tail queue. For the DCTCP queue,
RED/ECN marking was enabled, here, with a marking threshold of 75 KB.
More details however, we refer you to the paper [2] under section 3).
There are no code changes required to applications running in user
space. DCTCP has been implemented in full *isolation* of the rest of
the TCP code as its own congestion control module, so that it can run
without a need to expose code to the core of the TCP stack, and thus
nothing changes for non-DCTCP users.
Changes in the CA framework code are minimal, and DCTCP algorithm
operates on mechanisms that are already available in most Silicon.
The gain (dctcp_shift_g) is currently a fixed constant (1/16) from
the paper, but we leave the option that it can be chosen carefully
to a different value by the user.
In case DCTCP is being used and ECN support on peer site is off,
DCTCP falls back after 3WHS to operate in normal TCP Reno mode.
ss {-4,-6} -t -i diag interface:
... dctcp wscale:7,7 rto:203 rtt:2.349/0.026 mss:1448 cwnd:2054
ssthresh:1102 ce_state 0 alpha 15 ab_ecn 0 ab_tot 735584
send 10129.2Mbps pacing_rate 20254.1Mbps unacked:1822 retrans:0/15
reordering:101 rcv_space:29200
... dctcp-reno wscale:7,7 rto:201 rtt:0.711/1.327 ato:40 mss:1448
cwnd:10 ssthresh:1102 fallback_mode send 162.9Mbps pacing_rate
325.5Mbps rcv_rtt:1.5 rcv_space:29200
More information about DCTCP can be found in [1-4].
[1] http://simula.stanford.edu/~alizade/Site/DCTCP.html
[2] http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp-final.pdf
[3] http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp_analysis-full.pdf
[4] http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00
Joint work with Florian Westphal and Glenn Judd.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
DataCenter TCP (DCTCP) determines cwnd growth based on ECN information
and ACK properties, e.g. ACK that updates window is treated differently
than DUPACK.
Also DCTCP needs information whether ACK was delayed ACK. Furthermore,
DCTCP also implements a CE state machine that keeps track of CE markings
of incoming packets.
Therefore, extend the congestion control framework to provide these
event types, so that DCTCP can be properly implemented as a normal
congestion algorithm module outside of the core stack.
Joint work with Daniel Borkmann and Glenn Judd.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The congestion control ops "cwnd_event" currently supports
CA_EVENT_FAST_ACK and CA_EVENT_SLOW_ACK events (among others).
Both FAST and SLOW_ACK are only used by Westwood congestion
control algorithm.
This removes both flags from cwnd_event and adds a new
in_ack_event callback for this. The goal is to be able to
provide more detailed information about ACKs, such as whether
ECE flag was set, or whether the ACK resulted in a window
update.
It is required for DataCenter TCP (DCTCP) congestion control
algorithm as it makes a different choice depending on ECE being
set or not.
Joint work with Daniel Borkmann and Glenn Judd.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a flag to TCP congestion algorithms that allows
for requesting to mark IPv4/IPv6 sockets with transport as ECN
capable, that is, ECT(0), when required by a congestion algorithm.
It is currently used and needed in DataCenter TCP (DCTCP), as it
requires both peers to assert ECT on all IP packets sent - it
uses ECN feedback (i.e. CE, Congestion Encountered information)
from switches inside the data center to derive feedback to the
end hosts.
Therefore, simply add a new flag to icsk_ca_ops. Note that DCTCP's
algorithm/behaviour slightly diverges from RFC3168, therefore this
is only (!) enabled iff the assigned congestion control ops module
has requested this. By that, we can tightly couple this logic really
only to the provided congestion control ops.
Joint work with Florian Westphal and Glenn Judd.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split assignment and initialization from one into two functions.
This is required by followup patches that add Datacenter TCP
(DCTCP) congestion control algorithm - we need to be able to
determine if the connection is moderated by DCTCP before the
3WHS has finished.
As we walk the available congestion control list during the
assignment, we are always guaranteed to have Reno present as
it's fixed compiled-in. Therefore, since we're doing the
early assignment, we don't have a real use for the Reno alias
tcp_init_congestion_ops anymore and can thus remove it.
Actual usage of the congestion control operations are being
made after the 3WHS has finished, in some cases however we
can access get_info() via diag if implemented, therefore we
need to zero out the private area for those modules.
Joint work with Daniel Borkmann and Glenn Judd.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This completes the cls_rsvp conversion to RCU safe
copy, update semantics.
As a result all cases of tcf_exts_change occur on
empty lists now.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clearly the following change is not expected:
- if (!cp.perfect && !cp.h)
- cp.alloc_hash = cp.hash;
+ if (!cp->perfect && cp->h)
+ cp->alloc_hash = cp->hash;
Fixes: commit 331b72922c ("net: sched: RCU cls_tcindex")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When kmemdup() fails, we should return -ENOMEM.
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We do not wish to disturb dropwatch or perf drop profiles with an ARP
we will ignore.
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2014-09-25
1) Remove useless hash_resize_mutex in xfrm_hash_resize().
This mutex is used only there, but xfrm_hash_resize()
can't be called concurrently at all. From Ying Xue.
2) Extend policy hashing to prefixed policies based on
prefix lenght thresholds. From Christophe Gouault.
3) Make the policy hash table thresholds configurable
via netlink. From Christophe Gouault.
4) Remove the maximum authentication length for AH.
This was needed to limit stack usage. We switched
already to allocate space, so no need to keep the
limit. From Herbert Xu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: commit f187bc6efb ("ipv4: No need to set generic neighbour pointer")
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow switches driver to query and enable/disable EEE on a per-port
basis by implementing the ethtool_{get,set}_eee settings and delegating
these operations to the switch driver.
set_eee() will need to coordinate with the PHY driver to make sure that
EEE is enabled, the link-partner supports it and the auto-negotiation
result is satisfactory.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whenever a per-port network device is used/unused, invoke the switch
driver port_enable/port_disable callbacks to allow saving as much power
as possible by disabling unused parts of the switch (RX/TX logic, memory
arrays, PHYs...). We supply a PHY device argument to make sure the
switch driver can act on the PHY device if needed (like putting/taking
the PHY out of deep low power mode).
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dsa_slave_open() should start the PHY library state machine for its PHY
interface, and dsa_slave_close() should stop the PHY library state
machine accordingly.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is a cleanup which follows the idea in commit e11ecddf51 (tcp: use
TCP_SKB_CB(skb)->tcp_flags in input path),
and it may reduce register pressure since skb->cb[] access is fast,
bacause skb is probably in a register.
v2: remove variable th
v3: reword the changelog
Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Our goal is to access no more than one cache line access per skb in
a write or receive queue when doing the various walks.
After recent TCP_SKB_CB() reorganizations, it is almost done.
Last part is tcp_skb_pcount() which currently uses
skb_shinfo(skb)->gso_segs, which is a terrible choice, because it needs
3 cache lines in current kernel (skb->head, skb->end, and
shinfo->gso_segs are all in 3 different cache lines, far from skb->cb)
This very simple patch reuses space currently taken by tcp_tw_isn
only in input path, as tcp_skb_pcount is only needed for skb stored in
write queue.
This considerably speeds up tcp_ack(), granted we avoid shinfo->tx_flags
to get SKBTX_ACK_TSTAMP, which seems possible.
This also speeds up all sack processing in general.
This speeds up tcp_sendmsg() because it no longer has to access/dirty
shinfo.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP maintains lists of skb in write queue, and in receive queues
(in order and out of order queues)
Scanning these lists both in input and output path usually requires
access to skb->next, TCP_SKB_CB(skb)->seq, and TCP_SKB_CB(skb)->end_seq
These fields are currently in two different cache lines, meaning we
waste lot of memory bandwidth when these queues are big and flows
have either packet drops or packet reorders.
We can move TCP_SKB_CB(skb)->header at the end of TCP_SKB_CB, because
this header is not used in fast path. This allows TCP to search much faster
in the skb lists.
Even with regular flows, we save one cache line miss in fast path.
Thanks to Christoph Paasch for noticing we need to cleanup
skb->cb[] (IPCB/IP6CB) before entering IP stack in tx path,
and that I forgot IPCB use in tcp_v4_hnd_req() and tcp_v4_save_options().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6_opt_accepted() assumes IP6CB(skb) holds the struct inet6_skb_parm
that it needs. Lets not assume this, as TCP stack might use a different
place.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_options_echo() assumes struct ip_options is provided in &IPCB(skb)->opt
Lets break this assumption, but provide a helper to not change all call points.
ip_send_unicast_reply() gets a new struct ip_options pointer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6gre_tunnel_locate() should not return an existing tunnel if
create is true. Otherwise it is possible to add the same
tunnel multiple times without getting an error.
So return NULL if the tunnel that should be created already
exists.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vti6_locate() should not return an existing tunnel if
create is true. Otherwise it is possible to add the same
tunnel multiple times without getting an error.
So return NULL if the tunnel that should be created already
exists.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6_tnl_locate() should not return an existing tunnel if
create is true. Otherwise it is possible to add the same
tunnel multiple times without getting an error.
So return NULL if the tunnel that should be created already
exists.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net_dma was the only external user so this can become local to tcp.c
again.
Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that tcp_dma_try_early_copy() is gone nothing ever sets
copied_early.
Also reverts "53240c208776 tcp: Fix possible double-ack w/ user dma"
since it is no longer necessary.
Cc: Ali Saidi <saidi@engin.umich.edu>
Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Neal Cardwell <ncardwell@google.com>
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Per commit "77873803363c net_dma: mark broken" net_dma is no longer used
and there is no plan to fix it.
This is the mechanical removal of bits in CONFIG_NET_DMA ifdef guards.
Reverting the remainder of the net_dma induced changes is deferred to
subsequent patches.
Marked for stable due to Roman's report of a memory leak in
dma_pin_iovec_pages():
https://lkml.org/lkml/2014/9/3/177
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: David Whipple <whipple@securedatainnovations.ch>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: <stable@vger.kernel.org>
Reported-by: Roman Gushchin <klamm@yandex-team.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
With this alias, we don't need to load manually the module before adding an
ip6gretap interface with iproute2.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cache skb_shinfo(skb) in a variable to avoid computing it multiple
times.
Reorganize the tests to remove one indentation level.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the duplicated comment
"/* The following definitions are for users of the vport subsytem: */"
in vport.h
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
nf pull request for net
This series contains netfilter fixes for net, they are:
1) Fix lockdep splat in nft_hash when releasing sets from the
rcu_callback context. We don't the mutex there anymore.
2) Remove unnecessary spinlock_bh in the destroy path of the nf_tables
rbtree set type from rcu_callback context.
3) Fix another lockdep splat in rhashtable. None of the callers hold
a mutex when calling rhashtable_destroy.
4) Fix duplicated error reporting from nfnetlink when aborting and
replaying a batch.
5) Fix a Kconfig issue reported by kbuild robot.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
csum_partial() is a generic function which is not optimised for small fixed
length calculations, and its use requires to store "from" and "to" values in
memory while we already have them available in registers. This also has impact,
especially on RISC processors. In the same spirit as the change done by
Eric Dumazet on csum_replace2(), this patch rewrites inet_proto_csum_replace4()
taking into account RFC1624.
I spotted during a NATted tcp transfert that csum_partial() is one of top 5
consuming functions (around 8%), and the second user of csum_partial() is
inet_proto_csum_replace4().
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While profiling TCP stack, I noticed one useless atomic operation
in tcp_sendmsg(), caused by skb_header_release().
It turns out all current skb_header_release() users have a fresh skb,
that no other user can see, so we can avoid one atomic operation.
Introduce __skb_header_release() to clearly document this.
This gave me a 1.5 % improvement on TCP_RR workload.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless-next 2014-09-22
Please pull this batch of updates intended for the 3.18 stream...
For the mac80211 bits, Johannes says:
"This time, I have some rate minstrel improvements, support for a very
small feature from CCX that Steinar reverse-engineered, dynamic ACK
timeout support, a number of changes for TDLS, early support for radio
resource measurement and many fixes. Also, I'm changing a number of
places to clear key memory when it's freed and Intel claims copyright
for code they developed."
For the bluetooth bits, Johan says:
"Here are some more patches intended for 3.18. Most of them are cleanups
or fixes for SMP. The only exception is a fix for BR/EDR L2CAP fixed
channels which should now work better together with the L2CAP
information request procedure."
For the iwlwifi bits, Emmanuel says:
"I fix here dvm which was broken by my last pull request. Arik
continues to work on TDLS and Luca solved a few issues in CT-Kill. Eyal
keeps digging into rate scaling code, more to come soon. Besides this,
nothing really special here."
Beyond that, there are the usual big batches of updates to ath9k, b43,
mwifiex, and wil6210 as well as a handful of other bits here and there.
Also, rtlwifi gets some btcoexist attention from Larry.
Please let me know if there are problems!
====================
Had to adjust the wil6210 code to comply with Joe Perches's recent
change in net-next to make the netdev_*() routines return void instead
of 'int'.
Signed-off-by: David S. Miller <davem@davemloft.net>