Commit Graph

36499 Commits

Author SHA1 Message Date
Johan Hedberg
c7a3d57db6 Bluetooth: Introduce SMP_DBG macro for low-level debuging
The various inputs & outputs of the crypto functions as well as the
values of the ECDH keys can be considered security sensitive. They
should therefore not end up in dmesg by mistake. This patch introduces a
new SMP_DBG macro which requires explicit compilation with -DDEBUG to be
enabled. All crypto related data logs now use this macro instead of
BT_DBG.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:21 +01:00
Johan Hedberg
a29b073351 Bluetooth: Add basic LE SC OOB support for remote OOB data
This patch adds basic OOB pairing support when we've received the remote
OOB data. This includes tracking the remote r value (in smp->rr) as well
as doing the appropriate f4() call when needed. Previously the OOB rand
would have been stored in smp->rrnd however these are actually two
independent values so we need separate variables for them. Na/Nb in the
spec maps to smp->prnd/rrnd and ra/rb maps to smp->rr with smp->pr to
come once local OOB data is supported.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:21 +01:00
Johan Hedberg
02b05bd8b0 Bluetooth: Set SMP OOB flag if OOB data is available
If we have OOB data available for the remote device in question we
should set the OOB flag appropriately in the SMP pairing request or
response.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:21 +01:00
Johan Hedberg
86df9200c7 Bluetooth: Add support for adding remote OOB data for LE
This patch adds proper support for passing LE OOB data to the
hci_add_remote_oob_data() function. For LE the 192-bit values are not
valid and should therefore be passed as NULL values.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:21 +01:00
Johan Hedberg
6928a9245f Bluetooth: Store address type with OOB data
To be able to support OOB data for LE pairing we need to store the
address type of the remote device. This patch extends the relevant
functions and data types with a bdaddr_type variable.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:21 +01:00
Johan Hedberg
81328d5cca Bluetooth: Unify remote OOB data functions
There's no need to duplicate code for the 192 vs 192+256 variants of the
OOB data functions. This is also helpful to pave the way to support LE
SC OOB data where only 256 bit data is provided.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:20 +01:00
Johan Hedberg
903b71c78d Bluetooth: Add SC-only mode support for SMP
When Secure Connections-only mode is enabled we should reject any
pairing command that does not have Secure Connections set in the
authentication requirements. This patch adds the appropriate logic for
this to the command handlers of Pairing Request/Response and Security
Request.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:20 +01:00
Johan Hedberg
b5ae344d4c Bluetooth: Add full SMP BR/EDR support
When doing SMP over BR/EDR some of the routines can be shared with the
LE functionality whereas others needs to be split into their own BR/EDR
specific branches. This patch implements the split of BR/EDR specific
SMP code from the LE-only code, making sure SMP over BR/EDR works as
specified.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:20 +01:00
Johan Hedberg
ef8efe4bf8 Bluetooth: Add skeleton for BR/EDR SMP channel
This patch adds the very basic code for creating and destroying SMP
L2CAP channels for BR/EDR connections.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:20 +01:00
Johan Hedberg
858cdc78be Bluetooth: Add debugfs switch for forcing SMP over BR/EDR
To make it possible to use LE SC functionality over BR/EDR with pre-4.1
controllers (that do not support BR/EDR SC links) it's useful to be able
to force LE SC operations even over a traditional SSP protected link.
This patch adds a debugfs switch to force a special debug flag which is
used to skip the checks for BR/EDR SC support.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:20 +01:00
Johan Hedberg
fe8bc5ac67 Bluetooth: Add hci_conn flag for new link key generation
For LE Secure Connections we want to trigger cross transport key
generation only if a new link key was actually created during the BR/EDR
connection. This patch adds a new flag to track this information.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:20 +01:00
Johan Hedberg
70157ef539 Bluetooth: Use debug keys for SMP when HCI_USE_DEBUG_KEYS is set
The HCI_USE_DEBUG_KEYS flag is intended to force our side to always use
debug keys for pairing. This means both BR/EDR SSP as well as SMP with
LE Secure Connections. This patch updates the SMP code to use the debug
keys instead of generating a random local key pair when the flag is set.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:19 +01:00
Johan Hedberg
1408bb6efb Bluetooth: Add dummy handler for LE SC keypress notification
Since we don not actively try to clear the keypress notification bit we
might get these PDUs. To avoid failing the pairing process add a simple
dummy handler for these for now.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:19 +01:00
Johan Hedberg
d3e54a876e Bluetooth: Fix DHKey Check sending order for slave role
According to the LE SC specification the initiating device sends its
DHKey check first and the non-initiating devices sends its DHKey check
as a response to this. It's also important that the non-initiating
device doesn't send the response if it's still waiting for user input.
In order to synchronize all this a new flag is added.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:19 +01:00
Johan Hedberg
38606f1418 Bluetooth: Add passkey entry support for LE SC
The passkey entry mechanism involves either both sides requesting the
user for a passkey, or one side requesting the passkey while the other
one displays it. The behavior as far as SMP PDUs are concerned are
considerably different from numeric comparison and therefore requires
several new functions to handle it.

In essence passkey entry involves both sides gradually committing to
each bit of the passkey which involves 20 rounds of pairing confirm and
pairing random PDUS being sent in both directions.

This patch adds a new smp->passkey_round variable to track the current
round of the passkey commitment and reuses the variables already present
in struct hci_conn for the passkey and entered key count.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:19 +01:00
Johan Hedberg
e3befab970 Bluetooth: Fix BR/EDR Link Key type when derived through LE SC
We need to set the correct Link Key type based on the properties of the
LE SC pairing that it was derived from. If debug keys were used the type
should be a debug key, and the authenticated vs unauthenticated
information should be set on what kind of security level was reached.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:19 +01:00
Johan Hedberg
dddd3059e3 Bluetooth: Add support for SC just-works pairing
If the just-works method was chosen we shouldn't send anything to user
space but simply proceed with sending the DHKey Check PDU. This patch
adds the necessary code for it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:19 +01:00
Johan Hedberg
d378a2d776 Bluetooth: Set correct LTK type and authentication for SC
After generating the LTK we should set the correct type (normal SC or
debug) and authentication information for it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:18 +01:00
Johan Hedberg
6c0dcc5014 Bluetooth: Add check for accidentally generating a debug key
It is very unlikely, but to have a 100% guarantee of the generated key
type we need to reject any keys which happen to match the debug key.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:18 +01:00
Johan Hedberg
aeb7d461f9 Bluetooth: Detect SMP SC debug keys
We need to be able to detect if the remote side used a debug key for the
pairing. This patch adds the debug key defines and sets a flag to
indicate that a debug key was used. The debug private key (debug_sk) is
also added in this patch but will only be used in a subsequent patch
when local debug key support is implemented.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:18 +01:00
Johan Hedberg
5e3d3d9b3c Bluetooth: Add selection of the SC authentication method
This patch adds code to select the authentication method for Secure
Connections based on the local and remote capabilities. A new
DSP_PASSKEY method is also added for displaying the passkey - something
that is not part of legacy SMP pairing.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:18 +01:00
Johan Hedberg
783e057462 Bluetooth: Track authentication method in SMP context
For Secure Connections we'll select the authentication method as soon as
we receive the public key, but only use it later (both when actually
triggering the method as well as when determining the quality of the
resulting LTK). Store the method therefore in the SMP context.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:18 +01:00
Johan Hedberg
6a77083af5 Bluetooth: Add support for LE SC key generation
As the last step of the LE SC pairing process it's time to generate and
distribute keys. The generation part is unique to LE SC and so this
patch adds a dedicated function for it. We also clear the distribution
bits for keys which are not distributed with LE SC, so that the code
shared with legacy SMP will not go ahead and try to distribute them.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:18 +01:00
Johan Hedberg
6433a9a2c4 Bluetooth: Add support for LE SC DHKey check PDU
Once we receive the DHKey check PDU it's time to first verify that the
value is correct and then proceed with encrypting the link.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:17 +01:00
Johan Hedberg
760b018b6c Bluetooth: Add support for handling LE SC user response
With LE SC, once the user has responded to the numeric comparison it's
time to send DHKey check values in both directions. The DHKey check
value is generated using new smp_f5 and smp_f6 cryptographic functions.
The smp_f5 function is responsible for generating the LTK and the MacKey
values whereas the smp_f6 function takes the MacKey as input and
generates the DHKey Check value.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:17 +01:00
Johan Hedberg
191dc7fe2d Bluetooth: Add support for LE SC numeric comparison
After the Pairing Confirm and Random PDUs have been exchanged in LE SC
it's time to generate a numeric comparison value using a new smp_g2
cryptographic function (which also builds on AES-CMAC). This patch adds
the smp_g2 implementation and updates the Pairing Random PDU handler to
proceed with the value genration and user confirmation.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:17 +01:00
Johan Hedberg
dcee2b3221 Bluetooth: Add LE SC support for responding to Pairing Confirm PDU
When LE SC is being used we should always respond to it by sending our
local random number. This patch adds a convenience function for it which
also contains a check for the pre-requisite public key exchange
completion

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:17 +01:00
Johan Hedberg
cbbbe3e242 Bluetooth: Add support for sending LE SC Confirm value
Once the public key exchange is complete the next step is for the
non-initiating device to send a SMP Pairing Confirm PDU to the
initiating device. This requires the use of a new smp_f4 confirm value
generation function which in turn builds on the AES-CMAC cryptographic
function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:17 +01:00
Johan Hedberg
d8f8edbe93 Bluetooth: Add handler function for receiving LE SC public key
This patch adds a handler function for the LE SC SMP Public Key PDU.
When we receive the key we proceed with generating the shared DHKey
value from the remote public key and local private key.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:17 +01:00
Johan Hedberg
3b19146d23 Bluetooth: Add basic support for sending our LE SC public key
When the initial pairing request & response PDUs have been exchanged and
both have had the LE SC bit set the next step is to generate a ECDH
key pair and to send the public key to the remote side. This patch adds
basic support for generating the key pair and sending the public key
using the new Public Key SMP PDU. It is the initiating device that sends
the public key first and the non-initiating device responds by sending
its public key respectively (in a subsequent patch).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:17 +01:00
Johan Hedberg
05ddb47a91 Bluetooth: Add ECC library for LE Secure Connections
This patch adds a simple ECC library that will act as a fundamental
building block for LE Secure Connections. The library has a simple API
consisting of two functions: one for generating a public/private key
pair and another one for generating a Diffie-Hellman key from a local
private key and a remote public key.

The code has been taken from https://github.com/kmackay/easy-ecc and
modified to conform with the kernel coding style.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:16 +01:00
Johan Hedberg
407cecf6c7 Bluetooth: Add basic support for AES-CMAC
Most of the LE Secure Connections SMP crypto functions build on top of
the AES-CMAC function. This patch adds access to AES-CMAC in the kernel
crypto subsystem by allocating a crypto_hash handle for it in a similar
way that we have one for AES-CBC.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:16 +01:00
Johan Hedberg
df8e1a4c73 Bluetooth: Set link key generation bit if necessary for LE SC
Depending on whether Secure Connections is enabled or not we may need to add
the link key generation bit to the key distribution. This patch does the
necessary modifications to the build_pairing_cmd() function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:16 +01:00
Johan Hedberg
f3a73d97b3 Bluetooth: Rename hci_find_ltk_by_addr to hci_find_ltk
Now that hci_find_ltk_by_addr is the only LTK lookup function there's no
need to keep the long name anymore. This patch shortens the function
name to simply hci_find_ltk.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:16 +01:00
Johan Hedberg
0ac3dbf999 Bluetooth: Remove unused hci_find_ltk function
Now that LTKs are always looked up based on bdaddr (with EDiv/Rand
checks done after a successful lookup) the hci_find_ltk function is not
needed anymore. This patch removes the function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:16 +01:00
Johan Hedberg
5378bc5622 Bluetooth: Update LTK lookup to correctly deal with SC LTKs
LTKs derived from Secure Connections based pairing are symmetric, i.e.
they should match both master and slave role. This patch updates the LTK
lookup functions to ignore the desired role when dealing with SC LTKs.

Furthermore, with Secure Connections the EDiv and Rand values are not
used and should always be set to zero. This patch updates the LTK lookup
to first use the bdaddr as key and then do the necessary verifications
of EDiv and Rand based on whether the found LTK is for SC or not.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:16 +01:00
Johan Hedberg
a3209694f8 Bluetooth: Add mgmt_set_secure_conn support for any LE adapter
Since LE Secure Connections is a purely host-side feature we should
offer the Secure Connections mgmt setting for any adapter with LE
support. This patch updates the supported settings value and the
set_secure_conn command handler accordingly.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:15 +01:00
Johan Hedberg
710f11c08e Bluetooth: Use custom macro for testing BR/EDR SC enabled
Since the HCI_SC_ENABLED flag will also be used for controllers without
BR/EDR Secure Connections support whenever we need to check specifically
for SC for BR/EDR we also need to check that the controller actually
supports it. This patch adds a convenience macro for check all the
necessary conditions and converts the places in the code that need it to
use it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:15 +01:00
Johan Hedberg
8f5eeca321 Bluetooth: Set the correct security level for SC LTKs
When the looked-up LTK is one generated by Secure Connections pairing
the security level it gives is BT_SECURITY_FIPS. This patch updates the
LTK request event handler to correctly set this level.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:15 +01:00
Johan Hedberg
23fb8de376 Bluetooth: Add mgmt support for LE Secure Connections LTK types
We need a dedicated LTK type for LTK resulting from a Secure Connections
based SMP pairing. This patch adds a new define for it and ensures that
both the New LTK event as well as the Load LTKs command supports it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:15 +01:00
Johan Hedberg
d2eb9e10f7 Bluetooth: Update SMP security level to/from auth_req for SC
This patch updates the functions which map the SMP authentication
request to a security level and vice-versa to take into account the
Secure Connections feature.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:15 +01:00
Johan Hedberg
6566877694 Bluetooth: Add SMP flag for SC and set it when necessary.
This patch adds a new SMP flag for tracking whether Secure Connections
is in use and sets the flag when both remote and local side have elected
to use Secure Connections.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:15 +01:00
Johan Hedberg
0edb14de56 Bluetooth: Make auth_req mask dependent on SC enabled or not
If we haven't enabled SC support on our side we should use the same mask
for the authentication requirement as we were using before SC support
was added, otherwise we should use the extended mask for SC.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:14 +01:00
Johan Hedberg
e65392e2cc Bluetooth: Add basic SMP defines for LE Secure Connections
This patch adds basic SMP defines for commands, error codes and PDU
definitions for the LE Secure Connections feature.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 16:51:14 +01:00
Jozsef Kadlecsik
cac3763967 netfilter: ipset: Explicitly add padding elements to hash:net, net and hash:net, port, net
The elements must be u32 sized for the used hash function.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:36 +01:00
Jozsef Kadlecsik
77b4311d20 netfilter: ipset: Allocate the proper size of memory when /0 networks are supported
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:36 +01:00
Jozsef Kadlecsik
25a76f3463 netfilter: ipset: Simplify cidr handling for hash:*net* types
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:36 +01:00
Jozsef Kadlecsik
59de79cf57 netfilter: ipset: Indicate when /0 networks are supported
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:36 +01:00
Jozsef Kadlecsik
a51b9199b1 netfilter: ipset: Alignment problem between 64bit kernel 32bit userspace
Sven-Haegar Koch reported the issue:

sims:~# iptables -A OUTPUT -m set --match-set testset src -j ACCEPT
iptables: Invalid argument. Run `dmesg' for more information.

In syslog:
x_tables: ip_tables: set.3 match: invalid size 48 (kernel) != (user) 32

which was introduced by the counter extension in ipset.

The patch fixes the alignment issue with introducing a new set match
revision with the fixed underlying 'struct ip_set_counter_match'
structure.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:35 +01:00
Jozsef Kadlecsik
86ac79c7be netfilter: ipset: Support updating extensions when the set is full
When the set was full (hash type and maxelem reached), it was not
possible to update the extension part of already existing elements.
The patch removes this limitation.

Fixes: https://bugzilla.netfilter.org/show_bug.cgi?id=880
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:34 +01:00
Johan Hedberg
82c13d42bb Bluetooth: Simplify Link Key Notification event handling logic
When we get a Link Key Notification HCI event we should already have a
hci_conn object. This should have been created either in the Connection
Request event handler, the hci_connect_acl() function or the
hci_cs_create_conn() function (if the request was not sent by the
kernel).

Since the only case that we'd end up not having a hci_conn in the Link
Key Notification event handler would be essentially broken hardware it's
safe to simply bail out from the function if this happens.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-03 10:39:43 +01:00
Scott Feldman
2c3c031c8f bridge: add brport flags to dflt bridge_getlink
To allow brport device to return current brport flags set on port.  Add
returned flags to nested IFLA_PROTINFO netlink msg built in dflt getlink.
With this change, netlink msg returned for bridge_getlink contains the port's
offloaded flag settings (the port's SELF settings).

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:24 -08:00
Scott Feldman
065c212a9e bridge: move private brport flags to if_bridge.h so port drivers can use flags
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:22 -08:00
Scott Feldman
cf6b8e1eed bridge: add API to notify bridge driver of learned FBD on offloaded device
When the swdev device learns a new mac/vlan on a port, it sends some async
notification to the driver and the driver installs an FDB in the device.
To give a holistic system view, the learned mac/vlan should be reflected
in the bridge's FBD table, so the user, using normal iproute2 cmds, can view
what is currently learned by the device.  This API on the bridge driver gives
a way for the swdev driver to install an FBD entry in the bridge FBD table.
(And remove one).

This is equivalent to the device running these cmds:

  bridge fdb [add|del] <mac> dev <dev> vid <vlan id> master

This patch needs some extra eyeballs for review, in paricular around the
locking and contexts.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:22 -08:00
Scott Feldman
38dcf357ae bridge: call netdev_sw_port_stp_update when bridge port STP status changes
To notify switch driver of change in STP state of bridge port, add new
.ndo op and provide switchdev wrapper func to call ndo op. Use it in bridge
code then.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:22 -08:00
Jiri Pirko
aecbe01e74 net-sysfs: expose physical switch id for particular device
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Thomas Graf <tgraf@suug.ch>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:22 -08:00
Jiri Pirko
82f2841291 rtnl: expose physical switch id for particular device
The netdevice represents a port in a switch, it will expose
IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with the same value
belong to one physical switch.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Thomas Graf <tgraf@suug.ch>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:21 -08:00
Jiri Pirko
007f790c82 net: introduce generic switch devices support
The goal of this is to provide a possibility to support various switch
chips. Drivers should implement relevant ndos to do so. Now there is
only one ndo defined:
- for getting physical switch id is in place.

Note that user can use random port netdevice to access the switch.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:20 -08:00
Jiri Pirko
02637fce3e net: rename netdev_phys_port_id to more generic name
So this can be reused for identification of other "items" as well.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Thomas Graf <tgraf@suug.ch>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:19 -08:00
Jiri Pirko
f6f6424ba7 net: make vid as a parameter for ndo_fdb_add/ndo_fdb_del
Do the work of parsing NDA_VLAN directly in rtnetlink code, pass simple
u16 vid to drivers from there.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:18 -08:00
Jiri Pirko
93859b13fa bridge: convert flags in fbd entry into bitfields
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:17 -08:00
Jiri Pirko
020ec6ba2a bridge: rename fdb_*_hw to fdb_*_hw_addr to avoid confusion
The current name might seem that this actually offloads the fdb entry to
hw. So rename it to clearly present that this for hardware address
addition/removal.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:16 -08:00
Julien Lefrique
e479ce4797 NFC: NCI: Fix max length of General Bytes in ATR_RES
The maximum size of ATR_REQ and ATR_RES is 64 bytes.
The maximum number of General Bytes is calculated by
the maximum number of data bytes in the ATR_REQ/ATR_RES,
substracted by the number of mandatory data bytes.

ATR_REQ: 16 mandatory data bytes, giving a maximum of
48 General Bytes.
ATR_RES: 17 mandatory data bytes, giving a maximum of
47 General Bytes.

Regression introduced in commit a99903ec.

Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 22:59:28 +01:00
Christophe Ricard
3ff24012dd NFC: nci: Fix warning: cast to restricted __le16
Fixing: net/nfc/nci/ntf.c:106:31: warning: cast to restricted __le16
message when building with make C=1 CF=-D__CHECK_ENDIAN__

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 22:49:17 +01:00
Christophe Ricard
e5b53c0a2e NFC: Fix warning "warning: incorrect type in assignment"
Fix warnings:
net/nfc/llcp_commands.c:421:14: warning: incorrect type in assignment (different base types)
net/nfc/llcp_commands.c:421:14:    expected unsigned short [unsigned] [usertype] miux
net/nfc/llcp_commands.c:421:14:    got restricted __be16
net/nfc/llcp_commands.c:477:14: warning: incorrect type in assignment (different base types)
net/nfc/llcp_commands.c:477:14:    expected unsigned short [unsigned] [usertype] miux
net/nfc/llcp_commands.c:477:14:    got restricted __be16

Procedure to reproduce:
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 22:49:15 +01:00
Christophe Ricard
b3a55b9c5d NFC: hci: Add specific hci macro to not create a pipe
Some pipe are only created by other host (different than the
Terminal Host).
The pipe values will for example be notified by
NFC_HCI_ADM_NOTIFY_PIPE_CREATED.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 22:48:13 +01:00
Christophe Ricard
cd96db6fd0 NFC: Add se_io NFC operand
se_io allows to send apdu over the CLF to the embedded
Secure Element.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 22:47:56 +01:00
Christophe Ricard
3682f49f32 NFC: netlink: Add new netlink command NFC_CMD_ACTIVATE_TARGET
Some tag might get deactivated after some read or write tentative.
This may happen for example with Mifare Ultralight C tag when trying
to read the last 4 blocks (starting block 0x2c) configured as write
only.
NFC_CMD_ACTIVATE_TARGET will try to reselect the tag in order to
detect if it got remove from the field or if it is still present.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 22:47:37 +01:00
Christophe Ricard
9295b5b569 NFC: nci: Add support for different NCI_DEACTIVATE_TYPE
nci_rf_deactivate_req only support NCI_DEACTIVATE_TYPE_IDLE_MODE.
In some situation, it might be necessary to be able to support other
NCI_DEACTIVATE_TYPE such as NCI_DEACTIVATE_TYPE_SLEEP_MODE in order for
example to reactivate the selected target.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 22:47:17 +01:00
Christophe Ricard
4391590c40 NFC: nci: Add management for NCI state for machine rf_deactivate_ntf
A notification for rf deaction can be IDLE_MODE, SLEEP_MODE,
SLEEP_AF_MODE and DISCOVERY. According to each type and the NCI
state machine is different (see figure 10 RF Communication State
Machine in NCI specification)

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 22:47:07 +01:00
Christophe Ricard
98ff416f97 NFC: nci: Add status byte management in case of error.
The nci status byte was ignored. In case of tag reading for example,
if the tag is removed from the antenna there is no way for the upper
layers (aka: stack) to get inform about such event.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 22:46:47 +01:00
Johan Hedberg
0bd49fc75a Bluetooth: Track both local and remote L2CAP fixed channel mask
To pave the way for future fixed channels to be added easily we should
track both the local and remote mask on a per-L2CAP connection (struct
l2cap_conn) basis. So far the code has used a global variable in a racy
way which anyway needs fixing.

This patch renames the existing conn->fixed_chan_mask that tracked
the remote mask to conn->remote_fixed_chan and adds a new variable
conn->local_fixed_chan to track the local mask. Since the HS support
info is now available in the local mask we can remove the
conn->hs_enabled variable.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-12-02 09:26:50 +01:00
Christophe Ricard
a2ae218298 NFC: hci: Add support for NOTIFY_ALL_PIPE_CLEARED
When switching from UICC to another, the CLF may signals to the Terminal
Host that some existing pipe are cleared for future update.

This notification needs to be "acked" by the Terminal Host with a ANY_OK
message.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 02:02:00 +01:00
Christophe Ricard
deff5aa469 NFC: hci: Add open pipe command handler
If our terminal connect with other host like UICC, it may create
a pipe with us, the host controller will notify us new pipe
created, after that UICC will open that pipe, if we don't handle
that request, UICC may failed to continue initialize which may
lead to card emulation feature failed to work

Signed-off-by: Arron Wang <arron.wang@intel.com>
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 02:02:00 +01:00
Christophe Ricard
a688bf55c5 NFC: nci: Add se_io NCI operand
se_io allows to send apdu over the CLF to the embedded Secure Element.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 02:01:21 +01:00
Christophe Ricard
e9ef9431a3 NFC: nci: Update nci_disable_se to run proprietary commands to disable a secure element
Some NFC controller using NCI protocols may need a proprietary commands
flow to disable a secure element

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 02:01:21 +01:00
Christophe Ricard
93bca2bfa4 NFC: nci: Update nci_enable_se to run proprietary commands to enable a secure element
Some NFC controller using NCI protocols may need a proprietary commands
flow to enable a secure element

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 02:01:21 +01:00
Christophe Ricard
ba4db551bb NFC: nci: Update nci_discover_se to run proprietary commands to discover all available secure element
Some NFC controller using NCI protocols may need a proprietary commands
flow to discover all available secure element

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 02:01:21 +01:00
Christophe Ricard
c7dea2525b NFC: nci: Fix sparse: symbol 'nci_get_prop_rf_protocol' was not declared.
Fix sparse warning introduced by commit: 9e87f9a9c4

It was generating the following warning:
net/nfc/nci/ntf.c:170:7: sparse: symbol 'nci_get_prop_rf_protocol' was not declared. Should it be static?

Procedure to reproduce it:
# apt-get install sparse
  git checkout 9e87f9a9c4
  make ARCH=x86_64 allmodconfig
  make C=1 CF=-D__CHECK_ENDIAN__

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 01:50:42 +01:00
Christophe Ricard
9b8d32b7ac NFC: hci: Add se_io HCI operand
se_io allows to send apdu over the CLF to the embedded Secure Element.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-12-02 01:49:58 +01:00
John W. Linville
992066c8d3 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-12-01 15:49:58 -05:00
Jeff Layton
067f96ef17 sunrpc: release svc_pool_map reference when serv allocation fails
Currently, it leaks when the allocation fails.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-01 12:45:27 -07:00
Jeff Layton
8d65ef760d sunrpc: eliminate the XPT_DETACHED flag
All it does is indicate whether a xprt has already been deleted from
a list or not, which is unnecessary since we use list_del_init and it's
always set and checked under the sv_lock anyway.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-01 12:45:26 -07:00
Nicolas Dichtel
e0ebde0e13 rtnetlink: release net refcnt on error in do_setlink()
rtnl_link_get_net() holds a reference on the 'struct net', we need to release
it in case of error.

CC: Eric W. Biederman <ebiederm@xmission.com>
Fixes: b51642f6d7 ("net: Enable a userns root rtnl calls that are safe for unprivilged users")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-29 21:05:43 -08:00
David S. Miller
60b7379dc5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-29 20:47:48 -08:00
Steven Noonan
4338c57259 netfilter: nf_log_ipv6: correct typo in module description
It incorrectly identifies itself as "IPv4" packet logging.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-28 17:28:17 +01:00
Felix Fietkau
f027c2aca0 mac80211: add ieee80211_tx_status_noskb
This can be used by drivers that cannot reliably map tx status
information onto specific skbs.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-28 15:01:51 +01:00
Felix Fietkau
7e1cdcbb09 mac0211: add a helper function for fixing up tx status rates
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-28 15:01:50 +01:00
Felix Fietkau
ad9dda6383 mac80211: pass tx info to ieee80211_lost_packet instead of an skb
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-28 15:01:50 +01:00
Felix Fietkau
63558a650a mac80211: minstrel_ht: switch to .tx_status_noskb
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-28 15:01:50 +01:00
Felix Fietkau
fb7acfb8e1 mac80211: minstrel: switch to .tx_status_noskb
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-28 15:01:50 +01:00
Felix Fietkau
f684565e0a mac80211: add tx_status_noskb to rate_control_ops
This op works like .tx_status, except it does not need access to the
skb. This will be used by drivers that cannot match tx status
information to specific packets.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-28 15:01:50 +01:00
Felix Fietkau
95943425c0 mac80211: minstrel_ht: move aggregation check to .get_rate()
Preparation for adding a no-skb tx status path

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-28 15:01:50 +01:00
Johannes Berg
ea372c5452 cfg80211: remove unneeded initialisations in nl80211_set_reg
Some variables are assigned unconditionally, remove their
initialisations to help avoid introducing errors later.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-28 14:54:31 +01:00
Arik Nemtsov
ad932f046f cfg80211: leave invalid channels on regdomain change
When the regulatory settings change, some channels might become invalid.
Disconnect interfaces acting on these channels, after giving userspace
code a grace period to leave them.

This mode is currently opt-in, and not all interface operating modes are
supported for regulatory-enforcement checks. A wiphy that wishes to use
the new enforcement code must specify an appropriate regulatory flag,
and all its supported interface modes must be supported by the checking
code.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@suse.com>
[fix some indentation, typos]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-28 14:33:41 +01:00
Felix Fietkau
336004e291 mac80211: add more missing checks for VHT tx rates
Fixes a crash on attempting to calculate the frame duration for a VHT
packet (which needs to be handled by hw/driver instead).

Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-28 14:24:23 +01:00
Julien Lefrique
d7979e130e NFC: NCI: Signal deactivation in Target mode
Before signaling the deactivation, send a deactivation request if in
RFST_DISCOVERY state because neard assumes polling is stopped and will
try to restart it.

Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 14:07:51 +01:00
Julien Lefrique
6ff5462b67 NFC: NCI: Handle Discovery deactivation type
When the deactivation type reported by RF_DEACTIVATE_NTF is Discovery, go in
RFST_DISCOVERY state. The NFCC stays in Poll mode and/or Listen mode.

Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 14:07:51 +01:00
Julien Lefrique
966efbfb0d NFC: Fix a memory leak
Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 14:07:51 +01:00
Julien Lefrique
122c195872 NFC: NCI: Forward data received in Target mode to nfc core
Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 14:07:51 +01:00
Julien Lefrique
485f442fd5 NFC: NCI: Implement Target mode send function
As specified in NCI 1.0 and NCI 1.1, when using the NFC-DEP RF Interface, the
DH and the NFCC shall only use the Static RF Connection for data communication
with a Remote NFC Endpoint.

Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 14:07:51 +01:00
Julien Lefrique
529ee06682 NFC: NCI: Configure ATR_RES general bytes
The Target responds to the ATR_REQ with the ATR_RES. Configure the General
Bytes in ATR_RES with the first three octets equal to the NFC Forum LLCP
magic number, followed by some LLC Parameters TLVs described in section
4.5 of [LLCP].

Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 14:07:51 +01:00
Julien Lefrique
a99903ec45 NFC: NCI: Handle Target mode activation
Changes:

 * Extract the Listen mode activation parameters from RF_INTF_ACTIVATED_NTF.

 * Store the General Bytes of ATR_REQ.

 * Signal that Target mode is activated in case of an activation in NFC-DEP.

 * Update the NCI state accordingly.

 * Use the various constants defined in nfc.h.

 * Fix the ATR_REQ and ATR_RES maximum size. As per NCI 1.0 and NCI 1.1, the
   Activation Parameters for both Poll and Listen mode contain all the bytes of
   ATR_REQ/ATR_RES starting and including Byte 3 as defined in [DIGITAL].
   In [DIGITAL], the maximum size of ATR_REQ/ATR_RES is 64 bytes and they are
   numbered starting from Byte 1.

Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 14:07:51 +01:00
Julien Lefrique
90d78c1396 NFC: NCI: Enable NFC-DEP in Listen A and Listen F
Send LA_SEL_INFO and LF_PROTOCOL_TYPE with NFC-DEP protocol enabled.
Configure 212 Kbit/s and 412 Kbit/s bit rates for Listen F.

Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 14:07:51 +01:00
Julien Lefrique
772dccf4a7 NFC: NCI: Add passive Listen modes in discover request
The Target mode protocols are given to the nci_start_poll() function
but were previously ignored.
To enable P2P Target, when NFC-DEP is requested as a Target mode protocol, add
NFC-A and NFC-F Passive Listen modes in RF_DISCOVER_CMD command.

Signed-off-by: Julien Lefrique <lefrique@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 14:07:51 +01:00
Axel Lin
413df10bbf NFC: llcp: Use list_for_each_entry in llcp_accept_poll
list_for_each_entry_safe() is necessary if list objects are deleted from
the list while traversing it. Not the case here, so we can use the base
list_for_each_entry variant.

Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 13:41:44 +01:00
Mark A. Greer
9b5ec0fd58 NFC: digital: Add NFC-DEP Target-side ATN Support
When an NFC-DEP target receives an ATN PDU, its
supposed to respond with a similar ATN PDU.
When the Target receives an I PDU with the PNI
one less than the current PNI and the last PDU
sent was an ATN PDU, the Target is to resend the
last non-ATN PDU that it has sent.  This is
described in section 14.12.3.4 of the NFC Digital
Protocol Spec.

The digital layer's NFC-DEP code doesn't implement
this so add that support.

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:40:38 +01:00
Mark A. Greer
384ab1d174 NFC: digital: Add NFC-DEP Initiator-side ATN Support
When an NFC-DEP Initiator times out when waiting for
a DEP_RES from the Target, its supposed to send an
ATN to the Target.  The Target should respond to the
ATN with a similar ATN PDU and the Initiator can then
resend the last non-ATN PDU that it sent.  No more
than 'N(retry,atn)' are to be send where
2 <= 'N(retry,atn)' <= 5.  If the Initiator had just
sent a NACK PDU when the timeout occurred, it is to
continue sending NACKs until 'N(retry,nack)' NACKs
have been send.  This is described in section
14.12.5.6 of the NFC-DEP Digital Protocol Spec.

The digital layer's NFC-DEP code doesn't implement
this so add that support.

The value chosen for 'N(retry,atn)' is 2.

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:39:55 +01:00
Mark A. Greer
49dbb14e30 NFC: digital: Add NFC-DEP Target-side NACK Support
When an NFC-DEP Target receives a NACK PDU with
a PNI equal to 1 less than the current PNI, it
is supposed to re-send the last PDU.  This is
implied in section 14.12.5.4 of the NFC Digital
Protocol Spec.

The digital layer's NFC-DEP code doesn't implement
Target-side NACK handing so add it.  The last PDU
that was sent is saved in the 'nfc_digital_dev'
structure's 'saved_skb' member.  The skb will have
an additional reference taken to ensure that the skb
isn't freed when the driver performs a kfree_skb()
on the skb.  The length of the skb/PDU is also saved
so the length can be restored when re-sending the PDU
in the skb (the driver will perform an skb_pull() so
an skb_push() needs to be done to restore the skb's
data pointer/length).

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:39:47 +01:00
Mark A. Greer
a80509c76b NFC: digital: Add NFC-DEP Initiator-side NACK Support
When an NFC-DEP Initiator receives a frame with
an incorrect CRC or with a parity error, and the
frame is at least 4 bytes long, its supposed to
send a NACK to the Target.  The Initiator can
send up to 'N(retry,nack)' consecutive NACKs
where 2 <= 'N(retry,nack)' <= 5.  When the limit
is exceeded, a PROTOCOL EXCEPTION is raised.
Any other type of transmission error is to be
ignored and the Initiator should continue
waiting for a new frame.  This is described
in section 14.12.5.4 of the NFC Digital Protocol
Spec.

The digital layer's NFC-DEP code doesn't implement
any of this so add it.  This support diverges from
the spec in two significant ways:

a) NACKs will be sent for ANY error reported by the
   driver except a timeout.  This is done because
   there is currently no way for the digital layer
   to distinguish a CRC or parity error from any
   other type of error reported by the driver.

b) All other errors will cause a PROTOCOL EXCEPTION
   even frames with CRC errors that are less than 4
   bytes.

The value chosen for 'N(retry,nack)' is 2.

Targets do not send NACK PDUs.

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:39:33 +01:00
Mark A. Greer
c12715ab3f NFC: digital: Add NFC-DEP Receive Chaining Support
When the peer in an NFC-DEP exchange has a
packet to send that is larger than the local
maximum payload, it sets the 'MI' bit in the
'I' PDU.  This indicates that NFC-DEP chaining
is to occur.

When such a PDU is received, the local side
responds with an 'ACK' PDU and this continues
until the peer sends an 'I' PDU with the 'MI'
bit cleared.  This indicates that the chaining
sequence is complete and the entire packet has
been transferred.

Receiving chained PDUs is currently not supported
by the digital layer so add that support.  When a
chaining sequence is initiated by the peer, the
digital layer will allocate an skb large enough
to hold 8 maximum sized frame payloads.  The maximum
payload can range from 64 to 254 bytes so 8 * 254 =
2032 seems like a reasonable compromise between
potentially wasting memory and constantly reallocating
new, larger skbs.

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:39:21 +01:00
Mark A. Greer
3bd2a5bcc6 NFC: digital: Add NFC-DEP Send Chaining Support
When the NFC-DEP code is given a packet to send
that is larger than the peer's maximum payload,
its supposed to set the 'MI' bit in the 'I' PDU's
Protocol Frame Byte (PFB).  Setting this bit
indicates that NFC-DEP chaining is to occur.

When NFC-DEP chaining is progress, sender 'I' PDUs
are acknowledged with 'ACK' PDUs until the last 'I'
PDU in the chain (which has the 'MI' bit cleared)
is responded to with a normal 'I' PDU.  This can
occur while in Initiator mode or in Target mode.

Sender NFC-DEP chaining is currently not implemented
in the digital layer so add that support.  Unfortunately,
since sending a frame may require writing the CRC to the
end of the data, the relevant data part of the original
skb must be copied for each intermediate frame.

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:39:10 +01:00
Mark A. Greer
b08147cbc4 NFC: digital: Implement NFC-DEP max payload lengths
The maximum payload for NFC-DEP exchanges (i.e., the
number of bytes between SoD and EoD) is negotiated
using the ATR_REQ, ATR_RES, and PSL_REQ commands.
The valid maximum lengths are 64, 128, 192, and 254
bytes.

Currently, NFC-DEP code assumes that both sides are
always using 254 byte maximums and ignores attempts
by the peer to change it.  Instead, implement the
negotiation code, enforce the local maximum when
receiving data from the peer, and don't send payloads
that exceed the remote's maximum.  The default local
maximum is 254 bytes.

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:38:59 +01:00
Mark A. Greer
485fdc9bb6 NFC: digital: Enforce NFC-DEP PNI sequencing
NFC-DEP DEP_REQ and DEP_RES exchanges using 'I'
and 'ACK/NACK' PDUs have a sequence number called
the Packet Number Information (PNI).  The PNI
is incremented (modulo 4) after every DEP_REQ/
DEP_RES pair and should be verified by the digital
layer code.  That verification isn't always done,
though, so add code to make sure that it is done.

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:38:47 +01:00
Mark A. Greer
3e6b0de805 NFC: digital: Ensure no NAD byte in DEP_REQ and DEP_RES frames
According to chapter 14 of the NFC-DEP Digital
Protocol Spec., the NAD byte should never be
present in DEP_REQ or DEP_RES frames.  However,
this is not enforced so add that enforcement code.

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:38:36 +01:00
Mark A. Greer
05afedcb89 NFC: digital: Add Target-mode NFC-DEP DID Support
When in Target mode, the Initiator specifies whether
subsequent DEP_REQ and DEP_RES frames will include
a DID byte by the value passed in the ATR_REQ.  If
the DID value in the ATR_REQ is '0' then no DID
byte will be included.  If the DID value is between
'1' and '14' then a DID byte containing the same
value must be included in subsequent DEP_REQ and
DEP_RES frames.  Any other DID value is invalid.
This is specified in sections 14.8.1.2 and 14.8.2.2
of the NFC Digital Protocol Spec.

Checking the DID value (if it should be there at all),
is not currently supported by the digital layer's
NFC-DEP code.  Add this support by remembering the
DID value in the ATR_REQ, checking the DID value of
received DEP_REQ frames (if it should be there at all),
and including the remembered DID value in DEP_RES
frames when appropriate.

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:38:24 +01:00
Mark A. Greer
3bc3f88af5 NFC: digital: Ensure no DID in NFC-DEP responses
When in Initiator mode, the digital layer's
NFC-DEP code always sets the Device ID (DID)
value in the ATR_REQ to '0'.  This means that
subsequent DEP_REQ and DEP_RES frames must
never include a DID byte.  This is specified
in sections 14.8.1.1 and 14.8.2.1 of the NFC
Digital Protocol Spec.

Currently, the digital layer's NFC-DEP code
doesn't enforce this rule so add code to ensure
that there is no DID byte in DEP_RES frames.

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:38:10 +01:00
Mark A. Greer
6ce306682f NFC: digital: Rearrange NFC-DEP DEP_REQ/DEP_RES Code
Rearrange some of the code in digital_in_recv_dep_res()
and digital_tg_recv_dep_req() so the initial code looks
similar.  The real reason is prepare the code for some
upcoming patches that require these changes.

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:37:58 +01:00
Mark A. Greer
b15829ba5e NFC: digital: Fix potential skb leaks in NFC-DEP code
When digital_in_send_cmd() or digital_tg_send_cmd()
fail, they do not free the skb that was passed to
them so the routine that allocated the skb should
free it.  Currently, there are several routines in
the NFC-DEP code that don't do this so make them.

Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-11-28 12:37:47 +01:00
Johannes Berg
24a0aa212e cfg80211: make WEXT compatibility unselectable
This option has been marked for deprecation and removal for
a little more than two years, but it's not been very clearly
signalled since it was always possible to just select it.

Make it unselectable now to signal anyone who's still using
it after all this time more clearly. They can still get it
back, but only by patching the kernel.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-28 12:21:34 +01:00
Linus Torvalds
8e8459719c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Several small fixes here:

   1) Don't crash in tg3 driver when the number of tx queues has been
      configured to be different from the number of rx queues.  From
      Thadeu Lima de Souza Cascardo.

   2) VLAN filter not disabled properly in promisc mode in ixgbe driver,
      from Vlad Yasevich.

   3) Fix OOPS on dellink op in VTI tunnel driver, from Xin Long.

   4) IPV6 GRE driver WCCP code checks skb->protocol for ETH_P_IP
      instead of ETH_P_IPV6, whoops.  From Yuri Chislov.

   5) Socket matching in ping driver is buggy when packet AF does not
      match socket's AF.  Fix from Jane Zhou.

   6) Fix checksum calculation errors in VXLAN due to where the
      udp_tunnel6_xmit_skb() helper gets it's saddr/daddr from.  From
      Alexander Duyck.

   7) Fix 5G detection problem in rtlwifi driver, from Larry Finger.

   8) Fix NULL deref in tcp_v{4,6}_send_reset, from Eric Dumazet.

   9) Various missing netlink attribute verifications in bridging code,
      from Thomas Graf.

  10) tcp_recvmsg() unconditionally calls ipv4 ip_recv_error even for
      ipv6 sockets, whoops.  Fix from Willem de Bruijn"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits)
  net-timestamp: make tcp_recvmsg call ipv6_recv_error for AF_INET6 socks
  bridge: Sanitize IFLA_EXT_MASK for AF_BRIDGE:RTM_GETLINK
  bridge: Add missing policy entry for IFLA_BRPORT_FAST_LEAVE
  net: Check for presence of IFLA_AF_SPEC
  net: Validate IFLA_BRIDGE_MODE attribute length
  bridge: Validate IFLA_BRIDGE_FLAGS attribute length
  stmmac: platform: fix default values of the filter bins setting
  net/mlx4_core: Limit count field to 24 bits in qp_alloc_res
  net: dsa: bcm_sf2: reset switch prior to initialization
  net: dsa: bcm_sf2: fix unmapping registers in case of errors
  tg3: fix ring init when there are more TX than RX channels
  tcp: fix possible NULL dereference in tcp_vX_send_reset()
  rtlwifi: Change order in device startup
  rtlwifi: rtl8821ae: Fix 5G detection problem
  Revert "netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse"
  vxlan: Fix boolean flip in VXLAN_F_UDP_ZERO_CSUM6_[TX|RX]
  ip6_udp_tunnel: Fix checksum calculation
  net-timestamp: Fix a documentation typo
  net/ping: handle protocol mismatching scenario
  af_packet: fix sparse warning
  ...
2014-11-27 18:05:05 -08:00
Jeff Layton
388f0c7767 sunrpc: add a debugfs rpc_xprt directory with an info file in it
Add a new directory heirarchy under the debugfs sunrpc/ directory:

    sunrpc/
        rpc_xprt/
            <xprt id>/

Within that directory, we can put files that give info about the
xprts. We do have the (minor) problem that there is no succinct,
unique identifier for rpc_xprts. So we generate them synthetically
with a static atomic_t counter.

For now, this directory just holds an "info" file, but we may add
other files to it in the future.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-27 13:14:52 -05:00
Jeff Layton
b4b9d2ccf0 sunrpc: add debugfs file for displaying client rpc_task queue
It's possible to get a dump of the RPC task queue by writing a value to
/proc/sys/sunrpc/rpc_debug. If you write any value to that file, you get
a dump of the RPC client task list into the log buffer. This is a rather
inconvenient interface however, and makes it hard to get immediate info
about the task queue.

Add a new directory hierarchy under debugfs:

    sunrpc/
        rpc_clnt/
            <clientid>/

Within each clientid directory we create a new "tasks" file that will
dump info similar to what shows up in the log buffer, but with a few
small differences -- we avoid printing raw kernel addresses in favor of
symbolic names and the XID is also displayed.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-27 13:14:51 -05:00
Felix Fietkau
2967e031d4 mac80211: copy chandef from AP vif to VLANs
Instead of keeping track of all those special cases where
VLAN interfaces have no bss_conf.chandef, just make sure
they have the same as the AP interface they belong to.

Among others, this fixes a crash getting a VLAN's channel
from userspace since a NULL channel is returned as a good
result (return value 0) for VLANs since the commit below.

Cc: stable@vger.kernel.org [3.18 only]
Fixes: c12bc4885f ("mac80211: return the vif's chandef in ieee80211_cfg_get_channel()")
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[rewrite commit log]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-27 17:36:47 +01:00
Johannes Berg
601555cd75 nl80211: don't crash sending invalid chandef
One of the cases for an invalid channel definition is that
the channel pointer is NULL, in which case the warning is
a bit late since we'll dereference the pointer. Bail out
of the function upon warning about this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-27 17:27:52 +01:00
Pablo Neira Ayuso
b59eaf9e28 netfilter: combine IPv4 and IPv6 nf_nat_redirect code in one module
This resolves linking problems with CONFIG_IPV6=n:

net/built-in.o: In function `redirect_tg6':
xt_REDIRECT.c:(.text+0x6d021): undefined reference to `nf_nat_redirect_ipv6'

Reported-by: Andreas Ruprecht <rupran@einserver.de>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-27 13:08:42 +01:00
Alvaro Neira
1b63d4b9b5 netfilter: nf_tables_bridge: set the pktinfo for IPv4/IPv6 traffic
This patch adds the missing bits to allow to match per meta l4proto from
the bridge. Example:

  nft add rule bridge filter input ether type {ip, ip6} meta l4proto udp counter

Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-27 13:08:29 +01:00
Alvaro Neira
68b0faa87d netfilter: nf_tables_bridge: export nft_reject_ip*hdr_validate functions
This patch exports the functions nft_reject_iphdr_validate and
nft_reject_ip6hdr_validate to use it in follow up patches.
These functions check if the IPv4/IPv6 header is correct.

Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-27 12:58:05 +01:00
Florian Westphal
c41884ce05 netfilter: conntrack: avoid zeroing timer
add a __nfct_init_offset annotation member to struct nf_conn to make
it clear which members are covered by the memset when the conntrack
is allocated.

This avoids zeroing timer_list and ct_net; both are already inited
explicitly.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-27 12:41:06 +01:00
Florian Westphal
abc86d0f99 netfilter: xt_recent: relax ip_pkt_list_tot restrictions
The maximum value for the hitcount parameter is given by
"ip_pkt_list_tot" parameter (default: 20).

Exceeding this value on the command line will cause the rule to be
rejected.  The parameter is also readonly, i.e. it cannot be changed
without module unload or reboot.

Store size per table, then base nstamps[] size on the hitcount instead.

The module parameter is retained for backwards compatibility.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-27 12:40:31 +01:00
Steven Walter
f6af675ef5 Bluetooth: Automatically flushable packets aren't allowed on LE links
The Bluetooth spec states that automatically flushable packets may not
be sent over a LE-U link.

Signed-off-by: Steven Walter <stevenrwalter@gmail.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-11-27 12:12:27 +02:00
Trond Myklebust
ea5264138d NFS: Client side changes for RDMA
These patches various bugfixes and cleanups for using NFS over RDMA, including
 better error handling and performance improvements by using pad optimization.
 
 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJUdPv2AAoJENfLVL+wpUDrJkMQAKjtPZHLMcj+eHm4f1ZKLJxy
 GSrUZV21TU9tL0NVE/5An8US6hoLwHpNXsW8o+gHTAeGRyiCmIaNXGd1Ql/4PYRH
 zfzdNXoaJAh1N5iXX11fF3gOWqx/SolqzO2xLDVETK/3lAvq0VwMYoMElBQB6qQW
 8sN3z8yVuz/9Ia9oGIFhqu1B6dcKPHkQDMtmsElGxeEX/+9yEg4HUKx+kZDtV0Uj
 8/JM8Jh1FKRCQT/P6INkRItdY5KaSJGFc43BkC/8lbugfxa5XCyu/m/qMr9FJsDV
 nM6rwaiVcmR/mvD3fL82+Jg/M+P9VUHQ1/Az0sV9G+fEoHH/1Mey3LfMzNpUmf9v
 bykrPRuzXkPPQgN1VnjSaF2RF+CWwV9Nme1VVXM/zj8gHX1mcmQF/wPRxDuLjCrt
 EObAFsvHOwDTZZmYp9bG5kc6IvwvT8aeeVQMJ4q4PSGD3w8AtoIyJDn+Ee0LFD1K
 Zw0oZpTJpI4t7DVxGBSdo2wZWuMU/UKqGqGtGJ+ljXfTRuuq968Q5j5ujaA9vf0v
 C9igYTU8hq4teMzhZrfR1jtTWoSS+5zamb1KtvAZy8gsht2PQVgE9xka2k/AV8uE
 ul/w5HU4OV+QIrHNbiu7BE8B2Ags6smpdHMqn9fqLBwvG+JEwbWqk1zeTsajxzq+
 hkvKkkMq6JjDbsDf96Yk
 =YIru
 -----END PGP SIGNATURE-----

Merge tag 'nfs-rdma-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next

Pull NFS client RDMA changes for 3.19 from Anna Schumaker:
 "NFS: Client side changes for RDMA

  These patches various bugfixes and cleanups for using NFS over RDMA, including
  better error handling and performance improvements by using pad optimization.

  Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>"

* tag 'nfs-rdma-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma:
  xprtrdma: Display async errors
  xprtrdma: Enable pad optimization
  xprtrdma: Re-write rpcrdma_flush_cqs()
  xprtrdma: Refactor tasklet scheduling
  xprtrdma: unmap all FMRs during transport disconnect
  xprtrdma: Cap req_cqinit
  xprtrdma: Return an errno from rpcrdma_register_external()
2014-11-26 17:37:13 -05:00
Trond Myklebust
1702562db4 NFS: Generic client side changes from Chuck
These patches fixes for iostats and SETCLIENTID in addition to cleaning
 up the nfs4_init_callback() function.
 
 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJUdfhQAAoJENfLVL+wpUDr9x4QANWmG6xjEU7RuIBLalOoxit6
 eXnEDNZlwYp6NCkYktHVWaTqXdwq7fdGX+3p4eiwNg3C3SrJHkpWvjeFd4KT/SyP
 B/w3vYG3o5H01i3Mb5kCD2uW0gbS9soZXpIg+uHHOl43yZzneC0vPTLhQ/h+9zPc
 B32XcgQhvLIHR3LWrC/+uolsa31lcyya0W45PX+iCpzHF7i9qRBrNLODkTlw1hNQ
 eEnYIvVy5oW00zlHJUYiTHP3e+0EJn5PAngdYbqiboJ9mK7DbB0QDwqyvJbIT7ql
 WAip6cNcJnSv1eiVYqDwlR1ok8drK5X7yQCT3lcLzAMDznLsSAL1Itu0h2Ay3z61
 f8XCyTwI0izq0DbdrMcPoqPSitqyM8nkPElnOuitXwzEroPaG40OF67yss3+ixbl
 JeQZ+u35pnpCkKUaZdCK3Pn83StxmUaBcFx8eg30NBc0SN13Eiz6aZcGEperClrR
 RwMLDUhUtAMcMRunRRxiN9lHafPqqeDeJre7uky0p0sU9CsH+1n5qIKLmk+Ber2d
 ZS29TobdR7ktfjQ52XazMAIFzI7r1v4Zn7ziH/WRbvKoKw9ICcyoUGNy9CS5LyWu
 BxWCZAJmby5H1i7V7ituJK8TImb81L4aPW06hHX9k+0SoTrgFjas//4jZYfysJ9N
 PvR0MRNfMFhuBlsTj7Gy
 =PMdS
 -----END PGP SIGNATURE-----

Merge tag 'nfs-cel-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next

Pull pull additional NFS client changes for 3.19 from Anna Schumaker:
  "NFS: Generic client side changes from Chuck

  These patches fixes for iostats and SETCLIENTID in addition to cleaning
  up the nfs4_init_callback() function.

  Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>"

* tag 'nfs-cel-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma:
  NFS: Clean up nfs4_init_callback()
  NFS: SETCLIENTID XDR buffer sizes are incorrect
  SUNRPC: serialize iostats updates
2014-11-26 17:34:14 -05:00
Willem de Bruijn
f4713a3dfa net-timestamp: make tcp_recvmsg call ipv6_recv_error for AF_INET6 socks
TCP timestamping introduced MSG_ERRQUEUE handling for TCP sockets.
If the socket is of family AF_INET6, call ipv6_recv_error instead
of ip_recv_error.

This change is more complex than a single branch due to the loadable
ipv6 module. It reuses a pre-existing indirect function call from
ping. The ping code is safe to call, because it is part of the core
ipv6 module and always present when AF_INET6 sockets are active.

Fixes: 4ed2d765 (net-timestamp: TCP timestamping)
Signed-off-by: Willem de Bruijn <willemb@google.com>

----

It may also be worthwhile to add WARN_ON_ONCE(sk->family == AF_INET6)
to ip_recv_error.
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 15:45:04 -05:00
Thomas Graf
aa68c20ff3 bridge: Sanitize IFLA_EXT_MASK for AF_BRIDGE:RTM_GETLINK
Only search for IFLA_EXT_MASK if the message actually carries a
ifinfomsg header and validate minimal length requirements for
IFLA_EXT_MASK.

Fixes: 6cbdceeb ("bridge: Dump vlan information from a bridge port")
Cc: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 15:29:01 -05:00
Thomas Graf
6f705d8cfc bridge: Add missing policy entry for IFLA_BRPORT_FAST_LEAVE
Fixes: c2d3babf ("bridge: implement multicast fast leave")
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 15:29:01 -05:00
Thomas Graf
6e8d1c5545 bridge: Validate IFLA_BRIDGE_FLAGS attribute length
Payload is currently accessed blindly and may exceed valid message
boundaries.

Fixes: 407af3299 ("bridge: Add netlink interface to configure vlans on bridge ports")
Cc: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 15:29:00 -05:00
Johannes Berg
98f0334263 cfg80211: clean up beacon loss CQM event
Having it as a sub-event for RSSI thresholds is very ugly,
but luckily no userspace actually uses the events yet.

Move the event to its own function call internally and to
its own event attribute in nl80211.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-26 20:56:42 +01:00
Ying Xue
a6ca109443 tipc: use generic SKB list APIs to manage TIPC outgoing packet chains
Use standard SKB list APIs associated with struct sk_buff_head to
manage socket outgoing packet chain and name table outgoing packet
chain, having relevant code simpler and more readable.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:30:17 -05:00
Ying Xue
f03273f1e2 tipc: use generic SKB list APIs to manage link receive queue
Use standard SKB list APIs associated with struct sk_buff_head to
manage link's receive queue to simplify its relevant code cemplexity.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:30:17 -05:00
Ying Xue
bc6fecd409 tipc: use generic SKB list APIs to manage deferred queue of link
Use standard SKB list APIs associated with struct sk_buff_head to
manage link's deferred queue, simplifying relevant code.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:30:17 -05:00
Ying Xue
58dc55f256 tipc: use generic SKB list APIs to manage link transmission queue
Use standard SKB list APIs associated with struct sk_buff_head to
manage link transmission queue, having relevant code more clean.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:30:17 -05:00
Ying Xue
58d78b328a tipc: use skb_queue_walk_safe marco to simplify link_prepare_wakeup routine
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:30:17 -05:00
Ying Xue
99315ad43d tipc: remove unused between routine
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:30:17 -05:00
Ying Xue
58311d1690 tipc: eliminate two pseudo message types of BUNDLE_OPEN and BUNDLE_CLOSED
The pseudo message types of BUNDLE_CLOSED as well as BUNDLE_OPEN are
used to flag whether or not more messages can be bundled into a data
packet in the outgoing transmission queue. Obviously, no more messages
can be appended after the packet has been sent and is waiting to be
acknowledged and deleted. These message types do in reality represent
a send-side local implementation flag, and are not defined as part of
the protocol. It is therefore safe to move it to to where it belongs,
that is, the control area (TIPC_SKB_CB) of the buffer.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:30:17 -05:00
Ying Xue
47b4c9a82f tipc: clean up the process of link pushing packets
In original tipc_link_push_packet(), it pushes messages from protocol
message queue, retransmission queue and next_out queue. But as the two
first queues are removed, we can simplify its relevant code through
deleting tipc_link_push_queue().

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:30:16 -05:00
Ying Xue
7b6f087f98 tipc: remove retransmission queue
TIPC retransmission queue is intended to record which messages
should be retransmitted when bearer is not congested. However,
as the retransmission queue becomes useless with the removal of
bearer congestion mechanism, it should be removed.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:30:16 -05:00
Ying Xue
8965d250c2 tipc: remove protocol message queue
TIPC protocol message queue is intended to save one protocol message
when bearer is congested so that the message stored in the queue can
be immediately transmitted when bearer congestion is released. However,
as now the protocol queue has no mission any more with the removal of
bearer congestion mechanism, it should be removed.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:30:16 -05:00
Ying Xue
a8f48af587 tipc: remove node subscription infrastructure
The node subscribe infrastructure represents a virtual base class, so
its users, such as struct tipc_port and struct publication, can derive
its implemented functionalities. However, after the removal of struct
tipc_port, struct publication is left as its only single user now. So
defining an abstract infrastructure for one user becomes no longer
reasonable. If corresponding new functions associated with the
infrastructure are moved to name_table.c file, the node subscription
infrastructure can be removed as well.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:30:16 -05:00
zhuyj
73cf0e923d ipv6: Remove unnecessary test
The "init_net" test in function addrconf_exit_net is introduced
in commit 44a6bd29 [Create ipv6 devconf-s for namespaces] to avoid freeing
init_net. In commit c900a800 [ipv6: fix bad free of addrconf_init_net],
function addrconf_init_net will allocate memory for every net regardless of
init_net. In this case, it is unnecessary to make "init_net" test.

CC: Hong Zhiguo <honkiko@gmail.com>
CC: Octavian Purdila <opurdila@ixiacom.com>
CC: Pavel Emelyanov <xemul@openvz.org>
CC: Cong Wang <cwang@twopensource.com>
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:27:04 -05:00
Tom Herbert
4fd671ded1 gue: Call remcsum_adjust
Change remote checksum offload to call remcsum_adjust. This also
eliminates the optimization to skip an IP header as part of the
adjustment (really does not seem to be much of a win).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:25:44 -05:00
Eric Dumazet
ced7a04e39 pkt_sched: fq: increase max delay from 125 ms to one second
FQ/pacing has a clamp of delay of 125 ms, to avoid some possible harm.

It turns out this delay is too small to allow pacing low rates :
Some ISP setup very aggressive policers as low as 16kbit.

Now TCP stack has spurious rtx prevention, it seems safe to increase
this fixed parameter, without adding a qdisc attribute.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:08:04 -05:00
Johannes Berg
5b97f49d65 cfg80211: refactor the various CQM event sending code
Much of the code can be shared by moving it into helper functions
for the CQM event sending.

Also move the code closer together, even in the header file.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-26 12:47:38 +01:00
Varka Bhadram
473f3766b5 mac802154: remove unnecessary if statement
Removes unnecessary if statement check for net device. Error check performed
after alloc_netdev().

	ndev = alloc_netdev(sizeof(*sdata) + local->hw.vif_data_size, name,
			    NET_NAME_UNKNOWN, ieee802154_if_setup);
        if (!ndev)
                return ERR_PTR(-ENOMEM);
	..

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-26 06:00:13 +01:00
Varka Bhadram
69bb631ef9 ieee802154: fix spelling mistakes
Signed-off-by: Varka Bhadram <varkab@cdac.in>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-26 06:00:13 +01:00
Linus Torvalds
b914c5b213 Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from Bruce Fields:
 "These fix one mishandling of the case when security labels are
  configured out, and two races in the 4.1 backchannel code"

* 'for-3.18' of git://linux-nfs.org/~bfields/linux:
  nfsd: Fix slot wake up race in the nfsv4.1 callback code
  SUNRPC: Fix locking around callback channel reply receive
  nfsd: correctly define v4.2 support attributes
2014-11-25 19:05:41 -08:00
David S. Miller
d3fc6b3fdd Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
More work from Al Viro to move away from modifying iovecs
by using iov_iter instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-25 20:02:51 -05:00
Chuck Lever
edef1297f3 SUNRPC: serialize iostats updates
Occasionally mountstats reports a negative retransmission rate.
Ensure that two RPCs completing concurrently don't confuse the sums
in the transport's op_metrics array.

Since pNFS filelayout can invoke rpc_count_iostats() on another
transport from xprt_release(), we can't rely on simply holding the
transport_lock in xprt_release(). There's nothing for it but hard
serialization. One spin lock per RPC operation should make this as
painless as it can be.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-11-25 16:22:15 -05:00
Eric Dumazet
c3658e8d0f tcp: fix possible NULL dereference in tcp_vX_send_reset()
After commit ca777eff51 ("tcp: remove dst refcount false sharing for
prequeue mode") we have to relax check against skb dst in
tcp_v[46]_send_reset() if prequeue dropped the dst.

If a socket is provided, a full lookup was done to find this socket,
so the dst test can be skipped.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88191
Reported-by: Jaša Bartelj <jasa.bartelj@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Daniel Borkmann <dborkman@redhat.com>
Fixes: ca777eff51 ("tcp: remove dst refcount false sharing for prequeue mode")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-25 14:29:18 -05:00
Pablo Neira
43612d7c04 Revert "netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse"
This reverts commit 5195c14c8b.

If the conntrack clashes with an existing one, it is left out of
the unconfirmed list, thus, crashing when dropping the packet and
releasing the conntrack since golden rule is that conntracks are
always placed in any of the existing lists for traceability reasons.

Reported-by: Daniel Borkmann <dborkman@redhat.com>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=88841
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-25 14:14:51 -05:00
Alexander Duyck
f3750817a9 ip6_udp_tunnel: Fix checksum calculation
The UDP checksum calculation for VXLAN tunnels is currently using the
socket addresses instead of the actual packet source and destination
addresses.  As a result the checksum calculated is incorrect in some
cases.

Also uh->check was being set twice, first it was set to 0, and then it is
set again in udp6_set_csum.  This change removes the redundant assignment
to 0.

Fixes: acbf74a7 ("vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions.")

Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-25 14:12:12 -05:00
Chuck Lever
7ff11de1ba xprtrdma: Display async errors
An async error upcall is a hard error, and should be reported in
the system log.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-11-25 13:39:20 -05:00
Chuck Lever
d5440e27d3 xprtrdma: Enable pad optimization
The Linux NFS/RDMA server used to reject NFSv3 WRITE requests when
pad optimization was enabled. That bug was fixed by commit
e560e3b510 ("svcrdma: Add zero padding if the client doesn't send
it").

We can now enable pad optimization on the client, which helps
performance and is supported now by both Linux and Solaris servers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-11-25 13:39:20 -05:00
Chuck Lever
5c166bef4f xprtrdma: Re-write rpcrdma_flush_cqs()
Currently rpcrdma_flush_cqs() attempts to avoid code duplication,
and simply invokes rpcrdma_recvcq_upcall and rpcrdma_sendcq_upcall.

1. rpcrdma_flush_cqs() can run concurrently with provider upcalls.
   Both flush_cqs() and the upcalls were invoking ib_poll_cq() in
   different threads using the same wc buffers (ep->rep_recv_wcs
   and ep->rep_send_wcs), added by commit 1c00dd0776 ("xprtrmda:
   Reduce calls to ib_poll_cq() in completion handlers").

   During transport disconnect processing, this sometimes resulted
   in the same reply getting added to the rpcrdma_tasklets_g list
   more than once, which corrupted the list.

2. The upcall functions drain only a limited number of CQEs,
   thanks to the poll budget added by commit 8301a2c047
   ("xprtrdma: Limit work done by completion handler").

Fixes: a7bc211ac9 ("xprtrdma: On disconnect, don't ignore ... ")
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=276
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-11-25 13:39:20 -05:00
Chuck Lever
f1a03b76fe xprtrdma: Refactor tasklet scheduling
Restore the separate function that schedules the reply handling
tasklet. I need to call it from two different paths.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-11-25 13:39:20 -05:00
Chuck Lever
467c9674bc xprtrdma: unmap all FMRs during transport disconnect
When using RPCRDMA_MTHCAFMR memory registration, after a few
transport disconnect / reconnect cycles, ib_map_phys_fmr() starts to
return EINVAL because the provider has exhausted its map pool.

Make sure that all FMRs are unmapped during transport disconnect,
and that ->send_request remarshals them during an RPC retransmit.
This resets the transport's MRs to ensure that none are leaked
during a disconnect.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-11-25 13:39:20 -05:00
Chuck Lever
e7104a2a96 xprtrdma: Cap req_cqinit
Recent work made FRMR registration and invalidation completions
unsignaled. This greatly reduces the adapter interrupt rate.

Every so often, however, a posted send Work Request is allowed to
signal. Otherwise, the provider's Work Queue will wrap and the
workload will hang.

The number of Work Requests that are allowed to remain unsignaled is
determined by the value of req_cqinit. Currently, this is set to the
size of the send Work Queue divided by two, minus 1.

For FRMR, the send Work Queue is the maximum number of concurrent
RPCs (currently 32) times the maximum number of Work Requests an
RPC might use (currently 7, though some adapters may need more).

For mlx4, this is 224 entries. This leaves completion signaling
disabled for 111 send Work Requests.

Some providers hold back dispatching Work Requests until a CQE is
generated.  If completions are disabled, then no CQEs are generated
for quite some time, and that can stall the Work Queue.

I've seen this occur running xfstests generic/113 over NFSv4, where
eventually, posting a FAST_REG_MR Work Request fails with -ENOMEM
because the Work Queue has overflowed. The connection is dropped
and re-established.

Cap the rep_cqinit setting so completions are not left turned off
for too long.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=269
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-11-25 13:39:20 -05:00
Chuck Lever
92b98361f1 xprtrdma: Return an errno from rpcrdma_register_external()
The RPC/RDMA send_request method and the chunk registration code
expects an errno from the registration function. This allows
the upper layers to distinguish between a recoverable failure
(for example, temporary memory exhaustion) and a hard failure
(for example, a bug in the registration logic).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-11-25 13:39:20 -05:00
Daniel Borkmann
79e886599e crypto: algif - add and use sock_kzfree_s() instead of memzero_explicit()
Commit e1bd95bf7c ("crypto: algif - zeroize IV buffer") and
2a6af25bef ("crypto: algif - zeroize message digest buffer")
added memzero_explicit() calls on buffers that are later on
passed back to sock_kfree_s().

This is a discussed follow-up that, instead, extends the sock
API and adds sock_kzfree_s(), which internally uses kzfree()
instead of kfree() for passing the buffers back to slab.

Having sock_kzfree_s() allows to keep the changes more minimal
by just having a drop-in replacement instead of adding
memzero_explicit() calls everywhere before sock_kfree_s().

In kzfree(), the compiler is not allowed to optimize the memset()
away and thus there's no need for memzero_explicit(). Both,
sock_kfree_s() and sock_kzfree_s() are wrappers for
__sock_kfree_s() and call into kfree() resp. kzfree(); here,
__sock_kfree_s() needs to be explicitly inlined as we want the
compiler to optimize the call and condition away and thus it
produces e.g. on x86_64 the _same_ assembler output for
sock_kfree_s() before and after, and thus also allows for
avoiding code duplication.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-11-25 22:50:39 +08:00
Johannes Berg
40a11ca83d mac80211: check if channels allow 80 MHz for VHT probe requests
If there are no channels allowing 80 MHz to be used, then the
station isn't really VHT capable even if the driver and device
support it in general. In this case, exclude the VHT capability
IE from probe request frames.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-25 11:05:33 +01:00
Johannes Berg
ea9eba6a8b cfg80211: remove pointless channel lookup in survey code
We have a channel pointer, and we use its center frequency
to look up a channel pointer - which will thus be exactly
the same as the original pointer.

Remove that pointless lookup and just use the pointer.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-25 09:57:27 +01:00
Jeff Layton
1306729b0d sunrpc: eliminate RPC_TRACEPOINTS
It's always set to the same value as CONFIG_TRACEPOINTS, so we can just
use that instead.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-24 17:33:12 -05:00
Jeff Layton
f895b252d4 sunrpc: eliminate RPC_DEBUG
It's always set to whatever CONFIG_SUNRPC_DEBUG is, so just use that.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-24 17:31:46 -05:00
Jane Zhou
91a0b60346 net/ping: handle protocol mismatching scenario
ping_lookup() may return a wrong sock if sk_buff's and sock's protocols
dont' match. For example, sk_buff's protocol is ETH_P_IPV6, but sock's
sk_family is AF_INET, in that case, if sk->sk_bound_dev_if is zero, a wrong
sock will be returned.
the fix is to "continue" the searching, if no matching, return NULL.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Jane Zhou <a17711@motorola.com>
Signed-off-by: Yiwei Zhao <gbjc64@motorola.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-24 16:48:20 -05:00
Michael S. Tsirkin
6e58040b84 af_packet: fix sparse warning
af_packet produces lots of these:
	net/packet/af_packet.c:384:39: warning: incorrect type in return expression (different modifiers)
	net/packet/af_packet.c:384:39:    expected struct page [pure] *
	net/packet/af_packet.c:384:39:    got struct page *

this seems to be because sparse does not realize that _pure
refers to function, not the returned pointer.

Tweak code slightly to avoid the warning.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-24 16:15:36 -05:00
Yuri Chislov
be6572fdb1 ipv6: gre: fix wrong skb->protocol in WCCP
When using GRE redirection in WCCP, it sets the wrong skb->protocol,
that is, ETH_P_IP instead of ETH_P_IPV6 for the encapuslated traffic.

Fixes: c12b395a46 ("gre: Support GRE over IPv6")
Cc: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Yuri Chislov <yuri.chislov@gmail.com>
Tested-by: Yuri Chislov <yuri.chislov@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-24 16:11:05 -05:00
Richard Alpe
d8182804cf tipc: fix sparse warnings in new nl api
Fix sparse warnings about non-static declaration of static functions
in the new tipc netlink API.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-24 16:10:23 -05:00
David S. Miller
958d03b016 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
netfilter/ipvs updates for net-next

The following patchset contains Netfilter updates for your net-next
tree, this includes the NAT redirection support for nf_tables, the
cgroup support for nft meta and conntrack zone support for the connlimit
match. Coming after those, a bunch of sparse warning fixes, missing
netns bits and cleanups. More specifically, they are:

1) Prepare IPv4 and IPv6 NAT redirect code to use it from nf_tables,
   patches from Arturo Borrero.

2) Introduce the nf_tables redir expression, from Arturo Borrero.

3) Remove an unnecessary assignment in ip_vs_xmit/__ip_vs_get_out_rt().
   Patch from Alex Gartrell.

4) Add nft_log_dereference() macro to the nf_log infrastructure, patch
   from Marcelo Leitner.

5) Add some extra validation when registering logger families, also
   from Marcelo.

6) Some spelling cleanups from stephen hemminger.

7) Fix sparse warning in nf_logger_find_get().

8) Add cgroup support to nf_tables meta, patch from Ana Rey.

9) A Kconfig fix for the new redir expression and fix sparse warnings in
   the new redir expression.

10) Fix several sparse warnings in the netfilter tree, from
    Florian Westphal.

11) Reduce verbosity when OOM in nfnetlink_log. User can basically do
    nothing when this situation occurs.

12) Add conntrack zone support to xt_connlimit, again from Florian.

13) Add netnamespace support to the h323 conntrack helper, contributed
    by Vasily Averin.

14) Remove unnecessary nul-pointer checks before free_percpu() and
    module_put(), from Markus Elfring.

15) Use pr_fmt in nfnetlink_log, again patch from Marcelo Leitner.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-24 16:00:58 -05:00
Jeff Layton
1a867a0898 sunrpc: add tracepoints in xs_tcp_data_recv
Add tracepoints inside the main loop on xs_tcp_data_recv that allow
us to keep an eye on what's happening during each phase of it.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-24 12:53:35 -05:00
Jeff Layton
3705ad64f1 sunrpc: add new tracepoints in xprt handling code
...so we can keep track of when calls are sent and replies received.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-24 12:53:35 -05:00
Jeff Layton
860a0d9e51 sunrpc: add some tracepoints in svc_rqst handling functions
...just around svc_send, svc_recv and svc_process for now.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-24 12:53:34 -05:00
Johannes Berg
575f05302e mac80211: disable 80+80/160 in VHT correctly
The supported bandwidth field is a two-bit field, not a bitmap,
so treat it accordingly when disabling 80+80 or 160 MHz.

Note that we can only advertise "80+80 and 160" or "160", not
"80+80" by itself, so disabling 160 also disables 80+80.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-24 16:55:28 +01:00
Al Viro
083735f4b0 rds: switch rds_message_copy_from_user() to iov_iter
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 05:16:43 -05:00
Al Viro
c310e72c89 rds: switch ->inc_copy_to_user() to passing iov_iter
instances get considerably simpler from that...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 05:16:43 -05:00
Al Viro
7424ce6506 [atm] switch vcc_sendmsg() to copy_from_iter()
... and make it handle multi-segment iovecs - deals with that
"fix this later" issue for free.  A bit of shame, really - it
had been there since 2.3.15pre3 when the whole thing went into the
tree, practically a historical artefact by now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 05:16:42 -05:00
Al Viro
0f7db23a07 vmci_transport: switch ->enqeue_dgram, ->enqueue_stream and ->dequeue_stream to msghdr
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 05:16:42 -05:00
Al Viro
45dcc687f7 tipc_msg_build(): pass msghdr instead of its ->msg_iov
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 05:16:41 -05:00
Al Viro
562640f3c3 tipc_sendmsg(): pass msghdr instead of its ->msg_iov
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 05:16:40 -05:00
Al Viro
e0eb093e79 switch sctp_user_addto_chunk() and sctp_datamsg_from_user() to passing iov_iter
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 05:16:40 -05:00
Al Viro
8feb2fb2bb switch AF_PACKET and AF_UNIX to skb_copy_datagram_from_iter()
... and kill skb_copy_datagram_iovec()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 05:16:39 -05:00
Al Viro
195e952d03 kill zerocopy_sg_from_iovec()
no users left

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 05:16:39 -05:00
Al Viro
3a654f975b new helpers: skb_copy_datagram_from_iter() and zerocopy_sg_from_iter()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 05:03:08 -05:00
Al Viro
7eab8d9e8a new helper: memcpy_to_msg()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 04:28:51 -05:00
Al Viro
e169371823 switch ipxrtr_route_packet() from iovec to msghdr
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 04:28:49 -05:00
Al Viro
6ce8e9ce59 new helper: memcpy_from_msg()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 04:28:48 -05:00
Al Viro
227158db16 new helper: skb_copy_and_csum_datagram_msg()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 04:28:44 -05:00
lucien
20ea60ca99 ip_tunnel: the lack of vti_link_ops' dellink() cause kernel panic
Now the vti_link_ops do not point the .dellink, for fb tunnel device
(ip_vti0), the net_device will be removed as the default .dellink is
unregister_netdevice_queue,but the tunnel still in the tunnel list,
then if we add a new vti tunnel, in ip_tunnel_find():

        hlist_for_each_entry_rcu(t, head, hash_node) {
                if (local == t->parms.iph.saddr &&
                    remote == t->parms.iph.daddr &&
                    link == t->parms.link &&
==>                 type == t->dev->type &&
                    ip_tunnel_key_match(&t->parms, flags, key))
                        break;
        }

the panic will happen, cause dev of ip_tunnel *t is null:
[ 3835.072977] IP: [<ffffffffa04103fd>] ip_tunnel_find+0x9d/0xc0 [ip_tunnel]
[ 3835.073008] PGD b2c21067 PUD b7277067 PMD 0
[ 3835.073008] Oops: 0000 [#1] SMP
.....
[ 3835.073008] Stack:
[ 3835.073008]  ffff8800b72d77f0 ffffffffa0411924 ffff8800bb956000 ffff8800b72d78e0
[ 3835.073008]  ffff8800b72d78a0 0000000000000000 ffffffffa040d100 ffff8800b72d7858
[ 3835.073008]  ffffffffa040b2e3 0000000000000000 0000000000000000 0000000000000000
[ 3835.073008] Call Trace:
[ 3835.073008]  [<ffffffffa0411924>] ip_tunnel_newlink+0x64/0x160 [ip_tunnel]
[ 3835.073008]  [<ffffffffa040b2e3>] vti_newlink+0x43/0x70 [ip_vti]
[ 3835.073008]  [<ffffffff8150d4da>] rtnl_newlink+0x4fa/0x5f0
[ 3835.073008]  [<ffffffff812f68bb>] ? nla_strlcpy+0x5b/0x70
[ 3835.073008]  [<ffffffff81508fb0>] ? rtnl_link_ops_get+0x40/0x60
[ 3835.073008]  [<ffffffff8150d11f>] ? rtnl_newlink+0x13f/0x5f0
[ 3835.073008]  [<ffffffff81509cf4>] rtnetlink_rcv_msg+0xa4/0x270
[ 3835.073008]  [<ffffffff8126adf5>] ? sock_has_perm+0x75/0x90
[ 3835.073008]  [<ffffffff81509c50>] ? rtnetlink_rcv+0x30/0x30
[ 3835.073008]  [<ffffffff81529e39>] netlink_rcv_skb+0xa9/0xc0
[ 3835.073008]  [<ffffffff81509c48>] rtnetlink_rcv+0x28/0x30
....

modprobe ip_vti
ip link del ip_vti0 type vti
ip link add ip_vti0 type vti
rmmod ip_vti

do that one or more times, kernel will panic.

fix it by assigning ip_tunnel_dellink to vti_link_ops' dellink, in
which we skip the unregister of fb tunnel device. do the same on ip6_vti.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-23 21:11:17 -05:00
Ian Morris
e5d08d718a ipv6: coding style improvements (remove assignment in if statements)
This change has no functional impact and simply addresses some coding
style issues detected by checkpatch. Specifically this change
adjusts "if" statements which also include the assignment of a
variable.

No changes to the resultant object files result as determined by objdiff.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-23 21:00:56 -05:00
Alexander Duyck
b6fef4c6b8 ipv6: Do not treat a GSO_TCPV4 request from UDP tunnel over IPv6 as invalid
This patch adds SKB_GSO_TCPV4 to the list of supported GSO types handled by
the IPv6 GSO offloads.  Without this change VXLAN tunnels running over IPv6
do not currently handle IPv4 TCP TSO requests correctly and end up handing
the non-segmented frame off to the device.

Below is the before and after for a simple netperf TCP_STREAM test between
two endpoints tunneling IPv4 over a VXLAN tunnel running on IPv6 on top of
a 1Gb/s network adapter.

Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.29       0.88      Before
 87380  16384  16384    10.03     895.69      After

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-23 14:18:11 -05:00
David S. Miller
1459143386 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ieee802154/fakehard.c

A bug fix went into 'net' for ieee802154/fakehard.c, which is removed
in 'net-next'.

Add build fix into the merge from Stephen Rothwell in openvswitch, the
logging macros take a new initial 'log' argument, a new call was added
in 'net' so when we merge that in here we have to explicitly add the
new 'log' arg to it else the build fails.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 22:28:24 -05:00
Linus Torvalds
8a84e01e14 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix BUG when decrypting empty packets in mac80211, from Ronald Wahl.

 2) nf_nat_range is not fully initialized and this is copied back to
    userspace, from Daniel Borkmann.

 3) Fix read past end of b uffer in netfilter ipset, also from Dan
    Carpenter.

 4) Signed integer overflow in ipv4 address mask creation helper
    inet_make_mask(), from Vincent BENAYOUN.

 5) VXLAN, be2net, mlx4_en, and qlcnic need ->ndo_gso_check() methods to
    properly describe the device's capabilities, from Joe Stringer.

 6) Fix memory leaks and checksum miscalculations in openvswitch, from
    Pravin B SHelar and Jesse Gross.

 7) FIB rules passes back ambiguous error code for unreachable routes,
    making behavior confusing for userspace.  Fix from Panu Matilainen.

 8) ieee802154fake_probe() doesn't release resources properly on error,
    from Alexey Khoroshilov.

 9) Fix skb_over_panic in add_grhead(), from Daniel Borkmann.

10) Fix access of stale slave pointers in bonding code, from Nikolay
    Aleksandrov.

11) Fix stack info leak in PPP pptp code, from Mathias Krause.

12) Cure locking bug in IPX stack, from Jiri Bohac.

13) Revert SKB fclone memory freeing optimization that is racey and can
    allow accesses to freed up memory, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (71 commits)
  tcp: Restore RFC5961-compliant behavior for SYN packets
  net: Revert "net: avoid one atomic operation in skb_clone()"
  virtio-net: validate features during probe
  cxgb4 : Fix DCB priority groups being returned in wrong order
  ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg
  openvswitch: Don't validate IPv6 label masks.
  pptp: fix stack info leak in pptp_getname()
  brcmfmac: don't include linux/unaligned/access_ok.h
  cxgb4i : Don't block unload/cxgb4 unload when remote closes TCP connection
  ipv6: delete protocol and unregister rtnetlink when cleanup
  net/mlx4_en: Add VXLAN ndo calls to the PF net device ops too
  bonding: fix curr_active_slave/carrier with loadbalance arp monitoring
  mac80211: minstrel_ht: fix a crash in rate sorting
  vxlan: Inline vxlan_gso_check().
  can: m_can: update to support CAN FD features
  can: m_can: fix incorrect error messages
  can: m_can: add missing delay after setting CCCR_INIT bit
  can: m_can: fix not set can_dlc for remote frame
  can: m_can: fix possible sleep in napi poll
  can: m_can: add missing message RAM initialization
  ...
2014-11-21 17:20:36 -08:00
David S. Miller
53b15ef3c2 Merge tag 'master-2014-11-20' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-11-21

Please pull this batch of updates intended for the 3.19 stream...

For the mac80211 bits, Johannes says:

"It has been a while since my last pull request, so we accumulated
another relatively large set of changes:
 * TDLS off-channel support set from Arik/Liad, with some support
   patches I did
 * custom regulatory fixes from Arik
 * minstrel VHT fix (and a small optimisation) from Felix
 * add back radiotap vendor namespace support (myself)
 * random MAC address scanning for cfg80211/mac80211/hwsim (myself)
 * CSA improvements (Luca)
 * WoWLAN Net Detect (wake on network found) support (Luca)
 * and lots of other smaller changes from many people"

For the Bluetooth bits, Johan says:

"Here's another set of patches for 3.19. Most of it is again fixes and
cleanups to ieee802154 related code from Alexander Aring. We've also got
better handling of hardware error events along with a proper API for HCI
drivers to notify the HCI core of such situations. There's also a minor
fix for mgmt events as well as a sparse warning fix. The code for
sending HCI commands synchronously also gets a fix where we might loose
the completion event in the case of very fast HW (particularly easily
reproducible with an emulated HCI device)."

And...

"Here's another bluetooth-next pull request for 3.19. We've got:

 - Various fixes, cleanups and improvements to ieee802154/mac802154
 - Support for a Broadcom BCM20702A1 variant
 - Lots of lockdep fixes
 - Fixed handling of LE CoC errors that should trigger SMP"

For the Atheros bits, Kalle says:

"One ath6kl patch and rest for ath10k, but nothing really major which
stands out. Most notable:

o fix resume (Bartosz)

o firmware restart is now faster and more reliable (Michal)

o it's now possible to test hardware restart functionality without
  crashing the firmware using hw-restart parameter with
  simulate_fw_crash debugfs file (Michal)"

On top of that...both ath9k and mwifiex get their usual level of
updates.  Of note is the ath9k spectral scan work from Oleksij Rempel.

I also pulled from the wireless tree in order to avoid some merge issues.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 16:39:45 -05:00
Richard Cochran
37dd9255b2 vlan: Pass ethtool get_ts_info queries to real device.
Commit a6111d3c "vlan: Pass SIOC[SG]HWTSTAMP ioctls to real device"
intended to enable hardware time stamping on VLAN interfaces, but
passing SIOCSHWTSTAMP is only half of the story. This patch adds
the second half, by letting user space find out the time stamping
capabilities of the device backing a VLAN interface.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:35:53 -05:00
Calvin Owens
0c228e833c tcp: Restore RFC5961-compliant behavior for SYN packets
Commit c3ae62af8e ("tcp: should drop incoming frames without ACK
flag set") was created to mitigate a security vulnerability in which a
local attacker is able to inject data into locally-opened sockets by
using TCP protocol statistics in procfs to quickly find the correct
sequence number.

This broke the RFC5961 requirement to send a challenge ACK in response
to spurious RST packets, which was subsequently fixed by commit
7b514a886b ("tcp: accept RST without ACK flag").

Unfortunately, the RFC5961 requirement that spurious SYN packets be
handled in a similar manner remains broken.

RFC5961 section 4 states that:

   ... the handling of the SYN in the synchronized state SHOULD be
   performed as follows:

   1) If the SYN bit is set, irrespective of the sequence number, TCP
      MUST send an ACK (also referred to as challenge ACK) to the remote
      peer:

      <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>

      After sending the acknowledgment, TCP MUST drop the unacceptable
      segment and stop processing further.

   By sending an ACK, the remote peer is challenged to confirm the loss
   of the previous connection and the request to start a new connection.
   A legitimate peer, after restart, would not have a TCB in the
   synchronized state.  Thus, when the ACK arrives, the peer should send
   a RST segment back with the sequence number derived from the ACK
   field that caused the RST.

   This RST will confirm that the remote peer has indeed closed the
   previous connection.  Upon receipt of a valid RST, the local TCP
   endpoint MUST terminate its connection.  The local TCP endpoint
   should then rely on SYN retransmission from the remote end to
   re-establish the connection.

This patch lets SYN packets through the discard added in c3ae62af8e,
so that spurious SYN packets are properly dealt with as per the RFC.

The challenge ACK is sent unconditionally and is rate-limited, so the
original vulnerability is not reintroduced by this patch.

Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:33:50 -05:00
Eric Dumazet
e7820e39b7 net: Revert "net: avoid one atomic operation in skb_clone()"
Not sure what I was thinking, but doing anything after
releasing a refcount is suicidal or/and embarrassing.

By the time we set skb->fclone to SKB_FCLONE_FREE, another cpu
could have released last reference and freed whole skb.

We potentially corrupt memory or trap if CONFIG_DEBUG_PAGEALLOC is set.

Reported-by: Chris Mason <clm@fb.com>
Fixes: ce1a4ea3f1 ("net: avoid one atomic operation in skb_clone()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:26:32 -05:00
Richard Alpe
1593123a6a tipc: add name table dump to new netlink api
Add TIPC_NL_NAME_TABLE_GET command to the new tipc netlink API.

This command supports dumping the name table of all nodes.

Netlink logical layout of name table response message:
-> name table
    -> publication
        -> type
        -> lower
        -> upper
        -> scope
        -> node
        -> ref
        -> key

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:01:32 -05:00
Richard Alpe
27c2141672 tipc: add net set to new netlink api
Add TIPC_NL_NET_SET command to the new tipc netlink API.

This command can set the network id and network (tipc) address.

Netlink logical layout of network set message:
-> net
     [ -> id ]
     [ -> address ]

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:01:31 -05:00
Richard Alpe
fd3cf2ad51 tipc: add net dump to new netlink api
Add TIPC_NL_NET_GET command to the new tipc netlink API.

This command dumps the network id of the node.

Netlink logical layout of returned network data:
-> net
    -> id

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:01:31 -05:00
Richard Alpe
3e4b6ab58d tipc: add node get/dump to new netlink api
Add TIPC_NL_NODE_GET to the new tipc netlink API.

This command can dump the address and node status of all nodes in the
tipc cluster.

Netlink logical layout of returned node/address data:
-> node
    -> address
    -> up flag

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:01:31 -05:00
Richard Alpe
1e55417d8f tipc: add media set to new netlink api
Add TIPC_NL_MEDIA_SET command to the new tipc netlink API.

This command can set one or more link properties for a particular
media.

Netlink logical layout of bearer set message:
-> media
    -> name
    -> link properties
        [ -> tolerance ]
        [ -> priority ]
        [ -> window ]

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:01:31 -05:00
Richard Alpe
46f15c6794 tipc: add media get/dump to new netlink api
Add TIPC_NL_MEDIA_GET command to the new tipc netlink API.

This command supports dumping all information about all defined
media as well as getting all information about a specific media.

The information about a media includes name and link properties.

Netlink logical layout of media get response message:
-> media
    -> name
    -> link properties
        -> tolerance
        -> priority
        -> window

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:01:31 -05:00
Richard Alpe
ae36342b50 tipc: add link stat reset to new netlink api
Add TIPC_NL_LINK_RESET_STATS command to the new netlink API.

This command resets the link statistics for a particular link.

Netlink logical layout of link reset message:
-> link
    -> name

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:01:31 -05:00
Richard Alpe
f96ce7a20d tipc: add link set to new netlink api
Add TIPC_NL_LINK_SET to the new tipc netlink API.

This command can set one or more link properties for a particular
link.

Netlink logical layout of link set message:
-> link
    -> name
    -> properties
        [ -> tolerance ]
        [ -> priority ]
        [ -> window ]

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:01:30 -05:00
Richard Alpe
7be57fc691 tipc: add link get/dump to new netlink api
Add TIPC_NL_LINK_GET command to the new tipc netlink API.

This command supports dumping all information about all links
(including the broadcast link) or getting all information about a
specific link (not the broadcast link).

The information about a link includes name, transmission info,
properties and link statistics.

As the tipc broadcast link is special we unfortunately have to treat
it specially. It is a deliberate decision not to abstract the
broadcast link on this (API) level.

Netlink logical layout of link response message:
    -> port
        -> name
        -> MTU
        -> RX
        -> TX
        -> up flag
        -> active flag
        -> properties
           -> priority
           -> tolerance
           -> window
        -> statistics
            -> rx_info
            -> rx_fragments
            -> rx_fragmented
            -> rx_bundles
            -> rx_bundled
            -> tx_info
            -> tx_fragments
            -> tx_fragmented
            -> tx_bundles
            -> tx_bundled
            -> msg_prof_tot
            -> msg_len_cnt
            -> msg_len_tot
            -> msg_len_p0
            -> msg_len_p1
            -> msg_len_p2
            -> msg_len_p3
            -> msg_len_p4
            -> msg_len_p5
            -> msg_len_p6
            -> rx_states
            -> rx_probes
            -> rx_nacks
            -> rx_deferred
            -> tx_states
            -> tx_probes
            -> tx_nacks
            -> tx_acks
            -> retransmitted
            -> duplicates
            -> link_congs
            -> max_queue
            -> avg_queue

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:01:30 -05:00
Richard Alpe
1a1a143daf tipc: add publication dump to new netlink api
Add TIPC_NL_PUBL_GET command to the new tipc netlink API.

This command supports dumping of all publications for a specific
socket.

Netlink logical layout of request message:
    -> socket
        -> reference

Netlink logical layout of response message:
    -> publication
        -> type
        -> lower
        -> upper

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:01:30 -05:00
Richard Alpe
34b78a127c tipc: add sock dump to new netlink api
Add TIPC_NL_SOCK_GET command to the new tipc netlink API.

This command supports dumping of all available sockets with their
associated connection or publication(s). It could be extended to reply
with a single socket if the NLM_F_DUMP isn't set.

The information about a socket includes reference, address, connection
information / publication information.

Netlink logical layout of response message:
-> socket
    -> reference
    -> address
    [
    -> connection
        -> node
        -> socket
        [
        -> connected flag
        -> type
        -> instance
        ]
    ]
    [
    -> publication flag
    ]

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:01:30 -05:00
Richard Alpe
315c00bc9f tipc: add bearer set to new netlink api
Add TIPC_NL_BEARER_SET command to the new tipc netlink API.

This command can set one or more link properties for a particular
bearer.

Netlink logical layout of bearer set message:
-> bearer
    -> name
    -> link properties
        [ -> tolerance ]
        [ -> priority ]
        [ -> window ]

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:01:30 -05:00
Richard Alpe
35b9dd7607 tipc: add bearer get/dump to new netlink api
Add TIPC_NL_BEARER_GET command to the new tipc netlink API.

This command supports dumping all data about all bearers or getting
all information about a specific bearer.

The information about a bearer includes name, link priorities and
domain.

Netlink logical layout of bearer get message:
-> bearer
    -> name

Netlink logical layout of returned bearer information:
-> bearer
    -> name
    -> link properties
        -> priority
        -> tolerance
        -> window
    -> domain

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:01:29 -05:00
Richard Alpe
0655f6a863 tipc: add bearer disable/enable to new netlink api
A new netlink API for tipc that can disable or enable a tipc bearer.

The new API is separated from the old API because of a bug in the
user space client (tipc-config). The problem is that older versions
of tipc-config has a very low receive limit and adding commands to
the legacy genl_opts struct causes the ctrl_getfamily() response
message to grow, subsequently breaking the tool.

The new API utilizes netlink policies for input validation. Where the
top-level netlink attributes are tipc-logical entities, like bearer.
The top level entities then contain nested attributes. In this case
a name, nested link properties and a domain.

Netlink commands implemented in this patch:
TIPC_NL_BEARER_ENABLE
TIPC_NL_BEARER_DISABLE

Netlink logical layout of bearer enable message:
-> bearer
    -> name
    [ -> domain ]
    [
    -> properties
        -> priority
    ]

Netlink logical layout of bearer disable message:
-> bearer
    -> name

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:01:29 -05:00
Daniel Borkmann
f869c91286 net: sctp: keep owned chunk in destructor_arg instead of skb->cb
It's just silly to hold the skb destructor argument around inside
skb->cb[] as we currently do in SCTP.

Nowadays, we're sort of cheating on data accounting in the sense
that due to commit 4c3a5bdae2 ("sctp: Don't charge for data in
sndbuf again when transmitting packet"), we orphan the skb already
in the SCTP output path, i.e. giving back charged data memory, and
use a different destructor only to make sure the sk doesn't vanish
on skb destruction time. Thus, cb[] is still valid here as we
operate within the SCTP layer. (It's generally actually a big
candidate for future rework, imho.)

However, storing the destructor in the cb[] can easily cause issues
should an non sctp_packet_set_owner_w()'ed skb ever escape the SCTP
layer, since cb[] may get overwritten by lower layers and thus can
corrupt the chunk pointer. There are no such issues at present,
but lets keep the chunk in destructor_arg, as this is the actual
purpose for it.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 14:46:12 -05:00
Willem de Bruijn
9c7077622d packet: make packet_snd fail on len smaller than l2 header
When sending packets out with PF_PACKET, SOCK_RAW, ensure that the
packet is at least as long as the device's expected link layer header.
This check already exists in tpacket_snd, but not in packet_snd.
Also rate limit the warning in tpacket_snd.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 14:43:07 -05:00
Jiri Pirko
c7e2b9689e sched: introduce vlan action
This tc action allows to work with vlan tagged skbs. Two supported
sub-actions are header pop and header push.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 14:20:18 -05:00
Jiri Pirko
93515d53b1 net: move vlan pop/push functions into common code
So it can be used from out of openvswitch code.
Did couple of cosmetic changes on the way, namely variable naming and
adding support for 8021AD proto.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 14:20:18 -05:00
Jiri Pirko
e21951212f net: move make_writable helper into common code
note that skb_make_writable already exists in net/netfilter/core.c
but does something slightly different.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 14:20:17 -05:00
Jiri Pirko
5968250c86 vlan: introduce *vlan_hwaccel_push_inside helpers
Use them to push skb->vlan_tci into the payload and avoid code
duplication.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 14:20:17 -05:00
Jiri Pirko
62749e2cb3 vlan: rename __vlan_put_tag to vlan_insert_tag_set_proto
Name fits better. Plus there's going to be introduced
__vlan_insert_tag later on.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 14:20:17 -05:00
Jiri Pirko
b960a0ac69 vlan: make __vlan_hwaccel_put_tag return void
Always returns the same skb it gets, so change to void.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 14:20:16 -05:00
Jiri Pirko
1abcd82c20 openvswitch: actions: use skb_postpull_rcsum when possible
Replace duplicated code by calling skb_postpull_rcsum

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 14:20:16 -05:00
Alexander Couzens
fe1591224a l2tp_eth: allow to set a specific mac address
Signed-off-by: Alexander Couzens <lynxis@fe80.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 14:16:38 -05:00
David S. Miller
7e09dccd07 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains two bugfixes for your net tree, they are:

1) Validate netlink group from nfnetlink to avoid an out of bound array
   access. This should only happen with superuser priviledges though.
   Discovered by Andrey Ryabinin using trinity.

2) Don't push ethernet header before calling the netfilter output hook
   for multicast traffic, this breaks ebtables since it expects to see
   skb->data pointing to the network header, patch from Linus Luessing.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 00:12:39 -05:00
David S. Miller
c857781900 Merge tag 'master-2014-11-20' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-11-20

Please full this little batch of fixes intended for the 3.18 stream!

For the mac80211 patch, Johannes says:

"Here's another last minute fix, for minstrel HT crashing
depending on the value of some uninitialised stack."

On top of that...

Ben Greear fixes an ath9k regression in which a BSSID mask is
miscalculated.

Dmitry Torokhov corrects an error handling routing in brcmfmac which
was checking an unsigned variable for a negative value.

Johannes Berg avoids a build problem in brcmfmac for arches where
linux/unaligned/access_ok.h and asm/unaligned.h conflict.

Mathy Vanhoef addresses another brcmfmac issue so as to eliminate a
use-after-free of the URB transfer buffer if a timeout occurs.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 00:07:51 -05:00
Jiri Bohac
01462405f0 ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg
This fixes an old regression introduced by commit
b0d0d915 (ipx: remove the BKL).

When a recvmsg syscall blocks waiting for new data, no data can be sent on the
same socket with sendmsg because ipx_recvmsg() sleeps with the socket locked.

This breaks mars-nwe (NetWare emulator):
- the ncpserv process reads the request using recvmsg
- ncpserv forks and spawns nwconn
- ncpserv calls a (blocking) recvmsg and waits for new requests
- nwconn deadlocks in sendmsg on the same socket

Commit b0d0d915 has simply replaced BKL locking with
lock_sock/release_sock. Unlike now, BKL got unlocked while
sleeping, so a blocking recvmsg did not block a concurrent
sendmsg.

Only keep the socket locked while actually working with the socket data and
release it prior to calling skb_recv_datagram().

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-20 22:57:03 -05:00
Joe Stringer
d3052bb5d3 openvswitch: Don't validate IPv6 label masks.
When userspace doesn't provide a mask, OVS datapath generates a fully
unwildcarded mask for the flow by copying the flow and setting all bits
in all fields. For IPv6 label, this creates a mask that matches on the
upper 12 bits, causing the following error:

openvswitch: netlink: Invalid IPv6 flow label value (value=ffffffff, max=fffff)

This patch ignores the label validation check for masks, avoiding this
error.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-20 22:56:13 -05:00
John W. Linville
9a638ddfb0 It has been a while since my last pull request, so we accumulated
another relatively large set of changes:
  * TDLS off-channel support set from Arik/Liad, with some support
    patches I did
  * custom regulatory fixes from Arik
  * minstrel VHT fix (and a small optimisation) from Felix
  * add back radiotap vendor namespace support (myself)
  * random MAC address scanning for cfg80211/mac80211/hwsim (myself)
  * CSA improvements (Luca)
  * WoWLAN Net Detect (wake on network found) support (Luca)
  * and lots of other smaller changes from many people
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUbghkAAoJEDBSmw7B7bqroosP/RABYXUMua+k3Ccq7T4eU4jV
 AEO2p2gt5nHBzEl1NCJtdUTJkZ6ftT7ehvAkV2zboB0FBoAoBTbZ8YDtcBBiWaY6
 wJ5TYuOl6LFo7csAxWxpCzUxW3M+iq26itpyW9Zt9WWxP4QLSNPFyXEXV3SEh45n
 HcDW9A0SE+mgdaTQ2LEMBJ5XWxG/Ic7i9Xn6Py3o4x7NsTB4EqFNOD0WXcPCq7M0
 H8xlsIYIBYoGNMsV/2Nu7CEgcSXfDLqWcs9uPHQMCvWPjx/vIoEyOgTwJlE9bQHh
 2tloc1LBP6XKQ6g2bJ/pBaQVnZGugcOJhD6KUq3ckNm9qIP1ZtRmJslH4V6pUSCS
 eGl3TcAKSSE4BWIa7AaETWXeoH4X68dO7PM7pnflQRCQLzCJRbDWCdqjBst/AxBT
 6hvAFAvExEcWBkNVSTJ2egRk/C9cDFKRaCWQ1h4wX9yvh+8efe1D0DDWLW9a9qv1
 LsoGJE72BZdXn2CaQEME+CjTd3fWmn6u729d/c863cq2kspCSOof0QD0X9uWhBUx
 BvqtgbQjGZzAvHFcjBd7yRd5hz0aDfLyBL59bq2IBzaU1QmyekNPqzSMSD+5ZlCp
 uxEeE5AY2+pcNZV1KRtkvgAByfUgAVd0FHZcVb8SIM6QZ3IhqiOuzxuXtxv6hrYP
 V/76B+ath4Sv1IPF56ex
 =q4e6
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-john-2014-11-20' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg <johannes@sipsolutions.net> says:

"It has been a while since my last pull request, so we accumulated
another relatively large set of changes:
 * TDLS off-channel support set from Arik/Liad, with some support
   patches I did
 * custom regulatory fixes from Arik
 * minstrel VHT fix (and a small optimisation) from Felix
 * add back radiotap vendor namespace support (myself)
 * random MAC address scanning for cfg80211/mac80211/hwsim (myself)
 * CSA improvements (Luca)
 * WoWLAN Net Detect (wake on network found) support (Luca)
 * and lots of other smaller changes from many people"

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-11-20 16:09:30 -05:00
Marcelo Leitner
beacd3e8ef netfilter: nfnetlink_log: Make use of pr_fmt where applicable
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-20 14:09:01 +01:00
Markus Elfring
982f405136 netfilter: Deletion of unnecessary checks before two function calls
The functions free_percpu() and module_put() test whether their argument
is NULL and then return immediately. Thus the test around the call is
not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-20 13:08:43 +01:00
Steffen Klassert
fbe68ee875 vti6: Add a lookup method for tunnels with wildcard endpoints.
Currently we can't lookup tunnels with wildcard endpoints.
This patch adds a method to lookup these tunnels in the
receive path.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-11-20 10:03:07 +01:00
Duan Jiong
ffb1388a36 ipv6: delete protocol and unregister rtnetlink when cleanup
pim6_protocol was added when initiation, but it not deleted.
Similarly, unregister RTNL_FAMILY_IP6MR rtnetlink.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-19 16:56:17 -05:00
Al Viro
08adb7dabd fold verify_iovec() into copy_msghdr_from_user()
... and do the same on the compat side of things.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 16:23:49 -05:00
Al Viro
0844932009 {compat_,}verify_iovec(): switch to generic copying of iovecs
use {compat_,}rw_copy_check_uvector().  As the result, we are
guaranteed that all iovecs seen in ->msg_iov by ->sendmsg()
and ->recvmsg() will pass access_ok().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 16:23:16 -05:00
Al Viro
666547ff59 separate kernel- and userland-side msghdr
Kernel-side struct msghdr is (currently) using the same layout as
userland one, but it's not a one-to-one copy - even without considering
32bit compat issues, we have msg_iov, msg_name and msg_control copied
to kernel[1].  It's fairly localized, so we get away with a few functions
where that knowledge is needed (and we could shrink that set even
more).  Pretty much everything deals with the kernel-side variant and
the few places that want userland one just use a bunch of force-casts
to paper over the differences.

The thing is, kernel-side definition of struct msghdr is *not* exposed
in include/uapi - libc doesn't see it, etc.  So we can add struct user_msghdr,
with proper annotations and let the few places that ever deal with those
beasts use it for userland pointers.  Saner typechecking aside, that will
allow to change the layout of kernel-side msghdr - e.g. replace
msg_iov/msg_iovlen there with struct iov_iter, getting rid of the need
to modify the iovec as we copy data to/from it, etc.

We could introduce kernel_msghdr instead, but that would create much more
noise - the absolute majority of the instances would need to have the
type switched to kernel_msghdr and definition of struct msghdr in
include/linux/socket.h is not going to be seen by userland anyway.

This commit just introduces user_msghdr and switches the few places that
are dealing with userland-side msghdr to it.

[1] actually, it's even trickier than that - we copy msg_control for
sendmsg, but keep the userland address on recvmsg.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 16:22:59 -05:00
John W. Linville
6158fb37d1 Here's another last minute fix, for minstrel HT crashing
depending on the value of some uninitialised stack.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUa70EAAoJEDBSmw7B7bqr40gQAMgNUKILuYTyHZaKR7teV8aG
 uVFzrMdmNjCZIsxAuH1X8VNOJZEAB+Ro+lTybNjC/tTPyA+Yu3lDCQjSYiE8UOFk
 mDrNFG0tjZHGn3axceLIhujfPqXEks/wQ98tDEJU2eMLxS0nClwyTXjhc1IxzUBO
 /L2HlhGWADRZ6FIV7yhRO33KN1l3Q8oxW79ybJq/5JCcclfGBt3OpYvi7AW6M4iK
 qfgTfS6GbpcRRc4+HVA/cgYOFaRmMI71bvqMgvh8MgmqfQeRw+WSOaUiLx795CsR
 OKrIQgdpVp2JHwHkFXIgBy+EBBbXtCozvxKX3u+YKPabj4GC/6dkT370nuvv9hkz
 IzTxt+yNBTqJawum7vM6WypgT4wCZOR0JbY0vDW5jYLJnDfGJkYFbny7WtRtCVf7
 xWzIixkyWLNFs6swbRdIKoU5GNw2fNe7az4fRoUqNtZGSPHiK7oD5sEkr81hWUV5
 5XsifngqekfdVvIGTckpy2SAikxgkrtHOQFIGJz+PN1CjvDT++66Opr3AHnwWeSP
 IAjmoVOCiyIBzXlcsZJmAfPa4Hvk+NAPRKCx6N25q/+ESpOZRARWFRV3VuPeGzQW
 +SiC4V9VYWHvW9VPJ37WkAFYv5tPaLLyURdU7p6MSwuc0RCPgtcvXty2QFbjQLdf
 k0apC9DZydQPF2kre6Pv
 =L2Jb
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-john-2014-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg <johannes@sipsolutions.net> says:

"Here's another last minute fix, for minstrel HT crashing
depending on the value of some uninitialised stack."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-11-19 15:44:40 -05:00
John W. Linville
ab1f5a532c Merge commit '4e6ce4dc7ce71d0886908d55129d5d6482a27ff9' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2014-11-19 15:38:48 -05:00
Markus Elfring
fcd4d35ecc netlink: Deletion of an unnecessary check before the function call "__module_get"
The __module_get() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-19 15:27:40 -05:00
Markus Elfring
ef87c5d6a1 net: pktgen: Deletion of an unnecessary check before the function call "proc_remove"
The proc_remove() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-19 15:20:15 -05:00
Eric Dumazet
355a901e6c tcp: make connect() mem charging friendly
While working on sk_forward_alloc problems reported by Denys
Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
sk_forward_alloc is negative while connect is in progress.

We can fix this by calling regular sk_stream_alloc_skb() both for the
SYN packet (in tcp_connect()) and the syn_data packet in
tcp_send_syn_data()

Then, tcp_send_syn_data() can avoid copying syn_data as we simply
can manipulate syn_data->cb[] to remove SYN flag (and increment seq)

Instead of open coding memcpy_fromiovecend(), simply use this helper.

This leaves in socket write queue clean fast clone skbs.

This was tested against our fastopen packetdrill tests.

Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-19 14:57:01 -05:00
Felix Fietkau
75769c80e3 mac80211: minstrel_ht: add a small optimization to minstrel_aggr_check
Check the queue mapping earlier, skb->queue_mapping is more likely than
skb->data to still be in d-cache.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 19:31:07 +01:00
Johannes Berg
f815e2b3c0 mac80211: notify drivers on sta rate table changes
This allows drivers with a firmware or chip-based rate lookup table to
use the most recent default rate selection without having to get it from
per-packet data or explicit ieee80211_get_tx_rate calls

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 19:15:50 +01:00
Tomasz Bursztyka
8f894be2df nl80211: Broadcast CMD_NEW_INTERFACE and CMD_DEL_INTERFACE
Let the other listeners being notified when a new or del interface
command has been issued, thus reducing later necessary request to be in
sync with current context.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 19:02:42 +01:00
Jukka Rissanen
18e5ca65e5 nl80211: Replace interface socket owner attribute with more generic one
Replace NL80211_ATTR_IFACE_SOCKET_OWNER attribute with more generic
NL80211_ATTR_SOCKET_OWNER that can be used with other commands
that interface creation.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:54:40 +01:00
Rafał Miłecki
d687cbb703 cfg80211: protect fools returning NULL in add_virtual_intf
Callback add_virtual_intf is supposed to return ERR_PTR and trying to
return NULL results in some "Unable to handle kernel paging request",
etc. As it may be complicated to debug & trace, let's catch it (WARN).

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:50:34 +01:00
Arik Nemtsov
c7ab508190 cfg80211: explicitly initialize some fields in custom reg path
Explicitly initialize the DFS state and beacon found state when handling
channels in the custom regulatory path.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Acked-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:49:33 +01:00
Arik Nemtsov
2e18b38fc8 cfg80211: update missing fields in custom regulatory path
Some channels fields were not being updated in the custom regulatory
path. Update them according to the code in handle_channel().

Signed-off-by: Jonathan Doron <jonathanx.doron@intel.com>
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Acked-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:49:33 +01:00
Felix Fietkau
628c010f1f mac80211: skip legacy rate mask handling for VHT rates
The rate mask code currently assumes that a rate is legacy if
IEEE80211_TX_RC_MCS is not set. This might be the cause of bogus VHT
rates being reported with minstrel_ht.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:46:35 +01:00
Eliad Peller
8b1956f041 mac80211: don't allow 40MHz tx rates in case of 20MHz chandef
When 20MHz chandef is used, 40MHz rates shouldn't be
used (by the rate-control algorithm), even if the sta
ht capabilities indicate support for it.

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Singed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:46:30 +01:00
Johannes Berg
a344d6778a mac80211: allow drivers to support NL80211_SCAN_FLAG_RANDOM_ADDR
Allow drivers to support NL80211_SCAN_FLAG_RANDOM_ADDR with software
based scanning and generate a random MAC address for them for every
scan request with the flag.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:46:09 +01:00
Johannes Berg
6ea0a69ca2 mac80211: rcu-ify scan and scheduled scan request pointers
In order to use the scan and scheduled scan request pointers during
RX to check for randomisation, make them accessible using RCU.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:58 +01:00
Johannes Berg
ad2b26abc1 cfg80211: allow drivers to support random MAC addresses for scan
Add the necessary feature flags and a scan flag to support using
random MAC addresses for scan while unassociated.

The configuration for this supports an arbitrary MAC address
value and mask, so that any kind of configuration (e.g. fixed
OUI or full 46-bit random) can be requested. Full 46-bit random
is the default when no other configuration is passed.

Also add a small helper function to use the addr/mask correctly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:52 +01:00
Eliad Peller
ff5db4392c mac80211: remove redundant check
local->scan_req was tested in the previous line, so it
can't be NULL.

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:48 +01:00
Luciano Coelho
8cd4d4563e cfg80211: add wowlan net-detect support
Add a new WoWLAN API to enable net-detect as a wake up trigger.
Net-detect allows the device to scan in the background while the
host is asleep to wake up the host system when a matching network
is found.

Reuse the scheduled scan attributes to specify how the scan is
performed while suspended and the matches that will trigger a
wake event.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:45 +01:00
Luciano Coelho
256da02d18 cfg80211: refactor nl80211_start_sched_scan so it can be reused
For net detect, we will need to reuse most of the scheduled scan
parsing function, but not all, so split out the attributes parsing
part out of the main start sched_scan function.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:42 +01:00
Liad Kaufman
b6da911b3c mac80211: synchronously reserve TID per station
In TDLS (e.g., TDLS off-channel) there is a requirement for
some drivers to supply an unused TID between the AP and the
device to the FW, to allow sending PTI requests and to allow
the FW to aggregate on a specific TID for better throughput.

To ensure that the allocated TID is indeed unused, this patch
introduces an API for blocking the driver from TXing on that
TID.

Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:36 +01:00
Liad Kaufman
4f9610d528 mac80211: add specific-queue flushing support
If the HW supports IEEE80211_HW_QUEUE_CONTROL, allow
flushing only specific queues rather than all of them.

Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:30 +01:00
Arik Nemtsov
8a4d32f30d mac80211: add TDLS channel-switch Rx flow
When receiving a TDLS channel switch request or response, parse the frame
and call a new tdls_recv_channel_switch op in the low level driver with
the parsed data.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:26 +01:00
Arik Nemtsov
a7a6bdd067 mac80211: introduce TDLS channel switch ops
Implement the cfg80211 TDLS channel switch ops and introduce new mac80211
ones for low-level drivers.
Verify low-level driver support for the new ops when using the relevant
wiphy feature bit. Also verify the peer supports channel switching before
passing the command down.

Add a new STA flag to track the off-channel state with the TDLS peer and
make sure to cancel the channel-switch if the peer STA is unexpectedly
removed.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:21 +01:00
Arik Nemtsov
5383758443 mac80211: add parsing of TDLS specific IEs
These are used in TDLS channel switching code.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:16 +01:00
Arik Nemtsov
1057d35ede cfg80211: introduce TDLS channel switch commands
Introduce commands to initiate and cancel TDLS channel-switching. Once
TDLS channel-switching is started, the lower level driver is responsible
for continually initiating channel-switch operations and returning to
the base (AP) channel to listen for beacons from time to time.

Upon cancellation of the channel-switch all communication between the
relevant TDLS peers will continue on the base channel.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:12 +01:00
Arik Nemtsov
c273390569 mac80211: prepare TDLS mgmt code for channel-switch templates
Split the data-generating from the Tx-sending functionality, as we do
not want to send templates to the lower driver. Also add an optional
chandef argument to the data-generating portion. It will be used for
channel-switch templates.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:07 +01:00
Arik Nemtsov
9041c1fa57 mac80211: track AP and peer STA TDLS chan-switch support
The AP or peer can prohibit TDLS channel switch via a bit in the
extended capabilities IE. Parse the IE and track this bit. Set an
appropriate STA flag if both the AP and peer STA support TDLS
channel-switching.

Add the new STA flag and the missing TDLS_INITIATOR to debugfs.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:45:04 +01:00
Arik Nemtsov
78632a17ea cfg/mac80211: define TDLS channel switch feature bit
Define some related TDLS protocol constants and advertise channel switch
support in the extended-capabilities IE when the feature bit is defined.

Actually supporting TDLS channel-switching also requires support for
some new nl80211 commands, to be introduced by future patches.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:44:58 +01:00
Arik Nemtsov
2cedd87960 mac80211: add BSS coex IE to TDLS setup frames
Add the BSS coex IE in case we support HT40 channels, as mandated by
section 8.5.13 in IEEE802.11 2012.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:44:52 +01:00
Arik Nemtsov
f0d29cb979 mac80211: add supported channels IE during TDLS setup
This information element is mandatory in case TDLS channel-switching is to
be supported. The channels given are ones supported and allowed to be
active in the current regulatory setting.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:44:47 +01:00
Johannes Berg
7528ec5776 mac80211: add function to create data frame template including key
For some TDLS channel switch implementations data frames need to be
sent by the firmware based on a template. This template should be
created by mac80211, and thus needs to properly be built from an
802.3 frame into an 802.11 frame. In addition, the device will need
the key information so the select_key handler needs to be run.
However, the driver/device will be responsible for all of the crypto
encapsulation, as the sequence numbers etc. cannot be built by the
host anyway in this case since it's a template to be used multiple
times.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:44:43 +01:00
Johannes Berg
4c9451ed94 mac80211: factor out 802.11 header building code
Factor out the 802.11 header building code from the xmit function
to be able to use it separately in a later commit.

While at it, fix up some documentation.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:44:38 +01:00
Johannes Berg
73c4e195e6 mac80211: move skb info band assignment out
Instead of passing the band as a parameter to ieee80211_xmit()
and ieee80211_tx(), move it outside of the two functions while
making sure info->band is set up before calling them.

This removes the parameter and simplifies the follow commit.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:44:35 +01:00
Liad Kaufman
1277b4a9f5 mac80211: retransmit TDLS teardown packet through AP if not ACKed
Since the TDLS peer station might not receive the teardown
packet (e.g., when in PS), this makes sure the packet is
retransmitted - this time through the AP - if the TDLS peer
didn't ACK the packet.

Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:44:29 +01:00
Liad Kaufman
24d342c514 mac80211: add option for setting skb flags before xmit
Allows setting of an skb's flags - if needed - when calling
ieee80211_subif_start_xmit().

Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-19 18:44:19 +01:00
J. Bruce Fields
56429e9b3b merge nfs bugfixes into nfsd for-3.19 branch
In addition to nfsd bugfixes, there are some fixes in -rc5 for client
bugs that can interfere with my testing.
2014-11-19 12:06:30 -05:00
Trond Myklebust
093a1468b6 SUNRPC: Fix locking around callback channel reply receive
Both xprt_lookup_rqst() and xprt_complete_rqst() require that you
take the transport lock in order to avoid races with xprt_transmit().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-11-19 12:03:20 -05:00
Johan Hedberg
0378b59770 Bluetooth: Convert link keys list to use RCU
This patch converts the hdev->link_keys list to be protected through
RCU, thereby eliminating the need to hold the hdev lock while accessing
the list.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-19 16:19:47 +01:00
Johan Hedberg
cb6f3f7ace Bluetooth: Fix setting conn->pending_sec_level value from link key
When a connection is requested the conn->pending_sec_level value gets
set to whatever level the user requested the connection to be. During
the pairing process there are various sanity checks to try to ensure
that the right length PIN or right IO Capability is used to satisfy the
target security level. However, when we finally get hold of the link key
that is to be used we should still set the actual final security level
from the key type.

This way when we eventually get an Encrypt Change event the correct
value gets copied to conn->sec_level.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-19 16:17:32 +01:00
Johan Hedberg
22a3ceabf1 Bluetooth: Fix setting state back to TASK_RUNNING
In __hci_cmd_sync_ev() and __hci_req_sync() if the hci_req_run() call
fails and we return from the functions we should ensure that the state
doesn't remain in TASK_INTERRUPTIBLE that we just set it to. This patch
fixes missing calls to set_current_state(TASK_RUNNING) in both places.

Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Tested-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-19 16:15:55 +01:00
Felix Fietkau
280ba51d60 mac80211: minstrel_ht: fix a crash in rate sorting
The commit 5935839ad7
"mac80211: improve minstrel_ht rate sorting by throughput & probability"

introduced a crash on rate sorting that occurs when the rate added to
the sorting array is faster than all the previous rates. Due to an
off-by-one error, it reads the rate index from tp_list[-1], which
contains uninitialized stack garbage, and then uses the resulting index
for accessing the group rate stats, leading to a crash if the garbage
value is big enough.

Cc: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-18 22:39:16 +01:00
Rick Jones
e3e3217029 icmp: Remove some spurious dropped packet profile hits from the ICMP path
If icmp_rcv() has successfully processed the incoming ICMP datagram, we
should use consume_skb() rather than kfree_skb() because a hit on the likes
of perf -e skb:kfree_skb is not called-for.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 15:28:28 -05:00
Fabian Frederick
54aeba7f06 dev_ioctl: use sizeof(x) instead of sizeof x
Also remove spaces after cast.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 15:27:32 -05:00
Fabian Frederick
e56f735913 net/core: include linux/types.h instead of asm/types.h
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 15:26:32 -05:00
Fabian Frederick
1d2398dc7c net: fix spelling for synchronized
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 15:26:32 -05:00
Fabian Frederick
a77b634367 dccp: spelling s/reseting/resetting
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 15:26:32 -05:00
Fabian Frederick
02c31d2e56 dccp: replace min/casting by min_t
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 15:26:32 -05:00
Fabian Frederick
54da7996b8 dccp: remove blank lines between function/EXPORT_SYMBOL
See Documentation/CodingStyle chapter 6.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 15:26:32 -05:00
Fabian Frederick
6d80c4732b dccp: kerneldoc warning fixes
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 15:26:31 -05:00
John W. Linville
f48ecb19bc Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg <johan.hedberg@gmail.com> says:

"Here's another bluetooth-next pull request for 3.19. We've got:

 - Various fixes, cleanups and improvements to ieee802154/mac802154
 - Support for a Broadcom BCM20702A1 variant
 - Lots of lockdep fixes
 - Fixed handling of LE CoC errors that should trigger SMP"

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-11-18 15:13:26 -05:00
Johannes Berg
760a52e80f Merge remote-tracking branch 'wireless-next/master' into mac80211-next
This brings in some mwifiex changes that further patches will
need to work on top to not cause merge conflicts.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-18 09:32:44 +01:00
Johan Hedberg
76727c02c1 Bluetooth: Call drain_workqueue() before resetting state
Doing things like hci_conn_hash_flush() while holding the hdev lock is
risky since its synchronous pending work cancellation could cause the
L2CAP layer to try to reacquire the hdev lock. Right now there doesn't
seem to be any obvious places where this would for certain happen but
it's already enough to cause lockdep to start warning against the hdev
and the work struct locks being taken in the "wrong" order:

[  +0.000373] mgmt-tester/1603 is trying to acquire lock:
[  +0.000292]  ((&conn->pending_rx_work)){+.+.+.}, at: [<c104266d>] flush_work+0x0/0x181
[  +0.000270]
but task is already holding lock:
[  +0.000000]  (&hdev->lock){+.+.+.}, at: [<c13b9a80>] hci_dev_do_close+0x166/0x359
[  +0.000000]
which lock already depends on the new lock.

[  +0.000000]
the existing dependency chain (in reverse order) is:
[  +0.000000]
-> #1 (&hdev->lock){+.+.+.}:
[  +0.000000]        [<c105ea8f>] lock_acquire+0xe3/0x156
[  +0.000000]        [<c140c663>] mutex_lock_nested+0x54/0x375
[  +0.000000]        [<c13d644b>] l2cap_recv_frame+0x293/0x1a9c
[  +0.000000]        [<c13d7ca4>] process_pending_rx+0x50/0x5e
[  +0.000000]        [<c1041a3f>] process_one_work+0x21c/0x436
[  +0.000000]        [<c1041e3d>] worker_thread+0x1be/0x251
[  +0.000000]        [<c1045a22>] kthread+0x94/0x99
[  +0.000000]        [<c140f801>] ret_from_kernel_thread+0x21/0x30
[  +0.000000]
-> #0 ((&conn->pending_rx_work)){+.+.+.}:
[  +0.000000]        [<c105e158>] __lock_acquire+0xa07/0xc89
[  +0.000000]        [<c105ea8f>] lock_acquire+0xe3/0x156
[  +0.000000]        [<c1042696>] flush_work+0x29/0x181
[  +0.000000]        [<c1042864>] __cancel_work_timer+0x76/0x8f
[  +0.000000]        [<c104288c>] cancel_work_sync+0xf/0x11
[  +0.000000]        [<c13d4c18>] l2cap_conn_del+0x72/0x183
[  +0.000000]        [<c13d8953>] l2cap_disconn_cfm+0x49/0x55
[  +0.000000]        [<c13be37a>] hci_conn_hash_flush+0x7a/0xc3
[  +0.000000]        [<c13b9af6>] hci_dev_do_close+0x1dc/0x359
[  +0.012038]        [<c13bbe38>] hci_unregister_dev+0x6e/0x1a3
[  +0.000000]        [<c12d33c1>] vhci_release+0x28/0x47
[  +0.000000]        [<c10dd6a9>] __fput+0xd6/0x154
[  +0.000000]        [<c10dd757>] ____fput+0xd/0xf
[  +0.000000]        [<c1044bb2>] task_work_run+0x6b/0x8d
[  +0.000000]        [<c1001bd2>] do_notify_resume+0x3c/0x3f
[  +0.000000]        [<c140fa70>] work_notifysig+0x29/0x31
[  +0.000000]
other info that might help us debug this:

[  +0.000000]  Possible unsafe locking scenario:

[  +0.000000]        CPU0                    CPU1
[  +0.000000]        ----                    ----
[  +0.000000]   lock(&hdev->lock);
[  +0.000000]                                lock((&conn->pending_rx_work));
[  +0.000000]                                lock(&hdev->lock);
[  +0.000000]   lock((&conn->pending_rx_work));
[  +0.000000]
 *** DEADLOCK ***

Fully fixing this would require some quite heavy refactoring to change
how the hdev lock and hci_conn instances are handled together. A simpler
solution for now which this patch takes is to try ensure that the hdev
workqueue is empty before proceeding with the various cleanup calls,
including hci_conn_hash_flush().

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-18 08:32:08 +01:00
Johan Hedberg
38da170306 Bluetooth: Use shorter "rand" name for "randomizer"
The common short form of "randomizer" is "rand" in many places
(including the Bluetooth specification). The shorter version also makes
for easier to read code with less forced line breaks. This patch renames
all occurences of "randomizer" to "rand" in the Bluetooth subsystem
code.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-18 01:53:15 +01:00
Johan Hedberg
c19a495c8b Bluetooth: Fix BR/EDR-only address checks for remote OOB data
For now the mgmt commands dealing with remote OOB data are strictly
BR/EDR-only. This patch fixes missing checks for the passed address type
so that any non-BR/EDR value triggers the appropriate error response.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-18 01:53:15 +01:00
Vasily Averin
2c7b5d5dac netfilter: nf_conntrack_h323: lookup route from proper net namespace
Signed-off-by: Vasily Averin <vvs@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-17 12:47:14 +01:00
Florian Westphal
e59ea3df3f netfilter: xt_connlimit: honor conntrack zone if available
Currently all the conntrack lookups are done using default zone.
In case the skb has a ct attached (e.g. template) we should use this zone
for lookups instead.  This makes connlimit work with connections assigned
to other zones.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-17 12:44:20 +01:00
Linus Lüssing
f0b4eeced5 bridge: fix netfilter/NF_BR_LOCAL_OUT for own, locally generated queries
Ebtables on the OUTPUT chain (NF_BR_LOCAL_OUT) would not work as expected
for both locally generated IGMP and MLD queries. The IP header specific
filter options are off by 14 Bytes for netfilter (actual output on
interfaces is fine).

NF_HOOK() expects the skb->data to point to the IP header, not the
ethernet one (while dev_queue_xmit() does not). Luckily there is an
br_dev_queue_push_xmit() helper function already - let's just use that.

Introduced by eb1d164143
("bridge: Add core IGMP snooping support")

Ebtables example:

$ ebtables -I OUTPUT -p IPv6 -o eth1 --logical-out br0 \
	--log --log-level 6 --log-ip6 --log-prefix="~EBT: " -j DROP

before (broken):

~EBT:  IN= OUT=eth1 MAC source = 02:04:64:a4:39:c2 \
	MAC dest = 33:33:00:00:00:01 proto = 0x86dd IPv6 \
	SRC=64a4:39c2:86dd:6000:0000:0020:0001:fe80 IPv6 \
	DST=0000:0000:0000:0004:64ff:fea4:39c2:ff02, \
	IPv6 priority=0x3, Next Header=2

after (working):

~EBT:  IN= OUT=eth1 MAC source = 02:04:64:a4:39:c2 \
	MAC dest = 33:33:00:00:00:01 proto = 0x86dd IPv6 \
	SRC=fe80:0000:0000:0000:0004:64ff:fea4:39c2 IPv6 \
	DST=ff02:0000:0000:0000:0000:0000:0000:0001, \
	IPv6 priority=0x0, Next Header=0

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-17 12:38:02 +01:00
Pablo Neira Ayuso
97840cb67f netfilter: nfnetlink: fix insufficient validation in nfnetlink_bind
Make sure the netlink group exists, otherwise you can trigger an out
of bound array memory access from the netlink_bind() path. This splat
can only be triggered only by superuser.

[  180.203600] UBSan: Undefined behaviour in ../net/netfilter/nfnetlink.c:467:28
[  180.204249] index 9 is out of range for type 'int [9]'
[  180.204697] CPU: 0 PID: 1771 Comm: trinity-main Not tainted 3.18.0-rc4-mm1+ #122
[  180.205365] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
+04/01/2014
[  180.206498]  0000000000000018 0000000000000000 0000000000000009 ffff88007bdf7da8
[  180.207220]  ffffffff82b0ef5f 0000000000000092 ffffffff845ae2e0 ffff88007bdf7db8
[  180.207887]  ffffffff8199e489 ffff88007bdf7e18 ffffffff8199ea22 0000003900000000
[  180.208639] Call Trace:
[  180.208857] dump_stack (lib/dump_stack.c:52)
[  180.209370] ubsan_epilogue (lib/ubsan.c:174)
[  180.209849] __ubsan_handle_out_of_bounds (lib/ubsan.c:400)
[  180.210512] nfnetlink_bind (net/netfilter/nfnetlink.c:467)
[  180.210986] netlink_bind (net/netlink/af_netlink.c:1483)
[  180.211495] SYSC_bind (net/socket.c:1541)

Moreover, define the missing nf_tables and nf_acct multicast groups too.

Reported-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-17 12:01:13 +01:00
Alexander Aring
ee7b9053bd ieee802154: fix byteorder for short address and panid
This patch changes the byteorder handling for short and panid handling.
We now except to get little endian in nl802154 for these attributes.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-17 09:49:17 +01:00
Alexander Aring
cb41c8dd01 ieee802154: rename and move WPAN_NUM_ defines
This patch moves the 802.15.4 constraints WPAN_NUM_ defines into
"net/ieee802154.h" which should contain all necessary 802.15.4 related
information. Also rename these defines to a common name which is
IEEE802154_MAX_CHANNEL and IEEE802154_MAX_PAGE.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-17 09:49:17 +01:00
Alexander Aring
b821ecd4c8 ieee802154: add del interface command
This patch adds support for deleting a wpan interface via nl802154.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-17 09:49:17 +01:00
Alexander Aring
0e57547eb7 ieee802154: setting extended address while iface add
This patch adds support for setting an extended address while
registration a new interface. If ieee802154_is_valid_extended_addr
getting as parameter and invalid extended address then the perm address
is fallback. This is useful to make some default handling while for
example default registration of a wpan interface while phy registration.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-17 09:49:16 +01:00
Alexander Aring
f3ea5e4423 ieee802154: add new interface command
This patch adds a new nl802154 command for adding a new interface
according to a wpan phy via nl802154.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-17 09:49:16 +01:00
Alexander Aring
133d3f3172 mac802154: remove wpan_dev parameter in if_add
This parameter was grabbed from wireless implementation with the
identically wireless dev struct. We don't need this right now and so we
remove it. Maybe we will add it later again if we found any real reason
to have such parameter.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-17 09:49:16 +01:00
Alexander Aring
944742a36d mac802154: use new nl802154 iftype types
This patch replace the depracted IEEE802154_DEV to the new introduced
NL802154_IFTYPE_NODE types. There is a backwards compatibility to have
the identical types for both enum definitions. Also remove some inlcude
issue with "linux/nl802154.h", because the export nl_policy inside this
header it was always necessary to have an include of "net/rtnetlink.h"
before. The reason for this is more complicated. Nevertheless we removed
this now, because "linux/nl802154.h" is the depracted netlink interface.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-17 09:49:15 +01:00
Alexander Aring
cd11d935f2 mac802154: remove deprecated linux-zigbee info
We don't and we can't name it zigbee anymore. This patch removes
deprecated information for project website.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-17 09:49:15 +01:00
Alexander Aring
628b1e1136 mac802154: remove const for non pointer in rdev-ops
This patches removes the const keyword in variables which are non
pointers. There is no sense to declare call by value parameters as
const.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-17 09:49:15 +01:00
Alexander Aring
6d5fb87745 mac802154: remove const for non pointer in cfg ops
This patches removes the const keyword in variables which are non
pointers. There is no sense to declare call by value parameters as
const.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-17 09:49:15 +01:00
Alexander Aring
29cd54b9bf mac802154: remove const for non pointer in driver-ops
This patches removes the const keyword in variables which are non
pointers. There is no sense to declare call by value parameters as const.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-17 09:49:14 +01:00
Alexander Aring
197304b7a5 mac802154: remove unused prototypes
This patch removes some prototypes which are not used anymore.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-17 09:49:14 +01:00
Daniel Borkmann
feb91a02cc ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs
It has been reported that generating an MLD listener report on
devices with large MTUs (e.g. 9000) and a high number of IPv6
addresses can trigger a skb_over_panic():

skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
dev:port1
 ------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:100!
invalid opcode: 0000 [#1] SMP
Modules linked in: ixgbe(O)
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
[...]
Call Trace:
 <IRQ>
 [<ffffffff80578226>] ? skb_put+0x3a/0x3b
 [<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
 [<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
 [<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
 [<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
 [<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
 [<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
 [<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
 [<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
 [<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
 [<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70

mld_newpack() skb allocations are usually requested with dev->mtu
in size, since commit 72e09ad107 ("ipv6: avoid high order allocations")
we have changed the limit in order to be less likely to fail.

However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
macros, which determine if we may end up doing an skb_put() for
adding another record. To avoid possible fragmentation, we check
the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
assumption as the actual max allocation size can be much smaller.

The IGMP case doesn't have this issue as commit 57e1ab6ead
("igmp: refine skb allocations") stores the allocation size in
the cb[].

Set a reserved_tailroom to make it fit into the MTU and use
skb_availroom() helper instead. This also allows to get rid of
igmp_skb_size().

Reported-by: Wei Liu <lw1a2.jing@gmail.com>
Fixes: 72e09ad107 ("ipv6: avoid high order allocations")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: David L Stevens <david.stevens@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-16 16:55:06 -05:00
Eric Dumazet
960fb622f8 net: provide a per host RSS key generic infrastructure
RSS (Receive Side Scaling) typically uses Toeplitz hash and a 40 or 52 bytes
RSS key.

Some drivers use a constant (and well known key), some drivers use a random
key per port, making bonding setups hard to tune. Well known keys increase
attack surface, considering that number of queues is usually a power of two.

This patch provides infrastructure to help drivers doing the right thing.

netdev_rss_key_fill() should be used by drivers to initialize their RSS key,
even if they provide ethtool -X support to let user redefine the key later.

A new /proc/sys/net/core/netdev_rss_key file can be used to get the host
RSS key even for drivers not providing ethtool -x support, in case some
applications want to precisely setup flows to match some RX queues.

Tested:

myhost:~# cat /proc/sys/net/core/netdev_rss_key
11:63:99:bb:79:fb:a5:a7:07:45:b2:20:bf:02:42:2d:08:1a:dd:19:2b:6b:23:ac:56:28:9d:70:c3:ac:e8:16:4b:b7:c1:10:53:a4:78:41:36:40:74:b6:15:ca:27:44:aa:b3:4d:72

myhost:~# ethtool -x eth0
RX flow hash indirection table for eth0 with 8 RX ring(s):
    0:      0     1     2     3     4     5     6     7
RSS hash key:
11:63:99:bb:79:fb:a5:a7:07:45:b2:20:bf:02:42:2d:08:1a:dd:19:2b:6b:23:ac:56:28:9d:70:c3:ac:e8:16:4b:b7:c1:10:53:a4:78:41

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-16 15:59:11 -05:00
David S. Miller
c6ab766e81 Merge branch 'net_ovs' of git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch
Pravin B Shelar says:

====================
Open vSwitch

Following fixes are accumulated in ovs-repo.
Three of them are related to protocol processing, one is
related to memory leak in case of error and one is to
fix race.
Patch "Validate IPv6 flow key and mask values" has conflicts
with net-next, Let me know if you want me to send the patch
for net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-16 14:59:01 -05:00
Anish Bhatt
52cff74eef dcbnl : Disable software interrupts before taking dcb_lock
Solves possible lockup issues that can be seen from firmware DCB agents calling
into the DCB app api.

DCB firmware event queues can be tied in with NAPI so that dcb events are
generated in softIRQ context. This can results in calls to dcb_*app()
functions which try to take the dcb_lock.

If the the event triggers while we also have the dcb_lock because lldpad or
some other agent happened to be issuing a  get/set command we could see a cpu
lockup.

This code was not originally written with firmware agents in mind, hence
grabbing dcb_lock from softIRQ context was not considered.

Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-16 14:50:52 -05:00
Fabian Frederick
6f2aed6ad7 net: dsa: replace count*size kzalloc by kcalloc
kcalloc manages count*sizeof overflow.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-16 14:41:39 -05:00
Fabian Frederick
5bc4b46a70 net: dsa: replace count*size kmalloc by kmalloc_array
kmalloc_array manages count*sizeof overflow.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-16 14:41:39 -05:00
Fabian Frederick
f35423c137 openvswitch: use PTR_ERR_OR_ZERO
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-16 14:41:39 -05:00
Holger Brunck
0372bf5c09 tipc: allow one link per bearer to neighboring nodes
There is no reason to limit the amount of possible links to a
neighboring node to 2. If we have more then two bearers we can also
establish more links.

Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Reviewed-By: Jon Maloy <jon.maloy@ericsson.com>
cc: Ying Xue <ying.xue@windriver.com>
cc: Erik Hugne <erik.hugne@ericsson.com>
cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-16 14:27:17 -05:00
David S. Miller
f1227c5c1b Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter updates for your net tree,
they are:

1) Fix missing initialization of the range structure (allocated in the
   stack) in nft_masq_{ipv4, ipv6}_eval, from Daniel Borkmann.

2) Make sure the data we receive from userspace contains the req_version
   structure, otherwise return an error incomplete on truncated input.
   From Dan Carpenter.

3) Fix handling og skb->sk which may cause incorrect handling
   of connections from a local process. Via Simon Horman, patch from
   Calvin Owens.

4) Fix wrong netns in nft_compat when setting target and match params
   structure.

5) Relax chain type validation in nft_compat that was recently included,
   this broke the matches that need to be run from the route chain type.
   Now iptables-test.py automated regression tests report success again
   and we avoid the only possible problematic case, which is the use of
   nat targets out of nat chain type.

6) Use match->table to validate the tablename, instead of the match->name.
   Again patch for nft_compat.

7) Restore the synchronous release of objects from the commit and abort
   path in nf_tables. This is causing two major problems: splats when using
   nft_compat, given that matches and targets may sleep and call_rcu is
   invoked from softirq context. Moreover Patrick reported possible event
   notification reordering when rules refer to anonymous sets.

8) Fix race condition in between packets that are being confirmed by
   conntrack and the ctnetlink flush operation. This happens since the
   removal of the central spinlock. Thanks to Jesper D. Brouer to looking
   into this.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-16 14:23:56 -05:00
Panu Matilainen
49dd18ba46 ipv4: Fix incorrect error code when adding an unreachable route
Trying to add an unreachable route incorrectly returns -ESRCH if
if custom FIB rules are present:

[root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4
RTNETLINK answers: Network is unreachable
[root@localhost ~]# ip rule add to 55.66.77.88 table 200
[root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4
RTNETLINK answers: No such process
[root@localhost ~]#

Commit 83886b6b63 ("[NET]: Change "not found"
return value for rule lookup") changed fib_rules_lookup()
to use -ESRCH as a "not found" code internally, but for user space it
should be translated into -ENETUNREACH. Handle the translation centrally in
ipv4-specific fib_lookup(), leaving the DECnet case alone.

On a related note, commit b7a71b51ee
("ipv4: removed redundant conditional") removed a similar translation from
ip_route_input_slow() prematurely AIUI.

Fixes: b7a71b51ee ("ipv4: removed redundant conditional")
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-16 14:11:45 -05:00
Ingo Molnar
e9ac5f0fa8 Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying more changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 10:50:25 +01:00
Linus Torvalds
1afcb6ed0d NFS client bugfixes for Linux 3.18
Highlights include:
 
 - Stable patches to fix NFSv4.x delegation reclaim error paths
 - Fix a bug whereby we were advertising NFSv4.1 but using NFSv4.2 features
 - Fix a use-after-free problem with pNFS block layouts
 - Fix a memory leak in the pNFS files O_DIRECT code
 - Replace an intrusive and Oops-prone performance fix in the NFSv4 atomic
   open code with a safer one-line version and revert the two original patches.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUZol9AAoJEGcL54qWCgDyNQQQALnngvpPR51BoO/iTz9ruXol
 fGZy0SRIlTUKm1ArsQsQ+HGbV5K0hgP3Tg+z2AtEEZ8u/2Fi2Bqdl6+eNY12tKHd
 uUctDdM5TXLrETAn1UULrnd2eX1cvPMBfOlXlAdNHHsGEgC7w7YQ+rzGwnls+HDy
 LYXzY7Y3jYGdTMaRgZc5YRdtd8JBpCxciRvPEQLDIobwP0JnZC1afTLe1XInqB2I
 TZ4NTHT+DEWA+Ou1P2deL7+RuJNEAeWWBvULJy76n4BqKvN/HNedOO5HyBYXrwSd
 3UX3wbx9CWRxN1F0EqNKxjxZ/597JwqBeNoTDRcofLsqumUfAOtlbym1EahcD3Ls
 pykopNfgUhGuhxolStmuHdS6CnyQPERpR5lFZcDp7XtcwSq4FcwD8DRzLJMZW5dg
 N1lkfFlwQN3rqdk/NEHL+IxS41Hlk4HXjMoP6MNbRtqzIN6tW9tvC4MtAWd1aYxO
 YuUW281pbWxXQ731s0kTIrMUdQ9vGSRBMcbnO9rL3o+xkh8y5SPVkx9lhdhJN0UD
 VbQ5Ws/xZ54bD1PfyYb+Yx659lI8MSFOsDuMuLmDtfYnVicHwCA3H63StvQ3ihf/
 q0gu8Iex9YbNNjf7IfYGuWPmPn3gwPBoURPC0bcZvMPdY6DXodU6Oj4BRTQ5VCie
 9N0pt2wp2eRjaSzD7r5A
 =8YN6
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

   - stable patches to fix NFSv4.x delegation reclaim error paths
   - fix a bug whereby we were advertising NFSv4.1 but using NFSv4.2
     features
   - fix a use-after-free problem with pNFS block layouts
   - fix a memory leak in the pNFS files O_DIRECT code
   - replace an intrusive and Oops-prone performance fix in the NFSv4
     atomic open code with a safer one-line version and revert the two
     original patches"

* tag 'nfs-for-3.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor
  NFS: Don't try to reclaim delegation open state if recovery failed
  NFSv4: Ensure that we call FREE_STATEID when NFSv4.x stateids are revoked
  NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return
  NFSv4.1: nfs41_clear_delegation_stateid shouldn't trust NFS_DELEGATED_STATE
  NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired
  NFS: SEEK is an NFS v4.2 feature
  nfs: Fix use of uninitialized variable in nfs_getattr()
  nfs: Remove bogus assignment
  nfs: remove spurious WARN_ON_ONCE in write path
  pnfs/blocklayout: serialize GETDEVICEINFO calls
  nfs: fix pnfs direct write memory leak
  Revert "NFS: nfs4_do_open should add negative results to the dcache."
  Revert "NFS: remove BUG possibility in nfs4_open_and_get_state"
  NFSv4: Ensure nfs_atomic_open set the dentry verifier on ENOENT
2014-11-15 14:15:16 -08:00
Johan Hedberg
eedbd5812c Bluetooth: Fix clearing remote OOB data through mgmt
When passed BDADDR_ANY the Remove Remote OOB Data comand is specified to
clear all entries. This patch adds the necessary check and calls
hci_remote_oob_data_clear() when necessary.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-15 09:00:29 +01:00
Johan Hedberg
49d1174130 Bluetooth: Add debug logs to help track locking issues
This patch adds some extra debug logs to L2CAP related code. These are
mainly to help track locking issues but will probably be useful for
debugging other types of issues as well.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-15 01:53:27 +01:00
Johan Hedberg
d88b5bbf1a Bluetooth: Remove unnecessary hdev locking in smp.c
Now that the SMP related key lists are converted to RCU there is nothing
in smp_cmd_sign_info() or smp_cmd_ident_addr_info() that would require
taking the hdev lock (including the smp_distribute_keys call). This
patch removes this unnecessary locking.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-15 01:53:27 +01:00
Johan Hedberg
adae20cb2d Bluetooth: Convert IRK list to RCU
This patch set converts the hdev->identity_resolving_keys list to use
RCU to eliminate the need to use hci_dev_lock/unlock.

An additional change that must be done is to remove use of
CRYPTO_ALG_ASYNC for the hdev-specific AES crypto context. The reason is
that this context is used for matching RPAs and the loop that does the
matching is under the RCU read lock, i.e. is an atomic section which
cannot sleep.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-15 01:53:27 +01:00
Johan Hedberg
970d0f1b28 Bluetooth: Convert LTK list to RCU
This patch set converts the hdev->long_term_keys list to use RCU to
eliminate the need to use hci_dev_lock/unlock.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-15 01:53:27 +01:00
Johan Hedberg
3e64b7bd82 Bluetooth: Trigger SMP for the appropriate LE CoC errors
The insufficient authentication/encryption errors indicate to the L2CAP
client that it should try to elevate the security level. Since there
really isn't any exception to this rule it makes sense to fully handle
it on the kernel side instead of pushing the responsibility to user
space.

This patch adds special handling of these two error codes and calls
smp_conn_security() with the elevated security level if necessary.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-15 01:46:50 +01:00
Johan Hedberg
35dc6f834c Bluetooth: Add key preference parameter to smp_sufficient_security
So far smp_sufficient_security() has returned false if we're encrypted
with an STK but do have an LTK available. However, for the sake of LE
CoC servers we do want to let the incoming connection through even
though we're only encrypted with the STK.

This patch adds a key preference parameter to smp_sufficient_security()
with two possible values (enum used instead of bool for readability).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-15 01:46:49 +01:00
Johan Hedberg
fa37c1aa30 Bluetooth: Fix sending incorrect LE CoC PDU in BT_CONNECT2 state
For LE CoC L2CAP servers we don't do security level elevation during the
BT_CONNECT2 state (instead LE CoC simply sends an immediate error
response if the security level isn't high enough). Therefore if we get a
security level change while an LE CoC channel is in the BT_CONNECT2
state we should simply do nothing.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-15 01:46:49 +01:00
Fabian Frederick
a809eff11f Bluetooth: hidp: replace kzalloc/copy_from_user by memdup_user
use memdup_user for rd_data import.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-15 01:30:16 +01:00
Jarno Rajahalme
fecaef85f7 openvswitch: Validate IPv6 flow key and mask values.
Reject flow label key and mask values with invalid bits set.
Introduced by commit 3fdbd1ce11 ("openvswitch: add ipv6 'set'
action").

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-14 15:13:26 -08:00
Pravin B Shelar
8ec609d8b5 openvswitch: Convert dp rcu read operation to locked operations
dp read operations depends on ovs_dp_cmd_fill_info(). This API
needs to looup vport to find dp name, but vport lookup can
fail. Therefore to keep vport reference alive we need to
take ovs lock.

Introduced by commit 6093ae9aba ("openvswitch: Minimize
dp and vport critical sections").

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-11-14 15:13:26 -08:00
Daniele Di Proietto
19e7a3df72 openvswitch: Fix NDP flow mask validation
match_validate() enforce that a mask matching on NDP attributes has also an
exact match on ICMPv6 type.
The ICMPv6 type, which is 8-bit wide, is stored in the 'tp.src' field of
'struct sw_flow_key', which is 16-bit wide.
Therefore, an exact match on ICMPv6 type should only check the first 8 bits.

This commit fixes a bug that prevented flows with an exact match on NDP field
from being installed
Introduced by commit 03f0d916aa ("openvswitch: Mega flow implementation").

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-14 15:13:26 -08:00
Jesse Gross
856447d020 openvswitch: Fix checksum calculation when modifying ICMPv6 packets.
The checksum of ICMPv6 packets uses the IP pseudoheader as part of
the calculation, unlike ICMP in IPv4. This was not implemented,
which means that modifying the IP addresses of an ICMPv6 packet
would cause the checksum to no longer be correct as the psuedoheader
did not match.
Introduced by commit 3fdbd1ce11 ("openvswitch: add ipv6 'set' action").

Reported-by: Neal Shrader <icosahedral@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-14 15:13:26 -08:00
Pravin B Shelar
ab64f16ff2 openvswitch: Fix memory leak.
Need to free memory in case of sample action error.

Introduced by commit 651887b0c2 ("openvswitch: Sample
action without side effects").

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-14 15:13:26 -08:00
David S. Miller
8a5809e0dd Merge tag 'master-2014-11-11' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-11-13

Please pull this set of a few more wireless fixes intended for the
3.18 stream...

For the mac80211 bits, Johannes says:

"This has just one fix, for an issue with the CCMP decryption
that can cause a kernel crash. I'm not sure it's remotely
exploitable, but it's an important fix nonetheless."

For the iwlwifi bits, Emmanuel says:

"Two fixes here - we weren't updating mac80211 if a scan
was cut short by RFKILL which confused cfg80211. As a
result, the latter wouldn't allow to run another scan.
Liad fixes a small bug in the firmware dump."

On top of that...

Arend van Spriel corrects a channel width conversion that caused a
WARNING in brcmfmac.

Hauke Mehrtens avoids a NULL pointer dereference in b43.

Larry Finger hits a trio of rtlwifi bugs left over from recent
backporting from the Realtek vendor driver.

Miaoqing Pan fixes a clocking problem in ath9k that could affect
packet timestamps and such.

Stanislaw Gruszka addresses an payload alignment issue that has been
plaguing rt2x00.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-14 17:10:35 -05:00
bill bonaparte
5195c14c8b netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse
After removal of the central spinlock nf_conntrack_lock, in
commit 93bb0ceb75 ("netfilter: conntrack: remove central
spinlock nf_conntrack_lock"), it is possible to race against
get_next_corpse().

The race is against the get_next_corpse() cleanup on
the "unconfirmed" list (a per-cpu list with seperate locking),
which set the DYING bit.

Fix this race, in __nf_conntrack_confirm(), by removing the CT
from unconfirmed list before checking the DYING bit.  In case
race occured, re-add the CT to the dying list.

While at this, fix coding style of the comment that has been
updated.

Fixes: 93bb0ceb75 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock")
Reported-by: bill bonaparte <programme110@gmail.com>
Signed-off-by: bill bonaparte <programme110@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-14 17:43:05 +01:00
Pravin B Shelar
8cd4313aa7 openvswitch: Fix build failure.
Add dependency on INET to fix following build error. I have also
fixed MPLS dependency.

ERROR: "ip_route_output_flow" [net/openvswitch/openvswitch.ko]
undefined!
make[1]: *** [__modpost] Error 1

Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-14 01:24:27 -05:00
David S. Miller
076ce44825 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/chelsio/cxgb4vf/sge.c
	drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c

sge.c was overlapping two changes, one to use the new
__dev_alloc_page() in net-next, and one to use s->fl_pg_order in net.

ixgbe_phy.c was a set of overlapping whitespace changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-14 01:01:12 -05:00
Linus Torvalds
5cf5203704 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) sunhme driver lacks DMA mapping error checks, based upon a report by
    Meelis Roos.

 2) Fix memory leak in mvpp2 driver, from Sudip Mukherjee.

 3) DMA memory allocation sizes are wrong in systemport ethernet driver,
    fix from Florian Fainelli.

 4) Fix use after free in mac80211 defragmentation code, from Johannes
    Berg.

 5) Some networking uapi headers missing from Kbuild file, from Stephen
    Hemminger.

 6) TUN driver gets csum_start offset wrong when VLAN accel is enabled,
    and macvtap has a similar bug, from Herbert Xu.

 7) Adjust several tunneling drivers to set dev->iflink after registry,
    because registry sets that to -1 overwriting whatever we did.  From
    Steffen Klassert.

 8) Geneve forgets to set inner tunneling type, causing GSO segmentation
    to fail on some NICs.  From Jesse Gross.

 9) Fix several locking bugs in stmmac driver, from Fabrice Gasnier and
    Giuseppe CAVALLARO.

10) Fix spurious timeouts with NewReno on low traffic connections, from
    Marcelo Leitner.

11) Fix descriptor updates in enic driver, from Govindarajulu
    Varadarajan.

12) PPP calls bpf_prog_create() with locks held, which isn't kosher.
    Fix from Takashi Iwai.

13) Fix NULL deref in SCTP with malformed INIT packets, from Daniel
    Borkmann.

14) psock_fanout selftest accesses past the end of the mmap ring, fix
    from Shuah Khan.

15) Fix PTP timestamping for VLAN packets, from Richard Cochran.

16) netlink_unbind() calls in netlink pass wrong initial argument, from
    Hiroaki SHIMODA.

17) vxlan socket reuse accidently reuses a socket when the address
    family is different, so we have to explicitly check this, from
    Marcelo Lietner.

18) Fix missing include in nft_reject_bridge.c breaking the build on ppc
    and other architectures, from Guenter Roeck.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits)
  vxlan: Do not reuse sockets for a different address family
  smsc911x: power-up phydev before doing a software reset.
  lib: rhashtable - Remove weird non-ASCII characters from comments
  net/smsc911x: Fix delays in the PHY enable/disable routines
  net/smsc911x: Fix rare soft reset timeout issue due to PHY power-down mode
  netlink: Properly unbind in error conditions.
  net: ptp: fix time stamp matching logic for VLAN packets.
  cxgb4 : dcb open-lldp interop fixes
  selftests/net: psock_fanout seg faults in sock_fanout_read_ring()
  net: bcmgenet: apply MII configuration in bcmgenet_open()
  net: bcmgenet: connect and disconnect from the PHY state machine
  net: qualcomm: Fix dependency
  ixgbe: phy: fix uninitialized status in ixgbe_setup_phy_link_tnx
  net: phy: Correctly handle MII ioctl which changes autonegotiation.
  ipv6: fix IPV6_PKTINFO with v4 mapped
  net: sctp: fix memory leak in auth key management
  net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
  net: ppp: Don't call bpf_prog_create() in ppp_lock
  net/mlx4_en: Advertize encapsulation offloads features only when VXLAN tunnel is set
  cxgb4 : Fix bug in DCB app deletion
  ...
2014-11-13 17:54:08 -08:00
Eric Dumazet
d649a7a81f tcp: limit GSO packets to half cwnd
In DC world, GSO packets initially cooked by tcp_sendmsg() are usually
big, as sk_pacing_rate is high.

When network is congested, cwnd can be smaller than the GSO packets
found in socket write queue. tcp_write_xmit() splits GSO packets
using the available cwnd, and we end up sending a single GSO packet,
consuming all available cwnd.

With GRO aggregation on the receiver, we might handle a single GRO
packet, sending back a single ACK.

1) This single ACK might be lost
   TLP or RTO are forced to attempt a retransmit.
2) This ACK releases a full cwnd, sender sends another big GSO packet,
   in a ping pong mode.

This behavior does not fill the pipes in the best way, because of
scheduling artifacts.

Make sure we always have at least two GSO packets in flight.

This allows us to safely increase GRO efficiency without risking
spurious retransmits.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:21:44 -05:00
Thomas Graf
6eba82248e rhashtable: Drop gfp_flags arg in insert/remove functions
Reallocation is only required for shrinking and expanding and both rely
on a mutex for synchronization and callers of rhashtable_init() are in
non atomic context. Therefore, no reason to continue passing allocation
hints through the API.

Instead, use GFP_KERNEL and add __GFP_NOWARN | __GFP_NORETRY to allow
for silent fall back to vzalloc() without the OOM killer jumping in as
pointed out by Eric Dumazet and Eric W. Biederman.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:18:40 -05:00
Herbert Xu
7b4ce23534 rhashtable: Add parent argument to mutex_is_held
Currently mutex_is_held can only test locks in the that are global
since it takes no arguments.  This prevents rhashtable from being
used in places where locks are lock, e.g., per-namespace locks.

This patch adds a parent field to mutex_is_held and rhashtable_params
so that local locks can be used (and tested).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:13:05 -05:00
Herbert Xu
1f501d6252 netfilter: Move mutex_is_held under PROVE_LOCKING
The rhashtable function mutex_is_held is only used when PROVE_LOCKING
is enabled.  This patch modifies netfilter so that we can rhashtable.h
itself can later make mutex_is_held optional depending on PROVE_LOCKING.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:13:05 -05:00
Herbert Xu
9712756620 netlink: Move mutex_is_held under PROVE_LOCKING
The rhashtable function mutex_is_held is only used when PROVE_LOCKING
is enabled.  This patch modifies netlink so that we can rhashtable.h
itself can later make mutex_is_held optional depending on PROVE_LOCKING.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:13:05 -05:00
Michal Kubeček
fbe168ba91 net: generic dev_disable_lro() stacked device handling
Large receive offloading is known to cause problems if received packets
are passed to other host. Therefore the kernel disables it by calling
dev_disable_lro() whenever a network device is enslaved in a bridge or
forwarding is enabled for it (or globally). For virtual devices we need
to disable LRO on the underlying physical device (which is actually
receiving the packets).

Current dev_disable_lro() code handles this  propagation for a vlan
(including 802.1ad nested vlan), macvlan or a vlan on top of a macvlan.
It doesn't handle other stacked devices and their combinations, in
particular propagation from a bond to its slaves which often causes
problems in virtualization setups.

As we now have generic data structures describing the upper-lower device
relationship, dev_disable_lro() can be generalized to disable LRO also
for all lower devices (if any) once it is disabled for the device
itself.

For bonding and teaming devices, it is necessary to disable LRO not only
on current slaves at the moment when dev_disable_lro() is called but
also on any slave (port) added later.

v2: use lower device links for all devices (including vlan and macvlan)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 14:48:56 -05:00
Thomas Graf
882288c05e FOU: Fix no return statement warning for !CONFIG_NET_FOU_IP_TUNNELS
net/ipv4/fou.c: In function ‘ip_tunnel_encap_del_fou_ops’:
net/ipv4/fou.c:861:1: warning: no return statement in function returning non-void [-Wreturn-type]

Fixes: a8c5f90fb5 ("ip_tunnel: Ops registration for secondary encap (fou, gue)")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 14:32:00 -05:00
Ilya Dryomov
cc9f1f518c libceph: change from BUG to WARN for __remove_osd() asserts
No reason to use BUG_ON for osd request list assertions.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-11-13 22:26:34 +03:00
Ilya Dryomov
ba9d114ec5 libceph: clear r_req_lru_item in __unregister_linger_request()
kick_requests() can put linger requests on the notarget list.  This
means we need to clear the much-overloaded req->r_req_lru_item in
__unregister_linger_request() as well, or we get an assertion failure
in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item).

AFAICT the assumption was that registered linger requests cannot be on
any of req->r_req_lru_item lists, but that's clearly not the case.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-11-13 22:21:14 +03:00
Ilya Dryomov
a390de0208 libceph: unlink from o_linger_requests when clearing r_osd
Requests have to be unlinked from both osd->o_requests (normal
requests) and osd->o_linger_requests (linger requests) lists when
clearing req->r_osd.  Otherwise __unregister_linger_request() gets
confused and we trip over a !list_empty(&osd->o_linger_requests)
assert in __remove_osd().

MON=1 OSD=1:

    # cat remove-osd.sh
    #!/bin/bash
    rbd create --size 1 test
    DEV=$(rbd map test)
    ceph osd out 0
    sleep 3
    rbd map dne/dne # obtain a new osdmap as a side effect
    rbd unmap $DEV & # will block
    sleep 3
    ceph osd in 0

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-11-13 22:21:13 +03:00
Ilya Dryomov
aaef31703a libceph: do not crash on large auth tickets
Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth
tickets will have their buffers vmalloc'ed, which leads to the
following crash in crypto:

[   28.685082] BUG: unable to handle kernel paging request at ffffeb04000032c0
[   28.686032] IP: [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80
[   28.686032] PGD 0
[   28.688088] Oops: 0000 [#1] PREEMPT SMP
[   28.688088] Modules linked in:
[   28.688088] CPU: 0 PID: 878 Comm: kworker/0:2 Not tainted 3.17.0-vm+ #305
[   28.688088] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   28.688088] Workqueue: ceph-msgr con_work
[   28.688088] task: ffff88011a7f9030 ti: ffff8800d903c000 task.ti: ffff8800d903c000
[   28.688088] RIP: 0010:[<ffffffff81392b42>]  [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80
[   28.688088] RSP: 0018:ffff8800d903f688  EFLAGS: 00010286
[   28.688088] RAX: ffffeb04000032c0 RBX: ffff8800d903f718 RCX: ffffeb04000032c0
[   28.688088] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800d903f750
[   28.688088] RBP: ffff8800d903f688 R08: 00000000000007de R09: ffff8800d903f880
[   28.688088] R10: 18df467c72d6257b R11: 0000000000000000 R12: 0000000000000010
[   28.688088] R13: ffff8800d903f750 R14: ffff8800d903f8a0 R15: 0000000000000000
[   28.688088] FS:  00007f50a41c7700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
[   28.688088] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   28.688088] CR2: ffffeb04000032c0 CR3: 00000000da3f3000 CR4: 00000000000006b0
[   28.688088] Stack:
[   28.688088]  ffff8800d903f698 ffffffff81392ca8 ffff8800d903f6e8 ffffffff81395d32
[   28.688088]  ffff8800dac96000 ffff880000000000 ffff8800d903f980 ffff880119b7e020
[   28.688088]  ffff880119b7e010 0000000000000000 0000000000000010 0000000000000010
[   28.688088] Call Trace:
[   28.688088]  [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40
[   28.688088]  [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40
[   28.688088]  [<ffffffff81395d32>] blkcipher_walk_done+0x182/0x220
[   28.688088]  [<ffffffff813990bf>] crypto_cbc_encrypt+0x15f/0x180
[   28.688088]  [<ffffffff81399780>] ? crypto_aes_set_key+0x30/0x30
[   28.688088]  [<ffffffff8156c40c>] ceph_aes_encrypt2+0x29c/0x2e0
[   28.688088]  [<ffffffff8156d2a3>] ceph_encrypt2+0x93/0xb0
[   28.688088]  [<ffffffff8156d7da>] ceph_x_encrypt+0x4a/0x60
[   28.688088]  [<ffffffff8155b39d>] ? ceph_buffer_new+0x5d/0xf0
[   28.688088]  [<ffffffff8156e837>] ceph_x_build_authorizer.isra.6+0x297/0x360
[   28.688088]  [<ffffffff8112089b>] ? kmem_cache_alloc_trace+0x11b/0x1c0
[   28.688088]  [<ffffffff8156b496>] ? ceph_auth_create_authorizer+0x36/0x80
[   28.688088]  [<ffffffff8156ed83>] ceph_x_create_authorizer+0x63/0xd0
[   28.688088]  [<ffffffff8156b4b4>] ceph_auth_create_authorizer+0x54/0x80
[   28.688088]  [<ffffffff8155f7c0>] get_authorizer+0x80/0xd0
[   28.688088]  [<ffffffff81555a8b>] prepare_write_connect+0x18b/0x2b0
[   28.688088]  [<ffffffff81559289>] try_read+0x1e59/0x1f10

This is because we set up crypto scatterlists as if all buffers were
kmalloc'ed.  Fix it.

Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-11-13 22:21:12 +03:00
Jeff Layton
b3ecba0967 sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor
Bruce reported that he was seeing the following BUG pop:

    BUG: sleeping function called from invalid context at mm/slab.c:2846
    in_atomic(): 0, irqs_disabled(): 0, pid: 4539, name: mount.nfs
    2 locks held by mount.nfs/4539:
    #0:  (nfs_clid_init_mutex){+.+.+.}, at: [<ffffffffa01c0a9a>] nfs4_discover_server_trunking+0x4a/0x2f0 [nfsv4]
    #1:  (rcu_read_lock){......}, at: [<ffffffffa00e3185>] gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss]
    Preemption disabled at:[<ffffffff81a4f082>] printk+0x4d/0x4f

    CPU: 3 PID: 4539 Comm: mount.nfs Not tainted 3.18.0-rc1-00013-g5b095e9 #3393
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    ffff880021499390 ffff8800381476a8 ffffffff81a534cf 0000000000000001
    0000000000000000 ffff8800381476c8 ffffffff81097854 00000000000000d0
    0000000000000018 ffff880038147718 ffffffff8118e4f3 0000000020479f00
    Call Trace:
    [<ffffffff81a534cf>] dump_stack+0x4f/0x7c
    [<ffffffff81097854>] __might_sleep+0x114/0x180
    [<ffffffff8118e4f3>] __kmalloc+0x1a3/0x280
    [<ffffffffa00e31d8>] gss_stringify_acceptor+0x58/0xb0 [auth_rpcgss]
    [<ffffffffa00e3185>] ? gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss]
    [<ffffffffa006b438>] rpcauth_stringify_acceptor+0x18/0x30 [sunrpc]
    [<ffffffffa01b0469>] nfs4_proc_setclientid+0x199/0x380 [nfsv4]
    [<ffffffffa01b04d0>] ? nfs4_proc_setclientid+0x200/0x380 [nfsv4]
    [<ffffffffa01bdf1a>] nfs40_discover_server_trunking+0xda/0x150 [nfsv4]
    [<ffffffffa01bde45>] ? nfs40_discover_server_trunking+0x5/0x150 [nfsv4]
    [<ffffffffa01c0acf>] nfs4_discover_server_trunking+0x7f/0x2f0 [nfsv4]
    [<ffffffffa01c8e24>] nfs4_init_client+0x104/0x2f0 [nfsv4]
    [<ffffffffa01539b4>] nfs_get_client+0x314/0x3f0 [nfs]
    [<ffffffffa0153780>] ? nfs_get_client+0xe0/0x3f0 [nfs]
    [<ffffffffa01c83aa>] nfs4_set_client+0x8a/0x110 [nfsv4]
    [<ffffffffa0069708>] ? __rpc_init_priority_wait_queue+0xa8/0xf0 [sunrpc]
    [<ffffffffa01c9b2f>] nfs4_create_server+0x12f/0x390 [nfsv4]
    [<ffffffffa01c1472>] nfs4_remote_mount+0x32/0x60 [nfsv4]
    [<ffffffff81196489>] mount_fs+0x39/0x1b0
    [<ffffffff81166145>] ? __alloc_percpu+0x15/0x20
    [<ffffffff811b276b>] vfs_kern_mount+0x6b/0x150
    [<ffffffffa01c1396>] nfs_do_root_mount+0x86/0xc0 [nfsv4]
    [<ffffffffa01c1784>] nfs4_try_mount+0x44/0xc0 [nfsv4]
    [<ffffffffa01549b7>] ? get_nfs_version+0x27/0x90 [nfs]
    [<ffffffffa0161a2d>] nfs_fs_mount+0x47d/0xd60 [nfs]
    [<ffffffff81a59c5e>] ? mutex_unlock+0xe/0x10
    [<ffffffffa01606a0>] ? nfs_remount+0x430/0x430 [nfs]
    [<ffffffffa01609c0>] ? nfs_clone_super+0x140/0x140 [nfs]
    [<ffffffff81196489>] mount_fs+0x39/0x1b0
    [<ffffffff81166145>] ? __alloc_percpu+0x15/0x20
    [<ffffffff811b276b>] vfs_kern_mount+0x6b/0x150
    [<ffffffff811b5830>] do_mount+0x210/0xbe0
    [<ffffffff811b54ca>] ? copy_mount_options+0x3a/0x160
    [<ffffffff811b651f>] SyS_mount+0x6f/0xb0
    [<ffffffff81a5c852>] system_call_fastpath+0x12/0x17

Sleeping under the rcu_read_lock is bad. This patch fixes it by dropping
the rcu_read_lock before doing the allocation and then reacquiring it
and redoing the dereference before doing the copy. If we find that the
string has somehow grown in the meantime, we'll reallocate and try again.

Cc: <stable@vger.kernel.org> # v3.17+
Reported-by: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-13 13:15:49 -05:00
Pablo Neira Ayuso
8225161545 netfilter: nfnetlink_log: remove unnecessary error messages
In case of OOM, there's nothing userspace can do.

If there's no room to put the payload in __build_packet_message(),
jump to nla_put_failure which already performs the corresponding
error reporting.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-13 13:13:00 +01:00
Florian Westphal
5676864431 netfilter: fix various sparse warnings
net/bridge/br_netfilter.c:870:6: symbol 'br_netfilter_enable' was not declared. Should it be static?
  no; add include
net/ipv4/netfilter/nft_reject_ipv4.c:22:6: symbol 'nft_reject_ipv4_eval' was not declared. Should it be static?
  yes
net/ipv6/netfilter/nf_reject_ipv6.c:16:6: symbol 'nf_send_reset6' was not declared. Should it be static?
  no; add include
net/ipv6/netfilter/nft_reject_ipv6.c:22:6: symbol 'nft_reject_ipv6_eval' was not declared. Should it be static?
  yes
net/netfilter/core.c:33:32: symbol 'nf_ipv6_ops' was not declared. Should it be static?
  no; add include
net/netfilter/xt_DSCP.c:40:57: cast truncates bits from constant value (ffffff03 becomes 3)
net/netfilter/xt_DSCP.c:57:59: cast truncates bits from constant value (ffffff03 becomes 3)
  add __force, 3 is what we want.
net/ipv4/netfilter/nf_log_arp.c:77:6: symbol 'nf_log_arp_packet' was not declared. Should it be static?
  yes
net/ipv4/netfilter/nf_reject_ipv4.c:17:6: symbol 'nf_send_reset' was not declared. Should it be static?
  no; add include

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-13 12:14:42 +01:00
Herbert Xu
12bfa8bdba xfrm: Use __xfrm_policy_link in xfrm_policy_insert
For a long time we couldn't actually use __xfrm_policy_link in
xfrm_policy_insert because the latter wanted to do hashing at
a specific position.

Now that __xfrm_policy_link no longer does hashing it can now
be safely used in xfrm_policy_insert to kill some duplicate code,
finally reuniting general policies with socket policies.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-11-13 11:25:04 +01:00
Herbert Xu
53c2e285f9 xfrm: Do not hash socket policies
Back in 2003 when I added policy expiration, I half-heartedly
did a clean-up and renamed xfrm_sk_policy_link/xfrm_sk_policy_unlink
to __xfrm_policy_link/__xfrm_policy_unlink, because the latter
could be reused for all policies.  I never actually got around
to using __xfrm_policy_link for non-socket policies.

Later on hashing was added to all xfrm policies, including socket
policies.  In fact, we don't need hashing on socket policies at
all since they're always looked up via a linked list.

This patch restores xfrm_sk_policy_link/xfrm_sk_policy_unlink
as wrappers around __xfrm_policy_link/__xfrm_policy_unlink so
that it's obvious we're dealing with socket policies.

This patch also removes hashing from __xfrm_policy_link as for
now it's only used by socket policies which do not need to be
hashed.  Ironically this will in fact allow us to use this helper
for non-socket policies which I shall do later.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-11-13 11:25:03 +01:00
Johan Hedberg
2773b02422 Bluetooth: Fix correct nesting for 6lowpan server channel
Server channels in BT_LISTEN state should use L2CAP_NESTING_PARENT. This
patch fixes the nesting value for the 6lowpan channel.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-13 09:11:37 +01:00
Johan Hedberg
ff714119a6 Bluetooth: Fix L2CAP nesting level initialization location
There's no reason why all users of L2CAP would need to worry about
initializing chan->nesting to L2CAP_NESTING_NORMAL (which is important
since 0 is the same as NESTING_SMP). This patch moves the initialization
to the common place that's used to create all new channels, i.e. the
l2cap_chan_create() function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-13 09:11:37 +01:00
Johan Hedberg
3b2ab39e26 Bluetooth: Fix L2CAP socket lock nesting level
The teardown callback for L2CAP channels is problematic in that it is
explicitly called for all types of channels from l2cap_chan_del(),
meaning it's not possible to hard-code a nesting level when taking the
socket lock. The simplest way to have a correct nesting level for the
socket locking is to use the same value as for the chan. This also means
that the other places trying to lock parent sockets need to be update to
use the chan value (since L2CAP_NESTING_PARENT is defined as 2 whereas
SINGLE_DEPTH_NESTING has the value 1).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-13 07:49:09 +01:00
Johan Hedberg
abe84903a8 Bluetooth: Use proper nesting annotation for l2cap_chan lock
By default lockdep considers all L2CAP channels equal. This would mean
that we get warnings if a channel is locked when another one's lock is
tried to be acquired in the same thread. This kind of inter-channel
locking dependencies exist in the form of parent-child channels as well
as any channel wishing to elevate the security by requesting procedures
on the SMP channel.

To eliminate the chance for these lockdep warnings we introduce a
nesting level for each channel and use that when acquiring the channel
lock. For now there exists the earlier mentioned three identified
categories: SMP, "normal" channels and parent channels (i.e. those in
BT_LISTEN state). The nesting level is defined as atomic_t since we need
access to it before the lock is actually acquired.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-13 07:49:09 +01:00
Alexander Aring
61f2dcba9a mac802154: add interframe spacing time handling
This patch adds a new interframe spacing time handling into mac802154
layer. Interframe spacing time is a time period between each transmit.
This patch adds a high resolution timer into mac802154 and starts on
xmit complete with corresponding interframe spacing expire time if
ifs_handling is true. We make it variable because it depends if
interframe spacing time is handled by transceiver or mac802154. At the
timer complete function we wake the netdev queue again. This avoids
new frame transmit in range of interframe spacing time.

For synced driver we add no handling of interframe spacing time. This
is currently a lack of support in all synced xmit drivers. I suppose
it's working because the latency of workqueue which is needed to call
spi_sync.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-13 04:51:58 +01:00
Joe Perches
a768851f94 irda: Fix build failures after IRDA_DEBUG->pr_debug
Fix the build failures that result from the use of pr_debug
without the referenced char * arrays being defined.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12 22:01:14 -05:00
Hiroaki SHIMODA
6251edd932 netlink: Properly unbind in error conditions.
Even if netlink_kernel_cfg::unbind is implemented the unbind() method is
not called, because cfg->unbind is omitted in __netlink_kernel_create().
And fix wrong argument of test_bit() and off by one problem.

At this point, no unbind() method is implemented, so there is no real
issue.

Fixes: 4f52090052 ("netlink: have netlink per-protocol bind function return an error code.")
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Cc: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12 15:12:06 -05:00
Tom Herbert
a8c5f90fb5 ip_tunnel: Ops registration for secondary encap (fou, gue)
Instead of calling fou and gue functions directly from ip_tunnel
use ops for these that were previously registered. This patch adds the
logic to add and remove encapsulation operations for ip_tunnel,
and modified fou (and gue) to register with ip_tunnels.

This patch also addresses a circular dependency between ip_tunnel
and fou that was causing link errors when CONFIG_NET_IP_TUNNEL=y
and CONFIG_NET_FOU=m. References to fou an gue have been removed from
ip_tunnel.c

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12 15:01:35 -05:00
Joe Perches
4243cdc2c1 udp: Neaten function pointer calls and add braces
Standardize function pointer uses.

Convert calling style from:
	(*foo)(args...);
to:
	foo(args...);

Other miscellanea:

o Add braces around loops with single ifs on multiple lines
o Realign arguments around these functions
o Invert logic in if to return immediately.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12 14:51:59 -05:00
Joe Perches
955a9d202f irda: Convert IRDA_DEBUG to pr_debug
Use the normal kernel debugging mechanism which also
enables dynamic_debug at the same time.

Other miscellanea:

o Remove sysctl for irda_debug
o Remove function tracing like uses (use ftrace instead)
o Coalesce formats
o Realign arguments
o Remove unnecessary OOM messages

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-12 13:56:41 -05:00
Pablo Neira Ayuso
b326dd37b9 netfilter: nf_tables: restore synchronous object release from commit/abort
The existing xtables matches and targets, when used from nft_compat, may
sleep from the destroy path, ie. when removing rules. Since the objects
are released via call_rcu from softirq context, this results in lockdep
splats and possible lockups that may be hard to reproduce.

Patrick also indicated that delayed object release via call_rcu can
cause us problems in the ordering of event notifications when anonymous
sets are in place.

So, this patch restores the synchronous object release from the commit
and abort paths. This includes a call to synchronize_rcu() to make sure
that no packets are walking on the objects that are going to be
released. This is slowier though, but it's simple and it resolves the
aforementioned problems.

This is a partial revert of c7c32e7 ("netfilter: nf_tables: defer all
object release via rcu") that was introduced in 3.16 to speed up
interaction with userspace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 12:06:24 +01:00
Pablo Neira Ayuso
afefb6f928 netfilter: nft_compat: use the match->table to validate dependencies
Instead of the match->name, which is of course not relevant.

Fixes: f3f5dde ("netfilter: nft_compat: validate chain type in match/target")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 12:06:24 +01:00
Pablo Neira Ayuso
c918687f5e netfilter: nft_compat: relax chain type validation
Check for nat chain dependency only, which is the one that can
actually crash the kernel. Don't care if mangle, filter and security
specific match and targets are used out of their scope, they are
harmless.

This restores iptables-compat with mangle specific match/target when
used out of the OUTPUT chain, that are actually emulated through filter
chains, which broke when performing strict validation.

Fixes: f3f5dde ("netfilter: nft_compat: validate chain type in match/target")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 12:06:24 +01:00
Pablo Neira Ayuso
2daf1b4d18 netfilter: nft_compat: use current net namespace
Instead of init_net when using xtables over nftables compat.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 12:06:24 +01:00
Pablo Neira Ayuso
baf4750d92 netfilter: nft_redir: fix sparse warnings
>> net/netfilter/nft_redir.c:39:26: sparse: incorrect type in assignment (different base types)
   net/netfilter/nft_redir.c:39:26:    expected unsigned int [unsigned] [usertype] nla_be32
   net/netfilter/nft_redir.c:39:26:    got restricted __be32
>> net/netfilter/nft_redir.c:40:40: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:40:40: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:40:40: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:40:40: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:40:40: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:40:40: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:46:34: sparse: incorrect type in assignment (different base types)
   net/netfilter/nft_redir.c:46:34:    expected unsigned int [unsigned] [usertype] nla_be32
   net/netfilter/nft_redir.c:46:34:    got restricted __be32
>> net/netfilter/nft_redir.c:47:48: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:47:48: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:47:48: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:47:48: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:47:48: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:47:48: sparse: cast to restricted __be32

Fixes: e9105f1 ("netfilter: nf_tables: add new expression nft_redir")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 12:00:04 +01:00
Pablo Neira Ayuso
f6c6339d5e netfilter: fix unmet dependencies in NETFILTER_XT_TARGET_REDIRECT
warning: (NETFILTER_XT_TARGET_REDIRECT) selects NF_NAT_REDIRECT_IPV4 which has unmet direct dependencies (NET && INET && NETFILTER && NF_NAT_IPV4)

warning: (NETFILTER_XT_TARGET_REDIRECT) selects NF_NAT_REDIRECT_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_NAT_IPV6)

Fixes: 8b13edd ("netfilter: refactor NAT redirect IPv4 to use it from nf_tables")
Fixes: 9de920e ("netfilter: refactor NAT redirect IPv6 code to use it from nf_tables")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 11:54:12 +01:00
Johan Hedberg
a930430b04 Bluetooth: Remove unnecessary hci_dev_lock/unlock in smp.c
The mgmt_user_passkey_request and related functions do not do anything
else except read access to hdev->id. This member never changes after the
hdev creation so there is no need to acquire a lock to read it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 10:05:25 +01:00
Johan Hedberg
f03567040c Bluetooth: Fix l2cap_sock_teardown_cb lockdep warning
Any code calling bt_accept_dequeue() to get a new child socket from a
server socket should use lock_sock_nested to avoid lockdep warnings due
to the parent and child sockets being locked at the same time. The
l2cap_sock_accept() function is already doing this correctly but a
second place calling bt_accept_dequeue() is the code path from
l2cap_sock_teardown_cb() that calls l2cap_sock_cleanup_listen().

This patch fixes the proper nested locking annotation and thereby avoids
the following style of lockdep warning.

[  +0.000224] [ INFO: possible recursive locking detected ]
[  +0.000222] 3.17.0+ #1153 Not tainted
[  +0.000130] ---------------------------------------------
[  +0.000227] l2cap-tester/562 is trying to acquire lock:
[  +0.000210]  (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+...}, at: [<c1393f47>] bt_accept_dequeue+0x68/0x11b
[  +0.000467]
but task is already holding lock:
[  +0.000186]  (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+...}, at: [<c13b949a>] lock_sock+0xa/0xc
[  +0.000421]
other info that might help us debug this:
[  +0.000199]  Possible unsafe locking scenario:

[  +0.000117]        CPU0
[  +0.000000]        ----
[  +0.000000]   lock(sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP);
[  +0.000000]   lock(sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP);
[  +0.000000]
 *** DEADLOCK ***

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 10:05:25 +01:00
Alexander Aring
c8937a1d11 ieee820154: add lbt setting support
This patch adds support for setting listen before transmit mode via
nl802154 framework.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:43 +01:00
Alexander Aring
17a3a46bfb ieee820154: add max frame retries setting support
This patch add support for setting mac frame retries setting via
nl802154 framework.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:42 +01:00
Alexander Aring
a01ba7652c ieee820154: add max csma backoffs setting support
This patch add support for max csma backoffs setting via nl802154
framework.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:41 +01:00
Alexander Aring
656a999e87 ieee820154: add backoff exponent setting support
This patch adds support for setting backoff exponents via nl802154
framework.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:40 +01:00
Alexander Aring
9830c62a0b ieee820154: add short_addr setting support
This patch adds support for setting short address via nl802154 framework.
Also added a comment because a 0xffff seems to be valid address that we
don't have a short address. This is a valid setting but we need
more checks in upper layers to don't allow this address as source address.
Also the current netlink interface doesn't allow to set the short_addr
to 0xffff. Same for the 0xfffe short address which describes a not
allocated short address.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:40 +01:00
Alexander Aring
702bf37128 ieee820154: add pan_id setting support
This patch adds support for setting pan_id via nl802154 framework.
Adding a comment because setting 0xffff as pan_id seems to be valid
setting. The pan_id 0xffff as source pan is invalid. I am not sure now
about this setting but for the current netlink interface this is an
invalid setting, so we do the same now. Maybe we need to change that
when we have coordinator support and association support.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:39 +01:00
Alexander Aring
ab0bd56172 ieee820154: add channel set support
This patch adds page and channel setting support to nl802154 framework.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:38 +01:00
Alexander Aring
be4fd8e5d9 mac802154: add ifname change notifier
This patch adds a netdev notifier for interface renaming. We have a name
attribute inside of subif data struct. This is needed to have always the
actual netdev name in sdata name attribute.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:37 +01:00
Alexander Aring
912f67aec7 mac802154: change module description
This patch changes the module description like wireless which is IEEE
802.11 "subsystem" and not "implementation".

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:37 +01:00
Alexander Aring
6322d50d87 mac802154: add wpan_phy priv id
This patch adds an unique id for an wpan_phy. This behaviour is mostly
grabbed from wireless stack. This is needed for upcomming patches which
identify the wpan netdev while NETDEV_CHANGENAME in netdev notify function.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:36 +01:00
Alexander Aring
2789e6297f mac820154: move mutex locks out of loop
Instead of always re-lock the iflist_mtx at multiple interfaces we lock
the complete for each loop at start and at the end.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:36 +01:00
Alexander Aring
d14e1c71cf mac820154: rename sdata next to tmp
This patch is just a cleanup to name the temporary variable for
protected list for each loop as tmp.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:36 +01:00
Alexander Aring
592dfbfc72 mac820154: move interface unregistration into iface
This patch move the iface unregistration into iface.c file to have
a behaviour which is similar like mac80211. Also iface handling should
be inside iface.c file only.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-12 05:10:35 +01:00
Calvin Owens
50656d9df6 ipvs: Keep skb->sk when allocating headroom on tunnel xmit
ip_vs_prepare_tunneled_skb() ignores ->sk when allocating a new
skb, either unconditionally setting ->sk to NULL or allowing
the uninitialized ->sk from a newly allocated skb to leak through
to the caller.

This patch properly copies ->sk and increments its reference count.

Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-11-12 11:03:04 +09:00
Joe Perches
6c91023dc3 irda: Remove IRDA_<TYPE> logging macros
And use the more common mechanisms directly.

Other miscellanea:

o Coalesce formats
o Add missing newlines
o Realign arguments
o Remove unnecessary OOM message logging as
  there's a generic stack dump already on OOM.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 18:11:00 -05:00
John W. Linville
6164c20228 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-11-11 16:30:48 -05:00
Eric Dumazet
5337b5b75c ipv6: fix IPV6_PKTINFO with v4 mapped
Use IS_ENABLED(CONFIG_IPV6), to enable this code if IPv6 is
a module.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: c8e6ad0829 ("ipv6: honor IPV6_PKTINFO with v4 mapped addresses on sendmsg")
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 15:32:45 -05:00
WANG Cong
d7480fd3b1 neigh: remove dynamic neigh table registration support
Currently there are only three neigh tables in the whole kernel:
arp table, ndisc table and decnet neigh table. What's more,
we don't support registering multiple tables per family.
Therefore we can just make these tables statically built-in.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 15:23:54 -05:00
Daniel Borkmann
4184b2a79a net: sctp: fix memory leak in auth key management
A very minimal and simple user space application allocating an SCTP
socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing
the socket again will leak the memory containing the authentication
key from user space:

unreferenced object 0xffff8800837047c0 (size 16):
  comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s)
  hex dump (first 16 bytes):
    01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff816d7e8e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811c88d8>] __kmalloc+0xe8/0x270
    [<ffffffffa0870c23>] sctp_auth_create_key+0x23/0x50 [sctp]
    [<ffffffffa08718b1>] sctp_auth_set_key+0xa1/0x140 [sctp]
    [<ffffffffa086b383>] sctp_setsockopt+0xd03/0x1180 [sctp]
    [<ffffffff815bfd94>] sock_common_setsockopt+0x14/0x20
    [<ffffffff815beb61>] SyS_setsockopt+0x71/0xd0
    [<ffffffff816e58a9>] system_call_fastpath+0x12/0x17
    [<ffffffffffffffff>] 0xffffffffffffffff

This is bad because of two things, we can bring down a machine from
user space when auth_enable=1, but also we would leave security sensitive
keying material in memory without clearing it after use. The issue is
that sctp_auth_create_key() already sets the refcount to 1, but after
allocation sctp_auth_set_key() does an additional refcount on it, and
thus leaving it around when we free the socket.

Fixes: 65b07e5d0d ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 15:19:11 -05:00
Daniel Borkmann
e40607cbe2 net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
An SCTP server doing ASCONF will panic on malformed INIT ping-of-death
in the form of:

  ------------ INIT[PARAM: SET_PRIMARY_IP] ------------>

While the INIT chunk parameter verification dissects through many things
in order to detect malformed input, it misses to actually check parameters
inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary
IP address' parameter in ASCONF, which has as a subparameter an address
parameter.

So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS
or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0
and thus sctp_get_af_specific() returns NULL, too, which we then happily
dereference unconditionally through af->from_addr_param().

The trace for the log:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
IP: [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
PGD 0
Oops: 0000 [#1] SMP
[...]
Pid: 0, comm: swapper Not tainted 2.6.32-504.el6.x86_64 #1 Bochs Bochs
RIP: 0010:[<ffffffffa01e9c62>]  [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
[...]
Call Trace:
 <IRQ>
 [<ffffffffa01f2add>] ? sctp_bind_addr_copy+0x5d/0xe0 [sctp]
 [<ffffffffa01e1fcb>] sctp_sf_do_5_1B_init+0x21b/0x340 [sctp]
 [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
 [<ffffffffa01e5c09>] ? sctp_endpoint_lookup_assoc+0xc9/0xf0 [sctp]
 [<ffffffffa01e61f6>] sctp_endpoint_bh_rcv+0x116/0x230 [sctp]
 [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
 [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
 [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
 [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[...]

A minimal way to address this is to check for NULL as we do on all
other such occasions where we know sctp_get_af_specific() could
possibly return with NULL.

Fixes: d6de309759 ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 15:19:10 -05:00
Joe Perches
ba7a46f16d net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited
Use the more common dynamic_debug capable net_dbg_ratelimited
and remove the LIMIT_NETDEBUG macro.

All messages are still ratelimited.

Some KERN_<LEVEL> uses are changed to KERN_DEBUG.

This may have some negative impact on messages that were
emitted at KERN_INFO that are not not enabled at all unless
DEBUG is defined or dynamic_debug is enabled.  Even so,
these messages are now _not_ emitted by default.

This also eliminates the use of the net_msg_warn sysctl
"/proc/sys/net/core/warnings".  For backward compatibility,
the sysctl is not removed, but it has no function.  The extern
declaration of net_msg_warn is removed from sock.h and made
static in net/core/sysctl_net_core.c

Miscellanea:

o Update the sysctl documentation
o Remove the embedded uses of pr_fmt
o Coalesce format fragments
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 14:10:31 -05:00
David S. Miller
4083c8056d Merge branch 'net_next_ovs' of git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch
Pravin B Shelar says:

====================
Open vSwitch

Following batch of patches brings feature parity between upstream
ovs and out of tree ovs module.

Two features are added, first adds support to export egress
tunnel information for a packet. This is used to improve
visibility in network traffic. Second feature allows userspace
vswitchd process to probe ovs module features. Other patches
are optimization and code cleanup.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 13:32:25 -05:00
Joe Perches
a2ae6007a4 dsa: Use netdev_<level> instead of printk
Neaten and standardize the logging output.

Other miscellanea:

o Use pr_notice_once instead of a guard flag.
o Convert existing pr_<level> uses too.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 13:30:57 -05:00
Eric Dumazet
2c8c56e15d net: introduce SO_INCOMING_CPU
Alternative to RPS/RFS is to use hardware support for multiple
queues.

Then split a set of million of sockets into worker threads, each
one using epoll() to manage events on its own socket pool.

Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
know after accept() or connect() on which queue/cpu a socket is managed.

We normally use one cpu per RX queue (IRQ smp_affinity being properly
set), so remembering on socket structure which cpu delivered last packet
is enough to solve the problem.

After accept(), connect(), or even file descriptor passing around
processes, applications can use :

 int cpu;
 socklen_t len = sizeof(cpu);

 getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);

And use this information to put the socket into the right silo
for optimal performance, as all networking stack should run
on the appropriate cpu, without need to send IPI (RPS/RFS).

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 13:00:06 -05:00
Eric Dumazet
3d97379a67 tcp: move sk_mark_napi_id() at the right place
sk_mark_napi_id() is used to record for a flow napi id of incoming
packets for busypoll sake.
We should do this only on established flows, not on listeners.

This was 'working' by virtue of the socket cloning, but doing
this on SYN packets in unecessary cache line dirtying.

Even if we move sk_napi_id in the same cache line than sk_lock,
we are working to make SYN processing lockless, so it is desirable
to set sk_napi_id only for established flows.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 13:00:05 -05:00
Johannes Berg
0395442ad2 mac80211: refactor duplicate detection
Put duplicate detection into its own RX handler, and separate
out the conditions a bit to make the code more readable.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-11 17:05:43 +01:00
Johan Hedberg
4e79022677 Bluetooth: 6lowpan: Remove unnecessary RCU callback
When kfree() is all that's needed to free an object protected by RCU
there's a kfree_rcu() convenience function that can be used. This patch
updates the 6lowpan code to use this, thereby eliminating the need for
the separate peer_free() function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-11 14:26:02 +01:00
Dan Carpenter
2196937e12 netfilter: ipset: small potential read beyond the end of buffer
We could be reading 8 bytes into a 4 byte buffer here.  It seems
harmless but adding a check is the right thing to do and it silences a
static checker warning.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-11 13:46:37 +01:00
Johan Hedberg
60cb49d2c9 Bluetooth: Fix mgmt connected notification
This patch fixes a regression that was introduced by commit
cb77c3ec07. In addition to BT_CONFIG,
BT_CONNECTED is also a state in which we may get a remote name and need
to indicate over mgmt the connection status. This scenario is
particularly likely to happen for incoming connections that do not need
authentication since there the hci_conn state will reach BT_CONNECTED
before the remote name is received.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-11 10:34:52 +01:00
Johan Hedberg
252670c421 Bluetooth: Fix sparse warning in amp.c
This fixes the following sparse warning:

net/bluetooth/amp.c:152:53: warning: Variable length array is used.

The warning itself is probably harmless since this kind of usage of
shash_desc is present also in other places in the kernel (there's even a
convenience macro SHASH_DESC_ON_STACK available for defining such stack
variables). However, dynamically allocated versions are also used in
several places of the kernel (e.g. kernel/kexec.c and lib/digsig.c)
which have the benefit of not exhibiting the sparse warning.

Since there are no more sparse warnings in the Bluetooth subsystem after
fixing this one it is now easier to spot whenever new ones might get
introduced by future patches.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-11 00:07:29 +01:00
Jesse Gross
cfdf1e1ba5 udptunnel: Add SKB_GSO_UDP_TUNNEL during gro_complete.
When doing GRO processing for UDP tunnels, we never add
SKB_GSO_UDP_TUNNEL to gso_type - only the type of the inner protocol
is added (such as SKB_GSO_TCPV4). The result is that if the packet is
later resegmented we will do GSO but not treat it as a tunnel. This
results in UDP fragmentation of the outer header instead of (i.e.) TCP
segmentation of the inner header as was originally on the wire.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10 15:09:45 -05:00
David S. Miller
b92172661e Merge tag 'master-2014-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-11-07

Please pull this batch of updates intended for the 3.19 stream!

For the mac80211 bits, Johannes says:

"This relatively large batch of changes is comprised of the following:
 * large mac80211-hwsim changes from Ben, Jukka and a bit myself
 * OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
   University in Prague and Volkswagen Group Research
 * minstrel VHT work from Karl
 * more CSA work from Luca
 * WMM admission control support in mac80211 (myself)
 * various smaller fixes, spelling corrections, and minor API additions"

For the Bluetooth bits, Johan says:

"Here's the first bluetooth-next pull request for 3.19. The vast majority
of patches are for ieee802154 from Alexander Aring with various fixes
and cleanups. There are also several LE/SMP fixes as well as improved
support for handling LE devices that have lost their pairing information
(the patches from Alfonso). Jukka provides a couple of stability fixes
for 6lowpan and Szymon conformance fixes for RFCOMM. For the HCI drivers
we have one new USB ID for an Acer controller as well as a reset
handling fix for H5."

For the Atheros bits, Kalle says:

"Major changes are:

o ethtool support (Ben)

o print dev string prefix with debug hex buffers dump (Michal)

o debugfs file to read calibration data from the firmware verification
  purposes (me)

o fix fw_stats debugfs file, now results are more reliable (Michal)

o firmware crash counters via debugfs (Ben&me)

o various tracing points to debug firmware (Rajkumar)

o make it possible to provide firmware calibration data via a file (me)

And we have quite a lot of smaller fixes and clean up."

For the iwlwifi bits, Emmanuel says:

"The big new thing here is netdetect which allows the
firmware to wake up the platform when a specific network
is detected. Along with that I have fixes for d3 operation.
The usual amount of rate scaling stuff - we now support STBC.
The other commit that stands out is Johannes's work on
devcoredump. He basically starts to use the standard
infrastructure he built."

Along with that are the usual sort of updates and such for ath9k,
brcmfmac, wil6210, and a handful of other bits here and there...

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10 14:34:59 -05:00
Herbert Xu
c008ba5bdc ipv4: Avoid reading user iov twice after raw_probe_proto_opt
Ever since raw_probe_proto_opt was added it had the problem of
causing the user iov to be read twice, once during the probe for
the protocol header and once again in ip_append_data.

This is a potential security problem since it means that whatever
we're probing may be invalid.  This patch plugs the hole by
firstly advancing the iov so we don't read the same spot again,
and secondly saving what we read the first time around for use
by ip_append_data.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10 14:25:35 -05:00
Herbert Xu
32b5913a93 ipv4: Use standard iovec primitive in raw_probe_proto_opt
The function raw_probe_proto_opt tries to extract the first two
bytes from the user input in order to seed the IPsec lookup for
ICMP packets.  In doing so it's processing iovec by hand and
overcomplicating things.

This patch replaces the manual iovec processing with a call to
memcpy_fromiovecend.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10 14:25:35 -05:00
John W. Linville
6168823518 This has just one fix, for an issue with the CCMP decryption
that can cause a kernel crash. I'm not sure it's remotely
 exploitable, but it's an important fix nonetheless.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUYJNkAAoJEDBSmw7B7bqrgooP/ivTw/pJJ5i5wLsfAUUNV1tI
 FDe3+w/kBsD0d0lNsnKoyq58iORyYIxHtmLhvFV+yuBMvpfADTNm9CkLDDjg939m
 mU6/kjO6wbHv/SoLSL4rss/AggGaEsu+3ggGpmhXl2JApSUIVjw0z9XixNcU4Vnz
 CEjIw9jsnKA5UOeH2cv+fwQLXImXKSsofAf2NgZ3zJ6Ci0zLOZtr/ZmO2o4IU9Do
 HWp3CHsaehoWlaqtPx6r8JaodE7XPEPtim66aWoUaeSYueL9iYKJU7BS2jkTWJnQ
 Kv8JuVQIQPlMsTuvTR8oHvpHZ29m0Z6mc8XsKeNu6UCNzoOBGS2Ak2YZwcAeqSLr
 oQRDDBc4aKsY5YWcaChPx6N8gOEXRr4cQgfyY79o5vaPPJwixxLMPUUOU1rKHC3D
 E61YfQ/tfoFWzEitksX14P3Gynas6abySku5mhM5y2L/1XFdDHugZjzRjcBv/1ry
 J9x44k2EWbduvz8GBGGethmKMghKYYTqfB5rWh1cKxafgBEFt61ZcE4LuqaNnP17
 sg+IlXFxZjJLnhhg0YL2oZt+++/CyCLl9G+mx6wZE4HLqsl26ElZ6ql2kekJ5baV
 HKsPyBa2r48icJfrogA9WpjumI8b19ztG5OrsF0AQmK77IVSj80JFVnsTBal9/Y7
 Ao2xMau9OghJq8AEJpOi
 =H7HX
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-john-2014-11-10' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg <johannes@sipsolutions.net> says:

"This has just one fix, for an issue with the CCMP decryption
that can cause a kernel crash. I'm not sure it's remotely
exploitable, but it's an important fix nonetheless."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-11-10 13:08:45 -05:00
Eric Dumazet
3b47d30396 net: gro: add a per device gro flush timer
Tuning coalescing parameters on NIC can be really hard.

Servers can handle both bulk and RPC like traffic, with conflicting
goals : bulk flows want as big GRO packets as possible, RPC want minimal
latencies.

To reach big GRO packets on 10Gbe NIC, one can use :

ethtool -C eth0 rx-usecs 4 rx-frames 44

But this penalizes rpc sessions, with an increase of latencies, up to
50% in some cases, as NICs generally do not force an interrupt when
a packet with TCP Push flag is received.

Some NICs do not have an absolute timer, only a timer rearmed for every
incoming packet.

This patch uses a different strategy : Let GRO stack decides what do do,
based on traffic pattern.

Packets with Push flag wont be delayed.
Packets without Push flag might be held in GRO engine, if we keep
receiving data.

This new mechanism is off by default, and shall be enabled by setting
/sys/class/net/ethX/gro_flush_timeout to a value in nanosecond.

To fully enable this mechanism, drivers should use napi_complete_done()
instead of napi_complete().

Tested:
 Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)

Without this feature, we send back about 305,000 ACK per second.

GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)

Setting a timer of 2000 nsec is enough to increase GRO packet sizes
and reduce number of ACK packets. (811/19.2 = 42)

Receiver performs less calls to upper stacks, less wakes up.
This also reduces cpu usage on the sender, as it receives less ACK
packets.

Note that reducing number of wakes up increases cpu efficiency, but can
decrease QPS, as applications wont have the chance to warmup cpu caches
doing a partial read of RPC requests/answers if they fit in one skb.

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
0.00      0.50

B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
0.00      0.50

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10 12:05:59 -05:00
Daniel Borkmann
6b96686ecf netfilter: nft_masq: fix uninitialized range in nft_masq_{ipv4, ipv6}_eval
When transferring from the original range in nf_nat_masquerade_{ipv4,ipv6}()
we copy over values from stack in from min_proto/max_proto due to uninitialized
range variable in both, nft_masq_{ipv4,ipv6}_eval. As we only initialize
flags at this time from nft_masq struct, just zero out the rest.

Fixes: 9ba1f726be ("netfilter: nf_tables: add new nft_masq expression")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-10 17:56:28 +01:00
Arik Nemtsov
a6d4a534e1 cfg80211: introduce regulatory flags controlling bw
Allow setting bandwidth related regulatory flags. These flags are mapped
to the corresponding channel flags in the specified range.
Make sure the new flags are consulted when calculating the maximum
bandwidth allowed by a regulatory-rule.

Also allow propagating the GO_CONCURRENT modifier from a reg-rule to a
channel.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-10 10:36:21 +01:00
Johannes Berg
1f7bba79af mac80211: add back support for radiotap vendor namespace data
Radiotap vendor namespace data might still be useful, but we
reverted it because it used too much space in the RX status.
Put it back, but address the space problem by using a single
bit only and putting everything else into the skb->data.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-10 10:30:43 +01:00
Luciano Coelho
d04b5ac9e7 cfg80211/mac80211: allow any interface to send channel switch notifications
For multi-vif channel switches, we want to send
NL80211_CMD_CH_SWITCH_NOTIFY to the userspace to let it decide whether
other interfaces need to be moved as well.  This is needed when we
want a P2P GO interface to follow the channel of a station, for
example.

Modify the code so that all interfaces can send CSA notifications.
Additionally, send notifications for STA CSA as well.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-10 10:20:18 +01:00
Luciano Coelho
2f4572930d mac80211: send channel switch started notifications
Send a channel switch notification to userspace when a channel switch
is requested or when we react to a remote CSA.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-10 10:20:17 +01:00
Luciano Coelho
f8d7552e94 cfg80211: add channel switch started notification
Add a new NL80211_CH_SWITCH_STARTED_NOTIFY message that can be sent to
the userspace when a channel switch process has started.  This allows
userspace to take action, for instance, by requesting other interfaces
to switch channel as necessary.

This patch introduces a function that allows the drivers to send this
notification.  It should be used when the driver starts processing a
channel switch initiated by a remote device (eg. when a STA receives a
CSA from the AP) and when it successfully starts a userspace-triggered
channel switch (eg. when hostapd triggers a channel swith in the AP).

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-10 10:20:14 +01:00
Luciano Coelho
127f10ec60 mac80211: add device_timestamp to the drv_pre_channel_switch trace
The device_timestamp value was left out of the event trace for
drv_pre_channel_switch by mistake.  Add it.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-10 10:18:04 +01:00
Luciano Coelho
000baa5dfd mac80211: fix order of setting ch_switch and drv_pre_channel_switch call
There was a mistake when merging commit 6d027bcc (mac80211: add
pre_channel_switch driver operation) for upstream.  The assignment of
the values in the ch_switch structure came below the call to
drv_pre_channel_switch.  Fix the order.

Fixes: 6d027bcc (mac80211: add pre_channel_switch driver operation)
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-10 10:17:42 +01:00
Jarno Rajahalme
05da5898a9 openvswitch: Add support for OVS_FLOW_ATTR_PROBE.
This new flag is useful for suppressing error logging while probing
for datapath features using flow commands.  For backwards
compatibility reasons the commands are executed normally, but error
logging is suppressed.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-09 18:58:44 -08:00
Thomas Graf
12eb18f711 openvswitch: Constify various function arguments
Help produce better optimized code.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-09 18:58:44 -08:00
Pravin B Shelar
e8eedb85bd openvswitch: Remove redundant key ref from upcall_info.
struct dp_upcall_info has pointer to pkt_key which is already
available in OVS_CB.  This also simplifies upcall handling
for gso packet.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-11-09 18:58:44 -08:00
Pravin B Shelar
fff06c36a2 openvswitch: Optimize recirc action.
OVS need to flow key for flow lookup in recic action. OVS
does key extract in recic action. Most of cases we could
use OVS_CB packet key directly and can avoid packet flow key
extract. SET action we can update flow-key along with packet
to keep it consistent. But there are some action like MPLS
pop which forces OVS to do flow-extract. In such cases we
can mark flow key as invalid so that subsequent recirc
action can do full flow extract.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-11-09 18:58:44 -08:00
Wenyu Zhang
8f0aad6f35 openvswitch: Extend packet attribute for egress tunnel info
OVS vswitch has extended IPFIX exporter to export tunnel headers
to improve network visibility.
To export this information userspace needs to know egress tunnel
for given packet. By extending packet attributes datapath can
export egress tunnel info for given packet. So that userspace
can ask for egress tunnel info in userspace action. This
information is used to build IPFIX data for given flow.

Signed-off-by: Wenyu Zhang <wenyuz@vmware.com>
Acked-by: Romain Lenglet <rlenglet@vmware.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-09 18:58:44 -08:00
Pravin B Shelar
9ba559d9ca openvswitch: Export symbols as GPL symbols.
vport can be compiled as modules, therefore openvswitch needs
to export few symbols. Export them as GPL symbols.

CC: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-09 18:58:44 -08:00
Alexander Aring
f7cb96f105 mac802154: protect address changes via ioctl
This patch adds a netif_running check while trying to change the address
attributes via ioctl. While netif_running is true these attributes
should be only readable.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:29 +01:00
Alexander Aring
87023e1058 ieee802154: fix iface dump with lowpan
This patch adds a hacked solution for an interface dump with a running
lowpan interface. This will crash because lowpan and wpan interface use
the same arphdr. To change the arphdr will change the UAPI, this patch
checks on mtu which should on lowpan interface always different than
IEEE802154_MTU.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:29 +01:00
Alexander Aring
7bea1ea7b4 ieee802154: netlink add rtnl lock
This patch adds rtnl lock hold mechanism while accessing wpan_dev
attributes. Furthermore these attributes should be protected by rtnl
lock and netif_running only.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:29 +01:00
Alexander Aring
b03c9cccfa mac820154: don't set monitor dev_addr
This patch removes the setting of dev_addr on a monitor device. This
address should be zero. A monitor should only sniff and send raw frames
out. The address should be never used by upper layers and receiving
frame parsing.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:29 +01:00
Alexander Aring
4b96aea0fc ieee802154: add wpan_dev dump support
This patch adds support for wpan_dev dump via nl802154 framework.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:29 +01:00
Alexander Aring
ca20ce201c ieee802154: add wpan_phy dump support
This patch adds support for dumping wpan_phy attributes via nl802154.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:29 +01:00
Alexander Aring
79fe1a2aa7 ieee802154: add nl802154 framework
This patch adds a basic nl802154 framework. Most of this code was
grabbed from nl80211 framework.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:29 +01:00
Alexander Aring
a6fd693f6b ieee802154: sysfs add wpan_phy index and name
This patch adds new sysfs entries for wpan_phy index and name. This
needed for the new 802.15.4 userspace tool.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:28 +01:00
Alexander Aring
fcf39e6e88 ieee802154: add wpan_dev_list
This patch adds a wpan_dev_list list into cfg802154_registered_device
struct. Also adding new wpan_dev into this list while
cfg802154_netdev_notifier_call. This behaviour is mostly grab from
wireless core.c implementation and is needed for preparing nl802154
framework.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:28 +01:00
Alexander Aring
190ac1ca33 ieee802154: add iftype to wpan_dev
This patch adds an iftype argument to the wpan_dev. This is needed to
get the interface type from netdev ieee802154_ptr. The subif data struct
can only accessible in mac802154 branch.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:28 +01:00
Alexander Aring
f3ada640c2 ieee802154: add cfg802154_registered_device list
This patch adds a new cfg802154_rdev_list to remember all registered
cfg802154_registered_device structs. This is needed to prepare the
upcomming nl802154 framework.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:28 +01:00
Alexander Aring
f601379fa1 ieee802154: rename wpan_phy_alloc
This patch renames the wpan_phy_alloc function to wpan_phy_new. This
naming convention is like wireless and "wiphy_new" function.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:28 +01:00
Alexander Aring
5fb3f026ae mac802154: remove mac_params in sdata
This patch removes the mac_params from subif data struct. Instead we
manipulate the wpan attributes directly.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:28 +01:00
Alexander Aring
863e88f255 mac802154: move mac pib attributes into wpan_dev
This patch moves all mac pib attributes into the wpan_dev struct.
Furthermore we can easier access these attributes over the netdev
802154_ptr pointer. Currently this is only possible over a complicated
callback structure in mac802154 because subif data structure is
accessable inside mac802154 only.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-09 19:50:27 +01:00
Ana Rey
ce674173e9 netfilter: nft_meta: add cgroup support
This allows you to filter traffic by process control group (cgroup).

Signed-off-by: Ana Rey <anarey@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-09 16:21:22 +01:00
Joe Perches
c0560b9c52 dccp: Convert DCCP_WARN to net_warn_ratelimited
Remove the dependency on the "warning" sysctl (net_msg_warn)
which is only used by the LIMIT_NETDEBUG macro.

Convert the LIMIT_NETDEBUG use in DCCP_WARN to the more
common net_warn_ratelimited mechanism.

This still ratelimits based on the net_ratelimit()
function, but removes the check for the sysctl.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-08 21:22:54 -05:00
Alexander Aring
b0c42cd7b2 Bluetooth: 6lowpan: fix skb_unshare behaviour
This patch reverts commit:

a7807d73 ("Bluetooth: 6lowpan: Avoid memory leak if memory allocation
fails")

which was wrong suggested by Alexander Aring. The function skb_unshare
run also kfree_skb on failure.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 3.18.x
2014-11-08 20:29:35 +01:00
Rick Jones
36cbb2452c udp: Increment UDP_MIB_IGNOREDMULTI for arriving unmatched multicasts
As NIC multicast filtering isn't perfect, and some platforms are
quite content to spew broadcasts, we should not trigger an event
for skb:kfree_skb when we do not have a match for such an incoming
datagram.  We do though want to avoid sweeping the matter under the
rug entirely, so increment a suitable statistic.

This incorporates feedback from David L. Stevens, Karl Neiss and Eric
Dumazet.

V3 - use bool per David Miller

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-07 15:45:50 -05:00
Herbert Xu
bfe1be38fc net: Kill skb_copy_datagram_const_iovec
Now that both macvtap and tun are using skb_copy_datagram_iter, we
can kill the abomination that is skb_copy_datagram_const_iovec.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-07 12:13:34 -05:00
Herbert Xu
a8f820aa40 inet: Add skb_copy_datagram_iter
This patch adds skb_copy_datagram_iter, which is identical to
skb_copy_datagram_iovec except that it operates on iov_iter
instead of iovec.

Eventually all users of skb_copy_datagram_iovec should switch
over to iov_iter and then we can remove skb_copy_datagram_iovec.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-07 12:13:34 -05:00
Jaganath Kanakkassery
cb77c3ec07 Bluetooth: Send mgmt_connected only if state is BT_CONFIG
If a remote name request is initiated while acl connection is going on,
and if it fails then mgmt_connected will be sent. Evetually after acl
connection, authentication will not be initiated and userspace will
never get pairing reply.

< HCI Command: Create Connection (0x01|0x0005) plen 13
    bdaddr AA:BB:CC:DD:EE:FF ptype 0xcc18 rswitch 0x01 clkoffset 0x2306 (valid)
    Packet type: DM1 DM3 DM5 DH1 DH3 DH5
> HCI Event: Command Status (0x0f) plen 4
    Create Connection (0x01|0x0005) status 0x00 ncmd 1
> HCI Event: Inquiry Complete (0x01) plen 1
    status 0x00
< HCI Command: Remote Name Request (0x01|0x0019) plen 10
    bdaddr AA:BB:CC:DD:EE:FF mode 1 clkoffset 0x2306
> HCI Event: Command Status (0x0f) plen 4
    Remote Name Request (0x01|0x0019) status 0x0c ncmd 1
    Error: Command Disallowed
> HCI Event: Connect Complete (0x03) plen 11
    status 0x00 handle 50 bdaddr 00:0D:FD:47:53:B2 type ACL encrypt 0x00
< HCI Command: Read Remote Supported Features (0x01|0x001b) plen 2
    handle 50
> HCI Event: Command Status (0x0f) plen 4
    Read Remote Supported Features (0x01|0x001b) status 0x00 ncmd 1
> HCI Event: Max Slots Change (0x1b) plen 3
    handle 50 slots 5
> HCI Event: Read Remote Supported Features (0x0b) plen 11
    status 0x00 handle 50
    Features: 0xff 0xff 0x8f 0xfe 0x9b 0xff 0x59 0x83
< HCI Command: Read Remote Extended Features (0x01|0x001c) plen 3
    handle 50 page 1
> HCI Event: Command Status (0x0f) plen 4
    Read Remote Extended Features (0x01|0x001c) status 0x00 ncmd 1
> HCI Event: Read Remote Extended Features (0x23) plen 13
    status 0x00 handle 50 page 1 max 1
    Features: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00

This patch sends mgmt_connected in remote name command status only if
conn->state is BT_CONFIG

Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-11-07 15:43:51 +02:00
David S. Miller
1f5623106f Merge tag 'master-2014-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-11-06

Please pull this batch of fixes intended for the 3.18 stream...

For the mac80211 bits, Johannes says:

"This contains another small set of fixes for 3.18, these are all
over the place and most of the bugs are old, one even dates back
to the original mac80211 we merged into the kernel."

For the iwlwifi bits, Emmanuel says:

"I fix here two issues that are related to the firmware
loading flow. A user reported that he couldn't load the
driver because the rfkill line was pulled up while we
were running the calibrations. This was happening while
booting the system: systemd was restoring the "disable
wifi settings" and that raised an RFKILL interrupt during
the calibration. Our driver didn't handle that properly
and this is now fixed."

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 22:15:20 -05:00
David S. Miller
4e84b496fd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-06 22:01:18 -05:00
David S. Miller
6b798d70d0 Merge branch 'net_next_ovs' of git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch
Pravin B Shelar says:

====================
Open vSwitch

First two patches are related to OVS MPLS support. Rest of patches
are mostly refactoring and minor improvements to openvswitch.

v1-v2:
 - Fix conflicts due to "gue: Remote checksum offload"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 16:33:13 -05:00
Martin Townsend
56b2c3eea3 6lowpan: move skb_free from error paths in decompression
Currently we ensure that the skb is freed on every error path in IPHC
decompression which makes it easy to introduce skb leaks.  By centralising
the skb_free into the receive function it makes future decompression routines
easier to maintain.  It does come at the expense of ensuring that the skb
passed into the decompression routine must not be copied.

Signed-off-by: Martin Townsend <mtownsend1973@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-06 22:09:48 +01:00
Joe Perches
4508349777 net: esp: Convert NETDEBUG to pr_info
Commit 64ce207306 ("[NET]: Make NETDEBUG pure printk wrappers")
originally had these NETDEBUG printks as always emitting.

Commit a2a316fd06 ("[NET]: Replace CONFIG_NET_DEBUG with sysctl")
added a net_msg_warn sysctl to these NETDEBUG uses.

Convert these NETDEBUG uses to normal pr_info calls.

This changes the output prefix from "ESP: " to include
"IPSec: " for the ipv4 case and "IPv6: " for the ipv6 case.

These output lines are now like the other messages in the files.

Other miscellanea:

Neaten the arithmetic spacing to be consistent with other
arithmetic spacing in the files.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 15:11:10 -05:00
Joe Perches
cbffccc970 net; ipv[46] - Remove 2 unnecessary NETDEBUG OOM messages
These messages aren't useful as there's a generic dump_stack()
on OOM.

Neaten the comment and if test above the OOM by separating the
assign in if into an allocation then if test.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 15:11:10 -05:00
Andrew Lunn
b31f65fb43 net: dsa: slave: Fix autoneg for phys on switch MDIO bus
When the ports phys are connected to the switches internal MDIO bus,
we need to connect the phy to the slave netdev, otherwise
auto-negotiation etc, does not work.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 15:06:28 -05:00
Jiri Pirko
0c6965dd31 sched: fix act file names in header comment
Fixes: 4bba3925 ("[PKT_SCHED]: Prefix tc actions with act_")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 15:04:41 -05:00
Steffen Klassert
ea3dc9601b ip6_tunnel: Add support for wildcard tunnel endpoints.
This patch adds support for tunnels with local or
remote wildcard endpoints. With this we get a
NBMA tunnel mode like we have it for ipv4 and
sit tunnels.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 14:19:20 -05:00
Steffen Klassert
d50051407f ipv6: Allow sending packets through tunnels with wildcard endpoints
Currently we need the IP6_TNL_F_CAP_XMIT capabiltiy to transmit
packets through an ipv6 tunnel. This capability is set when the
tunnel gets configured, based on the tunnel endpoint addresses.

On tunnels with wildcard tunnel endpoints, we need to do the
capabiltiy checking on a per packet basis like it is done in
the receive path.

This patch extends ip6_tnl_xmit_ctl() to take local and remote
addresses as parameters to allow for per packet capabiltiy
checking.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 14:19:19 -05:00
Kuba Pawlak
9645c76c7c Bluetooth: Sort switch cases by opcode's numeric value
Opcodes in switch/case in hci_cmd_status_evt are not sorted
by value. This patch restores proper ordering.

Signed-off-by: Kuba Pawlak <kubax.t.pawlak@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-06 19:38:42 +01:00
Kuba Pawlak
50fc85f1b0 Bluetooth: Clear role switch pending flag
If role switch was rejected by the controller and HCI Event: Command Status
returned with status "Command Disallowed" (0x0C) the flag
HCI_CONN_RSWITCH_PEND remains set. No further role switches are
possible as this flag prevents us from sending any new HCI Switch Role
requests and the only way to clear it is to receive a valid
HCI Event Switch Role.

This patch clears the flag if command was rejected.

2013-01-01 00:03:44.209913 < HCI Command: Switch Role (0x02|0x000b) plen 7
    bdaddr BC:C6:DB:C4:6F:79 role 0x00
    Role: Master
2013-01-01 00:03:44.210867 > HCI Event: Command Status (0x0f) plen 4
    Switch Role (0x02|0x000b) status 0x0c ncmd 1
    Error: Command Disallowed

Signed-off-by: Kuba Pawlak <kubax.t.pawlak@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-06 19:38:42 +01:00
Ronald Wahl
4f031fa9f1 mac80211: Fix regression that triggers a kernel BUG with CCMP
Commit 7ec7c4a9a6 (mac80211: port CCMP to
cryptoapi's CCM driver) introduced a regression when decrypting empty
packets (data_len == 0). This will lead to backtraces like:

(scatterwalk_start) from [<c01312f4>] (scatterwalk_map_and_copy+0x2c/0xa8)
(scatterwalk_map_and_copy) from [<c013a5a0>] (crypto_ccm_decrypt+0x7c/0x25c)
(crypto_ccm_decrypt) from [<c032886c>] (ieee80211_aes_ccm_decrypt+0x160/0x170)
(ieee80211_aes_ccm_decrypt) from [<c031c628>] (ieee80211_crypto_ccmp_decrypt+0x1ac/0x238)
(ieee80211_crypto_ccmp_decrypt) from [<c032ef28>] (ieee80211_rx_handlers+0x870/0x1d24)
(ieee80211_rx_handlers) from [<c0330c7c>] (ieee80211_prepare_and_rx_handle+0x8a0/0x91c)
(ieee80211_prepare_and_rx_handle) from [<c0331260>] (ieee80211_rx+0x568/0x730)
(ieee80211_rx) from [<c01d3054>] (__carl9170_rx+0x94c/0xa20)
(__carl9170_rx) from [<c01d3324>] (carl9170_rx_stream+0x1fc/0x320)
(carl9170_rx_stream) from [<c01cbccc>] (carl9170_usb_tasklet+0x80/0xc8)
(carl9170_usb_tasklet) from [<c00199dc>] (tasklet_hi_action+0x88/0xcc)
(tasklet_hi_action) from [<c00193c8>] (__do_softirq+0xcc/0x200)
(__do_softirq) from [<c0019734>] (irq_exit+0x80/0xe0)
(irq_exit) from [<c0009c10>] (handle_IRQ+0x64/0x80)
(handle_IRQ) from [<c000c3a0>] (__irq_svc+0x40/0x4c)
(__irq_svc) from [<c0009d44>] (arch_cpu_idle+0x2c/0x34)

Such packets can appear for example when using the carl9170 wireless driver
because hardware sometimes generates garbage when the internal FIFO overruns.

This patch adds an additional length check.

Cc: stable@vger.kernel.org
Fixes: 7ec7c4a9a6 ("mac80211: port CCMP to cryptoapi's CCM driver")
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Ronald Wahl <ronald.wahl@raritan.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-06 12:42:22 +01:00
Pravin B Shelar
a85311bf1f openvswitch: Avoid NULL mask check while building mask
OVS does mask validation even if it does not need to convert
netlink mask attributes to mask structure.  ovs_nla_get_match()
caller can pass NULL mask structure pointer if the caller does
not need mask.  Therefore NULL check is required in SW_FLOW_KEY*
macros.  Following patch does not convert mask netlink attributes
if mask pointer is NULL, so we do not need these checks in
SW_FLOW_KEY* macro.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-11-05 23:52:35 -08:00
Pravin B Shelar
2fdb957d63 openvswitch: Refactor action alloc and copy api.
There are two separate API to allocate and copy actions list. Anytime
OVS needs to copy action list, it needs to call both functions.
Following patch moves action allocation to copy function to avoid
code duplication.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-11-05 23:52:35 -08:00
Joe Stringer
41af73e9c1 openvswitch: Move key_attr_size() to flow_netlink.h.
flow-netlink has netlink related code.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-05 23:52:35 -08:00
Lorand Jakab
d98612b8c1 openvswitch: Remove flow member from struct ovs_skb_cb
The 'flow' memeber was chosen for removal because it's only used
in ovs_execute_actions() we can pass it as argument to this
function.

Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-05 23:52:35 -08:00
Chunhe Li
e1f9c356d2 openvswitch: Drop packets when interdev is not up
If the internal device is not up, it should drop received
packets. Sometimes it receive the broadcast or multicast
packets, and the ip protocol stack will casue more cpu
usage wasted.

Signed-off-by: Chunhe Li <lichunhe@huawei.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-05 23:52:35 -08:00
Andy Zhou
cc3a5ae6f2 openvswitch: Refactor get_dp() function into multiple access APIs.
Avoid recursive read_rcu_lock() by using the lighter weight
get_dp_rcu() API. Add proper locking assertions to get_dp().

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-05 23:52:34 -08:00
Joe Stringer
ca7105f278 openvswitch: Refactor ovs_flow_cmd_fill_info().
Split up ovs_flow_cmd_fill_info() to make it easier to cache parts of a
dump reply. This will be used to streamline flow_dump in a future patch.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-05 23:52:34 -08:00
Andy Zhou
738967b8bf openvswitch: refactor do_output() to move NULL check out of fast path
skb_clone() NULL check is implemented in do_output(), as past of the
common (fast) path. Refactoring so that NULL check is done in the
slow path, immediately after skb_clone() is called.

Besides optimization, this change also improves code readability by
making the skb_clone() NULL check consistent within OVS datapath
module.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-05 23:52:34 -08:00
Jesse Gross
426cda5cc1 openvswitch: Additional logging for -EINVAL on flow setups.
There are many possible ways that a flow can be invalid so we've
added logging for most of them. This adds logs for the remaining
possible cases so there isn't any ambiguity while debugging.

CC: Federico Iezzi <fiezzi@enter.it>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-05 23:52:34 -08:00
Joe Stringer
1b760fb9a8 openvswitch: Remove redundant tcp_flags code.
These two cases used to be treated differently for IPv4/IPv6,
but they are now identical.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-05 23:52:34 -08:00
Pravin B Shelar
9b996e544a openvswitch: Move table destroy to dp-rcu callback.
Ths simplifies flow-table-destroy API. No need to pass explicit
parameter about context.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
2014-11-05 23:52:34 -08:00
Simon Horman
25cd9ba0ab openvswitch: Add basic MPLS support to kernel
Allow datapath to recognize and extract MPLS labels into flow keys
and execute actions which push, pop, and set labels on packets.

Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.

Cc: Ravi K <rkerur@gmail.com>
Cc: Leo Alterman <lalterman@nicira.com>
Cc: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-05 23:52:33 -08:00
Pravin B Shelar
59b93b41e7 net: Remove MPLS GSO feature.
Device can export MPLS GSO support in dev->mpls_features same way
it export vlan features in dev->vlan_features. So it is safe to
remove NETIF_F_GSO_MPLS redundant flag.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-05 23:52:33 -08:00
Tom Herbert
e1b2cb6550 fou: Fix typo in returning flags in netlink
When filling netlink info, dport is being returned as flags. Fix
instances to return correct value.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 22:18:20 -05:00
Daniel Borkmann
4c672e4b42 ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs
It has been reported that generating an MLD listener report on
devices with large MTUs (e.g. 9000) and a high number of IPv6
addresses can trigger a skb_over_panic():

skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
dev:port1
 ------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:100!
invalid opcode: 0000 [#1] SMP
Modules linked in: ixgbe(O)
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
[...]
Call Trace:
 <IRQ>
 [<ffffffff80578226>] ? skb_put+0x3a/0x3b
 [<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
 [<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
 [<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
 [<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
 [<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
 [<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
 [<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
 [<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
 [<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
 [<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70

mld_newpack() skb allocations are usually requested with dev->mtu
in size, since commit 72e09ad107 ("ipv6: avoid high order allocations")
we have changed the limit in order to be less likely to fail.

However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
macros, which determine if we may end up doing an skb_put() for
adding another record. To avoid possible fragmentation, we check
the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
assumption as the actual max allocation size can be much smaller.

The IGMP case doesn't have this issue as commit 57e1ab6ead
("igmp: refine skb allocations") stores the allocation size in
the cb[].

Set a reserved_tailroom to make it fit into the MTU and use
skb_availroom() helper instead. This also allows to get rid of
igmp_skb_size().

Reported-by: Wei Liu <lw1a2.jing@gmail.com>
Fixes: 72e09ad107 ("ipv6: avoid high order allocations")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: David L Stevens <david.stevens@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 22:12:30 -05:00
Joe Perches
1744bea1fa net: Convert SEQ_START_TOKEN/seq_printf to seq_puts
Using a single fixed string is smaller code size than using
a format and many string arguments.

Reduces overall code size a little.

$ size net/ipv4/igmp.o* net/ipv6/mcast.o* net/ipv6/ip6_flowlabel.o*
   text	   data	    bss	    dec	    hex	filename
  34269	   7012	  14824	  56105	   db29	net/ipv4/igmp.o.new
  34315	   7012	  14824	  56151	   db57	net/ipv4/igmp.o.old
  30078	   7869	  13200	  51147	   c7cb	net/ipv6/mcast.o.new
  30105	   7869	  13200	  51174	   c7e6	net/ipv6/mcast.o.old
  11434	   3748	   8580	  23762	   5cd2	net/ipv6/ip6_flowlabel.o.new
  11491	   3748	   8580	  23819	   5d0b	net/ipv6/ip6_flowlabel.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 22:04:55 -05:00
Marcelo Leitner
1f37bf87aa tcp: zero retrans_stamp if all retrans were acked
Ueki Kohei reported that when we are using NewReno with connections that
have a very low traffic, we may timeout the connection too early if a
second loss occurs after the first one was successfully acked but no
data was transfered later. Below is his description of it:

When SACK is disabled, and a socket suffers multiple separate TCP
retransmissions, that socket's ETIMEDOUT value is calculated from the
time of the *first* retransmission instead of the *latest*
retransmission.

This happens because the tcp_sock's retrans_stamp is set once then never
cleared.

Take the following connection:

                      Linux                    remote-machine
                        |                           |
         send#1---->(*1)|--------> data#1 --------->|
                  |     |                           |
                 RTO    :                           :
                  |     |                           |
                 ---(*2)|----> data#1(retrans) ---->|
                  | (*3)|<---------- ACK <----------|
                  |     |                           |
                  |     :                           :
                  |     :                           :
                  |     :                           :
                16 minutes (or more)                :
                  |     :                           :
                  |     :                           :
                  |     :                           :
                  |     |                           |
         send#2---->(*4)|--------> data#2 --------->|
                  |     |                           |
                 RTO    :                           :
                  |     |                           |
                 ---(*5)|----> data#2(retrans) ---->|
                  |     |                           |
                  |     |                           |
                RTO*2   :                           :
                  |     |                           |
                  |     |                           |
      ETIMEDOUT<----(*6)|                           |

(*1) One data packet sent.
(*2) Because no ACK packet is received, the packet is retransmitted.
(*3) The ACK packet is received. The transmitted packet is acknowledged.

At this point the first "retransmission event" has passed and been
recovered from. Any future retransmission is a completely new "event".

(*4) After 16 minutes (to correspond with retries2=15), a new data
packet is sent. Note: No data is transmitted between (*3) and (*4).

The socket's timeout SHOULD be calculated from this point in time, but
instead it's calculated from the prior "event" 16 minutes ago.

(*5) Because no ACK packet is received, the packet is retransmitted.
(*6) At the time of the 2nd retransmission, the socket returns
ETIMEDOUT.

Therefore, now we clear retrans_stamp as soon as all data during the
loss window is fully acked.

Reported-by: Ueki Kohei
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:59:49 -05:00
David S. Miller
51f3d02b98 net: Add and use skb_copy_datagram_msg() helper.
This encapsulates all of the skb_copy_datagram_iovec() callers
with call argument signature "skb, offset, msghdr->msg_iov, length".

When we move to iov_iters in the networking, the iov_iter object will
sit in the msghdr.

Having a helper like this means there will be less places to touch
during that transformation.

Based upon descriptions and patch from Al Viro.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:46:40 -05:00
Tom Herbert
a8d31c128b gue: Receive side of remote checksum offload
Add processing of the remote checksum offload option in both the normal
path as well as the GRO path. The implements patching the affected
checksum to derive the offloaded checksum.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:04 -05:00
Tom Herbert
b17f709a24 gue: TX support for using remote checksum offload option
Add if_tunnel flag TUNNEL_ENCAP_FLAG_REMCSUM to configure
remote checksum offload on an IP tunnel. Add logic in gue_build_header
to insert remote checksum offload option.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:03 -05:00
Tom Herbert
e585f23636 udp: Changes to udp_offload to support remote checksum offload
Add a new GSO type, SKB_GSO_TUNNEL_REMCSUM, which indicates remote
checksum offload being done (in this case inner checksum must not
be offloaded to the NIC).

Added logic in __skb_udp_tunnel_segment to handle remote checksum
offload case.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:03 -05:00
Tom Herbert
5024c33ac3 gue: Add infrastructure for flags and options
Add functions and basic definitions for processing standard flags,
private flags, and control messages. This includes definitions
to compute length of optional fields corresponding to a set of flags.
Flag validation is in validate_gue_flags function. This checks for
unknown flags, and that length of optional fields is <= length
in guehdr hlen.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:03 -05:00
Tom Herbert
4bcb877d25 udp: Offload outer UDP tunnel csum if available
In __skb_udp_tunnel_segment if outer UDP checksums are enabled and
ip_summed is not already CHECKSUM_PARTIAL, set up checksum offload
if device features allow it.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:03 -05:00
Tom Herbert
63487babf0 net: Move fou_build_header into fou.c and refactor
Move fou_build_header out of ip_tunnel.c and into fou.c splitting
it up into fou_build_header, gue_build_header, and fou_build_udp.
This allows for other users for TX of FOU or GUE. Change ip_tunnel_encap
to call fou_build_header or gue_build_header based on the tunnel
encapsulation type. Similarly, added fou_encap_hlen and gue_encap_hlen
functions which are called by ip_encap_hlen. New net/fou.h has
prototypes and defines for this.

Added NET_FOU_IP_TUNNELS configuration. When this is set, IP tunnels
can use FOU/GUE and fou module is also selected.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:02 -05:00
Alexander Aring
0916c02205 mac802154: fix typo promisuous to promiscuous
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:06 +01:00
Alexander Aring
e57a894684 mac802154: use IEEE802154_EXTENDED_ADDR_LEN
This patch removes the af_ieee802154 defines and use the
IEEE802154_EXTENDED_ADDR_LEN. We should do this everywhere in the
802.15.4 subsystem because af_ieee802154 should be normally an uapi
header.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:06 +01:00
Alexander Aring
dee56d1477 mac802154: add support for perm_extended_addr
This patch adding support for a perm extended address. This is useful
when a device supports an eeprom with a programmed static extended address.
If a device doesn't support such eeprom or serial registers then the
driver should generate a random extended address.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:05 +01:00
Alexander Aring
705cbbbe9c mac802154: cleanup ieee802154_netdev_to_extended_addr
This patch cleanups the ieee802154_be64_to_le64 to have a similar
function like ieee802154_le64_to_be64 only with switched source and
destionation types.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:05 +01:00
Alexander Aring
7c118c1a86 mac802154: add ieee802154_vif struct
This patch adds an ieee802154_vif similar like the ieee80211_vif which
holds the interface type and maybe further more attributes like the
ieee80211_vif structure.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:04 +01:00
Alexander Aring
e4962a1443 mac802154: add default interface registration
This patch adds a default interface registration for a wpan interface
type. Currently the 802.15.4 subsystem need to call userspace tools to
add an interface. This patch is like mac80211 handling for registration
a station interface type by default.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:04 +01:00
Alexander Aring
bd28a11f25 ieee802154: remove mlme get_phy callback
This patch removes the get_phy callback from mlme ops structure. Instead
we doing a dereference via ieee802154_ptr dev pointer. For backwards
compatibility we need to run get_device after dereference wpan_phy via
ieee802154_ptr.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:04 +01:00
Alexander Aring
d5ae67bacd ieee802154: rework interface registration
This patch meld mac802154_netdev_register into ieee802154_if_add
function. Also we have now only one alloc_netdev call with one interface
setup routine "ieee802154_if_setup" instead two different one for each
interface type. This patch checks via runtime the interface type and do
different handling now. Additional we add the wpan_dev struct in
ieee802154_sub_if_data and set the new ieee802154_ptr while netdev
registration. This behaviour is very similar the mac80211 netdev
registration functionality.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:04 +01:00
Alexander Aring
12cb56c237 mac802154: move dev_hold out of ieee802154_if_add
This patch moves the dev_hold call inside of nl-phy ieee802154_add_iface
function. The ieee802154_add_iface is the only one function which use the
ieee802154_if_add function and contains the corresponding dev_put call.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:03 +01:00
Alexander Aring
986a8abfc5 mac802154: move interface add handling in iface
This patch moves and renames the mac802154_add_iface and
mac802154_netdev_register functions into iface.c. The function
mac802154_add_iface is renamed to ieee802154_if_add which is a similar naming
convention like mac80211.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:03 +01:00
Alexander Aring
b210b18747 mac802154: move interface del handling in iface
This patch moves and rename the mac802154_del_iface function into
iface.c and rename the function to ieee802154_if_remove which is a similar
naming convention like mac80211.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:03 +01:00
Alexander Aring
9f3295b9ea ieee802154: remove nl802154 unused functions
The include/net/nl802154.h file contains a lot of prototypes which are
not used inside of ieee802154 subsystem. This patch removes this file
and make the only one used prototype "ieee802154_nl_start_confirm" as
static declaration in ieee802154/nl-mac.c

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:03 +01:00
Alexander Aring
53f9ee61b4 ieee802154: rework wpan_phy index assignment
This patch reworks the wpan_phy index incrementation. It's now similar
like wireless wiphy index incrementation. We move the wpan_phy index
attribute inside of cfg802154_registered_device and use atomic
operations instead locking mechanism via wpan_phy_mutex.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:03 +01:00
Jesse Gross
d3ca9eafc0 geneve: Unregister pernet subsys on module unload.
The pernet ops aren't ever unregistered, which causes a memory
leak and an OOPs if the module is ever reinserted.

Fixes: 0b5e8b8eea ("net: Add Geneve tunneling protocol driver")
CC: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 15:00:51 -05:00
Jesse Gross
45cac46e51 geneve: Set GSO type on transmit.
Geneve does not currently set the inner protocol type when
transmitting packets. This causes GSO segmentation to fail on NICs
that do not support Geneve offloading.

CC: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 15:00:51 -05:00
Steven Rostedt (Red Hat)
e71456ae98 netfilter: Remove checks of seq_printf() return values
The return value of seq_printf() is soon to be removed. Remove the
checks from seq_printf() in favor of seq_has_overflowed().

Link: http://lkml.kernel.org/r/20141104142236.GA10239@salvia
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-05 14:11:02 -05:00
Joe Perches
824f1fbee7 netfilter: Convert print_tuple functions to return void
Since adding a new function to seq_file (seq_has_overflowed())
there isn't any value for functions called from seq_show to
return anything.   Remove the int returns of the various
print_tuple/<foo>_print_tuple functions.

Link: http://lkml.kernel.org/p/f2e8cf8df433a197daa62cbaf124c900c708edc7.1412031505.git.joe@perches.com

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-05 14:10:33 -05:00
Steven Rostedt (Red Hat)
37246a5837 netfilter: Remove return values for print_conntrack callbacks
The seq_printf() and friends are having their return values removed.
The print_conntrack() returns the result of seq_printf(), which is
meaningless when seq_printf() returns void. Might as well remove the
return values of print_conntrack() as well.

Link: http://lkml.kernel.org/r/20141029220107.465008329@goodmis.org
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-05 14:09:47 -05:00
Fabian Frederick
6cf1093e58 udp: remove blank line between set and test
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 17:12:10 -05:00
Florent Fourcot
869ba988fe ipv6: trivial, add bracket for the if block
The "else" block is on several lines and use bracket.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 17:10:19 -05:00
Fabian Frederick
05006e8c59 esp4: remove assignment in if condition
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 16:57:49 -05:00
John W. Linville
bf515fb11a This relatively large batch of changes is comprised of the
following:
  * large mac80211-hwsim changes from Ben, Jukka and a bit myself
  * OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
    University in Prague and Volkswagen Group Research
  * minstrel VHT work from Karl
  * more CSA work from Luca
  * WMM admission control support in mac80211 (myself)
  * various smaller fixes, spelling corrections, and minor API additions
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUWMs/AAoJEDBSmw7B7bqrmBQQAIbfAe7wH1WifRtOnhw3zWQQ
 K36+Edf3HlQ+EIkSs63QousRj2e7pGDOyhzMWLaqsmeTLteUtlGbr7qwiJO1QZdf
 Ml2V5O2s+b8hUIClDBVQF2L6+GGUmRUdQqvDDhkN1guoxD/Nk8cNtsRkSdiXWJWy
 R48NzvYDflBhc8uqPtR8jDb10eM3c00YP9HB+w9hYAfizD+FRue7UNp4MQIqwp9V
 HdKRT6L2n/6QA+Mzse0rMDes5qI7nIUNgj+hjqgJSnhITPMgGR5j/pitnVHrr81M
 ngOipBFG3svsQrwZh8nM4Llp0cM4Gs+GlgCieu9+TJpr2sY00Z3kYcp0pxtDoSxz
 Wblqz9n/bnW9mrkEfl12XqwwT5vguchwHoZ9cXhejDxSawWXoTRx20uW4ahO8ArA
 kWwwjTBVsQ5WMCtOBiqggzNKghwCc2ILmcZnjGdg9aNXcWsmQ4vyeCfG2QxBz/UB
 Grv/f9NSy6mzKQ34yv+lyR7rFZ8XcT03EVAnZSYz8X0ZZGxwtFupRp1RrBh1KPtD
 TJoe6Q71FfHKYRJ2xgygYkQFo+r9d0BKBeerq+Vu2hBeaqyi4aUwSj7d1sUaaq6N
 tL8fmAUqFjVOOUFeH1g07Xke5QD+yrEC7sJKkeRMfcRGB+dEa+2m3I5p4WDz9bWM
 AEvFSsYr/I9KI4d1huXD
 =6GIj
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-john-2014-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg <johannes@sipsolutions.net> says:

"This relatively large batch of changes is comprised of the
following:
 * large mac80211-hwsim changes from Ben, Jukka and a bit myself
 * OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
   University in Prague and Volkswagen Group Research
 * minstrel VHT work from Karl
 * more CSA work from Luca
 * WMM admission control support in mac80211 (myself)
 * various smaller fixes, spelling corrections, and minor API additions"

Conflicts:
	drivers/net/wireless/ath/wil6210/cfg80211.c

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-11-04 16:18:12 -05:00
Florian Westphal
f7b3bec6f5 net: allow setting ecn via routing table
This patch allows to set ECN on a per-route basis in case the sysctl
tcp_ecn is not set to 1. In other words, when ECN is set for specific
routes, it provides a tcp_ecn=1 behaviour for that route while the rest
of the stack acts according to the global settings.

One can use 'ip route change dev $dev $net features ecn' to toggle this.

Having a more fine-grained per-route setting can be beneficial for various
reasons, for example, 1) within data centers, or 2) local ISPs may deploy
ECN support for their own video/streaming services [1], etc.

There was a recent measurement study/paper [2] which scanned the Alexa's
publicly available top million websites list from a vantage point in US,
Europe and Asia:

Half of the Alexa list will now happily use ECN (tcp_ecn=2, most likely
blamed to commit 255cac91c3 ("tcp: extend ECN sysctl to allow server-side
only ECN") ;)); the break in connectivity on-path was found is about
1 in 10,000 cases. Timeouts rather than receiving back RSTs were much
more common in the negotiation phase (and mostly seen in the Alexa
middle band, ranks around 50k-150k): from 12-thousand hosts on which
there _may_ be ECN-linked connection failures, only 79 failed with RST
when _not_ failing with RST when ECN is not requested.

It's unclear though, how much equipment in the wild actually marks CE
when buffers start to fill up.

We thought about a fallback to non-ECN for retransmitted SYNs as another
global option (which could perhaps one day be made default), but as Eric
points out, there's much more work needed to detect broken middleboxes.

Two examples Eric mentioned are buggy firewalls that accept only a single
SYN per flow, and middleboxes that successfully let an ECN flow establish,
but later mark CE for all packets (so cwnd converges to 1).

 [1] http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf, p.15
 [2] http://ecn.ethz.ch/

Joint work with Daniel Borkmann.

Reference: http://thread.gmane.org/gmane.linux.network/335797
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 16:06:09 -05:00
Florian Westphal
f1673381b1 syncookies: split cookie_check_timestamp() into two functions
The function cookie_check_timestamp(), both called from IPv4/6 context,
is being used to decode the echoed timestamp from the SYN/ACK into TCP
options used for follow-up communication with the peer.

We can remove ECN handling from that function, split it into a separate
one, and simply rename the original function into cookie_decode_options().
cookie_decode_options() just fills in tcp_option struct based on the
echoed timestamp received from the peer. Anything that fails in this
function will actually discard the request socket.

While this is the natural place for decoding options such as ECN which
commit 172d69e63c ("syncookies: add support for ECN") added, we argue
that in particular for ECN handling, it can be checked at a later point
in time as the request sock would actually not need to be dropped from
this, but just ECN support turned off.

Therefore, we split this functionality into cookie_ecn_ok(), which tells
us if the timestamp indicates ECN support AND the tcp_ecn sysctl is enabled.

This prepares for per-route ECN support: just looking at the tcp_ecn sysctl
won't be enough anymore at that point; if the timestamp indicates ECN
and sysctl tcp_ecn == 0, we will also need to check the ECN dst metric.

This would mean adding a route lookup to cookie_check_timestamp(), which
we definitely want to avoid. As we already do a route lookup at a later
point in cookie_{v4,v6}_check(), we can simply make use of that as well
for the new cookie_ecn_ok() function w/o any additional cost.

Joint work with Daniel Borkmann.

Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 16:06:09 -05:00
Florian Westphal
274e2da0ec syncookies: avoid magic values and document which-bit-is-what-option
Was a bit more difficult to read than needed due to magic shifts;
add defines and document the used encoding scheme.

Joint work with Daniel Borkmann.

Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 16:06:08 -05:00
Mika Westerberg
72daceb9a1 net: rfkill: gpio: Add default GPIO driver mappings for ACPI
The driver uses devm_gpiod_get_index(..., index) so that the index refers
directly to the GpioIo resource under the ACPI device. The problem with
this is that if the ordering changes we get wrong GPIOs.

With ACPI 5.1 _DSD we can now use names instead to reference GPIOs
analogous to Device Tree. However, we still have systems out there that do
not provide _DSD at all. These systems must be supported as well.

Luckily we now have acpi_dev_add_driver_gpios() that can be used to provide
mappings for systems where _DSD is not provided and still take advantage of
_DSD if it exists.

This patch changes the driver to create default GPIO mappings if we are
running on ACPI system.

While there we can drop the indices completely and use devm_gpiod_get()
with name instead.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: John W. Linville <linville@tuxdriver.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-04 21:58:24 +01:00
John W. Linville
0c9a67c8f1 This contains another small set of fixes for 3.18, these are all
over the place and most of the bugs are old, one even dates back
 to the original mac80211 we merged into the kernel.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUWJSYAAoJEDBSmw7B7bqr17IP/3RFbqI4S/ceZzrVNLtvGDUd
 MIlkGhngDdhpFhSxTdOH4opFM/j9bkwndk0F35d4r94mCeB5eJQKrtUfeon/7aft
 AKaRa3CNsEVQgCempCYOKGwlJZQQ86IL6IvU4CW5CTNHENUBLA83KHqX+6Aoumhm
 mdJxhSzmB53Qn1bteIJXyJmOjgxQvBZggBIF/25Xnosb3FBH3hvPsH0qbIKZaicy
 PlD5JWk9UseySjLNwk1/jriQ4koF5Dy/BVRyQ/0fRYswdmS3o2EiC4JOWjsOfIUi
 NE9Ax+DAKvHHGYNcsX/hXsPJTc6fYgq3INEZBvnK04GHVFVGLq1WoEIfOeLugK7o
 j7OIEJbkKAQjJSnEpB9Y6YHO/jPXEokJjUNT7VuZJqLElp4Hd8K9jnhKD9jkZBA6
 TGjNO5NJqgGdlxnq3nu4+XFh9StAam6J1Ey1TWarc6Kxd8Gtg3Ymkj3cO46rHcQU
 JX3i3RGlYqibEQ0NVtZ4EfnGjtcGx0Vbf+yAc9ZpWzKFvX9YKS1wuOd5i/eZI8bb
 hxMjHFwmViV3Ifk9GjBNKioXkCpEfk9Q3pKzRllHQn56ueTu1mBvAfIe93PRm9kR
 y/giIZvHEhs8VH2PHVuHzT16YMVnNfQniAi+BK73QWC3zAhj1ss3xN33+Q8FfpMM
 xw/prlY9IAH2A9zis1Vz
 =8uwd
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-john-2014-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg <johannes@sipsolutions.net> says:

"This contains another small set of fixes for 3.18, these are all
over the place and most of the bugs are old, one even dates back
to the original mac80211 we merged into the kernel."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-11-04 15:56:33 -05:00
Fabian Frederick
436f7c2068 igmp: remove camel case definitions
use standard uppercase for definitions

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 15:13:18 -05:00
Fabian Frederick
c18450a52a udp: remove else after return
else is unnecessary after return 0 in __udp4_lib_rcv()

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 15:13:18 -05:00
Fabian Frederick
aa1f731e52 inet: frags: remove inline on static in c file
remove __inline__ / inline and let compiler decide what to do
with static functions

Inspired-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 15:13:18 -05:00
Fabian Frederick
0d3979b9c7 ipv4: remove 0/NULL assignment on static
static values are automatically initialized to 0

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 15:09:52 -05:00
Fabian Frederick
c9f503b006 ipv4: use seq_puts instead of seq_printf where possible
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 15:09:52 -05:00
Fabian Frederick
b92022f3e5 tcp: spelling s/plugable/pluggable
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 15:09:52 -05:00
Fabian Frederick
988b13438c cipso: remove NULL assignment on static
Also add blank line after structure declarations

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 15:09:52 -05:00
Fabian Frederick
4c787b1626 ipv4: include linux/bug.h instead of asm/bug.h
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 15:09:20 -05:00
Fabian Frederick
4973404f81 cipso: kerneldoc warning fix
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 15:09:20 -05:00
Pablo Neira Ayuso
c5a589cc30 netfilter: nf_log: fix sparse warning in nf_logger_find_get()
net/netfilter/nf_log.c:157:16: warning: incorrect type in assignment (different address spaces)
net/netfilter/nf_log.c:157:16:    expected struct nf_logger *logger
net/netfilter/nf_log.c:157:16:    got struct nf_logger [noderef] <asn:4>*<noident>

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-04 17:56:31 +01:00
Simon Vincent
980edbd503 6lowpan: fix udp header compression when using raw sockets
If you use RAW sockets the transport header offset is not set by the
ipv6 stack so when we get to the udp header compression it does not
compress the right part of the packet.

This patch adds a check for this scenario and sets the transport
header offset.

Signed-off-by: Simon Vincent <simon.vincent@xsilon.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-04 17:31:01 +01:00
Henning Rogge
1ef4c85049 cfg80211: fix nl80211 cmd id in nl80211_send_mpath()
Netlink command for nl80211_send_mpath() should be NL80211_CMD_NEW_MPATH.

Signed-off-by: Henning Rogge <henning.rogge@fkie.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-04 16:37:22 +01:00
Eliad Peller
cf2c92d840 mac80211: replace restart_complete() with reconfig_complete()
Drivers might want to know also when mac80211 has
completed reconfiguring after resume (e.g. in order
to know when frames can be passed to mac80211).

Rename restart_complete() to a more-generic reconfig_complete(),
and add a new enum to indicate the reconfiguration type.

Update the current users with the new prototype.

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-04 13:49:00 +01:00
Andrei Otcheretianski
13a8098af9 mac80211: increase U-APSD max service period length
Deliver up to 128 frames during service period instead of 8 if
unlimited is specified by the client during association.
8 was just an arbitrary value; so is 128 since unlimited can
be any number.

However for large traffic bursts, increasing this value looks
reasonable. Also, it seems that a few certification tests
expect more frames to be delivered during SP.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-04 13:18:22 +01:00
Johannes Berg
8ed2874715 mac80211: handle RIC data element in reassociation request
When the RIC data element (RDE) is included in the IEs coming
from userspace for an association request, its handling is
currently broken as any IEs that are contained within it would
be split off from it and inserted again after all the IEs that
mac80211 generates (e.g. HT, VHT.)

To fix this, treat the RIC element specially, and stop after
it only when we find something that doesn't actually belong to
it. This assumes userspace is actually correctly building it,
directly after the fast BSS transition IE and before all the
others like extended capabilities.

This leaves as a potential problem the case where userspace is
building the following IEs:

[RDE] [vendor resource description] [vendor non-resource IE]

In this case, we'd erroneously consider all three IEs to be
part of the RIC data together, and not split them between the
two vendor IEs. Unfortunately, it isn't easily possible to
distinguish vendor IEs, so this isn't easy to fix. Luckily,
this case is rare as normally wpa_supplicant will include an
extended capabilities IE in the IEs, and that certainly will
break the two vendor IEs apart correctly.

Reviewed-by: Eliad Peller <eliad@wizery.com>
Reviewed-by: Beni Lev <beni.lev@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-04 13:18:21 +01:00
Rostislav Lisovy
239281f803 mac80211: 802.11p OCB mode support
This patch adds 802.11p OCB (Outside the Context of a BSS) mode
support.

When communicating in OCB mode a mandatory wildcard BSSID
(48 '1' bits) is used.

The EDCA parameters handling function was changed to support
802.11p specific values.

The insertion of a newly discovered STAs is done in the similar way
as in the IBSS mode -- through the deferred insertion.

The OCB mode uses a periodic 'housekeeping task' for expiration of
disconnected STAs (in the similar manner as in the MESH mode).

New Kconfig option for verbose OCB debugging outputs is added.

Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-04 13:18:21 +01:00
Rostislav Lisovy
6e0bd6c35b cfg80211: 802.11p OCB mode handling
This patch adds new iface type (NL80211_IFTYPE_OCB) representing
the OCB (Outside the Context of a BSS) mode.
When establishing a connection to the network a cfg80211_join_ocb
function is called (particular nl80211_command is added as well).
A mandatory parameters during the ocb_join operation are 'center
frequency' and 'channel width (5/10 MHz)'.

Changes done in mac80211 are minimal possible required to avoid
many warnings (warning: enumeration value 'NL80211_IFTYPE_OCB'
not handled in switch) during compilation. Full functionality
(where needed) is added in the following patch.

Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-04 13:18:17 +01:00
Felix Fietkau
5b3dc42b1b mac80211: add support for driver tx power reporting
The configured tx power is often limited by hardware capabilities,
channel settings, antenna configuration, etc.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[fix tracing compilation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-04 10:15:09 +01:00
Johan Hedberg
2a68c89724 Bluetooth: Fix sparse warnings in RFCOMM
This patch fixes the following sparse warnings in rfcomm/core.c:

net/bluetooth/rfcomm/core.c:391:16: warning: dubious: x | !y
net/bluetooth/rfcomm/core.c:546:24: warning: dubious: x | !y

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-04 08:01:46 +01:00
Peter Zijlstra
ff960a7317 netdev, sched/wait: Fix sleeping inside wait event
rtnl_lock_unregistering*() take rtnl_lock() -- a mutex -- inside a
wait loop. The wait loop relies on current->state to function, but so
does mutex_lock(), nesting them makes for the inner to destroy the
outer state.

Fix this using the new wait_woken() bits.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jerry Chu <hkchu@google.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: sfeldma@cumulusnetworks.com <sfeldma@cumulusnetworks.com>
Cc: stephen hemminger <stephen@networkplumber.org>
Cc: Tom Gundersen <teg@jklm.no>
Cc: Tom Herbert <therbert@google.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20141029173110.GE15602@worktop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:48 +01:00
Peter Zijlstra
eedf7e47da rfcomm, sched/wait: Fix broken wait construct
rfcomm_run() is a tad broken in that is has a nested wait loop. One
cannot rely on p->state for the outer wait because the inner wait will
overwrite it.

Fix this using the new wait_woken() facility.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Alexander Holler <holler@ahsoftware.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Joe Perches <joe@perches.com>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Libor Pechacek <lpechacek@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Vignesh Raman <Vignesh_Raman@mentor.com>
Cc: linux-bluetooth@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:47 +01:00
Greg Kroah-Hartman
a8a93c6f99 Merge branch 'platform/remove_owner' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux into driver-core-next
Remove all .owner fields from platform drivers
2014-11-03 19:53:56 -08:00
Linus Torvalds
ce1928da84 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fixes from Sage Weil:
 "There is a GFP flag fix from Mike Christie, an error code fix from
  Jan, and fixes for two unnecessary allocations (kmalloc and workqueue)
  from Ilya.  All are well tested.

  Ilya has one other fix on the way but it didn't get tested in time"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: eliminate unnecessary allocation in process_one_ticket()
  rbd: Fix error recovery in rbd_obj_read_sync()
  libceph: use memalloc flags for net IO
  rbd: use a single workqueue for all devices
2014-11-03 15:04:26 -08:00
Eric Dumazet
56b174256b net: add rbnode to struct sk_buff
Yaogong replaces TCP out of order receive queue by an RB tree.

As netem already does a private skb->{next/prev/tstamp} union
with a 'struct rb_node', lets do this in a cleaner way.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 16:13:03 -05:00
Steffen Klassert
f03eb128e3 gre6: Move the setting of dev->iflink into the ndo_init functions.
Otherwise it gets overwritten by register_netdev().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 15:42:24 -05:00
Steffen Klassert
ebe084aafb sit: Use ipip6_tunnel_init as the ndo_init function.
ipip6_tunnel_init() sets the dev->iflink via a call to
ipip6_tunnel_bind_dev(). After that, register_netdevice()
sets dev->iflink = -1. So we loose the iflink configuration
for ipv6 tunnels. Fix this by using ipip6_tunnel_init() as the
ndo_init function. Then ipip6_tunnel_init() is called after
dev->iflink is set to -1 from register_netdevice().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 15:42:24 -05:00
Steffen Klassert
16a0231bf7 vti6: Use vti6_dev_init as the ndo_init function.
vti6_dev_init() sets the dev->iflink via a call to
vti6_link_config(). After that, register_netdevice()
sets dev->iflink = -1. So we loose the iflink configuration
for vti6 tunnels. Fix this by using vti6_dev_init() as the
ndo_init function. Then vti6_dev_init() is called after
dev->iflink is set to -1 from register_netdevice().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 15:42:24 -05:00
Steffen Klassert
6c6151daaf ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function.
ip6_tnl_dev_init() sets the dev->iflink via a call to
ip6_tnl_link_config(). After that, register_netdevice()
sets dev->iflink = -1. So we loose the iflink configuration
for ipv6 tunnels. Fix this by using ip6_tnl_dev_init() as the
ndo_init function. Then ip6_tnl_dev_init() is called after
dev->iflink is set to -1 from register_netdevice().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 15:42:24 -05:00
Eric Dumazet
d75b1ade56 net: less interrupt masking in NAPI
net_rx_action() can mask irqs a single time to transfert sd->poll_list
into a private list, for a very short duration.

Then, napi_complete() can avoid masking irqs again,
and net_rx_action() only needs to mask irq again in slow path.

This patch removes 2 couples of irq mask/unmask per typical NAPI run,
more if multiple napi were triggered.

Note this also allows to give control back to caller (do_softirq())
more often, so that other softirq handlers can be called a bit earlier,
or ksoftirqd can be wakeup earlier under pressure.

This was developed while testing an alternative to RX interrupt
mitigation to reduce latencies while keeping or improving GRO
aggregation on fast NIC.

Idea is to test napi->gro_list at the end of a napi->poll() and
reschedule one NAPI poll, but after servicing a full round of
softirqs (timers, TX, rcu, ...). This will be allowed only if softirq
is currently serviced by idle task or ksoftirqd, and resched not needed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 12:25:09 -05:00
Guenter Roeck
c1207c049b netfilter: nft_reject_bridge: Fix powerpc build error
Fix:
net/bridge/netfilter/nft_reject_bridge.c:
In function 'nft_reject_br_send_v6_unreach':
net/bridge/netfilter/nft_reject_bridge.c:240:3:
	error: implicit declaration of function 'csum_ipv6_magic'
   csum_ipv6_magic(&nip6h->saddr, &nip6h->daddr,
   ^
make[3]: *** [net/bridge/netfilter/nft_reject_bridge.o] Error 1

Seen with powerpc:allmodconfig.

Fixes: 523b929d54 ("netfilter: nft_reject_bridge: don't use IP stack to reject traffic")
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 12:12:34 -05:00
Szymon Janc
a736abc1ac Bluetooth: Fix invalid response for 'Start Discovery' command
According to Management Interface API 'Start Discovery' command should
generate a Command Complete event on failure. Currently kernel is
sending Command Status on early errors. This results in userspace
ignoring such event due to invalid size.

bluetoothd[28499]: src/adapter.c:trigger_start_discovery()
bluetoothd[28499]: src/adapter.c:cancel_passive_scanning()
bluetoothd[28499]: src/adapter.c:start_discovery_timeout()
bluetoothd[28499]: src/adapter.c:start_discovery_complete() status 0x0a
bluetoothd[28499]: Wrong size of start discovery return parameters

Reported-by: Jukka Taimisto <jtt@codenomicon.com>
Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-11-03 15:43:05 +02:00
Johannes Berg
b8fff407a1 mac80211: fix use-after-free in defragmentation
Upon receiving the last fragment, all but the first fragment
are freed, but the multicast check for statistics at the end
of the function refers to the current skb (the last fragment)
causing a use-after-free bug.

Since multicast frames cannot be fragmented and we check for
this early in the function, just modify that check to also
do the accounting to fix the issue.

Cc: stable@vger.kernel.org
Reported-by: Yosef Khyal <yosefx.khyal@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-03 14:28:50 +01:00
Marcel Holtmann
40f4938aa6 Bluetooth: Consolidate whitelist debugfs entry into device_list
The debufs entry for the BR/EDR whitelist is confusing since there is
a controller debugfs entry with the name white_list and both are two
different things.

With the BR/EDR whitelist, the actual interface in use is the device
list and thus just include all values from the internal BR/EDR whitelist
in the device_list debugfs entry.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-11-03 10:13:42 +02:00
dingzhi
f293a5e33e xfrm: add XFRMA_REPLAY_VAL attribute to SA messages
After this commit, the attribute XFRMA_REPLAY_VAL is added when no ESN replay
value is defined. Thus sequence number values are always notified to userspace.

Signed-off-by: dingzhi <zhi.ding@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-11-03 08:54:52 +01:00
Alexander Aring
868ed8e06a ieee802154: remove unnecessary functions
This patch fixes commit c7420c367d
("mac802154: move mac_params functions into mac_cmd"). The mac_params
functions wasn't deleted by this commit.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 21:52:03 +01:00
Alexander Aring
fdd2068ab7 mac802154: cfg: add missing include
Running make C=2 occurs warning:

symbol 'mac802154_config_ops' was not declared. Should it be static?

This patch adds a missing include in cfg.c to solve this warning.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 21:52:02 +01:00
Alexander Aring
c5fbbc4683 ieee802154: sysfs: add missing include
Running make C=2 occurs in warnings:

symbol 'wpan_phy_class' was not declared. Should it be static?
symbol 'wpan_phy_sysfs_init' was not declared. Should it be static?
symbol 'wpan_phy_sysfs_exit' wasnot declared. Should it be static?

This patch adds a missing include "sysfs.h" to solve these warnings.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 21:52:02 +01:00
Linus Torvalds
4cb8c3593b irda: stop calling sk_prot->disconnect() on connection failure
The sk_prot is irda's own set of protocol handlers, so irda should
statically know what that function is anyway, without using an indirect
pointer.  And as it happens, we know *exactly* what that pointer is
statically: it's NULL, because irda doesn't define a disconnect
operation.

So calling that function is doubly wrong, and will just cause an oops.

Reported-by: Martin Lang <mlg.hessigheim@gmail.com>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-02 10:20:26 -08:00
Marcel Holtmann
75e0569f7f Bluetooth: Add hci_reset_dev() for driver triggerd stack reset
Some Bluetooth drivers require to reset the upper stack. To avoid having
all drivers send HCI Hardware Error events, provide a generic function
to wrap the reset functionality.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-11-02 10:03:45 +02:00
Marcel Holtmann
65efd2bf48 Bluetooth: Introduce BT_BREDR and BT_LE config options
The current kernel options do not make it clear which modules are for
Bluetooth Classic (BR/EDR) and which are for Bluetooth Low Energy (LE).

To make it really clear, introduce BT_BREDR and BT_LE options with
proper dependencies into the different modules. Both new options
default to y to not create a regression with previous kernel config
files.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-11-02 10:01:53 +02:00
Marcel Holtmann
24dfa34371 Bluetooth: Print error message for HCI_Hardware_Error event
When the HCI_Hardware_Error event is send by the controller or
injected by the driver, then at least print an error message.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-11-02 09:59:42 +02:00
Marcel Holtmann
8761f9d662 Bluetooth: Check status of command complete for HCI_Reset
When the HCI_Reset command returns, the status needs to be checked. It
is unlikely that HCI_Reset actually fails, but when it fails, it is a
bad idea to reset all values since the controller will have not reset
its values in that case.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-11-02 09:58:50 +02:00
Alexander Aring
6290671018 ieee802154: 6lowpan: remove set of mac address
Currently the ieee802154 6lowpan interface operates on wpan interfaces
only. Setting the wpan mac address over 6lowpan interface is complex and
maybe we can't never do this. This patch removes the set of mac address
handling in ieee802154 6lowpan interface for now.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 04:51:08 +01:00
Alexander Aring
ea7053c1df mac802154: iface: add validation for extended address
This patch use the validation function to check if an extended address
is valid or not while set the extended address.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 04:51:08 +01:00
Alexander Aring
f59f419d31 mac802154: move phy settings into netlink receive
All PHY attributes should be directly set to the transceiver after netlink.
MAC attributes should be set by interface up. Currently the macparams
netlink cmd contains mixed attributes of phy and mac settings. This patch
moves all phy settings to the netlink receive function for setting macparams.
This is the only way which doesn't change the userspace API and keep the
deprecated netlink interface alive.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 04:51:07 +01:00
Alexander Aring
50c7907501 mac802154: set panid address filter on ifup
This patch moves the setting of hardware panid address filtering
inside of interface up instead doing it it directly inside of netlink
interface. The netlink call which can only be called when netif isn't
running sets only the necessary panid value in sdata. After an
interface up the address filter will be set with this value.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 04:51:07 +01:00
Alexander Aring
78b4bad16e mac802154: set short address filter on ifup
This patch moves the setting of hardware short address filtering
inside of interface up instead doing it it directly inside of netlink
interface. The netlink call which can only be called when netif isn't
running sets only the necessary short_addr value in sdata. After an
interface up the address filter will be set with this value.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 04:51:07 +01:00
Alexander Aring
776e59de46 mac802154: set extended address filter on ifup
This patch moves the setting of hardware extended address filtering
inside of interface up instead doing it directly inside of netlink interface.
Also we don't need to set the sdata extended attribute in netlink. This
is already done by ndo_set_mac_address of net_device_ops.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 04:51:07 +01:00
Alexander Aring
8f499f991c ieee802154: don't allow to change addr while netif_running
This patch changes the actual behaviour for setting address attributes.
We should not change addresses while netif_running is true. Furthermore
when netif_running is running the address attributes becomes read only
and we can remove locking mechanism in receive and transmit hothpaths
of 802.15.4 subsystem.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 04:51:07 +01:00
Alexander Aring
4a9a816a4f cfg802154: convert deprecated iface add and del
This patch removes the wpan_phy callbacks for add and del an interface
on a phy. Instead we introduce deprecated cfg802154 callbacks for this.
Furthermore we introduce a new netlink interface nl802154 which use
different callbacks. The deprecated function is to have a backwards
compatibility with the current netlink interface.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 04:51:06 +01:00
Alexander Aring
ea4dcd32a4 ieee802154: add helper wpan_phy_to_rdev function
This patch introduce a function to get the cfg802154_registered_device
from a wpan_phy.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 04:51:06 +01:00
Alexander Aring
1201cd22fd mac802154: introduce mac802154_config_ops
This patch introduces mac802154_config_ops struct. Like wireless this
struct should be the only one interface between ieee802154 to mac802154
or possible HardMAC drivers.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 04:51:06 +01:00
Alexander Aring
a5dd1d72d8 cfg802154: introduce cfg802154_registered_device
This patch introduce the cfg802154_registered_device struct. Like
cfg80211_registered_device in wireless this should contain similar
functionality for cfg802154. This patch should not change any behaviour.
We just adds cfg802154_registered_device as container for wpan_phy struct.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 04:51:06 +01:00
Alexander Aring
ff4e65581e ieee802154: remove default channel settings
This patch removes the default channel setting. A channel is always set
and there is no default channel setting according 802.15.4.

Drivers should set the default channel and page in probing routine. This
behaviour is currently a lack of all 802.15.4 drivers.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 04:51:06 +01:00
Chan-yeol Park
039fada5cd Bluetooth: Fix hci_sync missing wakeup interrupt
__hci_cmd_sync_ev(), __hci_req_sync() could miss wake_up_interrupt from
hci_req_sync_complete() because hci_cmd_work() workqueue and its response
could be completed before they are ready to get the signal through
add_wait_queue(), set_current_state(TASK_INTERRUPTIBLE).

Signed-off-by: Chan-yeol Park <chanyeol.park@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-11-01 23:20:21 +02:00
David S. Miller
55b42b5ca2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/phy/marvell.c

Simple overlapping changes in drivers/net/phy/marvell.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-01 14:53:27 -04:00
Ilya Dryomov
e9226d7c9f libceph: eliminate unnecessary allocation in process_one_ticket()
Commit c27a3e4d66 ("libceph: do not hard code max auth ticket len")
while fixing a buffer overlow tried to keep the same as much of the
surrounding code as possible and introduced an unnecessary kmalloc() in
the unencrypted ticket path.  It is likely to fail on huge tickets, so
get rid of it.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-10-31 23:43:08 +03:00
Guenter Roeck
e0fb6fb6d5 net: ethtool: Return -EOPNOTSUPP if user space tries to read EEPROM with lengh 0
If a driver supports reading EEPROM but no EEPROM is installed in the system,
the driver's get_eeprom_len function returns 0. ethtool will subsequently
try to read that zero-length EEPROM anyway. If the driver does not support
EEPROM access at all, this operation will return -EOPNOTSUPP. If the driver
does support EEPROM access but no EEPROM is installed, the operation will
return -EINVAL. Return -EOPNOTSUPP in both cases for consistency.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31 16:12:34 -04:00
Pravin B Shelar
de05c400f7 mpls: Allow mpls_gso to be built as module
Kconfig already allows mpls to be built as module. Following patch
fixes Makefile to do same.

CC: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31 15:47:21 -04:00
Pravin B Shelar
f7065f4bd3 mpls: Fix mpls_gso handler.
mpls gso handler needs to pull skb after segmenting skb.

CC: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31 15:47:21 -04:00
David S. Miller
e3a88f9c4f Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
netfilter/ipvs fixes for net

The following patchset contains fixes for netfilter/ipvs. This round of
fixes is larger than usual at this stage, specifically because of the
nf_tables bridge reject fixes that I would like to see in 3.18. The
patches are:

1) Fix a null-pointer dereference that may occur when logging
   errors. This problem was introduced by 4a4739d56b ("ipvs: Pull
   out crosses_local_route_boundary logic") in v3.17-rc5.

2) Update hook mask in nft_reject_bridge so we can also filter out
   packets from there. This fixes 36d2af5 ("netfilter: nf_tables: allow
   to filter from prerouting and postrouting"), which needs this chunk
   to work.

3) Two patches to refactor common code to forge the IPv4 and IPv6
   reject packets from the bridge. These are required by the nf_tables
   reject bridge fix.

4) Fix nft_reject_bridge by avoiding the use of the IP stack to reject
   packets from the bridge. The idea is to forge the reject packets and
   inject them to the original port via br_deliver() which is now
   exported for that purpose.

5) Restrict nft_reject_bridge to bridge prerouting and input hooks.
   the original skbuff may cloned after prerouting when the bridge stack
   needs to flood it to several bridge ports, it is too late to reject
   the traffic.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31 12:29:42 -04:00
Johannes Berg
de4fcbadde cfg80211: avoid using default in interface type switch
Most code avoids having a default case in interface type switch
statements already, to make it easier to find places that need
to be extended. Change the code in the __cfg80211_leave() and
nl80211_key_allowed() functions to not have a default case.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-31 14:19:19 +01:00
Pablo Neira Ayuso
127917c29a netfilter: nft_reject_bridge: restrict reject to prerouting and input
Restrict the reject expression to the prerouting and input bridge
hooks. If we allow this to be used from forward or any other later
bridge hook, if the frame is flooded to several ports, we'll end up
sending several reject packets, one per cloned packet.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31 12:50:09 +01:00
Pablo Neira Ayuso
523b929d54 netfilter: nft_reject_bridge: don't use IP stack to reject traffic
If the packet is received via the bridge stack, this cannot reject
packets from the IP stack.

This adds functions to build the reject packet and send it from the
bridge stack. Comments and assumptions on this patch:

1) Validate the IPv4 and IPv6 headers before further processing,
   given that the packet comes from the bridge stack, we cannot assume
   they are clean. Truncated packets are dropped, we follow similar
   approach in the existing iptables match/target extensions that need
   to inspect layer 4 headers that is not available. This also includes
   packets that are directed to multicast and broadcast ethernet
   addresses.

2) br_deliver() is exported to inject the reject packet via
   bridge localout -> postrouting. So the approach is similar to what
   we already do in the iptables reject target. The reject packet is
   sent to the bridge port from which we have received the original
   packet.

3) The reject packet is forged based on the original packet. The TTL
   is set based on sysctl_ip_default_ttl for IPv4 and per-net
   ipv6.devconf_all hoplimit for IPv6.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31 12:50:08 +01:00
Pablo Neira Ayuso
8bfcdf6671 netfilter: nf_reject_ipv6: split nf_send_reset6() in smaller functions
That can be reused by the reject bridge expression to build the reject
packet. The new functions are:

* nf_reject_ip6_tcphdr_get(): to sanitize and to obtain the TCP header.
* nf_reject_ip6hdr_put(): to build the IPv6 header.
* nf_reject_ip6_tcphdr_put(): to build the TCP header.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31 12:49:57 +01:00
Pablo Neira Ayuso
052b9498ee netfilter: nf_reject_ipv4: split nf_send_reset() in smaller functions
That can be reused by the reject bridge expression to build the reject
packet. The new functions are:

* nf_reject_ip_tcphdr_get(): to sanitize and to obtain the TCP header.
* nf_reject_iphdr_put(): to build the IPv4 header.
* nf_reject_ip_tcphdr_put(): to build the TCP header.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31 12:49:05 +01:00
Pablo Neira Ayuso
4d87716cd0 netfilter: nf_tables_bridge: update hook_mask to allow {pre,post}routing
Fixes: 36d2af5 ("netfilter: nf_tables: allow to filter from prerouting and postrouting")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31 12:44:56 +01:00
Stephen Hemminger
d070f9137a mac80211: fix spelling errors
Use codespell to find spelling errors.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-31 08:29:56 +01:00
Ben Hutchings
5188cd44c5 drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets
UFO is now disabled on all drivers that work with virtio net headers,
but userland may try to send UFO/IPv6 packets anyway.  Instead of
sending with ID=0, we should select identifiers on their behalf (as we
used to).

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 916e4cf46d ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 20:01:18 -04:00
Eric Dumazet
39bb5e6286 net: skb_fclone_busy() needs to detect orphaned skb
Some drivers are unable to perform TX completions in a bound time.
They instead call skb_orphan()

Problem is skb_fclone_busy() has to detect this case, otherwise
we block TCP retransmits and can freeze unlucky tcp sessions on
mostly idle hosts.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 1f3279ae0c ("tcp: avoid retransmits of TCP packets hanging in host queues")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 19:58:30 -04:00
Sowmini Varadhan
cd2145358e tcp: Correction to RFC number in comment
Challenge ACK is described in RFC 5961, fix typo.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 19:53:34 -04:00
Tom Herbert
14051f0452 gre: Use inner mac length when computing tunnel length
Currently, skb_inner_network_header is used but this does not account
for Ethernet header for ETH_P_TEB. Use skb_inner_mac_header which
handles TEB and also should work with IP encapsulation in which case
inner mac and inner network headers are the same.

Tested: Ran TCP_STREAM over GRE, worked as expected.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 19:51:56 -04:00
Michele Baldessari
afb6befce6 sctp: replace seq_printf with seq_puts
Fixes checkpatch warning:
"WARNING: Prefer seq_puts to seq_printf"

Signed-off-by: Michele Baldessari <michele@acksyn.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 19:40:16 -04:00
Michele Baldessari
891310d53d sctp: add transport state in /proc/net/sctp/remaddr
It is often quite helpful to be able to know the state of a transport
outside of the application itself (for troubleshooting purposes or for
monitoring purposes). Add it under /proc/net/sctp/remaddr.

Signed-off-by: Michele Baldessari <michele@acksyn.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 19:40:16 -04:00
Nicolas Cavallari
fa19c2b050 ipv4: Do not cache routing failures due to disabled forwarding.
If we cache them, the kernel will reuse them, independently of
whether forwarding is enabled or not.  Which means that if forwarding is
disabled on the input interface where the first routing request comes
from, then that unreachable result will be cached and reused for
other interfaces, even if forwarding is enabled on them.  The opposite
is also true.

This can be verified with two interfaces A and B and an output interface
C, where B has forwarding enabled, but not A and trying
ip route get $dst iif A from $src && ip route get $dst iif B from $src

Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 19:20:40 -04:00
stephen hemminger
b2ad5e5fcc tipc: spelling errors
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 16:56:41 -04:00
Florian Westphal
646697b9e3 syncookies: only increment SYNCOOKIESFAILED on validation error
Only count packets that failed cookie-authentication.
We can get SYNCOOKIESFAILED > 0 while we never even sent a single cookie.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 16:53:39 -04:00
stephen hemminger
f4e715c325 ipv4: minor spelling fixes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 16:14:43 -04:00
Alexey Andriyanov
acf722f734 ip6_tunnel: allow to change mode for the ip6tnl0
The fallback device is in ipv6 mode by default.
The mode can not be changed in runtime, so there
is no way to decapsulate ip4in6 packets coming from
various sources without creating the specific tunnel
ifaces for each peer.

This allows to update the fallback tunnel device, but only
the mode could be changed. Usual command should work for the
fallback device: `ip -6 tun change ip6tnl0 mode any`

The fallback device can not be hidden from the packet receiver
as a regular tunnel, but there is no need for synchronization
as long as we do single assignment.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexey Andriyanov <alan@al-an.info>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 16:09:20 -04:00
Fabian Frederick
43728fa5c5 ipv6: remove assignment in if condition
Do assignment before if condition and test !skb like in rawv6_recvmsg()

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 15:51:43 -04:00
Fabian Frederick
fc08c25819 ipv6: remove inline on static in c file
remove __inline__ / inline and let compiler decide what to do
with static functions

Inspired-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 15:51:43 -04:00
Fabian Frederick
40dc2ca3cb ipv6: spelling s/incomming/incoming
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 15:51:43 -04:00
Fabian Frederick
e7b6658ea8 ipx: remove all unnecessary castings on ntohl
Apply commit e0f36310f7
("ipx: remove unnecessary casting on ntohl")
to all seq_printf/08lX

Inspired-by: "David S. Miller" <davem@davemloft.net>
Inspired-by: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 15:51:43 -04:00
Guenter Roeck
3d762a0f0a net: dsa: Add support for reading switch registers with ethtool
Add support for reading switch registers with 'ethtool -d'.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 14:54:11 -04:00
Guenter Roeck
6793abb4e8 net: dsa: Add support for switch EEPROM access
On some chips it is possible to access the switch eeprom.
Add infrastructure support for it.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 14:54:11 -04:00
Guenter Roeck
51579c3f1a net: dsa: Add support for reporting switch chip temperatures
Some switches provide chip temperature data.
Add support for reporting it through the hwmon subsystem.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 14:54:11 -04:00
Guenter Roeck
734cbb5b6b net: dsa: Don't set skb->protocol on outgoing tagged packets
Setting skb->protocol to a private protocol type may result in warning
messages such as
	e1000e 0000:00:19.0 em1: checksum_partial proto=dada!

This happens if the L3 protocol is IP or IPv6 and skb->ip_summed is set
to CHECKSUM_PARTIAL. Looking through the code, it appears that changing
skb->protocol for transmitted packets is not necessary and may actually
be harmful. For example, it prevents purposely unmodified (from a DSA
perspective) network drivers from properly setting up their transmit
checksum offload pointers since they inspect skb->protocol to set up the
IPv4 header or IPv6 header pointers. So don't unnecessarily change the
protocol field.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 14:54:10 -04:00
Marcel Holtmann
a4d5504d5c Bluetooth: Clear LE white list when resetting controller
The internal representation of the LE white list needs to be cleared
when receiving a successful HCI_Reset command. A reset of the controller
is expected to start with an empty LE white list.

When the LE white list is not cleared on controller reset, the passive
background scanning might skip programming the remote devices. Only
changes to the LE white list are programmed when passive background
is started.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Cc: stable@vger.kernel.org # 3.17.x
2014-10-30 17:41:08 +01:00
stephen hemminger
01cfa0a4ed netfilter: fix spelling errors
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-30 17:35:30 +01:00
Dan Carpenter
daac197ca9 Bluetooth: 6lowpan: use after free in disconnect_devices()
This was accidentally changed from list_for_each_entry_safe() to
list_for_each_entry() so now it has a use after free bug.  I've changed
it back.

Fixes: 9030582963 ('Bluetooth: 6lowpan: Converting rwlocks to use RCU')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-30 17:23:25 +01:00
Pablo Neira Ayuso
c3ac759ea6 Merge branch 'ipvs-next'
Simon Horman says:

====================
The single patch in this series fixes some minor fallout from adding
support IPv6 real servers in IPv4 virtual-services and vice versa.

It should not have any run-time affect other than perhaps saving a few cycles.
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-30 16:52:30 +01:00
Marcelo Leitner
8ac2bde2a4 netfilter: log: protect nf_log_register against double registering
Currently, despite the comment right before the function,
nf_log_register allows registering two loggers on with the same type and
end up overwriting the previous register.

Not a real issue today as current tree doesn't have two loggers for the
same type but it's better to get this protected.

Also make sure that all of its callers do error checking.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-30 16:41:48 +01:00
Marcelo Leitner
0c26ed1c07 netfilter: nf_log: Introduce nft_log_dereference() macro
Wrap up a common call pattern in an easier to handle call.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-30 16:39:40 +01:00
Johannes Berg
46238845bd mac80211: properly flush delayed scan work on interface removal
When an interface is deleted, an ongoing hardware scan is canceled and
the driver must abort the scan, at the very least reporting completion
while the interface is removed.

However, if it scheduled the work that might only run after everything
is said and done, which leads to cfg80211 warning that the scan isn't
reported as finished yet; this is no fault of the driver, it already
did, but mac80211 hasn't processed it.

To fix this situation, flush the delayed work when the interface being
removed is the one that was executing the scan.

Cc: stable@vger.kernel.org
Reported-by: Sujith Manoharan <sujith@msujith.org>
Tested-by: Sujith Manoharan <sujith@msujith.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-30 15:48:32 +01:00
Mike Christie
89baaa570a libceph: use memalloc flags for net IO
This patch has ceph's lib code use the memalloc flags.

If the VM layer needs to write data out to free up memory to handle new
allocation requests, the block layer must be able to make forward progress.
To handle that requirement we use structs like mempools to reserve memory for
objects like bios and requests.

The problem is when we send/receive block layer requests over the network
layer, net skb allocations can fail and the system can lock up.
To solve this, the memalloc related flags were added. NBD, iSCSI
and NFS uses these flags to tell the network/vm layer that it should
use memory reserves to fullfill allcation requests for structs like
skbs.

I am running ceph in a bunch of VMs in my laptop, so this patch was
not tested very harshly.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2014-10-30 13:11:50 +03:00
Alexander Aring
38130c31ef mac802154: add basic support for monitor
This patch adds basic support for monitor mode. Also change the open
call that we set the transceiver mac setting on an interface up. Futher
patches will add a better handling while interface up an interface.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:46 +01:00
Alexander Aring
05f7de6792 mac802154: rx: add error handling after skb_clone
This patch adds error handling after skb_clone and deliver only if
skb_clone was successful.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:46 +01:00
Alexander Aring
20b48120c1 mac802154: rx: monitor receive cleanup
This patch replace the !netif_running(sdata->dev) instead we doing a
!ieee802154_sdata_running(sdata). Also move this in two separate if
branches to compare with mac80211 code.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:46 +01:00
Alexander Aring
18460672e0 mac802154: rx: add rx stats incrementation
This patch adds rx stats incrementation when the monitor interface
recevied a frame.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:46 +01:00
Alexander Aring
21fdf0a1c1 mac802154: rx: use netif_receive_skb
This patch removes netif_rx_ni call. Instead we call netif_receive_skb,
we can do that since commit c5c47e67bc
("mac802154: rx: use tasklet instead workqueue").

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:46 +01:00
Alexander Aring
1054ed81c4 mac802154: rx: remove override pkt_type set to PACKET_HOST
This patch removes pkt_type set to PACKET_HOST while monitor receiving.
This should be PACKET_OTHERHOST on monitor mode which already set
before.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:45 +01:00
Alexander Aring
ec718f3db9 mac802154: rx: add software checksum filtering check
This patch adds a new hardware flag which indicate that the transceiver
doesn't support check for bad checksum via hardware. Also add a handling of
this while receive.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:45 +01:00
Alexander Aring
b7889497d3 mac802154: rx: simplify crc receive handling
This patch change the actual crc handling while receive. Currently the
IEEE802154_HW_RX_OMIT_CKSUM flag is used to filter a frame with a bad crc.
This patch changes the behaviour of IEEE802154_HW_RX_OMIT_CKSUM to add a
crc while receiving for the monitor interface. After monitor receiving
we remove the crc for frame parsing. This affect the driver layer
because all drivers sets IEEE802154_HW_RX_OMIT_CKSUM and deliver without
checksum.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:45 +01:00
Alexander Aring
08c511a733 mac802154: rx: remove unnecessary parameter
This patch removes a not used parameter in ieee802154_deliver_skb.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:45 +01:00
Alexander Aring
90386a7e3b mac802154: separate omit tx/rx flags
This patch splits the IEEE802154_HW_OMIT_CKSUM hardware flag into
IEEE802154_HW_TX_OMIT_CKSUM and IEEE802154_HW_RX_OMIT_CKSUM. This is
useful to deliver the received crc from the driver layer to the monitor
interface. At the moment we can't do that without change the xmit
handling.

The received checksum should be visible in monitor mode only.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:45 +01:00
Alexander Aring
94b792220c mac802154: add support for promiscuous mode
This patch adds a new driver operation to bring the transceiver into
promiscuous mode.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:45 +01:00
Alexander Aring
55a2d06517 mac802154: main: remove unnecessary include
This patch removes an unnecessary include of driver-ops header file.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:44 +01:00
Nicolas Dichtel
75fbfd3323 neigh: optimize neigh_parms_release()
In neigh_parms_release() we loop over all entries to find the entry given in
argument and being able to remove it from the list. By using a double linked
list, we can avoid this loop.

Here are some numbers with 30 000 dummy interfaces configured:

Before the patch:
$ time rmmod dummy
real	2m0.118s
user	0m0.000s
sys	1m50.048s

After the patch:
$ time rmmod dummy
real	1m9.970s
user	0m0.000s
sys	0m47.976s

Suggested-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29 16:11:50 -04:00
Eric Dumazet
bc9ad166e3 net: introduce napi_schedule_irqoff()
napi_schedule() can be called from any context and has to mask hard
irqs.

Add a variant that can only be called from hard interrupts handlers
or when irqs are already masked.

Many NIC drivers can use it from their hard IRQ handler instead of
generic variant.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29 16:07:27 -04:00
Nikolay Aleksandrov
d70127e8a9 inet: frags: remove the WARN_ON from inet_evict_bucket
The WARN_ON in inet_evict_bucket can be triggered by a valid case:
inet_frag_kill and inet_evict_bucket can be running in parallel on the
same queue which means that there has been at least one more ref added
by a previous inet_frag_find call, but inet_frag_kill can delete the
timer before inet_evict_bucket which will cause the WARN_ON() there to
trigger since we'll have refcnt!=1. Now, this case is valid because the
queue is being "killed" for some reason (removed from the chain list and
its timer deleted) so it will get destroyed in the end by one of the
inet_frag_put() calls which reaches 0 i.e. refcnt is still valid.

CC: Florian Westphal <fw@strlen.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McLean <chutzpah@gentoo.org>

Fixes: b13d3cbfb8 ("inet: frag: move eviction of queues to work queue")
Reported-by: Patrick McLean <chutzpah@gentoo.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29 15:21:30 -04:00
Nikolay Aleksandrov
65ba1f1ec0 inet: frags: fix a race between inet_evict_bucket and inet_frag_kill
When the evictor is running it adds some chosen frags to a local list to
be evicted once the chain lock has been released but at the same time
the *frag_queue can be running for some of the same queues and it
may call inet_frag_kill which will wait on the chain lock and
will then delete the queue from the wrong list since it was added in the
eviction one. The fix is simple - check if the queue has the evict flag
set under the chain lock before deleting it, this is safe because the
evict flag is set only under that lock and having the flag set also means
that the queue has been detached from the chain list, so no need to delete
it again.
An important note to make is that we're safe w.r.t refcnt because
inet_frag_kill and inet_evict_bucket will sync on the del_timer operation
where only one of the two can succeed (or if the timer is executing -
none of them), the cases are:
1. inet_frag_kill succeeds in del_timer
 - then the timer ref is removed, but inet_evict_bucket will not add
   this queue to its expire list but will restart eviction in that chain
2. inet_evict_bucket succeeds in del_timer
 - then the timer ref is kept until the evictor "expires" the queue, but
   inet_frag_kill will remove the initial ref and will set
   INET_FRAG_COMPLETE which will make the frag_expire fn just to remove
   its ref.
In the end all of the queue users will do an inet_frag_put and the one
that reaches 0 will free it. The refcount balance should be okay.

CC: Florian Westphal <fw@strlen.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McLean <chutzpah@gentoo.org>

Fixes: b13d3cbfb8 ("inet: frag: move eviction of queues to work queue")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Patrick McLean <chutzpah@gentoo.org>
Tested-by: Patrick McLean <chutzpah@gentoo.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29 15:21:30 -04:00
Erik Kline
7fd2561e4e net: ipv6: Add a sysctl to make optimistic addresses useful candidates
Add a sysctl that causes an interface's optimistic addresses
to be considered equivalent to other non-deprecated addresses
for source address selection purposes.  Preferred addresses
will still take precedence over optimistic addresses, subject
to other ranking in the source address selection algorithm.

This is useful where different interfaces are connected to
different networks from different ISPs (e.g., a cell network
and a home wifi network).

The current behaviour complies with RFC 3484/6724, and it
makes sense if the host has only one interface, or has
multiple interfaces on the same network (same or cooperating
administrative domain(s), but not in the multiple distinct
networks case.

For example, if a mobile device has an IPv6 address on an LTE
network and then connects to IPv6-enabled wifi, while the wifi
IPv6 address is undergoing DAD, IPv6 connections will try use
the wifi default route with the LTE IPv6 address, and will get
stuck until they time out.

Also, because optimistic nodes can receive frames, issue
an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC
flag appropriately set).  A second RTM_NEWADDR is sent if DAD
completes (the address flags have changed), otherwise an
RTM_DELADDR is sent.

Also: add an entry in ip-sysctl.txt for optimistic_dad.

Signed-off-by: Erik Kline <ek@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29 15:11:36 -04:00
Eric Dumazet
dca145ffaa tcp: allow for bigger reordering level
While testing upcoming Yaogong patch (converting out of order queue
into an RB tree), I hit the max reordering level of linux TCP stack.

Reordering level was limited to 127 for no good reason, and some
network setups [1] can easily reach this limit and get limited
throughput.

Allow a new max limit of 300, and add a sysctl to allow admins to even
allow bigger (or lower) values if needed.

[1] Aggregation of links, per packet load balancing, fabrics not doing
 deep packet inspections, alternative TCP congestion modules...

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29 15:05:15 -04:00
Toshiaki Makita
432c856fcf net: skb_segment() should preserve backpressure
This patch generalizes commit d6a4a10411 ("tcp: GSO should be TSQ
friendly") to protocols using skb_set_owner_w()

TCP uses its own destructor (tcp_wfree) and needs a more complex scheme
as explained in commit 6ff50cd555 ("tcp: gso: do not generate out of
order packets")

This allows UDP sockets using UFO to get proper backpressure,
thus avoiding qdisc drops and excessive cpu usage.

Here are performance test results (macvlan on vlan):

- Before
# netperf -t UDP_STREAM ...
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   60.00      144096 1224195    1258.56
212992           60.00          51              0.45

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.23      0.00     25.26      0.08      0.00     74.43

- After
# netperf -t UDP_STREAM ...
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   60.00      109593      0     957.20
212992           60.00      109593            957.20

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.18      0.00      8.38      0.02      0.00     91.43

[edumazet] Rewrote patch and changelog.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29 14:47:19 -04:00
Lubomir Rintel
b2ed64a974 ipv6: notify userspace when we added or changed an ipv6 token
NetworkManager might want to know that it changed when the router advertisement
arrives.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29 14:35:32 -04:00
WANG Cong
d56109020d sch_pie: schedule the timer after all init succeed
Cc: Vijay Subramanian <vijaynsu@cisco.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
2014-10-29 14:28:01 -04:00
Johannes Berg
fc1f48ffd5 cfg80211: fix integer signedness in chandef_primary_freqs()
The helper function can't ever create negative values, so use
u32 pointers as the function arguments as the caller does.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-29 18:42:51 +01:00
Fabian Frederick
dcc6c2f516 cfg80211: fix set but not used warning in nl80211_channel_switch()
radar_detect_width is unused since commit 97dc94f1d9
("cfg80211: remove channel_switch combination check")

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-29 18:42:51 +01:00
Luciano Coelho
ff1e417c7c mac80211: schedule the actual switch of the station before CSA count 0
Due to the time it takes to process the beacon that started the CSA
process, we may be late for the switch if we try to reach exactly
beacon 0.  To avoid that, use count - 1 when calculating the switch time.

Cc: stable@vger.kernel.org
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-29 16:37:54 +01:00
Luciano Coelho
84469a45a1 mac80211: use secondary channel offset IE also beacons during CSA
If we are switching from an HT40+ to an HT40- channel (or vice-versa),
we need the secondary channel offset IE to specify what is the
post-CSA offset to be used.  This applies both to beacons and to probe
responses.

In ieee80211_parse_ch_switch_ie() we were ignoring this IE from
beacons and using the *current* HT information IE instead.  This was
causing us to use the same offset as before the switch.

Fix that by using the secondary channel offset IE also for beacons and
don't ever use the pre-switch offset.  Additionally, remove the
"beacon" argument from ieee80211_parse_ch_switch_ie(), since it's not
needed anymore.

Cc: stable@vger.kernel.org
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-29 16:37:45 +01:00
Dan Carpenter
eb63192bb8 SUNRPC: off by one in BUG_ON()
The m->pool_to[] array has "maxpools" number of elements.  It's
allocated in svc_pool_map_alloc_arrays() which we called earlier in the
function.  This test should be >= instead of >.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-10-29 11:37:42 -04:00
Felix Fietkau
10b6848786 mac80211: flush keys for AP mode on ieee80211_do_stop
Userspace can add keys to an AP mode interface before start_ap has been
called. If there have been no calls to start_ap/stop_ap in the mean
time, the keys will still be around when the interface is brought down.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[adjust comments, fix AP_VLAN case]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-29 16:33:37 +01:00
Jukka Rissanen
9cfd5a23a4 Bluetooth: Wrong style spin lock used
Use spin_lock_bh() as the code is called from softirq in networking subsystem.
This is needed to prevent deadlocks when 6lowpan link is in use.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 16:20:40 +01:00
Alexander Aring
e23e9ec16b ieee802154: introduce sysfs file
This patch moves the sysfs handling in a own file. This is like wireless
sysfs file handling.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:09 +01:00
Alexander Aring
7445764155 mac802154: cleanup open count handling
This patch cleanups the open_count variable increment in open and close
calls of netdev.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:09 +01:00
Alexander Aring
c7420c367d mac802154: move mac_params functions into mac_cmd
These functions can be static in mac_cmd file.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:08 +01:00
Alexander Aring
12439a5356 mac802154: remove channel attributes from sdata
These channel attributes was part of "channel context switch while xmit"
which was removed by commit dc67c6b30f
("mac802154: tx: remove xmit channel context switch"). This patch
removes these unnecessary variables and use the current_page and
current_channel by wpan_phy struct now.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:08 +01:00
Alexander Aring
33d4189f51 mac802154: iface: remove assign to zero
These variables should already be zero, so we remove the extra assign to
zero.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:08 +01:00
Alexander Aring
538181a879 mac802154: add synchronization handling
This patch adds synchronization handling in start and stop driver ops
calls. This patch is mostly grab from mac80211 which was introduced by
commit ea77f12f2c ("mac80211: remove
tasklet enable/disable"). This is to be sure that we don't run into same
issues.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:08 +01:00
Alexander Aring
e363eca386 mac802154: move local started handling
This patch removes the current handling of started boolean. This is
actually dead code, because mac802154_netdev_register can't never be
called before ieee802154_register_hw. This means that local->started is
always be true when mac802154_netdev_register is called. Instead we
using this now like mac80211 to indicate that an instance of sdata is
running.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:08 +01:00
Alexander Aring
5d65cae4bf mac802154: rename running to started
This variable should be handled like ieee80211_local struct of mac80211.
We rename this variable to started now to have the same name convention.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:08 +01:00
Alexander Aring
0ea3da64fa mac802154: rework sdata state change to running
This patch reworks the handling for setting the state like mac80211. We
use bit's instead a bool variable. The mutex is not needed because it use
test and set bits which are atomic operations.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:07 +01:00
Alexander Aring
a543c5989d mac802154: remove driver ops in wpan-phy
This patch removes the driver ops callbacks inside of wpan_phy struct.
It was used to check if a phy supports this driver ops call. We do this
now via hardware flags.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:07 +01:00
Alexander Aring
59cb300f2b mac802154: use driver-ops function wrappers
This patch replaces all directly called driver ops by previous
introduced driver-ops function wrappers.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:07 +01:00
Alexander Aring
b6eea9ca35 mac802154: introduce driver-ops header
This patch introduce a driver-ops header file with function wrappers to
call the driver ops. These wrappers checking on right context
information and warn if optional driver ops are called when these aren't
implemented. This behaviour is like mac80211 driver-ops header file,
just without function tracing calls.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:07 +01:00
Alexander Aring
1630186100 mac802154: declare struct ieee802154_ops as const
The ieee802154_ops structure should be never changed during runtime.
This patch declare this structure as const to avoid a runtime change.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:07 +01:00
Alexander Aring
19ec690a43 mac802154: main: move open and close into iface
These functions can be static inside the iface file, because it's not
used anywhere else. This patch moves these functions into iface file.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:06 +01:00
Alexander Aring
b9ff77e50c mac802154: monitor: merge into iface implementation
This patch removes the monitor implementation file and put all monitor
stuff into iface file. It's now small enough to put all necessary
handling into iface.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:06 +01:00
Johan Hedberg
0b1db38ca2 Bluetooth: Fix check for direct advertising
These days we allow simultaneous LE scanning and advertising. Checking
for whether advertising is enabled or not is therefore not a reliable
way to determine whether directed advertising was used to trigger the
connection creation. The appropriate place to check (instead of the hdev
context) is the connection role that's stored in the hci_conn. This
patch fixes such a check in le_conn_timeout() which could otherwise lead
to incorrect HCI commands being sent.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 3.16.x
2014-10-28 22:48:56 +01:00
Johan Hedberg
980ffc0a2c Bluetooth: Fix LE connection timeout deadlock
The le_conn_timeout() may call hci_le_conn_failed() which in turn may
call hci_conn_del(). Trying to use the _sync variant for cancelling the
conn timeout from hci_conn_del() could therefore result in a deadlock.
This patch converts hci_conn_del() to use the non-sync variant so the
deadlock is not possible.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 3.16.x
2014-10-28 22:48:56 +01:00
David S. Miller
2c6c49ded7 openvswitch: Export lockdep_ovsl_is_held to modules.
ERROR: "lockdep_ovsl_is_held" [net/openvswitch/vport-gre.ko] undefined!

Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:27:23 -04:00
Simon Horman
941d8ebcf7 datapath: Rename last_action() as nla_is_last() and move to netlink.h
The original motivation for this change was to allow the helper to be used
in files other than actions.c as part of work on an odp select group
action.

It was as pointed out by Thomas Graf that this helper would be best off
living in netlink.h. Furthermore, I think that the generic nature of this
helper means it is best off in netlink.h regardless of if it is used more
than one .c file or not. Thus, I would like it considered independent of
the work on an odp select group action.

Cc: Thomas Graf <tgraf@suug.ch>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:07:29 -04:00
David S. Miller
25946f20b7 Merge tag 'master-2014-10-27' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-10-28

Please pull this batch of fixes intended for the 3.18 stream!

For the mac80211 bits, Johannes says:

"Here are a few fixes for the wireless stack: one fixes the
RTS rate, one for a debugfs file, one to return the correct
channel to userspace, a sanity check for a userspace value
and the remaining two are just documentation fixes."

For the iwlwifi bits, Emmanuel says:

"I revert here a patch that caused interoperability issues.
dvm gets a fix for a bug that was reported by many users.
Two minor fixes for BT Coex and platform power fix that helps
reducing latency when the PCIe link goes to low power states."

In addition...

Felix Fietkau adds a couple of ath code fixes related to regulatory
rule enforcement.

Hauke Mehrtens fixes a build break with bcma when CONFIG_OF_ADDRESS
is not set.

Karsten Wiese provides a trio of minor fixes for rtl8192cu.

Kees Cook prevents a potential information leak in rtlwifi.

Larry Finger also brings a trio of minor fixes for rtlwifi.

Rafał Miłecki adds a device ID to the bcma bus driver.

Rickard Strandqvist offers some strn* -> strl* changes in brcmfmac
to eliminate non-terminated string issues.

Sujith Manoharan avoids some ath9k stalls by enabling HW queue control
only for MCC.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 15:30:15 -04:00
Andrew Lunn
ae439286a0 net: dsa: Error out on tagging protocol mismatches
If there is a mismatch between enabled tagging protocols and the
protocol the switch supports, error out, rather than continue with a
situation which is unlikely to work.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
cc: alexander.h.duyck@intel.com
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 15:27:54 -04:00
Thomas Graf
62b9c8d037 ovs: Turn vports with dependencies into separate modules
The internal and netdev vport remain part of openvswitch.ko. Encap
vports including vxlan, gre, and geneve can be built as separate
modules and are loaded on demand. Modules can be unloaded after use.
Datapath ports keep a reference to the vport module during their
lifetime.

Allows to remove the error prone maintenance of the global list
vport_ops_list.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 14:43:18 -04:00
Stephen Hemminger
49c922bb1e Bluetooth: spelling fixes
Fix spelling errors in comments.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 17:23:58 +01:00
Jukka Rissanen
df092306d6 Bluetooth: 6lowpan: Fix lockdep splats
When a device ndo_start_xmit() calls again dev_queue_xmit(),
lockdep can complain because dev_queue_xmit() is re-entered and the
spinlocks protecting tx queues share a common lockdep class.

Same issue was fixed for ieee802154 in commit "20e7c4e80dcd"

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 17:04:39 +01:00
Jukka Rissanen
9030582963 Bluetooth: 6lowpan: Converting rwlocks to use RCU
The rwlocks are converted to use RCU. This helps performance as the
irq locks are not needed any more.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 17:04:38 +01:00
Johan Hedberg
da213f8e0c Bluetooth: Revert SMP self-test patches
This reverts commits c6992e9ef2 and
4cd3362da8.

The reason for the revert is that we cannot have more than one module
initialization function and the SMP one breaks the build with modular
kernels. As the proper fix for this is right now looking non-trivial
it's better to simply revert the problematic patches in order to keep
the upstream tree compilable.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 15:32:49 +01:00
Alex Gartrell
d770108911 ipvs: remove unnecessary assignment in __ip_vs_get_out_rt
It is a precondition of the function that daddr be equal to dest->addr.ip
if dest is non-NULL, so this additional assignment is just confusing for
stupid engineers like me.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-10-28 09:50:06 +09:00
Alex Gartrell
3d53666b40 ipvs: Avoid null-pointer deref in debug code
Use daddr instead of reaching into dest.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-10-28 09:48:31 +09:00
Alexei Starovoitov
f89b7755f5 bpf: split eBPF out of NET
introduce two configs:
- hidden CONFIG_BPF to select eBPF interpreter that classic socket filters
  depend on
- visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use

that solves several problems:
- tracing and others that wish to use eBPF don't need to depend on NET.
  They can use BPF_SYSCALL to allow loading from userspace or select BPF
  to use it directly from kernel in NET-less configs.
- in 3.18 programs cannot be attached to events yet, so don't force it on
- when the rest of eBPF infra is there in 3.19+, it's still useful to
  switch it off to minimize kernel size

bloat-o-meter on x64 shows:
add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601)

tested with many different config combinations. Hopefully didn't miss anything.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27 19:09:59 -04:00
Kyeyoon Park
958501163d bridge: Add support for IEEE 802.11 Proxy ARP
This feature is defined in IEEE Std 802.11-2012, 10.23.13. It allows
the AP devices to keep track of the hardware-address-to-IP-address
mapping of the mobile devices within the WLAN network.

The AP will learn this mapping via observing DHCP, ARP, and NS/NA
frames. When a request for such information is made (i.e. ARP request,
Neighbor Solicitation), the AP will respond on behalf of the
associated mobile device. In the process of doing so, the AP will drop
the multicast request frame that was intended to go out to the wireless
medium.

It was recommended at the LKS workshop to do this implementation in
the bridge layer. vxlan.c is already doing something very similar.
The DHCP snooping code will be added to the userspace application
(hostapd) per the recommendation.

This RFC commit is only for IPv4. A similar approach in the bridge
layer will be taken for IPv6 as well.

Signed-off-by: Kyeyoon Park <kyeyoonp@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27 19:02:04 -04:00
David S. Miller
5d26b1f50a Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Allow to recycle a TCP port in conntrack when the change role from
   server to client, from Marcelo Leitner.

2) Fix possible off by one access in ip_set_nfnl_get_byindex(), patch
   from Dan Carpenter.

3) alloc_percpu returns NULL on error, no need for IS_ERR() in nf_tables
   chain statistic updates. From Sabrina Dubroca.

4) Don't compile ip options in bridge netfilter, this mangles the packet
   and bridge should not alter layer >= 3 headers when forwarding packets.
   Patch from Herbert Xu and tested by Florian Westphal.

5) Account the final NLMSG_DONE message when calculating the size of the
   nflog netlink batches. Patch from Florian Westphal.

6) Fix a possible netlink attribute length overflow with large packets.
   Again from Florian Westphal.

7) Release the skbuff if nfnetlink_log fails to put the final
   NLMSG_DONE message. This fixes a leak on error. This shouldn't ever
   happen though, otherwise this means we miscalculate the netlink batch
   size, so spot a warning if this ever happens so we can track down the
   problem. This patch from Houcheng Lin.

8) Look at the right list when recycling targets in the nft_compat,
   patch from Arturo Borrero.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27 18:47:40 -04:00
Arturo Borrero
e9105f1bea netfilter: nf_tables: add new expression nft_redir
This new expression provides NAT in the redirect flavour, which is to
redirect packets to local machine.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-27 22:49:39 +01:00
Arturo Borrero
9de920eddb netfilter: refactor NAT redirect IPv6 code to use it from nf_tables
This patch refactors the IPv6 code so it can be usable both from xt and
nf_tables.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-27 22:48:10 +01:00
Arturo Borrero
8b13eddfdf netfilter: refactor NAT redirect IPv4 to use it from nf_tables
This patch refactors the IPv4 code so it can be usable both from xt and
nf_tables.

A similar patch follows-up to handle IPv6.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-27 22:47:06 +01:00
Arturo Borrero
7965ee9371 netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops()
The code looks for an already loaded target, and the correct list to search
is nft_target_list, not nft_match_list.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-27 22:17:46 +01:00
Fabian Frederick
b8901ac319 ipx: remove __inline__ in c file on static
Let compiler decide what to do with static void __ipxitf_put()

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27 16:25:31 -04:00
Fabian Frederick
e0f36310f7 ipx: remove unnecessary casting on ntohl
use %08X instead of %08lX and remove casting.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27 16:03:53 -04:00
Fabian Frederick
5e96d788d9 ipx: move extern sysctl_ipx_pprop_broadcasting to header file
include ipx.h from sysctl_net_ipx.c

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27 16:03:53 -04:00
Fabian Frederick
ce256981e5 ipv6: include linux/uaccess.h instead of asm/uaccess.h
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27 16:03:52 -04:00
Fabian Frederick
9451a304ce ipv6: replace min/casting by min_t
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27 16:03:52 -04:00
Fabian Frederick
6b436d3381 ipv4: remove set but unused variable sha
unsigned char *sha (source) was already in original git version
 but was never used.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27 16:03:52 -04:00
John W. Linville
99c814066e Here are a few fixes for the wireless stack: one fixes the
RTS rate, one for a debugfs file, one to return the correct
 channel to userspace, a sanity check for a userspace value
 and the remaining two are just documentation fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUSUotAAoJEDBSmw7B7bqr2TwP/2EJiYMOXhLTM5F/sEaGP5aX
 +hN+/Za3hAuLu3GkYXnIEw8uJL0ooDpUkQLoyV7AUoqVKhgqCuMQPWbklU0Ns2Og
 qubAl9BRY1DPgMtdGa3mMMJ3GYkIC8Hbh6kTOPqYASVZWover5NRKFlp2jp1uhbf
 ypv0IfIJVO323s90TWv1ZQsTtnGxMtL9DaLwqBNKN8nSGKxe62cUZsQN+H5KGm4N
 /n7eN62XPkidFUsTmdAXHfcgEpGv82rtSpxWmSrwxDbQEj12xkP66cRTuomZJ5v1
 981OIzcxtV0ngLjfnoSGev6bvgO2TDEbvQScIsZiqfnaJuBfzPEAaghnWozox3op
 dfolKkD3LecLGcxVVGJlKddxm3K4+2q7tuwkfDcxKNx2KFqtOqM6gY8z1uXYX8MW
 Jv7669nwpKgWM0e3hsxz6WJauEsdWRVzarmimK/Ymitu0RgNmXTVbdvvFVSTenuZ
 0HLqfr7Uk0gw5gQgWfj4F0qjNxzmjhnw/pz+c1DRtYs6w6SGToCqcm5yRU6f8pLt
 SHc3LJ67xK5RnOq8+KJ8o92MfE29HH2CTzLzgghNrLnwqcYBTApuCr0OtpQb04Zf
 AgQRMq61IGXwCDiVFE1ElpRgrW4/aekUZh/JB8pGHlhrAvl+HwNYVek66MBDeMyl
 akmLeHhrCkuWstDHHz+o
 =m/XT
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-john-2014-10-23' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg <johannes@sipsolutions.net> says:

"Here are a few fixes for the wireless stack: one fixes the
RTS rate, one for a debugfs file, one to return the correct
channel to userspace, a sanity check for a userspace value
and the remaining two are just documentation fixes."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-10-27 13:38:15 -04:00
Alexander Aring
be9d215fa9 mac802154: rx: change naming convention
This patch changes the naming convention of mac802154 rx file. It should
be more named like mac80211 stack. Furthermore we introduce a new frame
parsing implementation which is much similar the mac80211
implementation.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 18:07:50 +01:00
Alexander Aring
e176b681b0 mac802154: rx: move rcu locking
Instead of twice lock and unlock mechanism this patch hold these locks
only once at one position.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 18:07:45 +01:00
Alexander Aring
9cf215d073 mac802154: rx: move skb_reset_mac_header
This patch moves the skb_reset_mac_header call before frame parsing
while wpan rx and before monitor deliver functionality.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 18:07:45 +01:00
Alexander Aring
c9ca640140 mac802154: rx: add monitor pkt_type information
This patch adds a PACKET_OTHERHOST setting when a monitor interface
receives a skb. All receiving skb's to the monitor interface should
be PACKET_OTHERHOST.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 18:07:44 +01:00
Alexander Aring
75a46f0ee7 mac802154: rx: add CHECKSUM_UNNECESSARY
This patch adds CHECKSUM_UNNECESSARY to skb->ip_summed before delivery.
There exist no transceiver with IP checksum functionality.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 18:07:44 +01:00
Alexander Aring
702dcf994a mac802154: rx: move skb->protocol setting
This patch moves the skb->protocol setting to the position when it's
needed. It's only needed when frame parsing was successful.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 18:07:43 +01:00
Alexander Aring
469100d6c2 mac802154: rx: rename remove mac802154_subif_rx
This patch removes the mac802154_subif_rx function and do the necessary
calls inside of ieee802154_rx function. The ieee802154_rx is small
enough to move the functionality inside this function.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 18:07:43 +01:00
Alexander Aring
4ca18be54f mac802154: tx: remove monitor receive while xmit
This removes the call of monitor receive funktion when any interface
type call xmit. There exist no such use case that a monitor interface
should receive the actual sending frame. One use case could be that a
wpan interface and monitor interface could be running at the same time
on one phy. Then the monitor interface receives the wpan frames also.
Furthermore we adding support for promiscous mode setting. With
promiscous mode setting we can't run a wpan and monitor interface at the
same time.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 18:07:42 +01:00
Alexander Aring
2a9820c9e2 mac802154: rx: move receive handling into rx.c
This patch removes all relevant receiving functions inclusive frame
parsing into rx file. Like mac80211 we should implement the complete
receive handling and parsing in this file.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 18:07:42 +01:00
Alexander Aring
c730c90316 mac802154: rx: document ieee802154_rx() context requirement
This patch is similar like d20ef63d32
("mac80211: document ieee80211_rx() context requirement"). The
netif_receive_skb call requires with softirqs disabled. This patch
adds a warning if softirqs are pending while calling ieee802154_rx.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 18:07:42 +01:00
Alexander Aring
c5c47e67bc mac802154: rx: use tasklet instead workqueue
Tasklets have much less overhead than workqueues. This patch also
removes the heap allocation for the worker on receiving path.
Like mac80211 we should prefer use a tasklet here instead a workqueue to
getting fast out of interrupt context when ieee802154_rx_irqsafe is
called by driver. Like wireless inside the tasklet context we should
call netif_receive_skb instead netif_rx_ni anymore.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 18:07:40 +01:00
Alexander Aring
061ef8f915 mac802154: tx: use put_unaligned_le16 for copy crc
This patch replaces the memcpy with a put_unaligned_le16. The placement
of crc inside of PSDU can also be unaligned. With memcpy this can fail
on some architectures.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 18:07:36 +01:00
Martin Townsend
01141234f2 ieee802154: 6lowpan: rename process_data and lowpan_process_data
As we have decouple decompression from data delivery we can now rename all
occurences of process_data in receive path.

Signed-off-by: Martin Townsend <mtownsend1973@gmail.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 15:51:16 +01:00
Martin Townsend
3c400b843d bluetooth:6lowpan: use consume_skb when packet processed successfully
Signed-off-by: Martin Townsend <mtownsend1973@gmail.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 15:51:16 +01:00
Martin Townsend
04dfd7386a 6lowpan: fix process_data return values
As process_data now returns just error codes fix up the calls to this
function to only drop the skb if an error code is returned.

Signed-off-by: Martin Townsend <mtownsend1973@gmail.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 15:51:15 +01:00
Martin Townsend
f8b361768e 6lowpan: remove skb_deliver from IPHC
Separating skb delivery from decompression ensures that we can support further
decompression schemes and removes the mixed return value of error codes with
NET_RX_FOO.

Signed-off-by: Martin Townsend <mtownsend1973@gmail.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 15:51:15 +01:00
Karl Beldan
9ffe904405 mac80211: minstrel_ht: do not always skip ht rates vht_only is true
When CONFIG_MAC80211_RC_MINSTREL_VHT is set, the module param
minstrel_vht_only tells minstrel_ht whether to allow the mix of ht rates
with vht rates.
ATM, minstrel_ht skips ht rates when minstrel_vht_only is true, but it does
that even if vht is not supported, which makes the sta rates fallback to
legacy as no ht rate gets enabled.

Fixes: 9208247d74 ("mac80211: minstrel_ht: add basic support for VHT rates <= 3SS@80MHz")
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-27 08:48:35 +01:00
Emmanuel Grumbach
cfede0d80d mac80211: don't flush when probing the AP
All the callers of ieee80211_mgd_probe_ap_send return right
after they call the flush() callback. This means that calling
flush() is uneeded since its meaning is to wait until the
queues of the device are empty.

Devices that know how to report status on Tx will do so using
the regular path (ieee80211_tx_status) and this status will
trigger the continuation of the flow of the probe
(ieee80211_sta_tx_notify).

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-27 08:48:34 +01:00
Ben Greear
b5dfae020b mac80211: support creating vifs with specified mac address
This is useful when creating virtual interfaces.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-27 08:48:34 +01:00
Ben Greear
e8f479b112 cfg80211: support configuring vif mac addr on create
This is useful when creating virtual interfaces.
Keeps udev from mucking with things it shouldn't, since
the default MAC is never seen by udev when specified on
the cmd-line during creation.

Signed-off-by: Ben Greear <greearb@candelatech.com>
[check for feature flag in nl80211 to force drivers to set it]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-27 08:48:33 +01:00
Ben Greear
e27513fbd0 mac80211: support creating wiphy w/out creating wlanX
This will be helpful when using the mac80211_hwsim
wiphys and automated testing.  Let user create the
vifs as needed, and named as expected.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-27 08:48:32 +01:00
Ben Greear
ad28757eef mac80211: allow creating wiphy devices with suggested name
Support creating wiphy devices with an optional name.
This will be used by hwsim to have better automated control
over virtual radio creation/deletion.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-27 08:48:31 +01:00
Ben Greear
1998d90ad4 cfg80211: support creating wiphy with suggested name
Kernel will attempt to use the name if it is supplied,
but if name cannot be used for some reason, the default
phyX name will be used instead.

Signed-off-by: Ben Greear <greearb@candelatech.com>
[while at it, use wiphy_name() instead of dev_name(),
 fix format string issue reported by Kees Cook]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-27 08:48:18 +01:00
Fabian Frederick
5c1e9f2c1f xfrm: fix set but not used warning in xfrm_policy_queue_process()
err was set but unused.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-10-27 07:16:46 +01:00
Eric Dumazet
93a35f59f1 net: napi_reuse_skb() should check pfmemalloc
Do not reuse skb if it was pfmemalloc tainted, otherwise
future frame might be dropped anyway.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-26 22:47:23 -04:00
Alexander Aring
f81f466ca5 mac802154: tx: make worker information static
This patch moves the worker information struct out of skb control block.
Instead control block we declare it static inside of tx.c file. We can do
that, because the worker can't be used twice at the same time. It's
protected by stop and wake netdev queue.

This patch fix an issue that the "struct ieee802154_xmit_cb" doesn't fit
into the skb control block on some kernel configuartion reported by
kbuild test robot.

It was introduced by commit fe24371d66
("mac802154: tx: remove kmalloc in xmit hotpath").

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 19:18:35 +01:00
Alexander Aring
e5e584fcc2 mac802154: tx: change naming convention
This patch changes the naming convention of the tx functions like
mac80211. Just with an 802154 instead 80211 inside the name.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:05 +01:00
Alexander Aring
409c3b0c5f mac802154: tx: move stats tx increment
This patch moves the stats increment of successful transmitted packets
in the right place when the skb was really successful transmitted.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:05 +01:00
Alexander Aring
b7eec52bcb mac802154: tx: cleanup crc calculation
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:05 +01:00
Alexander Aring
cfa626cb37 mac802154: tx: use netdev print helpers
This patch replace the pr_foo printout function to netdev_foo printout
function. Inside the xmit handling, the interface is already known.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:04 +01:00
Alexander Aring
6001d5223d mac802154: tx: don't allow if down while sync tx
This patch holds rtnl lock while sync xmit inside of workqueue.
Otherwise we could down the interface while worker xmit handling.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:04 +01:00
Alexander Aring
ed0a5dce0c mac802154: tx: add support for xmit_async callback
This patch renames the existsing xmit callback to xmit_sync and
introduces an asynchronous xmit_async function. If ieee802154_ops
doesn't provide the xmit_async callback, then we have a fallback to
the xmit_sync callback.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:04 +01:00
Alexander Aring
cdb66beaa0 mac802154: tx: fix error handling while xmit
In case of an error we should call kfree_skb instead of consume_skb which
is called by ieee802154_xmit_complete function.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:04 +01:00
Alexander Aring
18d60a0d49 mac802154: tx: use queue helpers in xmit worker
This patch uses the queue utility helpers inside the xmit worker of
mac802154 subsystem.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:03 +01:00
Alexander Aring
c208510351 mac802154: add netdev qeue helpers
This patch adds a new file net/mac802154/util.c which contains utility
functions for drivers, etc. This file contains functions to start and
stop queues for all virtual interfaces, this is useful for asynchronous
handling by driver level.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:03 +01:00
Alexander Aring
dc67c6b30f mac802154: tx: remove xmit channel context switch
This patch removes the channel hopping feature before xmit. There are
several issues to provide a real channel hopping (timing requirements,
etc...).

We don't have any known kernelspace protocol which really use this
feature. And I don't know an real user of this feature.
We simply drop this feature now.

This patch removes also the hold of pib lock which isn't needed by any
real driver xmit callback implementation.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:03 +01:00
Alexander Aring
e89e45f22a mac802154: tx: squash multiple dereferencing
This patch introduce some new stack variables to avoid multiple
dereferencing inside the xmit worker function.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:02 +01:00
Alexander Aring
fe24371d66 mac802154: tx: remove kmalloc in xmit hotpath
This patch removes the kmalloc allocation for workqueue data. This patch
replaces the kmalloc and uses the control block of skb. The control block
has enough space and isn't use by any other layer in this case.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:23:58 +01:00
Alexander Aring
50c6fb9965 mac802154: tx: move xmit callback to tx file
This patch moves the netdev xmit callback functions into the tx.c file.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:23:50 +01:00
Eric Dumazet
349ce993ac tcp: md5: do not use alloc_percpu()
percpu tcp_md5sig_pool contains memory blobs that ultimately
go through sg_set_buf().

-> sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));

This requires that whole area is in a physically contiguous portion
of memory. And that @buf is not backed by vmalloc().

Given that alloc_percpu() can use vmalloc() areas, this does not
fit the requirements.

Replace alloc_percpu() by a static DEFINE_PER_CPU() as tcp_md5sig_pool
is small anyway, there is no gain to dynamically allocate it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 765cf9976e ("tcp: md5: remove one indirection level in tcp_md5sig_pool")
Reported-by: Crestez Dan Leonard <cdleonard@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-25 16:10:04 -04:00
Alexander Aring
c6f635faf3 mac802154: remove ieee802154_addr from driver_ops
This driver_ops callback function is never used by any driver.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:55:39 +02:00
Alexander Aring
f773054254 mac802154: rename dev_workqueue to workqueue
Small rename to use the name workqueue than dev_workqueue. To bring the
same naming convention like wireless into 802.15.4.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:55:38 +02:00
Alexander Aring
59d19cd70c mac802154: introduce IEEE802154_DEV_TO_SUB_IF
This function adds a wrapper to call netdev_priv to getting the sdata
attribute. This is similar like the IEEE80211_DEV_TO_SUB_IF function
inside wireless stack implementation.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:55:38 +02:00
Alexander Aring
60741361c3 mac802154: introduce hw_to_local function
This patch replace the mac802154_to_priv macro with a static inline
function named hw_to_local. This brings a similar naming convention like
mac80211 stack.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:55:38 +02:00
Alexander Aring
d98be45b36 mac802154: rename sdata slaves and slaves_mtx
This patch renamens the slaves attribute in sdata to interfaces and
slaves_mtx to iflist_mtx. This is similar like the mac80211 stack naming
convention.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:55:38 +02:00
Alexander Aring
04e850fe06 mac802154: rename hw subif_data variable to local
This patch renames the hw attribute in struct ieee802154_sub_if_data to
local. This avoid confusing with the struct ieee802154_hw hw; inside of
local struct.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:55:38 +02:00
Alexander Aring
036562f9c4 mac802154: rename mac802154_sub_if_data
Like wireless this structure should named ieee802154_sub_if_data and not
mac802154_sub_if_data. This patch renames the struct and variables to
sdata instead priv sometimes.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:55:37 +02:00
Alexander Aring
a5e1ec538f mac802154: rename mac802154_priv to ieee802154_local
This patch rename the mac802154_priv to ieee802154_local. The
mac802154_priv structure is like ieee80211_local and so we name it
ieee802154_local.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:55:37 +02:00
Alexander Aring
5a50439775 ieee802154: rename ieee802154_dev to ieee802154_hw
The identical struct of the wireless stack implementation is named
ieee80211_hw. This is useful to name the variable hw instead of get
confusing with netdev dev variable.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:55:37 +02:00
Alexander Aring
4ca24aca55 ieee802154: move ieee802154 header
This patch moves the ieee802154 header into include/linux instead
include/net. Similar like wireless which have the ieee80211 header
inside of include/linux.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:39:57 +02:00
Alexander Aring
86d52cd964 ieee802154: move wpan-class.c to core.c
Like the wireless core.c file this file contains function for phy
allocation and freeing. Move this file to core.c to get similar
behaviour.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:39:56 +02:00
Alexander Aring
5ad60d3699 ieee802154: move wpan-phy.h to cfg802154.h
The wpan-phy header contains the wpan_phy struct information. Later this
header will be have similar function like cfg80211 header. The cfg80211
header contains the wiphy struct which is identically the wpan_phy
struct inside 802.15.4 subsystem.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:39:56 +02:00
Alexander Aring
15859a5e14 mac802154: move wpan.c to iface.c
The wpan.c file contains the interface handling functions now. It's similar
like the mac80211 iface.c file. This patch renames this file to iface.c to
have similar naming convention in mac802154.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:39:55 +02:00
Alexander Aring
0f1556bc2b mac802154: move mac802154.h to ieee802154_i.h
This patch moves the mac802154.h internal header to ieee802154_i.h like
the wireless stack ieee80211_i.h file. This avoids confusing with the
not internal header include/net/mac802154.h header. Additional we get
the same naming conversion like mac80211 for this file.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:39:55 +02:00
Alexander Aring
62eb01f5c2 mac802154: move ieee802154_dev.c to main.c
The ieee802154_dev functionality contains various function for
allocation and registration of an ieee802154_dev. This is equal to the
net/mac80211/main.c file. This patch rename the ieee802154_dev.c to
main.c to have the same behaviour.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:39:54 +02:00
Johan Hedberg
c6992e9ef2 Bluetooth: Add self-tests for SMP crypto functions
This patch adds self-tests for the c1 and s1 crypto functions used for
SMP pairing. The data used is the sample data from the core
specification.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:33:57 +02:00
Johan Hedberg
4cd3362da8 Bluetooth: Add skeleton for SMP self-tests
This patch adds a basic skeleton for SMP self-tests. The tests are put
behind a new configuration option since running them will slow down the
boot process. For now there are no actual tests defined but those will
come in a subsequent patch.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:33:56 +02:00
Johan Hedberg
e491eaf3c0 Bluetooth: Pass only crypto context to SMP crypto functions
In order to make unit testing possible we need to make the SMP crypto
functions only take the crypto context instead of the full SMP context
(the latter would require having hci_dev, hci_conn, l2cap_chan,
l2cap_conn, etc around). The drawback is that we no-longer get the
involved hdev in the debug logs, but this is really the only way to make
simple unit tests for the code.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:33:56 +02:00
Fabian Frederick
4f639edef7 Bluetooth: fix shadow warning in hci_disconnect()
use clkoff_cp for hci_cp_read_clock_offset instead of cp
(already defined above).

Suggested-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 18:53:39 +02:00
Alexander Aring
ed1da14817 ieee802154: wpan-class: fix trailing semicolon
This patch removes an unnecessary tailing semicolon after macro
define. Otherwise we get a trailing semicolon while using this macro.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 08:07:30 +02:00
Alexander Aring
57205c14ca mac802154: fix typo IEEE802515 to IEEE802154
This patch fixs a typo in address filter defines from IEEE802515 to
IEEE802154.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 08:07:30 +02:00
Alexander Aring
139f14adab ieee802154: ieee802154_dev: fix align typo
This patch fix a typo and fix align instead allign.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 08:07:30 +02:00
Alexander Aring
b3020f0a35 ieee802154: mac802154: remove FSF address
This patch removes the FSF address in files which belongs to ieee802154
and mac802154.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 08:07:30 +02:00
Martin Townsend
ee93053d56 Bluetooth: Fix missing channel unlock in l2cap_le_credits
In the error case where credits is greater than max_credits there
is a missing l2cap_chan_unlock before returning.

Signed-off-by: Martin Townsend <mtownsend1973@gmail.com>
Tested-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 07:56:25 +02:00
Martin Townsend
11e3ff7072 6lowpan: Use skb_cow in IPHC decompression.
Currently there are potentially 2 skb_copy_expand calls in IPHC
decompression.  This patch replaces this with one call to
skb_cow which will check to see if there is enough headroom
first to ensure it's only done if necessary and will handle
alignment issues for cache.
As skb_cow uses pskb_expand_head we ensure the skb isn't shared from
bluetooth and ieee802.15.4 code that use the IPHC decompression.

Signed-off-by: Martin Townsend <martin.townsend@xsilon.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 07:56:25 +02:00
Li RongQing
4456c50d23 Bluetooth: 6lowpan: remove unnecessary codes in give_skb_to_upper
netif_rx() only returns NET_RX_DROP and NET_RX_SUCCESS, not returns
negative value

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 07:56:25 +02:00
Szymon Janc
15346a9c28 Bluetooth: Improve RFCOMM __test_pf macro robustness
Value returned by this macro might be used as bit value so it should
return either 0 or 1 to avoid possible bugs (similar to NSC bug)
when shifting it.

Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-10-25 07:56:24 +02:00
Szymon Janc
ec511545ef Bluetooth: Fix RFCOMM NSC response
rfcomm_send_nsc expects CR to be either 0 or 1 since it is later
passed to __mcc_type macro and shitfed. Unfortunatelly CR extracted
from received frame type was not sanitized and shifted value was passed
resulting in bogus response.

Note: shifted value was also passed to other functions but was used
only in if satements so this bug appears only for NSC case.

The CR bit in the value octet shall be set to the same value
as the CR bit in the type field octet of the not supported command
frame but the CR bit for NCS response should be set to 0 since it is
always a response.

This was affecting TC_RFC_BV_25_C PTS qualification test.

Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-10-25 07:56:24 +02:00
Alfonso Acosta
89cbb0638e Bluetooth: Defer connection-parameter removal when unpairing
Systematically removing the LE connection parameters and autoconnect
action is inconvenient for rebonding without disconnecting from
userland (i.e. unpairing followed by repairing without
disconnecting). The parameters will be lost after unparing and
userland needs to take care of book-keeping them and re-adding them.

This patch allows userland to forget about parameter management when
rebonding without disconnecting. It defers clearing the connection
parameters when unparing without disconnecting, giving a chance of
keeping the parameters if a repairing happens before the connection is
closed.

Signed-off-by: Alfonso Acosta <fons@spotify.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-10-25 07:56:24 +02:00
Alexander Aring
c37a8106de ieee802154: 6lowpan: add RTNL assertion
This patch ensure that the rtnl lock is hold while newlink callback.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 07:56:24 +02:00
Alexander Aring
1ae2605e55 ieee802154: 6lowpan: improve packet registration
This patch improves the packet registration handling. Instead of
registration with module init we have a open count variable and
registration the lowpan packet handler when it's needed.

The open count variable should be protected by RTNL.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 07:56:24 +02:00
Alfonso Acosta
ddbea5cff7 Bluetooth: Remove redundant check on hci_conn's device class
NULL-checking conn->dev_class is pointless since the variable is
defined as an array, i.e. it will always be non-NULL.

Signed-off-by: Alfonso Acosta <fons@spotify.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-10-25 07:56:24 +02:00
Alfonso Acosta
fd45ada910 Bluetooth: Include ADV_IND report in Device Connected event
There are scenarios when autoconnecting to a device after the
reception of an ADV_IND report (action 0x02), in which userland
might want to examine the report's contents.

For instance, the Service Data might have changed and it would be
useful to know ahead of time before starting any GATT procedures.
Also, the ADV_IND may contain Manufacturer Specific data which would
be lost if not propagated to userland. In fact, this patch results
from the need to rebond with a device lacking persistent storage which
notifies about losing its LTK in ADV_IND reports.

This patch appends the ADV_IND report which triggered the
autoconnection to the EIR Data in the Device Connected event.

Signed-off-by: Alfonso Acosta <fons@spotify.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-10-25 07:56:24 +02:00
Alfonso Acosta
48ec92fa4f Bluetooth: Refactor arguments of mgmt_device_connected
The values of a lot of the mgmt_device_connected() parameters come
straight from a hci_conn object. We can simplify the function by passing
the full hci_conn pointer to it.

Signed-off-by: Alfonso Acosta <fons@spotify.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-10-25 07:56:23 +02:00
Alexander Aring
c0bffc7ddc ieee802154: 6lowpan: fix sign of errno return val
This patch fix ERR_PTR(-rc) to ERR_PTR(rc). The variable rc is already a
negative errno value.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 07:56:22 +02:00
Alexander Aring
f870b8c631 ieee802154: reassembly: fix tag byteorder
This patch fix byte order handling in reassembly code of 802.15.4
6LoWPAN fragmentation handling.

net/ieee802154/reassembly.c:58:43: warning: restricted __be16 degrades to integer

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 07:56:22 +02:00
Alexander Aring
cd97a713ac ieee802154: 6lowpan: fix byteorder for frag tag
This patch fix byteorder issues with fragment tag of generation 802.15.4
6LoWPAN fragment header.

net/ieee802154/6lowpan_rtnl.c:278:54: warning restricted __be16 degrades to integer
net/ieee802154/6lowpan_rtnl.c:278:18: warning: incorrect type in assignment (different base types)
net/ieee802154/6lowpan_rtnl.c:278:18:    expected restricted __be16 [usertype] frag_tag
net/ieee802154/6lowpan_rtnl.c:278:18:    got unsigned short

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 07:56:22 +02:00
Simon Vincent
39f6eb19cf ieee802154: 6lowpan: Drop PACKET_OTHERHOST skbs in 6lowpan
There is no point processing pkts which are PACKET_OTHERHOST
in 6lowpan as they are discarded as soon as they reach the
ipv6 layer. Therefore we should drop them in the 6lowpan layer.

Signed-off-by: Simon Vincent <simon.vincent@xsilon.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 07:56:21 +02:00
Julien Catalano
ee4c148e8a trivial: net/mac802154: Fix Kconfig typo
Signed-off-by: Julien Catalano <julien.catalano@gmail.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 07:56:21 +02:00
Fabian Frederick
74bca138e1 net: llc: include linux/errno.h instead of asm/errno.h
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24 15:51:42 -04:00
Fabian Frederick
75da1469f9 lapb: move EXPORT_SYMBOL after functions.
See Documentation/CodingStyle Chapter 6

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24 15:51:42 -04:00
Houcheng Lin
b51d3fa364 netfilter: nf_log: release skbuff on nlmsg put failure
The kernel should reserve enough room in the skb so that the DONE
message can always be appended.  However, in case of e.g. new attribute
erronously not being size-accounted for, __nfulnl_send() will still
try to put next nlmsg into this full skbuf, causing the skb to be stuck
forever and blocking delivery of further messages.

Fix issue by releasing skb immediately after nlmsg_put error and
WARN() so we can track down the cause of such size mismatch.

[ fw@strlen.de: add tailroom/len info to WARN ]

Signed-off-by: Houcheng Lin <houcheng@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-24 14:34:11 +02:00
Florian Westphal
c1e7dc91ee netfilter: nfnetlink_log: fix maximum packet length logged to userspace
don't try to queue payloads > 0xffff - NLA_HDRLEN, it does not work.
The nla length includes the size of the nla struct, so anything larger
results in u16 integer overflow.

This patch is similar to
9cefbbc9c8 (netfilter: nfnetlink_queue: cleanup copy_range usage).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-24 14:32:27 +02:00
Florian Westphal
9dfa1dfe4d netfilter: nf_log: account for size of NLMSG_DONE attribute
We currently neither account for the nlattr size, nor do we consider
the size of the trailing NLMSG_DONE when allocating nlmsg skb.

This can result in nflog to stop working, as __nfulnl_send() re-tries
sending forever if it failed to append NLMSG_DONE (which will never
work if buffer is not large enough).

Reported-by: Houcheng Lin <houcheng@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-24 14:30:15 +02:00
Herbert Xu
7677e86843 bridge: Do not compile options in br_parse_ip_options
Commit 462fb2af97

	bridge : Sanitize skb before it enters the IP stack

broke when IP options are actually used because it mangles the
skb as if it entered the IP stack which is wrong because the
bridge is supposed to operate below the IP stack.

Since nobody has actually requested for parsing of IP options
this patch fixes it by simply reverting to the previous approach
of ignoring all IP options, i.e., zeroing the IPCB.

If and when somebody who uses IP options and actually needs them
to be parsed by the bridge complains then we can revisit this.

Reported-by: David Newall <davidn@davidnewall.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-24 14:24:03 +02:00
Martin KaFai Lau
367efcb932 ipv6: Avoid redoing fib6_lookup() with reachable = 0 by saving fn
This patch save the fn before doing rt6_backtrack.
Hence, without redo-ing the fib6_lookup(), saved_fn can be used
to redo rt6_select() with RT6_LOOKUP_F_REACHABLE off.

Some minor changes I think make sense to review as a single patch:
* Remove the 'out:' goto label.
* Remove the 'reachable' variable. Only use the 'strict' variable instead.

After this patch, "failing ip6_ins_rt()" should be the only case that
requires a redo of fib6_lookup().

Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24 00:14:39 -04:00
Martin KaFai Lau
94c77bb41d ipv6: Avoid redoing fib6_lookup() for RTF_CACHE hit case
When there is a RTF_CACHE hit, no need to redo fib6_lookup()
with reachable=0.

Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24 00:14:39 -04:00
Martin KaFai Lau
a3c00e46ef ipv6: Remove BACKTRACK macro
It is the prep work to reduce the number of calls to fib6_lookup().

The BACKTRACK macro could be hard-to-read and error-prone due to
its side effects (mainly goto).

This patch is to:
1. Replace BACKTRACK macro with a function (fib6_backtrack) with the following
   return values:
   * If it is backtrack-able, returns next fn for retry.
   * If it reaches the root, returns NULL.
2. The caller needs to decide if a backtrack is needed (by testing
   rt == net->ipv6.ip6_null_entry).
3. Rename the goto labels in ip6_pol_route() to make the next few
   patches easier to read.

Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24 00:14:39 -04:00
Kenjiro Nakayama
105970f608 net: Remove trailing whitespace in tcp.h icmp.c syncookies.c
Remove trailing whitespace in tcp.h icmp.c syncookies.c

Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24 00:13:10 -04:00
Arik Nemtsov
0fc1e0495f mac80211: expose API allowing station iteration
Allow drivers to iterate all stations currently uploaded to them.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-23 20:40:03 +02:00
Arik Nemtsov
2bad7748b3 mac80211: add stations in order to the station list
During reconfig the station list is traversed in order and station are
added back to the driver. Make sure the stations are added to the driver
in the same order they were added to mac80211.

This has a real side effect - some drivers (iwlwifi) require TDLS
stations to be added only after the AP station for the same network.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-23 20:40:02 +02:00
Arik Nemtsov
8b94148cfe mac80211: expose TDLS-initiator value to low level driver
Some drivers need to know which station is the TDLS link initiator.
Expose this value via the mac80211 ieee80211_sta structure.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-23 20:40:02 +02:00
Arik Nemtsov
452218d9fd mac80211: fix network header breakage during encryption
When an IV is generated, only the MAC header is moved back. The network
header location remains the same relative to the skb head, as the new IV
is using headroom space that was reserved in advance.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-23 20:40:01 +02:00
Andrei Otcheretianski
a7f3a76828 mac80211: export IE splitting function
Export ieee80211_ie_split function, so it can be reused by
drivers which need to insert additional elements.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-23 20:39:43 +02:00
Karl Beldan
8ec7886b1c mac80211: minstrel_ht: use group flags instead of index to display rates
When displaying a rate through debugfs minstrel_ht guesses its flags
comparing group indexes.  Since 3ec373c421 ("mac80211: minstrel_ht:
include type (cck/ht) in rates flag"), the rate flags of interest are
present in the mcs_group-s, so use it.
While improving the code, this also fixes a smatch false positive
"error: testing array offset 'i' after use" in minstrel_ht_stats_dump.
This warning only triggers after 9208247d74 ("mac80211: minstrel_ht:
add basic support for VHT rates <= 3SS@80MHz") with
CONFIG_MAC80211_RC_MINSTREL_VHT unset because then MINSTREL_VHT_GROUP_0
is above MINSTREL_GROUPS_NB and smatch only barks when the "testing
array offset" seems to prevent possible out of bonds accesses (which
does not happen here since i < ARRAY_SIZE(mi->groups)).

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-23 20:36:13 +02:00
J. Bruce Fields
280caac078 rpc: change comments to assertions
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-10-23 14:05:11 -04:00
J. Bruce Fields
ed38c06998 RPC: remove unneeded checks from xdr_truncate_encode()
Thanks to Andrea Arcangeli for pointing out these checks are
obviously unnecessary given the preceding calculations.

Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-10-23 14:05:11 -04:00
Sathya Perla
9e7ceb0607 net: fix saving TX flow hash in sock for outgoing connections
The commit "net: Save TX flow hash in sock and set in skbuf on xmit"
introduced the inet_set_txhash() and ip6_set_txhash() routines to calculate
and record flow hash(sk_txhash) in the socket structure. sk_txhash is used
to set skb->hash which is used to spread flows across multiple TXQs.

But, the above routines are invoked before the source port of the connection
is created. Because of this all outgoing connections that just differ in the
source port get hashed into the same TXQ.

This patch fixes this problem for IPv4/6 by invoking the the above routines
after the source port is available for the socket.

Fixes: b73c3d0e4("net: Save TX flow hash in sock and set in skbuf on xmit")

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-22 16:14:29 -04:00
Li RongQing
789f202326 xfrm6: fix a potential use after free in xfrm6_policy.c
pskb_may_pull() maybe change skb->data and make nh and exthdr pointer
oboslete, so recompute the nd and exthdr

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-22 15:38:48 -04:00
Karl Beldan
a63ba13eec net: tso: fix unaligned access to crafted TCP header in helper API
The crafted header start address is from a driver supplied buffer, which
one can reasonably expect to be aligned on a 4-bytes boundary.
However ATM the TSO helper API is only used by ethernet drivers and
the tcp header will then be aligned to a 2-bytes only boundary from the
header start address.

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-22 12:52:55 -04:00
Sabrina Dubroca
c123bb7163 netfilter: nf_tables: check for NULL in nf_tables_newchain pcpu stats allocation
alloc_percpu returns NULL on failure, not a negative error code.

Fixes: ff3cd7b3c9 ("netfilter: nf_tables: refactor chain statistic routines")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-22 14:12:51 +02:00
Dan Carpenter
0f9f5e1b83 netfilter: ipset: off by one in ip_set_nfnl_get_byindex()
The ->ip_set_list[] array is initialized in ip_set_net_init() and it
has ->ip_set_max elements so this check should be >= instead of >
otherwise we are off by one.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-22 14:12:50 +02:00
Marcelo Leitner
e37ad9fd63 netfilter: nf_conntrack: allow server to become a client in TW handling
When a port that was used to listen for inbound connections gets closed
and reused for outgoing connections (like rsh ends up doing for stderr
flow), current we may reject the SYN/ACK packet for the new connection
because tcp_conntracks states forbirds a port to become a client while
there is still a TIME_WAIT entry in there for it.

As TCP may expire the TIME_WAIT socket in 60s and conntrack's timeout
for it is 120s, there is a ~60s window that the application can end up
opening a port that conntrack will end up blocking.

This patch fixes this by simply allowing such state transition: if we
see a SYN, in TIME_WAIT state, on REPLY direction, move it to sSS. Note
that the rest of the code already handles this situation, more
specificly in tcp_packet(), first switch clause.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-22 14:12:50 +02:00
Johannes Berg
4619194a49 mac80211: don't remove tainted keys after not programming
When a key is tainted during resume, it is no longer programmed
into the device; however, it's uploaded flag may (will) be set.
Clear the flag when not programming it because it's tainted to
avoid attempting to remove it again later.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-22 11:30:30 +02:00
Johannes Berg
02219b3abc mac80211: add WMM admission control support
Use the currently existing APIs between mac80211 and the low
level driver to implement WMM admission control.

The low level driver needs to report the media time used by
each transmitted packet in ieee80211_tx_status. Based on that
information, mac80211 will modify the QoS parameters of the
admission controlled Access Category when the limit is
reached. Once the original QoS parameters can be restored,
mac80211 will do so.

One issue with this approach is that management frames will
also erroneously be downgraded, but the upside is that the
implementation is simple. In the future, it can be extended
to driver- or device-based implementations that are better.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-22 10:42:09 +02:00
Johannes Berg
f409079bb6 mac80211: sanity check CW_min/CW_max towards driver
There's no reason to ever set invalid CW_min/CW_max to the
drivers, we should catch it in higher layers. However, the
consequences of setting it wrong can be quite severe, so
double-check at a low level and error out for invalid data.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-22 10:42:09 +02:00
Johannes Berg
723e73acd1 cfg80211: make WMM TSPEC support flag an nl80211 feature flag
During the review of the corresponding wpa_supplicant patches we
noticed that the only way for it to detect that this functionality
is supported currently is to check for the command support. This
can be misleading though, as the command was also designed to, in
the future, support pure 802.11 TSPECs.

Expose the WMM-TSPEC feature flag to nl80211 so later we can also
expose an 802.11-TSPEC feature flag (if needed) to differentiate
the two cases.

Note: this change isn't needed in 3.18 as there's no driver there
yet that supports the functionality at all.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-22 10:41:49 +02:00
Sabrina Dubroca
7c1c97d54f net: sched: initialize bstats syncp
Use netdev_alloc_pcpu_stats to allocate percpu stats and initialize syncp.

Fixes: 22e0f8b932 "net: sched: make bstats per cpu and estimator RCU safe"
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-21 21:45:21 -04:00
Thomas Graf
78fd1d0ab0 netlink: Re-add locking to netlink_lookup() and seq walker
The synchronize_rcu() in netlink_release() introduces unacceptable
latency. Reintroduce minimal lookup so we can drop the
synchronize_rcu() until socket destruction has been RCUfied.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-21 21:34:49 -04:00
Ying Xue
1a194c2d59 tipc: fix lockdep warning when intra-node messages are delivered
When running tipcTC&tipcTS test suite, below lockdep unsafe locking
scenario is reported:

[ 1109.997854]
[ 1109.997988] =================================
[ 1109.998290] [ INFO: inconsistent lock state ]
[ 1109.998575] 3.17.0-rc1+ #113 Not tainted
[ 1109.998762] ---------------------------------
[ 1109.998762] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 1109.998762] swapper/7/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 1109.998762]  (slock-AF_TIPC){+.?...}, at: [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762] {SOFTIRQ-ON-W} state was registered at:
[ 1109.998762]   [<ffffffff810a4770>] __lock_acquire+0x6a0/0x1d80
[ 1109.998762]   [<ffffffff810a6555>] lock_acquire+0x95/0x1e0
[ 1109.998762]   [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80
[ 1109.998762]   [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]   [<ffffffffa0004fe8>] tipc_link_xmit+0xa8/0xc0 [tipc]
[ 1109.998762]   [<ffffffffa000ec6f>] tipc_sendmsg+0x15f/0x550 [tipc]
[ 1109.998762]   [<ffffffffa000f165>] tipc_connect+0x105/0x140 [tipc]
[ 1109.998762]   [<ffffffff817676ee>] SYSC_connect+0xae/0xc0
[ 1109.998762]   [<ffffffff81767b7e>] SyS_connect+0xe/0x10
[ 1109.998762]   [<ffffffff817a9788>] compat_SyS_socketcall+0xb8/0x200
[ 1109.998762]   [<ffffffff81a306e5>] sysenter_dispatch+0x7/0x1f
[ 1109.998762] irq event stamp: 241060
[ 1109.998762] hardirqs last  enabled at (241060): [<ffffffff8105a4ad>] __local_bh_enable_ip+0x6d/0xd0
[ 1109.998762] hardirqs last disabled at (241059): [<ffffffff8105a46f>] __local_bh_enable_ip+0x2f/0xd0
[ 1109.998762] softirqs last  enabled at (241020): [<ffffffff81059a52>] _local_bh_enable+0x22/0x50
[ 1109.998762] softirqs last disabled at (241021): [<ffffffff8105a626>] irq_exit+0x96/0xc0
[ 1109.998762]
[ 1109.998762] other info that might help us debug this:
[ 1109.998762]  Possible unsafe locking scenario:
[ 1109.998762]
[ 1109.998762]        CPU0
[ 1109.998762]        ----
[ 1109.998762]   lock(slock-AF_TIPC);
[ 1109.998762]   <Interrupt>
[ 1109.998762]     lock(slock-AF_TIPC);
[ 1109.998762]
[ 1109.998762]  *** DEADLOCK ***
[ 1109.998762]
[ 1109.998762] 2 locks held by swapper/7/0:
[ 1109.998762]  #0:  (rcu_read_lock){......}, at: [<ffffffff81782dc9>] __netif_receive_skb_core+0x69/0xb70
[ 1109.998762]  #1:  (rcu_read_lock){......}, at: [<ffffffffa0001c90>] tipc_l2_rcv_msg+0x40/0x260 [tipc]
[ 1109.998762]
[ 1109.998762] stack backtrace:
[ 1109.998762] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.17.0-rc1+ #113
[ 1109.998762] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 1109.998762]  ffffffff82745830 ffff880016c03828 ffffffff81a209eb 0000000000000007
[ 1109.998762]  ffff880017b3cac0 ffff880016c03888 ffffffff81a1c5ef 0000000000000001
[ 1109.998762]  ffff880000000001 ffff880000000000 ffffffff81012d4f 0000000000000000
[ 1109.998762] Call Trace:
[ 1109.998762]  <IRQ>  [<ffffffff81a209eb>] dump_stack+0x4e/0x68
[ 1109.998762]  [<ffffffff81a1c5ef>] print_usage_bug+0x1f1/0x202
[ 1109.998762]  [<ffffffff81012d4f>] ? save_stack_trace+0x2f/0x50
[ 1109.998762]  [<ffffffff810a406c>] mark_lock+0x28c/0x2f0
[ 1109.998762]  [<ffffffff810a3440>] ? print_irq_inversion_bug.part.46+0x1f0/0x1f0
[ 1109.998762]  [<ffffffff810a467d>] __lock_acquire+0x5ad/0x1d80
[ 1109.998762]  [<ffffffff810a70dd>] ? trace_hardirqs_on+0xd/0x10
[ 1109.998762]  [<ffffffff8108ace8>] ? sched_clock_cpu+0x98/0xc0
[ 1109.998762]  [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30
[ 1109.998762]  [<ffffffff810a10dc>] ? lock_release_holdtime.part.29+0x1c/0x1a0
[ 1109.998762]  [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90
[ 1109.998762]  [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc]
[ 1109.998762]  [<ffffffff810a6555>] lock_acquire+0x95/0x1e0
[ 1109.998762]  [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]  [<ffffffff810a6fb6>] ? trace_hardirqs_on_caller+0xa6/0x1c0
[ 1109.998762]  [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80
[ 1109.998762]  [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]  [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc]
[ 1109.998762]  [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]  [<ffffffffa00076bd>] tipc_rcv+0x5ed/0x960 [tipc]
[ 1109.998762]  [<ffffffffa0001d1c>] tipc_l2_rcv_msg+0xcc/0x260 [tipc]
[ 1109.998762]  [<ffffffffa0001c90>] ? tipc_l2_rcv_msg+0x40/0x260 [tipc]
[ 1109.998762]  [<ffffffff81783345>] __netif_receive_skb_core+0x5e5/0xb70
[ 1109.998762]  [<ffffffff81782dc9>] ? __netif_receive_skb_core+0x69/0xb70
[ 1109.998762]  [<ffffffff81784eb9>] ? dev_gro_receive+0x259/0x4e0
[ 1109.998762]  [<ffffffff817838f6>] __netif_receive_skb+0x26/0x70
[ 1109.998762]  [<ffffffff81783acd>] netif_receive_skb_internal+0x2d/0x1f0
[ 1109.998762]  [<ffffffff81785518>] napi_gro_receive+0xd8/0x240
[ 1109.998762]  [<ffffffff815bf854>] e1000_clean_rx_irq+0x2c4/0x530
[ 1109.998762]  [<ffffffff815c1a46>] e1000_clean+0x266/0x9c0
[ 1109.998762]  [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30
[ 1109.998762]  [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90
[ 1109.998762]  [<ffffffff817842b1>] net_rx_action+0x141/0x310
[ 1109.998762]  [<ffffffff810bd710>] ? handle_fasteoi_irq+0xe0/0x150
[ 1109.998762]  [<ffffffff81059fa6>] __do_softirq+0x116/0x4d0
[ 1109.998762]  [<ffffffff8105a626>] irq_exit+0x96/0xc0
[ 1109.998762]  [<ffffffff81a30d07>] do_IRQ+0x67/0x110
[ 1109.998762]  [<ffffffff81a2ee2f>] common_interrupt+0x6f/0x6f
[ 1109.998762]  <EOI>  [<ffffffff8100d2b7>] ? default_idle+0x37/0x250
[ 1109.998762]  [<ffffffff8100d2b5>] ? default_idle+0x35/0x250
[ 1109.998762]  [<ffffffff8100dd1f>] arch_cpu_idle+0xf/0x20
[ 1109.998762]  [<ffffffff810999fd>] cpu_startup_entry+0x27d/0x4d0
[ 1109.998762]  [<ffffffff81034c78>] start_secondary+0x188/0x1f0

When intra-node messages are delivered from one process to another
process, tipc_link_xmit() doesn't disable BH before it directly calls
tipc_sk_rcv() on process context to forward messages to destination
socket. Meanwhile, if messages delivered by remote node arrive at the
node and their destinations are also the same socket, tipc_sk_rcv()
running on process context might be preempted by tipc_sk_rcv() running
BH context. As a result, the latter cannot obtain the socket lock as
the lock was obtained by the former, however, the former has no chance
to be run as the latter is owning the CPU now, so headlock happens. To
avoid it, BH should be always disabled in tipc_sk_rcv().

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-21 15:28:15 -04:00
Ying Xue
7b8613e0a1 tipc: fix a potential deadlock
Locking dependency detected below possible unsafe locking scenario:

           CPU0                          CPU1
T0:  tipc_named_rcv()                tipc_rcv()
T1:  [grab nametble write lock]*     [grab node lock]*
T2:  tipc_update_nametbl()           tipc_node_link_up()
T3:  tipc_nodesub_subscribe()        tipc_nametbl_publish()
T4:  [grab node lock]*               [grab nametble write lock]*

The opposite order of holding nametbl write lock and node lock on
above two different paths may result in a deadlock. If we move the
the updating of the name table after link state named out of node
lock, the reverse order of holding locks will be eliminated, and
as a result, the deadlock risk.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-21 15:28:15 -04:00
Fabian Frederick
5c6761adc7 mac80211: remove unnecessary null test before debugfs_remove()
The debugfs_remove() function can safely take NULL parameters
so the additionally null test isn't required, and there's no
other reason to have it here, so remove it.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
[rewrite commit message, re-introduce blank line after assert]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-21 21:08:10 +02:00
Karl Beldan
9208247d74 mac80211: minstrel_ht: add basic support for VHT rates <= 3SS@80MHz
When the new CONFIG_MAC80211_RC_MINSTREL_VHT is not set (default 'N'),
there is no behavioral change including in sampling and MCS_GROUP_RATES
remains 8.
Otherwise MCS_GROUP_RATES is 10, and a module parameter *vht_only*
(default 'true'), restricts the rates selection to VHT when VHT is
supported.

Regarding the debugfs stats buffer:
It is explicitly increased from 8k to 32k to fit every rates incl. when
both HT and VHT rates are enabled, as for the format, before:
type           rate     tpt eprob *prob ret  *ok(*cum)        ok(      cum)
HT20/LGI ABCDP MCS0     0.0   0.0   0.0   1    0(   0)         0(        0)
after:
 type           rate      tpt eprob *prob ret  *ok(*cum)        ok(      cum)
 HT20/LGI ABCDP MCS0      0.0   0.0   0.0   1    0(   0)         0(        0)
VHT40/LGI       MCS5/2    0.0   0.0   0.0   0    0(   0)         0(        0)

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-21 13:25:26 +02:00
Karl Beldan
3ec373c421 mac80211: minstrel_ht: include type (cck/ht) in rates flag
ATM, we grep cck rates idx with idx / MCS_GROUP_RATES ==
MINSTREL_CCK_GROUP.
Matching neither-cck-non-ht rates could be done by replacing '==' with
'>', however it would be less versatile or explicit.
This will allow to match VHT rates with IEEE80211_TX_RC_VHT_MCS.

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-20 21:39:35 +02:00
Karl Beldan
8a0ee4fe19 mac80211: minstrel_ht: macros adjustments for future VHT_GROUPs
No functional change.

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-20 21:39:35 +02:00
Karl Beldan
d4d141cae8 mac80211: minstrel_ht: Increase the range of handled rate indexes
Since 5935839ad7 ("mac80211: improve minstrel_ht rate sorting by
throughput & probability"), the rate indexes are manipulated via u8's
and hence allow for a maximum of 256 mcs_group entries in
minstrel_mcs_groups.

ATM, minstrel_ht advertizes support up to 3HTSS@40MHz, consuming:
8(MCS_GROUP_RATES) * (3(SS)*2(GI)*2(BW)+1(CCK)), i.e. 104 entries.

Support for 3VHTSS@80MHz will require:
10(MCS_GROUP_RATES) * (3(SS)*2(GI)*2(BW)+1(CCK)) +
10(MCS_GROUP_RATES) * (3(SS)*2(GI)*3(BW)), i.e. 130 + 180 entries.

This change moves from u8s to u16s where necessary.

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-20 21:39:35 +02:00
Johannes Berg
8fa74e3aa6 Merge branch 'mac80211' into mac80211-next
This was needed to avoid conflicts in the minstrel changes.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-20 21:39:29 +02:00
Johannes Berg
b08cc24e0a mac80211: fix change flags variable signedness
This showed up as a sparse warning (with higher verbosity) and is
certainly correct - the change flags should be unsigned. It's not
that important since high flag numbers aren't used and bitwise
operations would still work.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-20 21:36:55 +02:00
Florian Westphal
f993bc25e5 net: core: handle encapsulation offloads when computing segment lengths
if ->encapsulation is set we have to use inner_tcp_hdrlen and add the
size of the inner network headers too.

This is 'mostly harmless'; tbf might send skb that is slightly over
quota or drop skb even if it would have fit.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-20 12:38:13 -04:00
Florian Westphal
330966e501 net: make skb_gso_segment error handling more robust
skb_gso_segment has three possible return values:
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL.  This can happen when GSO is used for header verification.

However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.

Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.

However, there have been issues with some protocol handlers erronously not
respecting the specified feature mask in some cases.

It is preferable to get 'have to turn off hw offloading, else slow' reports
rather than 'kernel crashes'.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-20 12:38:13 -04:00
Florian Westphal
1e16aa3ddf net: gso: use feature flag argument in all protocol gso handlers
skb_gso_segment() has a 'features' argument representing offload features
available to the output path.

A few handlers, e.g. GRE, instead re-fetch the features of skb->dev and use
those instead of the provided ones when handing encapsulation/tunnels.

Depending on dev->hw_enc_features of the output device skb_gso_segment() can
then return NULL even when the caller has disabled all GSO feature bits,
as segmentation of inner header thinks device will take care of segmentation.

This e.g. affects the tbf scheduler, which will silently drop GRE-encap GSO skbs
that did not fit the remaining token quota as the segmentation does not work
when device supports corresponding hw offload capabilities.

Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-20 12:38:12 -04:00
David S. Miller
ce8ec48967 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
netfilter fixes for net

The following patchset contains netfilter fixes for your net tree,
they are:

1) Fix missing MODULE_LICENSE() in the new nf_reject_ipv{4,6} modules.

2) Restrict nat and masq expressions to the nat chain type. Otherwise,
   users may crash their kernel if they attach a nat/masq rule to a non
   nat chain.

3) Fix hook validation in nft_compat when non-base chains are used.
   Basically, initialize hook_mask to zero.

4) Make sure you use match/targets in nft_compat from the right chain
   type. The existing validation relies on the table name which can be
   avoided by

5) Better netlink attribute validation in nft_nat. This expression has
   to reject the configuration when no address and proto configurations
   are specified.

6) Interpret NFTA_NAT_REG_*_MAX if only if NFTA_NAT_REG_*_MIN is set.
   Yet another sanity check to reject incorrect configurations from
   userspace.

7) Conditional NAT attribute dumping depending on the existing
   configuration.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-20 11:57:47 -04:00
Jouni Malinen
988568669d cfg80211: Specify frame and reason code for NL80211_CMD_DEL_STATION
The optional NL80211_ATTR_MGMT_SUBTYPE and NL80211_ATTR_REASON_CODE
attributes can now be included in NL80211_CMD_DEL_STATION to indicate to
the driver which frame (Deauthentication/Disassociation) and reason code
in that frame should be used to indicate removal to the specific
station. This is used by drivers that implement AP SME and generate
those frames internally.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-20 16:39:23 +02:00
Karl Beldan
11b2357d5d mac80211: minstrels: fix buffer overflow in HT debugfs rc_stats
ATM an HT rc_stats line is 106 chars.
Times 8(MCS_GROUP_RATES)*3(SS)*2(GI)*2(BW) + CCK(4), i.e. x100, this is
well above the current 8192 - sizeof(*ms) currently allocated.

Fix this by squeezing the output as follows (not that we're short on
memory but this also improves readability and range, the new format adds
one more digit to *ok/*cum and ok/cum):

- Before (HT) (106 ch):
type           rate     throughput  ewma prob   this prob  retry   this succ/attempt   success    attempts
CCK/LP          5.5M           0.0        0.0         0.0      0              0(  0)         0           0
HT20/LGI ABCDP MCS0            0.0        0.0         0.0      1              0(  0)         0           0
- After (75 ch):
type           rate     tpt eprob *prob ret  *ok(*cum)        ok(      cum)
CCK/LP          5.5M    0.0   0.0   0.0   0    0(   0)         0(        0)
HT20/LGI ABCDP MCS0     0.0   0.0   0.0   1    0(   0)         0(        0)

- Align non-HT format Before (non-HT) (83 ch):
rate      throughput  ewma prob  this prob  this succ/attempt   success    attempts
ABCDP  6         0.0        0.0        0.0             0(  0)         0           0
      54         0.0        0.0        0.0             0(  0)         0           0
- After (61 ch):
rate          tpt eprob *prob  *ok(*cum)        ok(      cum)
ABCDP  1      0.0   0.0   0.0    0(   0)         0(        0)
      54      0.0   0.0   0.0    0(   0)         0(        0)

*This also adds dynamic checks for overflow, lowers the size of the
non-HT request (allowing > 30 entries) and replaces the buddy-rounded
allocations (s/sizeof(*ms) + 8192/8192).

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-20 16:37:01 +02:00
Jouni Malinen
89c771e5a6 cfg80211: Convert del_station() callback to use a param struct
This makes it easier to add new parameters for the del_station calls
without having to modify all drivers that use this.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-20 16:24:21 +02:00
Wolfram Sang
140bbc4af0 net: rfkill: drop owner assignment from platform_drivers
A platform_driver does not need to set an owner, it will be populated by the
driver core.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2014-10-20 16:21:58 +02:00
Wolfram Sang
0dd1153813 net: dsa: drop owner assignment from platform_drivers
A platform_driver does not need to set an owner, it will be populated by the
driver core.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2014-10-20 16:21:58 +02:00
Linus Torvalds
e25b492741 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A quick batch of bug fixes:

  1) Fix build with IPV6 disabled, from Eric Dumazet.

  2) Several more cases of caching SKB data pointers across calls to
     pskb_may_pull(), thus referencing potentially free'd memory.  From
     Li RongQing.

  3) DSA phy code tests operation presence improperly, instead of going:

        if (x->ops->foo)
                r = x->ops->foo(args);

     it was going:

        if (x->ops->foo(args))
                r = x->ops->foo(args);

   Fix from Andew Lunn"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  Net: DSA: Fix checking for get_phy_flags function
  ipv6: fix a potential use after free in sit.c
  ipv6: fix a potential use after free in ip6_offload.c
  ipv4: fix a potential use after free in gre_offload.c
  tcp: fix build error if IPv6 is not enabled
2014-10-19 11:41:57 -07:00
Andrew Lunn
228b16cb13 Net: DSA: Fix checking for get_phy_flags function
The check for the presence or not of the optional switch function
get_phy_flags() called the function, rather than checked to see if it
is a NULL pointer. This causes a derefernce of a NULL pointer on all
switch chips except the sf2, the only switch to implement this call.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 6819563e64 ("net: dsa: allow switch drivers to specify phy_device::dev_flags")
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-19 12:46:31 -04:00
Linus Torvalds
0e6e58f941 One cc: stable commit, the rest are a series of minor cleanups which have
been sitting in MST's tree during my vacation.  I changed a function name
 and made one trivial change, then they spent two days in linux-next.
 
 Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUQFBQAAoJENkgDmzRrbjxJRIP/1yCQRElQewxURSmJelyqCdU
 0mHYB0R9Mf3tfre1xnofqs2lWeSMc/4ptKHsVR6pupoztSwnz7HsLHfEFvFJh4mj
 KsaqYElxkNxTcfyHwLjyJS0/J6tG1tYypXGiimTBS0bvFHL3XZdimVgJ6WvX+gO7
 YSaDEX8/EqCERafslS5+gKJlz3drDOnCZCe9y4BDSmsvl2k7bkpSxIn8vsR6jIC0
 c5JpUy6QVF+3XA/J932M7yRs+xpqxNoUWiyY3ar9o3CtQAaQB0ZAetSxY6hTfvVc
 GlNFzCifdsaQwsl2SVsE2h6tWaRhtMtcGWQuhHThIPyIf8XxhYyBRY2FLo70LMz1
 eqtwy6F/Bg/nzUsdee4PZBMeoKHlAEL12RpsEKgfUoLzj16Aqa8ll+Agbglbkw8G
 f3d2FwzKAlpY5NwHETC1wYy52PJ3efqksRWuhokmYpxNSbHJS/lsiJOE7272/4Qr
 MtXuvRmo22tf34XFd5y7zqWjgZ58eeFOqQWi/K+6ZgpqVOvikjrXXKEuiVdjO0ZD
 kTVR/sQKiR+79rzENk80XBhWaMveECNXF1TiZ/3MmURkmEOBRQMxRQ20BX3exvna
 AJ/WVA5DcfXZc1yyqknE1NLGrvSBMJENH13x2QPwrqNWAryOOKuF1VKKIwWlDw5j
 vtx5nXiJa8YYdxI2TJCN
 =JK6x
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio updates from Rusty Russell:
 "One cc: stable commit, the rest are a series of minor cleanups which
  have been sitting in MST's tree during my vacation.  I changed a
  function name and made one trivial change, then they spent two days in
  linux-next"

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits)
  virtio-rng: refactor probe error handling
  virtio_scsi: drop scan callback
  virtio_balloon: enable VQs early on restore
  virtio_scsi: fix race on device removal
  virito_scsi: use freezable WQ for events
  virtio_net: enable VQs early on restore
  virtio_console: enable VQs early on restore
  virtio_scsi: enable VQs early on restore
  virtio_blk: enable VQs early on restore
  virtio_scsi: move kick event out from virtscsi_init
  virtio_net: fix use after free on allocation failure
  9p/trans_virtio: enable VQs early
  virtio_console: enable VQs early
  virtio_blk: enable VQs early
  virtio_net: enable VQs early
  virtio: add API to enable VQs early
  virtio_net: minor cleanup
  virtio-net: drop config_mutex
  virtio_net: drop config_enable
  virtio-blk: drop config_mutex
  ...
2014-10-18 10:25:09 -07:00
Li RongQing
a6d4518da3 ipv6: fix a potential use after free in sit.c
pskb_may_pull() maybe change skb->data and make iph pointer oboslete,
fix it by geting ip header length directly.

Fixes: ca15a078 (sit: generate icmpv6 error when receiving icmpv4 error)
Cc: Oussama Ghorbel <ghorbel@pivasoftware.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-18 13:04:09 -04:00
Li RongQing
fc6fb41cd6 ipv6: fix a potential use after free in ip6_offload.c
pskb_may_pull() maybe change skb->data and make opth pointer oboslete,
so set the opth again

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-18 13:04:08 -04:00
Li RongQing
b4e3cef703 ipv4: fix a potential use after free in gre_offload.c
pskb_may_pull() may change skb->data and make greh pointer oboslete;
so need to reassign greh;
but since first calling pskb_may_pull already ensured that skb->data
has enough space for greh, so move the reference of greh before second
calling pskb_may_pull(), to avoid reassign greh.

Fixes: 7a7ffbabf9("ipv4: fix tunneled VM traffic over hw VXLAN/GRE GSO NIC")
Cc: Wei-Chun Chao <weichunc@plumgrid.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-18 13:04:08 -04:00
Linus Torvalds
2e923b0251 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Include fixes for netrom and dsa (Fabian Frederick and Florian
    Fainelli)

 2) Fix FIXED_PHY support in stmmac, from Giuseppe CAVALLARO.

 3) Several SKB use after free fixes (vxlan, openvswitch, vxlan,
    ip_tunnel, fou), from Li ROngQing.

 4) fec driver PTP support fixes from Luwei Zhou and Nimrod Andy.

 5) Use after free in virtio_net, from Michael S Tsirkin.

 6) Fix flow mask handling for megaflows in openvswitch, from Pravin B
    Shelar.

 7) ISDN gigaset and capi bug fixes from Tilman Schmidt.

 8) Fix route leak in ip_send_unicast_reply(), from Vasily Averin.

 9) Fix two eBPF JIT bugs on x86, from Alexei Starovoitov.

10) TCP_SKB_CB() reorganization caused a few regressions, fixed by Cong
    Wang and Eric Dumazet.

11) Don't overwrite end of SKB when parsing malformed sctp ASCONF
    chunks, from Daniel Borkmann.

12) Don't call sock_kfree_s() with NULL pointers, this function also has
    the side effect of adjusting the socket memory usage.  From Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
  bna: fix skb->truesize underestimation
  net: dsa: add includes for ethtool and phy_fixed definitions
  openvswitch: Set flow-key members.
  netrom: use linux/uaccess.h
  dsa: Fix conversion from host device to mii bus
  tipc: fix bug in bundled buffer reception
  ipv6: introduce tcp_v6_iif()
  sfc: add support for skb->xmit_more
  r8152: return -EBUSY for runtime suspend
  ipv4: fix a potential use after free in fou.c
  ipv4: fix a potential use after free in ip_tunnel_core.c
  hyperv: Add handling of IP header with option field in netvsc_set_hash()
  openvswitch: Create right mask with disabled megaflows
  vxlan: fix a free after use
  openvswitch: fix a use after free
  ipv4: dst_entry leak in ip_send_unicast_reply()
  ipv4: clean up cookie_v4_check()
  ipv4: share tcp_v4_save_options() with cookie_v4_check()
  ipv4: call __ip_options_echo() in cookie_v4_check()
  atm: simplify lanai.c by using module_pci_driver
  ...
2014-10-18 09:31:37 -07:00
Pablo Neira Ayuso
1e2d56a5d3 netfilter: nft_nat: dump attributes if they are set
Dump NFTA_NAT_REG_ADDR_MIN if this is non-zero. Same thing with
NFTA_NAT_REG_PROTO_MIN.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-18 14:16:13 +02:00
Pablo Neira Ayuso
61cfac6b42 netfilter: nft_nat: NFTA_NAT_REG_ADDR_MAX depends on NFTA_NAT_REG_ADDR_MIN
Interpret NFTA_NAT_REG_ADDR_MAX if NFTA_NAT_REG_ADDR_MIN is present,
otherwise, skip it. Same thing with NFTA_NAT_REG_PROTO_MAX.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-18 14:16:12 +02:00
Pablo Neira Ayuso
5c819a3975 netfilter: nft_nat: insufficient attribute validation
We have to validate that we at least get an NFTA_NAT_REG_ADDR_MIN or
NFTA_NFT_REG_PROTO_MIN attribute. Reject the configuration if none
of them are present.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-18 14:16:11 +02:00
Pablo Neira Ayuso
f3f5ddeddd netfilter: nft_compat: validate chain type in match/target
We have to validate the real chain type to ensure that matches/targets
are not used out from their scope (eg. MASQUERADE in nat chain type).
The existing validation relies on the table name, but this is not
sufficient since userspace can fool us by using the appropriate table
name with a different chain type.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-18 14:14:07 +02:00
Florian Fainelli
a28205437b net: dsa: add includes for ethtool and phy_fixed definitions
net/dsa/slave.c uses functions and structures declared in phy_fixed.h
but does not explicitely include it, while dsa.h needs structure
declarations for 'struct ethtool_wolinfo' and 'struct ethtool_eee', fix
those by including the correct header files.

Fixes: ec9436baed ("net: dsa: allow drivers to do link adjustment")
Fixes: ce31b31c68 ("net: dsa: allow updating fixed PHY link information")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 23:54:46 -04:00
Pravin B Shelar
25ef1328a0 openvswitch: Set flow-key members.
This patch adds missing memset which are required to initialize
flow key member. For example for IP flow we need to initialize
ip.frag for all cases.

Found by inspection.

This bug is introduced by commit 0714812134
("openvswitch: Eliminate memset() from flow_extract").

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 23:54:02 -04:00
Fabian Frederick
dc8e54165f netrom: use linux/uaccess.h
replace asm/uaccess.h by linux/uaccess.h

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 23:52:54 -04:00
Jon Paul Maloy
643566d4b4 tipc: fix bug in bundled buffer reception
In commit ec8a2e5621 ("tipc: same receive
code path for connection protocol and data messages") we omitted the
the possiblilty that an arriving message extracted from a bundle buffer
may be a multicast message. Such messages need to be to be delivered to
the socket via a separate function, tipc_sk_mcast_rcv(). As a result,
small multicast messages arriving as members of a bundle buffer will be
silently dropped.

This commit corrects the error by considering this case in the function
tipc_link_bundle_rcv().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 23:50:53 -04:00
Eric Dumazet
870c315138 ipv6: introduce tcp_v6_iif()
Commit 971f10eca1 ("tcp: better TCP_SKB_CB layout to reduce cache line
misses") added a regression for SO_BINDTODEVICE on IPv6.

This is because we still use inet6_iif() which expects that IP6 control
block is still at the beginning of skb->cb[]

This patch adds tcp_v6_iif() helper and uses it where necessary.

Because __inet6_lookup_skb() is used by TCP and DCCP, we add an iif
parameter to it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 971f10eca1 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 23:48:07 -04:00
Li RongQing
d8f00d2710 ipv4: fix a potential use after free in fou.c
pskb_may_pull() maybe change skb->data and make uh pointer oboslete,
so reload uh and guehdr

Fixes: 37dd0247 ("gue: Receive side for Generic UDP Encapsulation")
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 23:45:26 -04:00
Li RongQing
1245dfc8ca ipv4: fix a potential use after free in ip_tunnel_core.c
pskb_may_pull() maybe change skb->data and make eth pointer oboslete,
so set eth after pskb_may_pull()

Fixes:3d7b46cd("ip_tunnel: push generic protocol handling to ip_tunnel module")
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 23:45:26 -04:00
Pravin B Shelar
f47de068f6 openvswitch: Create right mask with disabled megaflows
If megaflows are disabled, the userspace does not send the netlink attribute
OVS_FLOW_ATTR_MASK, and the kernel must create an exact match mask.

sw_flow_mask_set() sets every bytes (in 'range') of the mask to 0xff, even the
bytes that represent padding for struct sw_flow, or the bytes that represent
fields that may not be set during ovs_flow_extract().
This is a problem, because when we extract a flow from a packet,
we do not memset() anymore the struct sw_flow to 0.

This commit gets rid of sw_flow_mask_set() and introduces mask_set_nlattr(),
which operates on the netlink attributes rather than on the mask key. Using
this approach we are sure that only the bytes that the user provided in the
flow are matched.

Also, if the parse_flow_mask_nlattrs() for the mask ENCAP attribute fails, we
now return with an error.

This bug is introduced by commit 0714812134
("openvswitch: Eliminate memset() from flow_extract").

Reported-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 16:49:34 -04:00
Li RongQing
389f48947a openvswitch: fix a use after free
pskb_may_pull() called by arphdr_ok can change skb->data, so put the arp
setting after arphdr_ok to avoid the use the freed memory

Fixes: 0714812134 ("openvswitch: Eliminate memset() from flow_extract.")
Cc: Jesse Gross <jesse@nicira.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 16:21:53 -04:00
Vasily Averin
4062090e3e ipv4: dst_entry leak in ip_send_unicast_reply()
ip_setup_cork() called inside ip_append_data() steals dst entry from rt to cork
and in case errors in __ip_append_data() nobody frees stolen dst entry

Fixes: 2e77d89b2f ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
Signed-off-by: Vasily Averin <vvs@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 15:30:12 -04:00
Cong Wang
461b74c391 ipv4: clean up cookie_v4_check()
We can retrieve opt from skb, no need to pass it as a parameter.
And opt should always be non-NULL, no need to check.

Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
Cc: Eric Dumazet <edumazet@google.com>
Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 12:02:57 -04:00
Cong Wang
e25f866fbc ipv4: share tcp_v4_save_options() with cookie_v4_check()
cookie_v4_check() allocates ip_options_rcu in the same way
with tcp_v4_save_options(), we can just make it a helper function.

Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 12:02:57 -04:00
Cong Wang
2077eebf7d ipv4: call __ip_options_echo() in cookie_v4_check()
commit 971f10eca1 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
missed that cookie_v4_check() still calls ip_options_echo() which uses
IPCB(). It should use TCPCB() at TCP layer, so call __ip_options_echo()
instead.

Fixes: commit 971f10eca1 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
Cc: Eric Dumazet <edumazet@google.com>
Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 12:02:57 -04:00
Fabian Frederick
4e8febd0a7 openvswitch: use vport instead of p
All functions used struct vport *vport except
ovs_vport_find_upcall_portid.

This fixes 1 kerneldoc warning

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15 23:25:33 -04:00
Fabian Frederick
7e78cc46b7 openvswitch: kerneldoc warning fix
s/sock/gs

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15 23:25:33 -04:00
Tom Herbert
04ffcb255f net: Add ndo_gso_check
Add ndo_gso_check which a device can define to indicate whether is
is capable of doing GSO on a packet. This funciton would be called from
the stack to determine whether software GSO is needed to be done. A
driver should populate this function if it advertises GSO types for
which there are combinations that it wouldn't be able to handle. For
instance a device that performs UDP tunneling might only implement
support for transparent Ethernet bridging type of inner packets
or might have limitations on lengths of inner headers.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15 12:11:00 -04:00
Linus Torvalds
0429fbc0bd Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu consistent-ops changes from Tejun Heo:
 "Way back, before the current percpu allocator was implemented, static
  and dynamic percpu memory areas were allocated and handled separately
  and had their own accessors.  The distinction has been gone for many
  years now; however, the now duplicate two sets of accessors remained
  with the pointer based ones - this_cpu_*() - evolving various other
  operations over time.  During the process, we also accumulated other
  inconsistent operations.

  This pull request contains Christoph's patches to clean up the
  duplicate accessor situation.  __get_cpu_var() uses are replaced with
  with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

  Unfortunately, the former sometimes is tricky thanks to C being a bit
  messy with the distinction between lvalues and pointers, which led to
  a rather ugly solution for cpumask_var_t involving the introduction of
  this_cpu_cpumask_var_ptr().

  This converts most of the uses but not all.  Christoph will follow up
  with the remaining conversions in this merge window and hopefully
  remove the obsolete accessors"

* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
  irqchip: Properly fetch the per cpu offset
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
  ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
  Revert "powerpc: Replace __get_cpu_var uses"
  percpu: Remove __this_cpu_ptr
  clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
  sparc: Replace __get_cpu_var uses
  avr32: Replace __get_cpu_var with __this_cpu_write
  blackfin: Replace __get_cpu_var uses
  tile: Use this_cpu_ptr() for hardware counters
  tile: Replace __get_cpu_var uses
  powerpc: Replace __get_cpu_var uses
  alpha: Replace __get_cpu_var
  ia64: Replace __get_cpu_var uses
  s390: cio driver &__get_cpu_var replacements
  s390: Replace __get_cpu_var uses
  mips: Replace __get_cpu_var uses
  MIPS: Replace __get_cpu_var uses in FPU emulator.
  arm: Replace __this_cpu_ptr with raw_cpu_ptr
  ...
2014-10-15 07:48:18 +02:00
Linus Torvalds
6b04908166 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "There is the long-awaited discard support for RBD (Guangliang Zhao,
  Josh Durgin), a pile of RBD bug fixes that didn't belong in late -rc's
  (Ilya Dryomov, Li RongQing), a pile of fs/ceph bug fixes and
  performance and debugging improvements (Yan, Zheng, John Spray), and a
  smattering of cleanups (Chao Yu, Fabian Frederick, Joe Perches)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits)
  ceph: fix divide-by-zero in __validate_layout()
  rbd: rbd workqueues need a resque worker
  libceph: ceph-msgr workqueue needs a resque worker
  ceph: fix bool assignments
  libceph: separate multiple ops with commas in debugfs output
  libceph: sync osd op definitions in rados.h
  libceph: remove redundant declaration
  ceph: additional debugfs output
  ceph: export ceph_session_state_name function
  ceph: include the initial ACL in create/mkdir/mknod MDS requests
  ceph: use pagelist to present MDS request data
  libceph: reference counting pagelist
  ceph: fix llistxattr on symlink
  ceph: send client metadata to MDS
  ceph: remove redundant code for max file size verification
  ceph: remove redundant io_iter_advance()
  ceph: move ceph_find_inode() outside the s_mutex
  ceph: request xattrs if xattr_version is zero
  rbd: set the remaining discard properties to enable support
  rbd: use helpers to handle discard for layered images correctly
  ...
2014-10-15 06:46:01 +02:00
Michael S. Tsirkin
64b4cc3911 9p/trans_virtio: enable VQs early
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, but virtio 9p device
adds self to channel list within probe, at which point VQ can be
used in violation of the spec.

To fix, call virtio_device_ready before using VQs.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-15 10:25:04 +10:30
Eric Dumazet
9b462d02d6 tcp: TCP Small Queues and strange attractors
TCP Small queues tries to keep number of packets in qdisc
as small as possible, and depends on a tasklet to feed following
packets at TX completion time.
Choice of tasklet was driven by latencies requirements.

Then, TCP stack tries to avoid reorders, by locking flows with
outstanding packets in qdisc in a given TX queue.

What can happen is that many flows get attracted by a low performing
TX queue, and cpu servicing TX completion has to feed packets for all of
them, making this cpu 100% busy in softirq mode.

This became particularly visible with latest skb->xmit_more support

Strategy adopted in this patch is to detect when tcp_wfree() is called
from ksoftirqd and let the outstanding queue for this flow being drained
before feeding additional packets, so that skb->ooo_okay can be set
to allow select_queue() to select the optimal queue :

Incoming ACKS are normally handled by different cpus, so this patch
gives more chance for these cpus to take over the burden of feeding
qdisc with future packets.

Tested:

lpaa23:~# ./super_netperf 1400 --google-pacing-rate 3028000 -H lpaa24 -l 3600 &

lpaa23:~# sar -n DEV 1 10 | grep eth1
06:16:18 AM      eth1 595448.00 1190564.00  38381.09 1760253.12      0.00      0.00      1.00
06:16:19 AM      eth1 594858.00 1189686.00  38340.76 1758952.72      0.00      0.00      0.00
06:16:20 AM      eth1 597017.00 1194019.00  38480.79 1765370.29      0.00      0.00      1.00
06:16:21 AM      eth1 595450.00 1190936.00  38380.19 1760805.05      0.00      0.00      0.00
06:16:22 AM      eth1 596385.00 1193096.00  38442.56 1763976.29      0.00      0.00      1.00
06:16:23 AM      eth1 598155.00 1195978.00  38552.97 1768264.60      0.00      0.00      0.00
06:16:24 AM      eth1 594405.00 1188643.00  38312.57 1757414.89      0.00      0.00      1.00
06:16:25 AM      eth1 593366.00 1187154.00  38252.16 1755195.83      0.00      0.00      0.00
06:16:26 AM      eth1 593188.00 1186118.00  38232.88 1753682.57      0.00      0.00      1.00
06:16:27 AM      eth1 596301.00 1192241.00  38440.94 1762733.09      0.00      0.00      0.00
Average:         eth1 595457.30 1190843.50  38381.69 1760664.84      0.00      0.00      0.50
lpaa23:~# ./tc -s -d qd sh dev eth1 | grep backlog
 backlog 7606336b 2513p requeues 167982
 backlog 224072b 74p requeues 566
 backlog 581376b 192p requeues 5598
 backlog 181680b 60p requeues 1070
 backlog 5305056b 1753p requeues 110166    // Here, this TX queue is attracting flows
 backlog 157456b 52p requeues 1758
 backlog 672216b 222p requeues 3025
 backlog 60560b 20p requeues 24541
 backlog 448144b 148p requeues 21258

lpaa23:~# echo 1 >/proc/sys/net/ipv4/tcp_tsq_enable_tcp_wfree_ksoftirqd_detect

Immediate jump to full bandwidth, and traffic is properly
shard on all tx queues.

lpaa23:~# sar -n DEV 1 10 | grep eth1
06:16:46 AM      eth1 1397632.00 2795397.00  90081.87 4133031.26      0.00      0.00      1.00
06:16:47 AM      eth1 1396874.00 2793614.00  90032.99 4130385.46      0.00      0.00      0.00
06:16:48 AM      eth1 1395842.00 2791600.00  89966.46 4127409.67      0.00      0.00      1.00
06:16:49 AM      eth1 1395528.00 2791017.00  89946.17 4126551.24      0.00      0.00      0.00
06:16:50 AM      eth1 1397891.00 2795716.00  90098.74 4133497.39      0.00      0.00      1.00
06:16:51 AM      eth1 1394951.00 2789984.00  89908.96 4125022.51      0.00      0.00      0.00
06:16:52 AM      eth1 1394608.00 2789190.00  89886.90 4123851.36      0.00      0.00      1.00
06:16:53 AM      eth1 1395314.00 2790653.00  89934.33 4125983.09      0.00      0.00      0.00
06:16:54 AM      eth1 1396115.00 2792276.00  89984.25 4128411.21      0.00      0.00      1.00
06:16:55 AM      eth1 1396829.00 2793523.00  90030.19 4130250.28      0.00      0.00      0.00
Average:         eth1 1396158.40 2792297.00  89987.09 4128439.35      0.00      0.00      0.50

lpaa23:~# tc -s -d qd sh dev eth1 | grep backlog
 backlog 7900052b 2609p requeues 173287
 backlog 878120b 290p requeues 589
 backlog 1068884b 354p requeues 5621
 backlog 996212b 329p requeues 1088
 backlog 984100b 325p requeues 115316
 backlog 956848b 316p requeues 1781
 backlog 1080996b 357p requeues 3047
 backlog 975016b 322p requeues 24571
 backlog 990156b 327p requeues 21274

(All 8 TX queues get a fair share of the traffic)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 17:16:26 -04:00
David S. Miller
e53da5fbfc net: Trap attempts to call sock_kfree_s() with a NULL pointer.
Unlike normal kfree() it is never right to call sock_kfree_s() with
a NULL pointer, because sock_kfree_s() also has the side effect of
discharging the memory from the sockets quota.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 17:02:37 -04:00
Cong Wang
dee49f203a rds: avoid calling sock_kfree_s() on allocation failure
It is okay to free a NULL pointer but not okay to mischarge the socket optmem
accounting. Compile test only.

Reported-by: rucsoftsec@gmail.com
Cc: Chien Yen <chien.yen@oracle.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 17:00:19 -04:00
Fabian Frederick
91c4467e3c caif_usb: use target structure member in memset
parent cfusbl was used instead of first structure member 'layer'

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 16:05:45 -04:00
Fabian Frederick
7970f1918f caif_usb: remove redundant memory message
Let MM subsystem display out of memory messages.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 16:05:45 -04:00
Fabian Frederick
6ff1e1e3c8 caif: replace kmalloc/memset 0 by kzalloc
Also add blank line after declaration

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 16:04:07 -04:00
Jiri Pirko
f76936d07c ipv4: fix nexthop attlen check in fib_nh_match
fib_nh_match does not match nexthops correctly. Example:

ip route add 172.16.10/24 nexthop via 192.168.122.12 dev eth0 \
                          nexthop via 192.168.122.13 dev eth0
ip route del 172.16.10/24 nexthop via 192.168.122.14 dev eth0 \
                          nexthop via 192.168.122.15 dev eth0

Del command is successful and route is removed. After this patch
applied, the route is correctly matched and result is:
RTNETLINK answers: No such process

Please consider this for stable trees as well.

Fixes: 4e902c5741 ("[IPv4]: FIB configuration using struct fib_config")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 15:59:37 -04:00
Eric Dumazet
ad971f616a tcp: fix tcp_ack() performance problem
We worked hard to improve tcp_ack() performance, by not accessing
skb_shinfo() in fast path (cd7d8498c9 tcp: change tcp_skb_pcount()
location)

We still have one spurious access because of ACK timestamping,
added in commit e1c8a607b2 ("net-timestamp: ACK timestamp for
bytestreams")

By checking if sk_tsflags has SOF_TIMESTAMPING_TX_ACK set,
we can avoid two cache line misses for the common case.

While we are at it, add two prefetchw() :

One in tcp_ack() to bring skb at the head of write queue.

One in tcp_clean_rtx_queue() loop to bring following skb,
as we will delete skb from the write queue and dirty skb->next->prev.

Add a couple of [un]likely() clauses.

After this patch, tcp_ack() is no longer the most consuming
function in tcp stack.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 15:59:37 -04:00
Ilya Dryomov
f9865f06f7 libceph: ceph-msgr workqueue needs a resque worker
Commit f363e45fd1 ("net/ceph: make ceph_msgr_wq non-reentrant")
effectively removed WQ_MEM_RECLAIM flag from ceph_msgr_wq.  This is
wrong - libceph is very much a memory reclaim path, so restore it.

Cc: stable@vger.kernel.org # needs backporting for < 3.12
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Tested-by: Micha Krause <micha@krausam.de>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-10-14 12:57:04 -07:00
Ilya Dryomov
25f897773b libceph: separate multiple ops with commas in debugfs output
For requests with multiple ops, separate ops with commas instead of \t,
which is a field separator here.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-10-14 12:57:03 -07:00
Ilya Dryomov
70b5bfa360 libceph: sync osd op definitions in rados.h
Bring in missing osd ops and strings, use macros to eliminate multiple
points of maintenance.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-10-14 12:57:02 -07:00
Yan, Zheng
e4339d28f6 libceph: reference counting pagelist
this allow pagelist to present data that may be sent multiple times.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-10-14 12:56:48 -07:00
Li RongQing
02ea80741a ipv6: remove aca_lock spinlock from struct ifacaddr6
no user uses this lock.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 13:15:15 -04:00
Eric Dumazet
b2532eb9ab tcp: fix ooo_okay setting vs Small Queues
TCP Small Queues (tcp_tsq_handler()) can hold one reference on
sk->sk_wmem_alloc, preventing skb->ooo_okay being set.

We should relax test done to set skb->ooo_okay to take care
of this extra reference.

Minimal truesize of skb containing one byte of payload is
SKB_TRUESIZE(1)

Without this fix, we have more chance locking flows into the wrong
transmit queue.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 13:12:00 -04:00
Ilya Dryomov
91883cd27c libceph: don't try checking queue_work() return value
queue_work() doesn't "fail to queue", it returns false if work was
already on a queue, which can't happen here since we allocate
event_work right before we queue it.  So don't bother at all.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-10-14 21:03:25 +04:00
Joe Perches
b9a678994b libceph: Convert pr_warning to pr_warn
Use the more common pr_warn.

Other miscellanea:

o Coalesce formats
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-10-14 21:03:23 +04:00
Li RongQing
589506f1e7 libceph: fix a use after free issue in osdmap_set_max_osd
If the state variable is krealloced successfully, map->osd_state will be
freed, once following two reallocation failed, and exit the function
without resetting map->osd_state, map->osd_state become a wild pointer.

fix it by resetting them after krealloc successfully.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-10-14 21:03:21 +04:00
Ilya Dryomov
dc220db03f libceph: select CRYPTO_CBC in addition to CRYPTO_AES
We want "cbc(aes)" algorithm, so select CRYPTO_CBC too, not just
CRYPTO_AES.  Otherwise on !CRYPTO_CBC kernels we fail rbd map/mount
with

    libceph: error -2 building auth method x request

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-10-14 21:03:20 +04:00
Ilya Dryomov
2cc6128ab2 libceph: resend lingering requests with a new tid
Both not yet registered (r_linger && list_empty(&r_linger_item)) and
registered linger requests should use the new tid on resend to avoid
the dup op detection logic on the OSDs, yet we were doing this only for
"registered" case.  Factor out and simplify the "registered" logic and
use the new helper for "not registered" case as well.

Fixes: http://tracker.ceph.com/issues/8806

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-10-14 21:03:19 +04:00
Ilya Dryomov
f671b581f1 libceph: abstract out ceph_osd_request enqueue logic
Introduce __enqueue_request() and switch to it.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-10-14 21:03:18 +04:00
Daniel Borkmann
26b87c7881 net: sctp: fix remote memory pressure from excessive queueing
This scenario is not limited to ASCONF, just taken as one
example triggering the issue. When receiving ASCONF probes
in the form of ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------>
  [...]
  ---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------>

... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed
ASCONFs and have increasing serial numbers, we process such
ASCONF chunk(s) marked with !end_of_packet and !singleton,
since we have not yet reached the SCTP packet end. SCTP does
only do verification on a chunk by chunk basis, as an SCTP
packet is nothing more than just a container of a stream of
chunks which it eats up one by one.

We could run into the case that we receive a packet with a
malformed tail, above marked as trailing JUNK. All previous
chunks are here goodformed, so the stack will eat up all
previous chunks up to this point. In case JUNK does not fit
into a chunk header and there are no more other chunks in
the input queue, or in case JUNK contains a garbage chunk
header, but the encoded chunk length would exceed the skb
tail, or we came here from an entirely different scenario
and the chunk has pdiscard=1 mark (without having had a flush
point), it will happen, that we will excessively queue up
the association's output queue (a correct final chunk may
then turn it into a response flood when flushing the
queue ;)): I ran a simple script with incremental ASCONF
serial numbers and could see the server side consuming
excessive amount of RAM [before/after: up to 2GB and more].

The issue at heart is that the chunk train basically ends
with !end_of_packet and !singleton markers and since commit
2e3216cd54 ("sctp: Follow security requirement of responding
with 1 packet") therefore preventing an output queue flush
point in sctp_do_sm() -> sctp_cmd_interpreter() on the input
chunk (chunk = event_arg) even though local_cork is set,
but its precedence has changed since then. In the normal
case, the last chunk with end_of_packet=1 would trigger the
queue flush to accommodate possible outgoing bundling.

In the input queue, sctp_inq_pop() seems to do the right thing
in terms of discarding invalid chunks. So, above JUNK will
not enter the state machine and instead be released and exit
the sctp_assoc_bh_rcv() chunk processing loop. It's simply
the flush point being missing at loop exit. Adding a try-flush
approach on the output queue might not work as the underlying
infrastructure might be long gone at this point due to the
side-effect interpreter run.

One possibility, albeit a bit of a kludge, would be to defer
invalid chunk freeing into the state machine in order to
possibly trigger packet discards and thus indirectly a queue
flush on error. It would surely be better to discard chunks
as in the current, perhaps better controlled environment, but
going back and forth, it's simply architecturally not possible.
I tried various trailing JUNK attack cases and it seems to
look good now.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 12:46:22 -04:00
Daniel Borkmann
b69040d8e3 net: sctp: fix panic on duplicate ASCONF chunks
When receiving a e.g. semi-good formed connection scan in the
form of ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ---------------- ASCONF_a; ASCONF_b ----------------->

... where ASCONF_a equals ASCONF_b chunk (at least both serials
need to be equal), we panic an SCTP server!

The problem is that good-formed ASCONF chunks that we reply with
ASCONF_ACK chunks are cached per serial. Thus, when we receive a
same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
not need to process them again on the server side (that was the
idea, also proposed in the RFC). Instead, we know it was cached
and we just resend the cached chunk instead. So far, so good.

Where things get nasty is in SCTP's side effect interpreter, that
is, sctp_cmd_interpreter():

While incoming ASCONF_a (chunk = event_arg) is being marked
!end_of_packet and !singleton, and we have an association context,
we do not flush the outqueue the first time after processing the
ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
queued up, although we set local_cork to 1. Commit 2e3216cd54
changed the precedence, so that as long as we get bundled, incoming
chunks we try possible bundling on outgoing queue as well. Before
this commit, we would just flush the output queue.

Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
continue to process the same ASCONF_b chunk from the packet. As
we have cached the previous ASCONF_ACK, we find it, grab it and
do another SCTP_CMD_REPLY command on it. So, effectively, we rip
the chunk->list pointers and requeue the same ASCONF_ACK chunk
another time. Since we process ASCONF_b, it's correctly marked
with end_of_packet and we enforce an uncork, and thus flush, thus
crashing the kernel.

Fix it by testing if the ASCONF_ACK is currently pending and if
that is the case, do not requeue it. When flushing the output
queue we may relink the chunk for preparing an outgoing packet,
but eventually unlink it when it's copied into the skb right
before transmission.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 12:46:22 -04:00
Daniel Borkmann
9de7922bc7 net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks
Commit 6f4c618ddb ("SCTP : Add paramters validity check for
ASCONF chunk") added basic verification of ASCONF chunks, however,
it is still possible to remotely crash a server by sending a
special crafted ASCONF chunk, even up to pre 2.6.12 kernels:

skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768
 head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950
 end:0x440 dev:<NULL>
 ------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
 <IRQ>
 [<ffffffff8144fb1c>] skb_put+0x5c/0x70
 [<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp]
 [<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp]
 [<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20
 [<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp]
 [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
 [<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0
 [<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
 [<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
 [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
 [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
 [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
 [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0
 [<ffffffff81497078>] ip_local_deliver+0x98/0xa0
 [<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440
 [<ffffffff81496ac5>] ip_rcv+0x275/0x350
 [<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750
 [<ffffffff81460588>] netif_receive_skb+0x58/0x60

This can be triggered e.g., through a simple scripted nmap
connection scan injecting the chunk after the handshake, for
example, ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ------------------ ASCONF; UNKNOWN ------------------>

... where ASCONF chunk of length 280 contains 2 parameters ...

  1) Add IP address parameter (param length: 16)
  2) Add/del IP address parameter (param length: 255)

... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
Address Parameter in the ASCONF chunk is even missing, too.
This is just an example and similarly-crafted ASCONF chunks
could be used just as well.

The ASCONF chunk passes through sctp_verify_asconf() as all
parameters passed sanity checks, and after walking, we ended
up successfully at the chunk end boundary, and thus may invoke
sctp_process_asconf(). Parameter walking is done with
WORD_ROUND() to take padding into account.

In sctp_process_asconf()'s TLV processing, we may fail in
sctp_process_asconf_param() e.g., due to removal of the IP
address that is also the source address of the packet containing
the ASCONF chunk, and thus we need to add all TLVs after the
failure to our ASCONF response to remote via helper function
sctp_add_asconf_response(), which basically invokes a
sctp_addto_chunk() adding the error parameters to the given
skb.

When walking to the next parameter this time, we proceed
with ...

  length = ntohs(asconf_param->param_hdr.length);
  asconf_param = (void *)asconf_param + length;

... instead of the WORD_ROUND()'ed length, thus resulting here
in an off-by-one that leads to reading the follow-up garbage
parameter length of 12336, and thus throwing an skb_over_panic
for the reply when trying to sctp_addto_chunk() next time,
which implicitly calls the skb_put() with that length.

Fix it by using sctp_walk_params() [ which is also used in
INIT parameter processing ] macro in the verification *and*
in ASCONF processing: it will make sure we don't spill over,
that we walk parameters WORD_ROUND()'ed. Moreover, we're being
more defensive and guard against unknown parameter types and
missized addresses.

Joint work with Vlad Yasevich.

Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 12:46:22 -04:00
Pablo Neira Ayuso
493618a92c netfilter: nft_compat: fix hook validation for non-base chains
Set hook_mask to zero for non-base chains, otherwise people may hit
bogus errors from the xt_check_target() and xt_check_match() when
validating the uninitialized hook_mask.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-14 12:52:40 +02:00
Karl Beldan
c7abf25af0 mac80211: fix typo in starting baserate for rts_cts_rate_idx
It affects non-(V)HT rates and can lead to selecting an rts_cts rate
that is not a basic rate or way superior to the reference rate (ATM
rates[0] used for the 1st attempt of the protected frame data).

E.g, assuming drivers register growing (bitrate) sorted tables of
ieee80211_rate-s, having :
- rates[0].idx == d'2 and basic_rates == b'10100
will select rts_cts idx b'10011 & ~d'(BIT(2)-1), i.e. 1, likewise
- rates[0].idx == d'2 and basic_rates == b'10001
will select rts_cts idx b'10000
The first is not a basic rate and the second is > rates[0].

Also, wrt severity of the addressed misbehavior, ATM we only have one
rts_cts_rate_idx rather than one per rate table entry, so this idx might
still point to bitrates > rates[1..MAX_RATES].

Fixes: 5253ffb8c9 ("mac80211: always pick a basic rate to tx RTS/CTS for pre-HT rates")
Cc: stable@vger.kernel.org
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-14 11:16:16 +02:00
Andy Shevchenko
5df1415aee lib80211: remove unused print_ssid()
In kernel we have %*pE specifier to print an escaped buffer.  All users
now switched to that approach.

This fixes a bug as well.  The current implementation wrongly prints
octal numbers: only two first digits are used in case when 3 are
required and the rest of the string ends up cut off.

Additionally by default the \f, \v, \a, and \e are escaped to their
alphabetic representation.  It's safe to do since it is currently used
for messaging only.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: "John W . Linville" <linville@tuxdriver.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:27 +02:00
Rasmus Villemoes
b7a8d756fb batman-adv: replace strnicmp with strncasecmp
The kernel used to contain two functions for length-delimited,
case-insensitive string comparison, strnicmp with correct semantics and
a slightly buggy strncasecmp.  The latter is the POSIX name, so strnicmp
was renamed to strncasecmp, and strnicmp made into a wrapper for the new
strncasecmp to avoid breaking existing users.

To allow the compat wrapper strnicmp to be removed at some point in the
future, and to avoid the extra indirection cost, do
s/strnicmp/strncasecmp/g.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Acked-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:24 +02:00
Rasmus Villemoes
18082746a2 netfilter: replace strnicmp with strncasecmp
The kernel used to contain two functions for length-delimited,
case-insensitive string comparison, strnicmp with correct semantics and
a slightly buggy strncasecmp.  The latter is the POSIX name, so strnicmp
was renamed to strncasecmp, and strnicmp made into a wrapper for the new
strncasecmp to avoid breaking existing users.

To allow the compat wrapper strnicmp to be removed at some point in the
future, and to avoid the extra indirection cost, do
s/strnicmp/strncasecmp/g.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:24 +02:00
Pablo Neira Ayuso
7210e4e38f netfilter: nf_tables: restrict nat/masq expressions to nat chain type
This adds the missing validation code to avoid the use of nat/masq from
non-nat chains. The validation assumes two possible configuration
scenarios:

1) Use of nat from base chain that is not of nat type. Reject this
   configuration from the nft_*_init() path of the expression.

2) Use of nat from non-base chain. In this case, we have to wait until
   the non-base chain is referenced by at least one base chain via
   jump/goto. This is resolved from the nft_*_validate() path which is
   called from nf_tables_check_loops().

The user gets an -EOPNOTSUPP in both cases.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-13 20:42:00 +02:00
Linus Torvalds
77c688ac87 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "The big thing in this pile is Eric's unmount-on-rmdir series; we
  finally have everything we need for that.  The final piece of prereqs
  is delayed mntput() - now filesystem shutdown always happens on
  shallow stack.

  Other than that, we have several new primitives for iov_iter (Matt
  Wilcox, culled from his XIP-related series) pushing the conversion to
  ->read_iter()/ ->write_iter() a bit more, a bunch of fs/dcache.c
  cleanups and fixes (including the external name refcounting, which
  gives consistent behaviour of d_move() wrt procfs symlinks for long
  and short names alike) and assorted cleanups and fixes all over the
  place.

  This is just the first pile; there's a lot of stuff from various
  people that ought to go in this window.  Starting with
  unionmount/overlayfs mess...  ;-/"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (60 commits)
  fs/file_table.c: Update alloc_file() comment
  vfs: Deduplicate code shared by xattr system calls operating on paths
  reiserfs: remove pointless forward declaration of struct nameidata
  don't need that forward declaration of struct nameidata in dcache.h anymore
  take dname_external() into fs/dcache.c
  let path_init() failures treated the same way as subsequent link_path_walk()
  fix misuses of f_count() in ppp and netlink
  ncpfs: use list_for_each_entry() for d_subdirs walk
  vfs: move getname() from callers to do_mount()
  gfs2_atomic_open(): skip lookups on hashed dentry
  [infiniband] remove pointless assignments
  gadgetfs: saner API for gadgetfs_create_file()
  f_fs: saner API for ffs_sb_create_file()
  jfs: don't hash direct inode
  [s390] remove pointless assignment of ->f_op in vmlogrdr ->open()
  ecryptfs: ->f_op is never NULL
  android: ->f_op is never NULL
  nouveau: __iomem misannotations
  missing annotation in fs/file.c
  fs: namespace: suppress 'may be used uninitialized' warnings
  ...
2014-10-13 11:28:42 +02:00
Linus Torvalds
5e40d331bd Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris.

Mostly ima, selinux, smack and key handling updates.

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits)
  integrity: do zero padding of the key id
  KEYS: output last portion of fingerprint in /proc/keys
  KEYS: strip 'id:' from ca_keyid
  KEYS: use swapped SKID for performing partial matching
  KEYS: Restore partial ID matching functionality for asymmetric keys
  X.509: If available, use the raw subjKeyId to form the key description
  KEYS: handle error code encoded in pointer
  selinux: normalize audit log formatting
  selinux: cleanup error reporting in selinux_nlmsg_perm()
  KEYS: Check hex2bin()'s return when generating an asymmetric key ID
  ima: detect violations for mmaped files
  ima: fix race condition on ima_rdwr_violation_check and process_measurement
  ima: added ima_policy_flag variable
  ima: return an error code from ima_add_boot_aggregate()
  ima: provide 'ima_appraise=log' kernel option
  ima: move keyring initialization to ima_init()
  PKCS#7: Handle PKCS#7 messages that contain no X.509 certs
  PKCS#7: Better handling of unsupported crypto
  KEYS: Overhaul key identification when searching for asymmetric keys
  KEYS: Implement binary asymmetric key ID handling
  ...
2014-10-12 10:13:55 -04:00
Linus Torvalds
ca321885b0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "This set fixes a bunch of fallout from the changes that went in during
  this merge window, particularly:

   - Fix fsl_pq_mdio (Claudiu Manoil) and fm10k (Pranith Kumar) build
     failures.

   - Several networking drivers do atomic_set() on page counts where
     that's not exactly legal.  From Eric Dumazet.

   - Make __skb_flow_get_ports() work cleanly with unaligned data, from
     Alexander Duyck.

   - Fix some kernel-doc buglets in rfkill and netlabel, from Fabian
     Frederick.

   - Unbalanced enable_irq_wake usage in bcmgenet and systemport
     drivers, from Florian Fainelli.

   - pxa168_eth needs to depend on HAS_DMA, from Geert Uytterhoeven.

   - Multi-dequeue in the qdisc layer severely bypasses the fairness
     limits the previous code used to enforce, reintroduce in a way that
     at the same time doesn't compromise bulk dequeue opportunities.
     From Jesper Dangaard Brouer.

   - macvlan receive path unnecessarily hops through a softirq by using
     netif_rx() instead of netif_receive_skb().  From Jason Baron"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (51 commits)
  net: systemport: avoid unbalanced enable_irq_wake calls
  net: bcmgenet: avoid unbalanced enable_irq_wake calls
  net: bcmgenet: fix off-by-one in incrementing read pointer
  net: fix races in page->_count manipulation
  mlx4: fix race accessing page->_count
  ixgbe: fix race accessing page->_count
  igb: fix race accessing page->_count
  fm10k: fix race accessing page->_count
  net/phy: micrel: Add clock support for KSZ8021/KSZ8031
  flow-dissector: Fix alignment issue in __skb_flow_get_ports
  net: filter: fix the comments
  Documentation: replace __sk_run_filter with __bpf_prog_run
  macvlan: optimize the receive path
  macvlan: pass 'bool' type to macvlan_count_rx()
  drivers: net: xgene: Add 10GbE ethtool support
  drivers: net: xgene: Add 10GbE support
  drivers: net: xgene: Preparing for adding 10GbE support
  dtb: Add 10GbE node to APM X-Gene SoC device tree
  Documentation: dts: Update section header for APM X-Gene
  MAINTAINERS: Update APM X-Gene section
  ...
2014-10-11 21:19:00 -04:00
Linus Torvalds
ef4a48c513 File locking related changes for v3.18 (pile #1)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUNZK4AAoJEAAOaEEZVoIVI08P/iM7eaIVRnqaqtWw/JBzxiba
 EMDlJYUBSlv6lYk9s8RJT4bMmcmGAKSYzVAHSoPahzNcqTDdFLeDTLGxJ8uKBbjf
 d1qRRdH1yZHGUzCvJq3mEendjfXn435Y3YburUxjLfmzrzW7EbMvndiQsS5dhAm9
 PEZ+wrKF/zFL7LuXa1YznYrbqOD/GRsJAXGEWc3kNwfS9avephVG/RI3GtpI2PJj
 RY1mf8P7+WOlrShYoEuUo5aqs01MnU70LbqGHzY8/QKH+Cb0SOkCHZPZyClpiA+G
 MMJ+o2XWcif3BZYz+dobwz/FpNZ0Bar102xvm2E8fqByr/T20JFjzooTKsQ+PtCk
 DetQptrU2gtyZDKtInJUQSDPrs4cvA13TW+OEB1tT8rKBnmyEbY3/TxBpBTB9E6j
 eb/V3iuWnywR3iE+yyvx24Qe7Pov6deM31s46+Vj+GQDuWmAUJXemhfzPtZiYpMT
 exMXTyDS3j+W+kKqHblfU5f+Bh1eYGpG2m43wJVMLXKV7NwDf8nVV+Wea962ga+w
 BAM3ia4JRVgRWJBPsnre3lvGT5kKPyfTZsoG+kOfRxiorus2OABoK+SIZBZ+c65V
 Xh8VH5p3qyCUBOynXlHJWFqYWe2wH0LfbPrwe9dQwTwON51WF082EMG5zxTG0Ymf
 J2z9Shz68zu0ok8cuSlo
 =Hhee
 -----END PGP SIGNATURE-----

Merge tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux

Pull file locking related changes from Jeff Layton:
 "This release is a little more busy for file locking changes than the
  last:

   - a set of patches from Kinglong Mee to fix the lockowner handling in
     knfsd
   - a pile of cleanups to the internal file lease API.  This should get
     us a bit closer to allowing for setlease methods that can block.

  There are some dependencies between mine and Bruce's trees this cycle,
  and I based my tree on top of the requisite patches in Bruce's tree"

* tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux: (26 commits)
  locks: fix fcntl_setlease/getlease return when !CONFIG_FILE_LOCKING
  locks: flock_make_lock should return a struct file_lock (or PTR_ERR)
  locks: set fl_owner for leases to filp instead of current->files
  locks: give lm_break a return value
  locks: __break_lease cleanup in preparation of allowing direct removal of leases
  locks: remove i_have_this_lease check from __break_lease
  locks: move freeing of leases outside of i_lock
  locks: move i_lock acquisition into generic_*_lease handlers
  locks: define a lm_setup handler for leases
  locks: plumb a "priv" pointer into the setlease routines
  nfsd: don't keep a pointer to the lease in nfs4_file
  locks: clean up vfs_setlease kerneldoc comments
  locks: generic_delete_lease doesn't need a file_lock at all
  nfsd: fix potential lease memory leak in nfs4_setlease
  locks: close potential race in lease_get_mtime
  security: make security_file_set_fowner, f_setown and __f_setown void return
  locks: consolidate "nolease" routines
  locks: remove lock_may_read and lock_may_write
  lockd: rip out deferred lock handling from testlock codepath
  NFSD: Get reference of lockowner when coping file_lock
  ...
2014-10-11 13:21:34 -04:00
Pablo Neira Ayuso
ab2d7251d6 netfilter: missing module license in the nf_reject_ipvX modules
[   23.545204] nf_reject_ipv4: module license 'unspecified' taints kernel.

Fixes: c8d7b98 ("netfilter: move nf_send_resetX() code to nf_reject_ipvX modules")
Reported-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-11 14:59:41 +02:00
Eric Dumazet
4c450583d9 net: fix races in page->_count manipulation
This is illegal to use atomic_set(&page->_count, ...) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.

The only case it is valid is when page->_count is 0

Fixes: 540eb7bf0b ("net: Update alloc frag to reduce get/put page usage and recycle pages")
Signed-off-by: Eric Dumaze <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-10 15:37:29 -04:00
Alexander Duyck
5af7fb6e3e flow-dissector: Fix alignment issue in __skb_flow_get_ports
This patch addresses a kernel unaligned access bug seen on a sparc64 system
with an igb adapter.  Specifically the __skb_flow_get_ports was returning a
be32 pointer which was then having the value directly returned.

In order to prevent this it is actually easier to simply not populate the
ports or address values when an skb is not present.  In this case the
assumption is that the data isn't needed and rather than slow down the
faster aligned accesses by making them have to assume the unaligned path on
architectures that don't support efficent unaligned access it makes more
sense to simply switch off the bits that were copying the source and
destination address/port for the case where we only care about the protocol
types and lengths which are normally 16 bit fields anyway.

Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-10 15:33:47 -04:00
Li RongQing
8ea6e345a6 net: filter: fix the comments
1. sk_run_filter has been renamed, sk_filter() is using SK_RUN_FILTER.
2. Remove wrong comments about storing intermediate value.
3. replace sk_run_filter with __bpf_prog_run for check_load_and_stores's
comments

Cc: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-10 15:11:51 -04:00
Alexei Starovoitov
38b3629adb net: bpf: fix bpf syscall dependence on anon_inodes
minimal configurations where EPOLL, PERF_EVENTS, etc are disabled,
but NET is enabled, are failing to build with link error:
kernel/built-in.o: In function `bpf_prog_load':
syscall.c:(.text+0x3b728): undefined reference to `anon_inode_getfd'

fix it by selecting ANON_INODES when NET is enabled

Reported-by: Michal Sojka <sojkam1@fel.cvut.cz>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-10 15:02:23 -04:00
David S. Miller
7b6fa1eef6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter fixes for net-next

This batch contains two fixes for what you have in your net-next,
they are:

1) Remove nf_send_reset6() from header file. This function now resides
   in the nf_reject_ipv6 module. Reported by Eric Dumazet.

2) Fix wrong NFT_REJECT_ICMPX_MAX definition and adjust code to fix
   errors reported by Dan Carpenter's static analysis tools.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-10 15:01:09 -04:00
David S. Miller
4511a4a50e Merge tag 'master-2014-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-10-09

Please pull this batch of fixes intended for the 3.18 stream!

Andrea Merello makes rtl818x_pci use a more reasonable transmission
rate for HW generated frames.

Fabian Frederick tweaks some kernel-doc bits to avoid warnings.

Larry Finger corrects a possible unaligned access in the rtlwifi code.

Marek Puzyniak avoids a kernel panic in ath9k_hw_reset.

Sujith Manoharan goes for the hat trick -- he fixes a smatch warning
in the shared ath code, he fixes a crash in ath9k, and he corrects
a sequence number assignment problem in ath9k too.

For ease of merging, I pulled the last bits of the wireless tree as well...

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-10 14:49:55 -04:00
Karl Beldan
2a84ee8625 cfg80211: set the rates mask in connection probes over specified freq
ATM, specifying the frequency when connecting sends a void 'supported
rates' EID.

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
[fix memory leak in error path]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-10 17:11:13 +02:00
Michal Kazior
486cf4c08f mac80211: enable DFS with channel contexts
It is okay to enable DFS for channel contexts
based drivers as long as no combination advertises
radar detection and multi-channel operation at the
same time.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-10 17:08:33 +02:00
Linus Torvalds
c798360cd1 Merge branch 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu updates from Tejun Heo:
 "A lot of activities on percpu front.  Notable changes are...

   - percpu allocator now can take @gfp.  If @gfp doesn't contain
     GFP_KERNEL, it tries to allocate from what's already available to
     the allocator and a work item tries to keep the reserve around
     certain level so that these atomic allocations usually succeed.

     This will replace the ad-hoc percpu memory pool used by
     blk-throttle and also be used by the planned blkcg support for
     writeback IOs.

     Please note that I noticed a bug in how @gfp is interpreted while
     preparing this pull request and applied the fix 6ae833c7fe
     ("percpu: fix how @gfp is interpreted by the percpu allocator")
     just now.

   - percpu_ref now uses longs for percpu and global counters instead of
     ints.  It leads to more sparse packing of the percpu counters on
     64bit machines but the overhead should be negligible and this
     allows using percpu_ref for refcnting pages and in-memory objects
     directly.

   - The switching between percpu and single counter modes of a
     percpu_ref is made independent of putting the base ref and a
     percpu_ref can now optionally be initialized in single or killed
     mode.  This allows avoiding percpu shutdown latency for cases where
     the refcounted objects may be synchronously created and destroyed
     in rapid succession with only a fraction of them reaching fully
     operational status (SCSI probing does this when combined with
     blk-mq support).  It's also planned to be used to implement forced
     single mode to detect underflow more timely for debugging.

  There's a separate branch percpu/for-3.18-consistent-ops which cleans
  up the duplicate percpu accessors.  That branch causes a number of
  conflicts with s390 and other trees.  I'll send a separate pull
  request w/ resolutions once other branches are merged"

* 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits)
  percpu: fix how @gfp is interpreted by the percpu allocator
  blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode
  percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky
  percpu_ref: add PERCPU_REF_INIT_* flags
  percpu_ref: decouple switching to percpu mode and reinit
  percpu_ref: decouple switching to atomic mode and killing
  percpu_ref: add PCPU_REF_DEAD
  percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch
  percpu_ref: replace pcpu_ prefix with percpu_
  percpu_ref: minor code and comment updates
  percpu_ref: relocate percpu_ref_reinit()
  Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe"
  Revert "percpu: free percpu allocation info for uniprocessor system"
  percpu-refcount: make percpu_ref based on longs instead of ints
  percpu-refcount: improve WARN messages
  percpu: fix locking regression in the failure path of pcpu_alloc()
  percpu-refcount: add @gfp to percpu_ref_init()
  proportions: add @gfp to init functions
  percpu_counter: add @gfp to percpu_counter_init()
  percpu_counter: make percpu_counters_lock irq-safe
  ...
2014-10-10 07:26:02 -04:00
Jesper Dangaard Brouer
b8358d70ce net_sched: restore qdisc quota fairness limits after bulk dequeue
Restore the quota fairness between qdisc's, that we broke with commit
5772e9a346 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE").

Before that commit, the quota in __qdisc_run() were in packets as
dequeue_skb() would only dequeue a single packet, that assumption
broke with bulk dequeue.

We choose not to account for the number of packets inside the TSO/GSO
packets (accessable via "skb_gso_segs").  As the previous fairness
also had this "defect". Thus, GSO/TSO packets counts as a single
packet.

Further more, we choose to slack on accuracy, by allowing a bulk
dequeue try_bulk_dequeue_skb() to exceed the "packets" limit, only
limited by the BQL bytelimit.  This is done because BQL prefers to get
its full budget for appropriate feedback from TX completion.

In future, we might consider reworking this further and, if it allows,
switch to a time-based model, as suggested by Eric. Right now, we only
restore old semantics.

Joint work with Eric, Hannes, Daniel and Jesper.  Hannes wrote the
first patch in cooperation with Daniel and Jesper.  Eric rewrote the
patch.

Fixes: 5772e9a346 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-09 19:12:26 -04:00
Masanari Iida
de3f0d0eff net: Missing @ before descriptions cause make xmldocs warning
This patch fix following warning.
Warning(.//net/core/skbuff.c:4142): No description found for parameter 'header_len'
Warning(.//net/core/skbuff.c:4142): No description found for parameter 'data_len'
Warning(.//net/core/skbuff.c:4142): No description found for parameter 'max_page_order'
Warning(.//net/core/skbuff.c:4142): No description found for parameter 'errcode'
Warning(.//net/core/skbuff.c:4142): No description found for parameter 'gfp_mask'

Acutually the descriptions exist, but missing "@" in front.

This problem start to happen when following commit was merged
into Linus's tree during 3.18-rc1 merge period.
commit 2e4e441071
net: add alloc_skb_with_frags() helper

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-09 18:57:14 -04:00
Fabian Frederick
408b18abf6 mac80211: directly return ieee80211_vif_use_reserved_context()
No need to store ieee80211_vif_use_reserved_context()
result and test it before returning.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 20:54:04 +02:00
Luciano Coelho
0f791eb47f mac80211: allow channel switch with multiple channel contexts
Channel switch with multiple channel contexts should now work fine.
Remove check that disallows switches when multiple contexts are in
use.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:30:09 +02:00
Luciano Coelho
0c21e6320f mac80211: wait for the first beacon on the new channel after CSA
Instead of immediately reopening the queues (in case of block_tx),
calling the post_channel_switch operation and sending the
notification, wait for the first beacon on the new channel.  This
makes sure that we don't lose packets if the AP/GO is not on the new
channel yet.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:30:09 +02:00
Luciano Coelho
f1d65583bc mac80211: add post_channel_switch driver operation
As a counterpart to the pre_channel_switch operation, add a
post_channel_switch operation.  This allows the drivers to go back to
a normal configuration after the channel switch is completed.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:30:09 +02:00
Luciano Coelho
6d027bcc8a mac80211: add pre_channel_switch driver operation
Some drivers may need to prepare for a channel switch also when it is
initiated from the remote side (eg. station, P2P client).  To make
this possible, add a generic callback that can be called for all
interface types.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:30:08 +02:00
Luciano Coelho
e9a21949b7 mac80211: add extended channel switching capability if the driver supports CSA
The Extended Channel Switching capability bit in the extended
capabilities element must be set if the driver supports CSA on
non-beaconing interfaces.

Since this capability needs to be set during driver registration, the
extended_capabiliities global variable needs to be moved to the local
structure so that it can be modified.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:30:08 +02:00
Luciano Coelho
2ba45384e5 mac80211: add device_timestamp to the ieee80211_channel_switch struct
Some devices may need the device timestamp in order to synchronize the
channel switch.  To pass this value back to the driver, add it to the
channel switch structure and copy the device_timestamp value received
in the rx info structure into it.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:30:08 +02:00
Luciano Coelho
252e07ca5f nl80211: sanity check the channel switch counter value
The nl80211 channel switch count attribute
(NL80211_ATTR_CH_SWITCH_COUNT) is specified as u32, but the
specification uses u8 for the counter.  To make sure strange things
don't happen without informing the user, sanity check the value and
return -EINVAL if it doesn't fit in u8.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:25:11 +02:00
Henning Rogge
a2db2ed3fb mac80211: implement cfg80211_ops to query mesh proxy path table
Implement get_mpp and dump_mpp cfg80211_ops to export the content of the
802.11s mesh proxy path table to userspace.

Signed-off-by: Henning Rogge <henning.rogge@fkie.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:19:07 +02:00
Henning Rogge
66be7d2bcd cfg80211: add ops to query mesh proxy path table
Add two new cfg80211 operations for querying a table with proxied mesh
paths.

Signed-off-by: Henning Rogge <henning.rogge@fkie.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:19:07 +02:00
Fabian Frederick
bc37b16870 net: rfkill: kernel-doc warning fixes
Correct the kernel-doc, the parameter is called "blocked" not "state".

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:16:15 +02:00
Luciano Coelho
c12bc4885f mac80211: return the vif's chandef in ieee80211_cfg_get_channel()
The chandef of the channel context a vif is using may be different
than the chandef of the vif itself.  For instance, the bandwidth used
by the vif may be narrower than the one configured in the channel
context.  To avoid confusion, return the vif's chandef in
ieee80211_cfg_get_channel() instead of the chandef of the channel
context.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:01:58 +02:00
Karl Beldan
cc61d8df0a mac80211: minstrel_ht: fix MCS_GROUP_RATES usage
Commit 5935839ad7 ("mac80211: improve minstrel_ht rate sorting by
throughput & probability") replaced the constant 8 with MCS_GROUP_RATES
when getting the number of streams of an HT MCS. See commit 7a5e3fa2c8
("mac80211: minstrel_ht: replace some occurences of MCS_GROUP_RATES").

Fixes: 5935839ad7 ("mac80211: improve minstrel_ht rate sorting by throughput & probability")
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 10:51:19 +02:00
Liad Kaufman
8975ae88e1 mac80211: fix warning on htmldocs for last_tdls_pkt_time
Forgot to add an entry to the struct description of sta_info.

Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 10:33:29 +02:00
Al Viro
24dff96a37 fix misuses of f_count() in ppp and netlink
we used to check for "nobody else could start doing anything with
that opened file" by checking that refcount was 2 or less - one
for descriptor table and one we'd acquired in fget() on the way to
wherever we are.  That was race-prone (somebody else might have
had a reference to descriptor table and do fget() just as we'd
been checking) and it had become flat-out incorrect back when
we switched to fget_light() on those codepaths - unlike fget(),
it doesn't grab an extra reference unless the descriptor table
is shared.  The same change allowed a race-free check, though -
we are safe exactly when refcount is less than 2.

It was a long time ago; pre-2.6.12 for ioctl() (the codepath leading
to ppp one) and 2.6.17 for sendmsg() (netlink one).  OTOH,
netlink hadn't grown that check until 3.9 and ppp used to live
in drivers/net, not drivers/net/ppp until 3.1.  The bug existed
well before that, though, and the same fix used to apply in old
location of file.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:17 -04:00
Fabian Frederick
59f35b810e netlabel: kernel-doc warning fix
no secid argument in netlbl_cfg_unlbl_static_del

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-09 01:40:05 -04:00
Linus Torvalds
35a9ad8af0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Most notable changes in here:

   1) By far the biggest accomplishment, thanks to a large range of
      contributors, is the addition of multi-send for transmit.  This is
      the result of discussions back in Chicago, and the hard work of
      several individuals.

      Now, when the ->ndo_start_xmit() method of a driver sees
      skb->xmit_more as true, it can choose to defer the doorbell
      telling the driver to start processing the new TX queue entires.

      skb->xmit_more means that the generic networking is guaranteed to
      call the driver immediately with another SKB to send.

      There is logic added to the qdisc layer to dequeue multiple
      packets at a time, and the handling mis-predicted offloads in
      software is now done with no locks held.

      Finally, pktgen is extended to have a "burst" parameter that can
      be used to test a multi-send implementation.

      Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4,
      virtio_net

      Adding support is almost trivial, so export more drivers to
      support this optimization soon.

      I want to thank, in no particular or implied order, Jesper
      Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal
      Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann,
      David Tat, Hannes Frederic Sowa, and Rusty Russell.

   2) PTP and timestamping support in bnx2x, from Michal Kalderon.

   3) Allow adjusting the rx_copybreak threshold for a driver via
      ethtool, and add rx_copybreak support to enic driver.  From
      Govindarajulu Varadarajan.

   4) Significant enhancements to the generic PHY layer and the bcm7xxx
      driver in particular (EEE support, auto power down, etc.) from
      Florian Fainelli.

   5) Allow raw buffers to be used for flow dissection, allowing drivers
      to determine the optimal "linear pull" size for devices that DMA
      into pools of pages.  The objective is to get exactly the
      necessary amount of headers into the linear SKB area pre-pulled,
      but no more.  The new interface drivers use is eth_get_headlen().
      From WANG Cong, with driver conversions (several had their own
      by-hand duplicated implementations) by Alexander Duyck and Eric
      Dumazet.

   6) Support checksumming more smoothly and efficiently for
      encapsulations, and add "foo over UDP" facility.  From Tom
      Herbert.

   7) Add Broadcom SF2 switch driver to DSA layer, from Florian
      Fainelli.

   8) eBPF now can load programs via a system call and has an extensive
      testsuite.  Alexei Starovoitov and Daniel Borkmann.

   9) Major overhaul of the packet scheduler to use RCU in several major
      areas such as the classifiers and rate estimators.  From John
      Fastabend.

  10) Add driver for Intel FM10000 Ethernet Switch, from Alexander
      Duyck.

  11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric
      Dumazet.

  12) Add Datacenter TCP congestion control algorithm support, From
      Florian Westphal.

  13) Reorganize sk_buff so that __copy_skb_header() is significantly
      faster.  From Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits)
  netlabel: directly return netlbl_unlabel_genl_init()
  net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers
  net: description of dma_cookie cause make xmldocs warning
  cxgb4: clean up a type issue
  cxgb4: potential shift wrapping bug
  i40e: skb->xmit_more support
  net: fs_enet: Add NAPI TX
  net: fs_enet: Remove non NAPI RX
  r8169:add support for RTL8168EP
  net_sched: copy exts->type in tcf_exts_change()
  wimax: convert printk to pr_foo()
  af_unix: remove 0 assignment on static
  ipv6: Do not warn for informational ICMP messages, regardless of type.
  Update Intel Ethernet Driver maintainers list
  bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING
  tipc: fix bug in multicast congestion handling
  net: better IFF_XMIT_DST_RELEASE support
  net/mlx4_en: remove NETDEV_TX_BUSY
  3c59x: fix bad split of cpu_to_le32(pci_map_single())
  net: bcmgenet: fix Tx ring priority programming
  ...
2014-10-08 21:40:54 -04:00
David S. Miller
64b1f00a08 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-10-08 16:22:22 -04:00
Fabian Frederick
16b99a4f66 netlabel: directly return netlbl_unlabel_genl_init()
No need to store netlbl_unlabel_genl_init result and test it before returning.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-08 16:08:04 -04:00
WANG Cong
5301e3e117 net_sched: copy exts->type in tcf_exts_change()
We need to copy exts->type when committing the change, otherwise
it would be always 0. This is a quick fix for -net and -stable,
for net-next tcf_exts will be removed.

Fixes: commit 33be627159 ("net_sched: act: use standard struct list_head")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-08 15:41:27 -04:00
Fabian Frederick
2f29fed3f8 net: rfkill: kernel-doc warning fixes
s/state/blocked

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-10-08 15:24:15 -04:00
Linus Torvalds
6dea0737bc Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "Highlights:

   - support the NFSv4.2 SEEK operation (allowing clients to support
     SEEK_HOLE/SEEK_DATA), thanks to Anna.
   - end the grace period early in a number of cases, mitigating a
     long-standing annoyance, thanks to Jeff
   - improve SMP scalability, thanks to Trond"

* 'for-3.18' of git://linux-nfs.org/~bfields/linux: (55 commits)
  nfsd: eliminate "to_delegation" define
  NFSD: Implement SEEK
  NFSD: Add generic v4.2 infrastructure
  svcrdma: advertise the correct max payload
  nfsd: introduce nfsd4_callback_ops
  nfsd: split nfsd4_callback initialization and use
  nfsd: introduce a generic nfsd4_cb
  nfsd: remove nfsd4_callback.cb_op
  nfsd: do not clear rpc_resp in nfsd4_cb_done_sequence
  nfsd: fix nfsd4_cb_recall_done error handling
  nfsd4: clarify how grace period ends
  nfsd4: stop grace_time update at end of grace period
  nfsd: skip subsequent UMH "create" operations after the first one for v4.0 clients
  nfsd: set and test NFSD4_CLIENT_STABLE bit to reduce nfsdcltrack upcalls
  nfsd: serialize nfsdcltrack upcalls for a particular client
  nfsd: pass extra info in env vars to upcalls to allow for early grace period end
  nfsd: add a v4_end_grace file to /proc/fs/nfsd
  lockd: add a /proc/fs/lockd/nlm_end_grace file
  nfsd: reject reclaim request when client has already sent RECLAIM_COMPLETE
  nfsd: remove redundant boot_time parm from grace_done client tracking op
  ...
2014-10-08 12:51:44 -04:00
Linus Torvalds
25641c0c8d NFS client updates for Linux 3.18
Highlights include:
 
 Stable fixes:
 - fix an NFSv4.1 state renewal regression
 - fix open/lock state recovery error handling
 - fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
 - fix statd when reconnection fails
 - Don't wake tasks during connection abort
 - Don't start reboot recovery if lease check fails
 - fix duplicate proc entries
 
 Features:
 - pNFS block driver fixes and clean ups from Christoph
 - More code cleanups from Anna
 - Improve mmap() writeback performance
 - Replace use of PF_TRANS with a more generic mechanism for avoiding
   deadlocks in nfs_release_page
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUMpFYAAoJEGcL54qWCgDywHYP/A7XNykwOGhoHVP1Cgr3xqoz
 gVhAw97AEMZE8xSNVEGS++pJTe59JVzsIsYAwdHMwePV33l3zyzYorae6N9p7zWF
 0xVaNQ4qNLVhbrNLAoB5KA/c3/jMnNjF5t15+8akZad5pt4kXLlhSKjyVpdEEtJE
 A0eneXShMYEeLZoOJhpQt5bsw0OZ8YbWWEMjGlDqyeelvV3K1+zfivQOoyX6hS4w
 XFkPEDmU7zunE/xFP9ZoUaVdLO0TvOWfEZ7STWoHm7NuWfPQiDb9w1mTnuZbZyka
 ssezoGcitzwsjCcQ5e1iKTOoFRIsm/zYXFQgFQL7VFMBU1Tss9Of8047EyDkqcPF
 GxctsGg0gQ2FkG7yx7JH7AKpyibOIuByQrQQ916coWSf7K0L4H4Rcky3vryroylP
 1e1RI49xu215OTm+dLvlvYCv55bqCrTmaUGImZac18+ixD2eh6MNfW2ubSdxk89L
 U2rTFV09Bd52N7IQOGQx1FBEI2ZnIFUV4UaFz7v+rGFxOnk6+WYe+iWyb4wC70Yc
 8Jh/gTIQDd5aghql3FTieMOyfEvO6Re4pLMXmqEWMAevicx2t8DwkJriRu6X8Iy2
 rlDlBPwu5QmRWC20Dc897f0VajwDtwdeB8puod7nobOWzOfx4FrNqLJ+jR3pmHUk
 0otvJytqemXt+zkqqHKK
 =/OQi
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable fixes:
   - fix an NFSv4.1 state renewal regression
   - fix open/lock state recovery error handling
   - fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
   - fix statd when reconnection fails
   - don't wake tasks during connection abort
   - don't start reboot recovery if lease check fails
   - fix duplicate proc entries

  Features:
  - pNFS block driver fixes and clean ups from Christoph
  - More code cleanups from Anna
  - Improve mmap() writeback performance
  - Replace use of PF_TRANS with a more generic mechanism for avoiding
    deadlocks in nfs_release_page"

* tag 'nfs-for-3.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (66 commits)
  NFSv4.1: Fix an NFSv4.1 state renewal regression
  NFSv4: fix open/lock state recovery error handling
  NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
  NFS: Fabricate fscache server index key correctly
  SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT
  NFSv3: Fix missing includes of nfs3_fs.h
  NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page()
  NFS: avoid waiting at all in nfs_release_page when congested.
  NFS: avoid deadlocks with loop-back mounted NFS filesystems.
  MM: export page_wakeup functions
  SCHED: add some "wait..on_bit...timeout()" interfaces.
  NFS: don't use STABLE writes during writeback.
  NFSv4: use exponential retry on NFS4ERR_DELAY for async requests.
  rpc: Add -EPERM processing for xs_udp_send_request()
  rpc: return sent and err from xs_sendpages()
  lockd: Try to reconnect if statd has moved
  SUNRPC: Don't wake tasks during connection abort
  Fixing lease renewal
  nfs: fix duplicate proc entries
  pnfs/blocklayout: Fix a 64-bit division/remainder issue in bl_map_stripe
  ...
2014-10-08 12:49:23 -04:00
Linus Torvalds
28596c9722 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull "trivial tree" updates from Jiri Kosina:
 "Usual pile from trivial tree everyone is so eagerly waiting for"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  Remove MN10300_PROC_MN2WS0038
  mei: fix comments
  treewide: Fix typos in Kconfig
  kprobes: update jprobe_example.c for do_fork() change
  Documentation: change "&" to "and" in Documentation/applying-patches.txt
  Documentation: remove obsolete pcmcia-cs from Changes
  Documentation: update links in Changes
  Documentation: Docbook: Fix generated DocBook/kernel-api.xml
  score: Remove GENERIC_HAS_IOMAP
  gpio: fix 'CONFIG_GPIO_IRQCHIP' comments
  tty: doc: Fix grammar in serial/tty
  dma-debug: modify check_for_stack output
  treewide: fix errors in printk
  genirq: fix reference in devm_request_threaded_irq comment
  treewide: fix synchronize_rcu() in comments
  checkstack.pl: port to AArch64
  doc: queue-sysfs: minor fixes
  init/do_mounts: better syntax description
  MIPS: fix comment spelling
  powerpc/simpleboot: fix comment
  ...
2014-10-07 21:16:26 -04:00
Linus Torvalds
d0cd84817c dmaengine-3.17
1/ Step down as dmaengine maintainer see commit 08223d80df "dmaengine
    maintainer update"
 
 2/ Removal of net_dma, as it has been marked 'broken' since 3.13 (commit
    7787380336 "net_dma: mark broken"), without reports of performance
    regression.
 
 3/ Miscellaneous fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUKDLKAAoJEB7SkWpmfYgC7wwP/iNHqRjf1suMUTBIF3P6Hgbe
 VCUwh0IkuujMPDG46WRn6cYzarRxVPLoGaLHLPszgjI6pmGPVv19wqeDOlUxtcmr
 0iQWEWv/zqseaAIW+4gj/WYCyMgKil49EUBJKCZCfNmIaad+e0pr8f0uE5yOkHPM
 tqWoZERu9A4dlXGr1TjeOZVzdnPrCt92MrLDN6ZZ6tMuJaEc5PauaLxKTeGy5fYj
 UB+k1xJQzECbsYfpB+uCVYl5/qPO1rNyuBYS8THCsW+JYmrbbfH2kkF2lo2FaUpO
 8Yd50FtzXHKWwAt7BzfIwU2M7x0wRmryrC/xsQi6M+WmVeHYvvHUIpzaA66xRZ5x
 fCy3Fu8sEnmnmboAbh2v2c5uTycqRl2xPzbpLAuxglloXIxzi3ckp6ESF/Z4SldH
 oxIoEievN7lah3vKgvlHZYcWDzrYr8EKf/EzFe9RqDBQDKtzDzre1H9Uivr387Vm
 uFUcGHYG/GXuX47C7EUsMtaSW2UEoR2ytw/HR6CKFPTVXwAzEO6kA9vg0EqL0iIq
 2wVLgavlZuwegmaUBgnr+bgVZMvVN7OU7fAIRVe5xNO6itrPKvheSlQthmRiiq9C
 uzOu4PS6PexqzHUNPCcJpCsj+lawmCSrE0bxtPzTA/CQInVgWs219V9+W5Gn/0YA
 EARN9k6ueX9PZPQrPQLm
 =BBBv
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine

Pull dmaengine updates from Dan Williams:
 "Even though this has fixes marked for -stable, given the size and the
  needed conflict resolutions this is 3.18-rc1/merge-window material.

  These patches have been languishing in my tree for a long while.  The
  fact that I do not have the time to do proper/prompt maintenance of
  this tree is a primary factor in the decision to step down as
  dmaengine maintainer.  That and the fact that the bulk of drivers/dma/
  activity is going through Vinod these days.

  The net_dma removal has not been in -next.  It has developed simple
  conflicts against mainline and net-next (for-3.18).

  Continuing thanks to Vinod for staying on top of drivers/dma/.

  Summary:

   1/ Step down as dmaengine maintainer see commit 08223d80df
      "dmaengine maintainer update"

   2/ Removal of net_dma, as it has been marked 'broken' since 3.13
      (commit 7787380336 "net_dma: mark broken"), without reports of
      performance regression.

   3/ Miscellaneous fixes"

* tag 'dmaengine-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
  net: make tcp_cleanup_rbuf private
  net_dma: revert 'copied_early'
  net_dma: simple removal
  dmaengine maintainer update
  dmatest: prevent memory leakage on error path in thread
  ioat: Use time_before_jiffies()
  dmaengine: fix xor sources continuation
  dma: mv_xor: Rename __mv_xor_slot_cleanup() to mv_xor_slot_cleanup()
  dma: mv_xor: Remove all callers of mv_xor_slot_cleanup()
  dma: mv_xor: Remove unneeded mv_xor_clean_completed_slots() call
  ioat: Use pci_enable_msix_exact() instead of pci_enable_msix()
  drivers: dma: Include appropriate header file in dca.c
  drivers: dma: Mark functions as static in dma_v3.c
  dma: mv_xor: Add DMA API error checks
  ioat/dca: Use dev_is_pci() to check whether it is pci device
2014-10-07 20:39:25 -04:00
Fabian Frederick
28b7deae75 wimax: convert printk to pr_foo()
Use current logging functions and add module name prefix.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 20:28:44 -04:00
Fabian Frederick
505e907db3 af_unix: remove 0 assignment on static
static values are automatically initialized to 0

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 17:03:14 -04:00
David S. Miller
ea85a0a2dc ipv6: Do not warn for informational ICMP messages, regardless of type.
There is no reason to emit a log message for these.

Based upon a suggestion from Hannes Frederic Sowa.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
2014-10-07 16:33:53 -04:00
Herbert Xu
93fdd47e52 bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING
As we may defragment the packet in IPv4 PRE_ROUTING and refragment
it after POST_ROUTING we should save the value of frag_max_size.

This is still very wrong as the bridge is supposed to leave the
packets intact, meaning that the right thing to do is to use the
original frag_list for fragmentation.

Unfortunately we don't currently guarantee that the frag_list is
left untouched throughout netfilter so until this changes this is
the best we can do.

There is also a spot in FORWARD where it appears that we can
forward a packet without going through fragmentation, mark it
so that we can fix it later.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 15:12:44 -04:00
Jon Maloy
908344cdda tipc: fix bug in multicast congestion handling
One aim of commit 50100a5e39 ("tipc:
use pseudo message to wake up sockets after link congestion") was
to handle link congestion abatement in a uniform way for both unicast
and multicast transmit. However, the latter doesn't work correctly,
and has been broken since the referenced commit was applied.

If a user now sends a burst of multicast messages that is big
enough to cause broadcast link congestion, it will be put to sleep,
and not be waked up when the congestion abates as it should be.

This has two reasons. First, the flag that is used, TIPC_WAKEUP_USERS,
is set correctly, but in the wrong field. Instead of setting it in the
'action_flags' field of the arrival node struct, it is by mistake set
in the dummy node struct that is owned by the broadcast link, where it
will never tested for. Second, we cannot use the same flag for waking
up unicast and multicast users, since the function tipc_node_unlock()
needs to pick the wakeup pseudo messages to deliver from different
queues. It must hence be able to distinguish between the two cases.

This commit solves this problem by adding a new flag
TIPC_WAKEUP_BCAST_USERS, and a new function tipc_bclink_wakeup_user().
The latter is to be called by tipc_node_unlock() when the named flag,
now set in the correct field, is encountered.

v2: using explicit 'unsigned int' declaration instead of 'uint', as
per comment from David Miller.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 14:50:15 -04:00
John W. Linville
d7ffd588f0 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2014-10-07 14:48:29 -04:00
Pablo Neira Ayuso
f0d1f04f0a netfilter: fix wrong arithmetics regarding NFT_REJECT_ICMPX_MAX
NFT_REJECT_ICMPX_MAX should be __NFT_REJECT_ICMPX_MAX - 1.

nft_reject_icmp_code() and nft_reject_icmpv6_code() are called from the
packet path, so BUG_ON in case we try to access an unknown abstracted
ICMP code. This should not happen since we already validate this from
nft_reject_{inet,bridge}_init().

Fixes: 51b0a5d ("netfilter: nft_reject: introduce icmp code abstraction for inet and bridge")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-07 20:16:31 +02:00
Eric Dumazet
0287587884 net: better IFF_XMIT_DST_RELEASE support
Testing xmit_more support with netperf and connected UDP sockets,
I found strange dst refcount false sharing.

Current handling of IFF_XMIT_DST_RELEASE is not optimal.

Dropping dst in validate_xmit_skb() is certainly too late in case
packet was queued by cpu X but dequeued by cpu Y

The logical point to take care of drop/force is in __dev_queue_xmit()
before even taking qdisc lock.

As Julian Anastasov pointed out, need for skb_dst() might come from some
packet schedulers or classifiers.

This patch adds new helper to cleanly express needs of various drivers
or qdiscs/classifiers.

Drivers that need skb_dst() in their ndo_start_xmit() should call
following helper in their setup instead of the prior :

	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
->
	netif_keep_dst(dev);

Instead of using a single bit, we use two bits, one being
eventually rebuilt in bonding/team drivers.

The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
rebuilt in bonding/team. Eventually, we could add something
smarter later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 13:22:11 -04:00
WANG Cong
02c0fc1b8f net_sched: fix unused variables in __gnet_stats_copy_basic_cpu()
Probably not a big deal, but we'd better just use the
one we get in retry loop.

Fixes: commit 22e0f8b932 ("net: sched: make bstats per cpu and estimator RCU safe")
Reported-by: Joe Perches <joe@perches.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 00:10:49 -04:00
Andy Zhou
7c5df8fa19 openvswitch: fix a compilation error when CONFIG_INET is not setW!
Fix a openvswitch compilation error when CONFIG_INET is not set:

=====================================================
   In file included from include/net/geneve.h:4:0,
                       from net/openvswitch/flow_netlink.c:45:
		          include/net/udp_tunnel.h: In function 'udp_tunnel_handle_offloads':
			  >> include/net/udp_tunnel.h💯2: error: implicit declaration of function 'iptunnel_handle_offloads' [-Werror=implicit-function-declaration]
			  >>      return iptunnel_handle_offloads(skb, udp_csum, type);
			  >>           ^
			  >>           >> include/net/udp_tunnel.h💯2: warning: return makes pointer from integer without a cast
			  >>           >>    cc1: some warnings being treated as errors

=====================================================

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 00:10:49 -04:00
Andy Zhou
0a5d1c55fa openvswitch: fix a sparse warning
Fix a sparse warning introduced by commit:
f579668406 (openvswitch: Add support for
Geneve tunneling.) caught by kbuild test robot:

reproduce:
  # apt-get install sparse
  #   git checkout f579668406
  #     make ARCH=x86_64 allmodconfig
  #       make C=1 CF=-D__CHECK_ENDIAN__
  #
  #
  #       sparse warnings: (new ones prefixed by >>)
  #
  #       >> net/openvswitch/vport-geneve.c:109:15: sparse: incorrect type in assignment (different base types)
  #          net/openvswitch/vport-geneve.c:109:15:    expected restricted __be16 [usertype] sport
  #             net/openvswitch/vport-geneve.c:109:15:    got int
  #             >> net/openvswitch/vport-geneve.c:110:56: sparse: incorrect type in argument 3 (different base types)
  #                net/openvswitch/vport-geneve.c:110:56:    expected unsigned short [unsigned] [usertype] value
  #                   net/openvswitch/vport-geneve.c:110:56:    got restricted __be16 [usertype] sport

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 00:10:48 -04:00
Andy Zhou
42350dcaaf net: fix a sparse warning
Fix a sparse warning introduced by Commit
0b5e8b8eea (net: Add Geneve tunneling
protocol driver) caught by kbuild test robot:

  # apt-get install sparse
  #   git checkout 0b5e8b8eea
  #     make ARCH=x86_64 allmodconfig
  #       make C=1 CF=-D__CHECK_ENDIAN__
  #
  #
  #       sparse warnings: (new ones prefixed by >>)
  #
  #       >> net/ipv4/geneve.c:230:42: sparse: incorrect type in assignment (different base types)
  #          net/ipv4/geneve.c:230:42:    expected restricted __be32 [addressable] [assigned] [usertype] s_addr
  #             net/ipv4/geneve.c:230:42:    got unsigned long [unsigned] <noident>
  #

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 00:10:47 -04:00
Hannes Frederic Sowa
327571cb10 ipv6: don't walk node's leaf during serial number update
Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 00:02:30 -04:00
Hannes Frederic Sowa
812918c464 ipv6: make fib6 serial number per namespace
Try to reduce number of possible fn_sernum mutation by constraining them
to their namespace.

Also remove rt_genid which I forgot to remove in 705f1c869d ("ipv6:
remove rt6i_genid").

Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 00:02:30 -04:00
Hannes Frederic Sowa
c8c4d42a6b ipv6: only generate one new serial number per fib mutation
Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 00:02:30 -04:00
Hannes Frederic Sowa
42b1870646 ipv6: make rt_sernum atomic and serial number fields ordinary ints
Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 00:02:30 -04:00
Hannes Frederic Sowa
94b2cfe02b ipv6: minor fib6 cleanups like type safety, bool conversion, inline removal
Also renamed struct fib6_walker_t to fib6_walker and enum fib_walk_state_t
to fib6_walk_state as recommended by Cong Wang.

Cc: Cong Wang <cwang@twopensource.com>
Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 00:02:30 -04:00
Eric Dumazet
1ff0dc9499 net: validate_xmit_vlan() is static
Marking this as static allows compiler to inline it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 18:17:17 -04:00
Fabian Frederick
79952bca86 net: fix rcu access on phonet_routes
-Add __rcu annotation on table to fix sparse warnings:
net/phonet/pn_dev.c:279:25: warning: incorrect type in assignment (different address spaces)
net/phonet/pn_dev.c:279:25:    expected struct net_device *<noident>
net/phonet/pn_dev.c:279:25:    got void [noderef] <asn:4>*<noident>
net/phonet/pn_dev.c:376:17: warning: incorrect type in assignment (different address spaces)
net/phonet/pn_dev.c:376:17:    expected struct net_device *volatile <noident>
net/phonet/pn_dev.c:376:17:    got struct net_device [noderef] <asn:4>*<noident>
net/phonet/pn_dev.c:392:17: warning: incorrect type in assignment (different address spaces)
net/phonet/pn_dev.c:392:17:    expected struct net_device *<noident>
net/phonet/pn_dev.c:392:17:    got void [noderef] <asn:4>*<noident>

-Access table with rcu_access_pointer (fixes the following sparse errors):
net/phonet/pn_dev.c:278:25: error: incompatible types in comparison expression (different address spaces)
net/phonet/pn_dev.c:391:17: error: incompatible types in comparison expression (different address spaces)

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 18:16:30 -04:00
John Fastabend
18cdb37ebf net: sched: do not use tcf_proto 'tp' argument from call_rcu
Using the tcf_proto pointer 'tp' from inside the classifiers callback
is not valid because it may have been cleaned up by another call_rcu
occuring on another CPU.

'tp' is currently being used by tcf_unbind_filter() in this patch we
move instances of tcf_unbind_filter outside of the call_rcu() context.
This is safe to do because any running schedulers will either read the
valid class field or it will be zeroed.

And all schedulers today when the class is 0 do a lookup using the
same call used by the tcf_exts_bind(). So even if we have a running
classifier hit the null class pointer it will do a lookup and get
to the same result. This is particularly fragile at the moment because
the only way to verify this is to audit the schedulers call sites.

Reported-by: Cong Wang <xiyou.wangconf@gmail.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 18:02:33 -04:00
John Fastabend
13990f8156 net: sched: cls_cgroup tear down exts and ematch from rcu callback
It is not RCU safe to destroy the action chain while there
is a possibility of readers accessing it. Move this code
into the rcu callback using the same rcu callback used in the
code patch to make a change to head.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 18:02:32 -04:00
John Fastabend
82a470f111 net: sched: remove tcf_proto from ematch calls
This removes the tcf_proto argument from the ematch code paths that
only need it to reference the net namespace. This allows simplifying
qdisc code paths especially when we need to tear down the ematch
from an RCU callback. In this case we can not guarentee that the
tcf_proto structure is still valid.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 18:02:32 -04:00
Eric Dumazet
fcbeb976d7 net: introduce netdevice gso_min_segs attribute
Some TSO engines might have a too heavy setup cost, that impacts
performance on hosts sending small bursts (2 MSS per packet).

This patch adds a device gso_min_segs, allowing drivers to set
a minimum segment size for TSO packets, according to the NIC
performance.

Tested on a mlx4 NIC, this allows to get a ~110% increase of
throughput when sending 2 MSS per packet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 17:56:28 -04:00
Daniel Borkmann
b47bd8d279 ipv4: igmp: fix v3 general query drop monitor false positive
In case we find a general query with non-zero number of sources, we
are dropping the skb as it's malformed.

RFC3376, section 4.1.8. Number of Sources (N):

  This number is zero in a General Query or a Group-Specific Query,
  and non-zero in a Group-and-Source-Specific Query.

Therefore, reflect that by using kfree_skb() instead of consume_skb().

Fixes: d679c5324d ("igmp: avoid drop_monitor false positives")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 17:14:54 -04:00
Eric Dumazet
1255a50554 ethtool: Ethtool parameter to dynamically change tx_copybreak
Use new ethtool [sg]et_tunable() to set tx_copybread (inline threshold)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 01:04:16 -04:00
Eric Dumazet
f2600cf02b net: sched: avoid costly atomic operation in fq_dequeue()
Standard qdisc API to setup a timer implies an atomic operation on every
packet dequeue : qdisc_unthrottled()

It turns out this is not really needed for FQ, as FQ has no concept of
global qdisc throttling, being a qdisc handling many different flows,
some of them can be throttled, while others are not.

Fix is straightforward : add a 'bool throttle' to
qdisc_watchdog_schedule_ns(), and remove calls to qdisc_unthrottled()
in sch_fq.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 00:55:10 -04:00
Eric Dumazet
bec3cfdca3 net: skb_segment() provides list head and tail
Its unfortunate we have to walk again skb list to find the tail
after segmentation, even if data is probably hot in cpu caches.

skb_segment() can store the tail of the list into segs->prev,
and validate_xmit_skb_list() can immediately get the tail.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 00:37:30 -04:00
Jesse Gross
f579668406 openvswitch: Add support for Geneve tunneling.
The Openvswitch implementation is completely agnostic to the options
that are in use and can handle newly defined options without
further work. It does this by simply matching on a byte array
of options and allowing userspace to setup flows on this array.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Singed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 00:32:21 -04:00
Jesse Gross
6b205b2ca1 openvswitch: Factor out allocation and verification of actions.
As the size of the flow key grows, it can put some pressure on the
stack. This is particularly true in ovs_flow_cmd_set(), which needs several
copies of the key on the stack. One of those uses is logically separate,
so this factors it out to reduce stack pressure and improve readibility.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 00:32:20 -04:00
Jesse Gross
f0b128c1e2 openvswitch: Wrap struct ovs_key_ipv4_tunnel in a new structure.
Currently, the flow information that is matched for tunnels and
the tunnel data passed around with packets is the same. However,
as additional information is added this is not necessarily desirable,
as in the case of pointers.

This adds a new structure for tunnel metadata which currently contains
only the existing struct. This change is purely internal to the kernel
since the current OVS_KEY_ATTR_IPV4_TUNNEL is simply a compressed version
of OVS_KEY_ATTR_TUNNEL that is translated at flow setup.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 00:32:20 -04:00
Jesse Gross
67fa034194 openvswitch: Add support for matching on OAM packets.
Some tunnel formats have mechanisms for indicating that packets are
OAM frames that should be handled specially (either as high priority or
not forwarded beyond an endpoint). This provides support for allowing
those types of packets to be matched.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 00:32:20 -04:00
Jesse Gross
0714812134 openvswitch: Eliminate memset() from flow_extract.
As new protocols are added, the size of the flow key tends to
increase although few protocols care about all of the fields. In
order to optimize this for hashing and matching, OVS uses a variable
length portion of the key. However, when fields are extracted from
the packet we must still zero out the entire key.

This is no longer necessary now that OVS implements masking. Any
fields (or holes in the structure) which are not part of a given
protocol will be by definition not part of the mask and zeroed out
during lookup. Furthermore, since masking already uses variable
length keys this zeroing operation automatically benefits as well.

In principle, the only thing that needs to be done at this point
is remove the memset() at the beginning of flow. However, some
fields assume that they are initialized to zero, which now must be
done explicitly. In addition, in the event of an error we must also
zero out corresponding fields to signal that there is no valid data
present. These increase the total amount of code but very little of
it is executed in non-error situations.

Removing the memset() reduces the profile of ovs_flow_extract()
from 0.64% to 0.56% when tested with large packets on a 10G link.

Suggested-by: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 00:32:20 -04:00
Andy Zhou
0b5e8b8eea net: Add Geneve tunneling protocol driver
This adds a device level support for Geneve -- Generic Network
Virtualization Encapsulation. The protocol is documented at
http://tools.ietf.org/html/draft-gross-geneve-01

Only protocol layer Geneve support is provided by this driver.
Openvswitch can be used for configuring, set up and tear down
functional Geneve tunnels.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 00:32:20 -04:00
Vlad Yasevich
bdf6fa52f0 sctp: handle association restarts when the socket is closed.
Currently association restarts do not take into consideration the
state of the socket.  When a restart happens, the current assocation
simply transitions into established state.  This creates a condition
where a remote system, through a the restart procedure, may create a
local association that is no way reachable by user.  The conditions
to trigger this are as follows:
  1) Remote does not acknoledge some data causing data to remain
     outstanding.
  2) Local application calls close() on the socket.  Since data
     is still outstanding, the association is placed in SHUTDOWN_PENDING
     state.  However, the socket is closed.
  3) The remote tries to create a new association, triggering a restart
     on the local system.  The association moves from SHUTDOWN_PENDING
     to ESTABLISHED.  At this point, it is no longer reachable by
     any socket on the local system.

This patch addresses the above situation by moving the newly ESTABLISHED
association into SHUTDOWN-SENT state and bundling a SHUTDOWN after
the COOKIE-ACK chunk.  This way, the restarted associate immidiately
enters the shutdown procedure and forces the termination of the
unreachable association.

Reported-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 00:21:45 -04:00
David S. Miller
a4b4a2b7f9 Merge tag 'master-2014-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-10-03

Please pull tihs batch of updates intended for the 3.18 stream!

For the iwlwifi bits, Emmanuel says:

"I have here a few things that depend on the latest mac80211's changes:
RRM, TPC, Quiet Period etc...  Eyal keeps improving our rate control
and we have a new device ID. This last patch should probably have
gone to wireless.git, but at that stage, I preferred to send it to
-next and CC stable."

For (most of) the Atheros bits, Kalle says:

"The only new feature is testmode support from me. Ben added a new method
to crash the firmware with an assert for debug purposes. As usual, we
have lots of smaller fixes from Michal. Matteo fixed a Kconfig
dependency with debugfs. I fixed some warnings recently added to
checkpatch."

For the NFC bits, Samuel says:

"We've had major updates for TI and ST Microelectronics drivers, and a
few NCI related changes.

For TI's trf7970a driver:

- Target mode support for trf7970a
- Suspend/resume support for trf7970a
- DT properties additions to handle different quirks
- A bunch of fixes for smartphone IOP related issues

For ST Microelectronics' ST21NFCA and ST21NFCB drivers:

- ISO15693 support for st21nfcb
- checkpatch and sparse related warning fixes
- Code cleanups and a few minor fixes

Finally, Marvell added ISO15693 support to the NCI stack, together with a
couple of NCI fixes."

For the Bluetooth bits, Johan says:

"This 3.18 pull request replaces the one I did on Monday ("bluetooth-next
2014-09-22", which hasn't been pulled yet). The additions since the last
request are:

 - SCO connection fix for devices not supporting eSCO
 - Cleanups regarding the SCO establishment logic
 - Remove unnecessary return value from logging functions
 - Header compression fix for 6lowpan
 - Cleanups to the ieee802154/mrf24j40 driver

Here's a copy from previous request that this one replaces:

'
Here are some more patches for 3.18. They include various fixes to the
btusb HCI driver, a fix for LE SMP, as well as adding Jukka to the
MAINTAINERS file for generic 6LoWPAN (as requested by Alexander Aring).

I've held on to this pull request a bit since we were waiting for a SCO
related fix to get sorted out first. However, since the merge window is
getting closer I decided not to wait for it. If we do get the fix sorted
out there'll probably be a second small pull request later this week.
'"

And,

"Unless 3.17 gets delayed this will probably be our last -next pull request for
3.18. We've got:

  - New Marvell hardware supportr
  - Multicast support for 6lowpan
  - Several of 6lowpan fixes & cleanups
  - Fix for a (false-positive) lockdep warning in L2CAP
  - Minor btusb cleanup"

On top of all that comes the usual sort of updates to ath5k, ath9k,
ath10k, brcmfmac, mwifiex, and wil6210.  This time around there are
also a number of rtlwifi updates to enable some new hardware and
to reconcile the in-kernel drivers with some newer releases of the
Realtek vendor drivers.  Also of note is some device tree work for
the bcma bus.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-05 21:34:39 -04:00
David S. Miller
61b37d2f54 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains another batch with Netfilter/IPVS updates
for net-next, they are:

1) Add abstracted ICMP codes to the nf_tables reject expression. We
   introduce four reasons to reject using ICMP that overlap in IPv4
   and IPv6 from the semantic point of view. This should simplify the
   maintainance of dual stack rule-sets through the inet table.

2) Move nf_send_reset() functions from header files to per-family
   nf_reject modules, suggested by Patrick McHardy.

3) We have to use IS_ENABLED(CONFIG_BRIDGE_NETFILTER) everywhere in the
   code now that br_netfilter can be modularized. Convert remaining spots
   in the network stack code.

4) Use rcu_barrier() in the nf_tables module removal path to ensure that
   we don't leave object that are still pending to be released via
   call_rcu (that may likely result in a crash).

5) Remove incomplete arch 32/64 compat from nft_compat. The original (bad)
   idea was to probe the word size based on the xtables match/target info
   size, but this assumption is wrong when you have to dump the information
   back to userspace.

6) Allow to filter from prerouting and postrouting in the nf_tables bridge.
   In order to emulate the ebtables NAT chains (which are actually simple
   filter chains with no special semantics), we have support filtering from
   this hooks too.

7) Add explicit module dependency between xt_physdev and br_netfilter.
   This provides a way to detect if the user needs br_netfilter from
   the configuration path. This should reduce the breakage of the
   br_netfilter modularization.

8) Cleanup coding style in ip_vs.h, from Simon Horman.

9) Fix crash in the recently added nf_tables masq expression. We have
   to register/unregister the notifiers to clean up the conntrack table
   entries from the module init/exit path, not from the rule addition /
   deletion path. From Arturo Borrero.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-05 21:32:37 -04:00
Vlad Yasevich
5be5a2df40 bridge: Add filtering support for default_pvid
Currently when vlan filtering is turned on on the bridge, the bridge
will drop all traffic untill the user configures the filter.  This
isn't very nice for ports that don't care about vlans and just
want untagged traffic.

A concept of a default_pvid was recently introduced.  This patch
adds filtering support for default_pvid.   Now, ports that don't
care about vlans and don't define there own filter will belong
to the VLAN of the default_pvid and continue to receive untagged
traffic.

This filtering can be disabled by setting default_pvid to 0.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-05 21:21:37 -04:00
Vlad Yasevich
3df6bf45ec bridge: Simplify pvid checks.
Currently, if the pvid is not set, we return an illegal vlan value
even though the pvid value is set to 0.  Since pvid of 0 is currently
invalid, just return 0 instead.  This makes the current and future
checks simpler.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-05 21:21:36 -04:00
Vlad Yasevich
96a20d9d7f bridge: Add a default_pvid sysfs attribute
This patch allows the user to set and retrieve default_pvid
value.  A new value can only be stored when vlan filtering
is disabled.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-05 21:21:36 -04:00
Ignacy Gawędzki
34a419d4e2 ematch: Fix early ending of inverted containers.
The result of a negated container has to be inverted before checking for
early ending.

This fixes my previous attempt (17c9c82326) to
make inverted containers work correctly.

Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-04 20:49:46 -04:00
John Fastabend
1e203c1a2c net: sched: suspicious RCU usage in qdisc_watchdog
Suspicious RCU usage in qdisc_watchdog call needs to be done inside
rcu_read_lock/rcu_read_unlock. And then Qdisc destroy operations
need to ensure timer is cancelled before removing qdisc structure.

[ 3992.191339] ===============================
[ 3992.191340] [ INFO: suspicious RCU usage. ]
[ 3992.191343] 3.17.0-rc6net-next+ #72 Not tainted
[ 3992.191345] -------------------------------
[ 3992.191347] include/net/sch_generic.h:272 suspicious rcu_dereference_check() usage!
[ 3992.191348]
[ 3992.191348] other info that might help us debug this:
[ 3992.191348]
[ 3992.191351]
[ 3992.191351] rcu_scheduler_active = 1, debug_locks = 1
[ 3992.191353] no locks held by swapper/1/0.
[ 3992.191355]
[ 3992.191355] stack backtrace:
[ 3992.191358] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.17.0-rc6net-next+ #72
[ 3992.191360] Hardware name:                  /DZ77RE-75K, BIOS GAZ7711H.86A.0060.2012.1115.1750 11/15/2012
[ 3992.191362]  0000000000000001 ffff880235803e48 ffffffff8178f92c 0000000000000000
[ 3992.191366]  ffff8802322224a0 ffff880235803e78 ffffffff810c9966 ffff8800a5fe3000
[ 3992.191370]  ffff880235803f30 ffff8802359cd768 ffff8802359cd6e0 ffff880235803e98
[ 3992.191374] Call Trace:
[ 3992.191376]  <IRQ>  [<ffffffff8178f92c>] dump_stack+0x4e/0x68
[ 3992.191387]  [<ffffffff810c9966>] lockdep_rcu_suspicious+0xe6/0x130
[ 3992.191392]  [<ffffffff8167213a>] qdisc_watchdog+0x8a/0xb0
[ 3992.191396]  [<ffffffff810f93f2>] __run_hrtimer+0x72/0x420
[ 3992.191399]  [<ffffffff810f9bcd>] ? hrtimer_interrupt+0x7d/0x240
[ 3992.191403]  [<ffffffff816720b0>] ? tc_classify+0xc0/0xc0
[ 3992.191406]  [<ffffffff810f9c4f>] hrtimer_interrupt+0xff/0x240
[ 3992.191410]  [<ffffffff8109e4a5>] ? __atomic_notifier_call_chain+0x5/0x140
[ 3992.191415]  [<ffffffff8103577b>] local_apic_timer_interrupt+0x3b/0x60
[ 3992.191419]  [<ffffffff8179c2b5>] smp_apic_timer_interrupt+0x45/0x60
[ 3992.191422]  [<ffffffff8179a6bf>] apic_timer_interrupt+0x6f/0x80
[ 3992.191424]  <EOI>  [<ffffffff815ed233>] ? cpuidle_enter_state+0x73/0x2e0
[ 3992.191432]  [<ffffffff815ed22e>] ? cpuidle_enter_state+0x6e/0x2e0
[ 3992.191437]  [<ffffffff815ed567>] cpuidle_enter+0x17/0x20
[ 3992.191441]  [<ffffffff810c0741>] cpu_startup_entry+0x3d1/0x4a0
[ 3992.191445]  [<ffffffff81106fc6>] ? clockevents_config_and_register+0x26/0x30
[ 3992.191448]  [<ffffffff81033c16>] start_secondary+0x1b6/0x260

Fixes: b26b0d1e8b ("net: qdisc: use rcu prefix and silence sparse warnings")
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-04 20:45:54 -04:00
Florian Fainelli
f7d6b96f34 net: dsa: do not call phy_start_aneg
Commit f7f1de51ed ("net: dsa: start and stop the PHY state machine")
add calls to phy_start() in dsa_slave_open() respectively phy_stop() in
dsa_slave_close().

We also call phy_start_aneg() in dsa_slave_create(), and this call is
messing up with the PHY state machine, since we basically start the
auto-negotiation, and later on restart it when calling phy_start().
phy_start() does not currently handle the PHY_FORCING or PHY_AN states
properly, but such a fix would be too invasive for this window.

Fixes: f7f1de51ed ("net: dsa: start and stop the PHY state machine")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-04 20:44:44 -04:00
Vijay Subramanian
c8753d55af net: Cleanup skb cloning by adding SKB_FCLONE_FREE
SKB_FCLONE_UNAVAILABLE has overloaded meaning depending on type of skb.
1: If skb is allocated from head_cache, it indicates fclone is not available.
2: If skb is a companion fclone skb (allocated from fclone_cache), it indicates
it is available to be used.

To avoid confusion for case 2 above, this patch  replaces
SKB_FCLONE_UNAVAILABLE with SKB_FCLONE_FREE where appropriate. For fclone
companion skbs, this indicates it is free for use.

SKB_FCLONE_UNAVAILABLE will now simply indicate skb is from head_cache and
cannot / will not have a companion fclone.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-04 20:34:25 -04:00
Nicolas Dichtel
3be07244b7 ip6_gre: fix flowi6_proto value in xmit path
In xmit path, we build a flowi6 which will be used for the output route lookup.
We are sending a GRE packet, neither IPv4 nor IPv6 encapsulated packet, thus the
protocol should be IPPROTO_GRE.

Fixes: c12b395a46 ("gre: Support GRE over IPv6")
Reported-by: Matthieu Ternisien d'Ouville <matthieu.tdo@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-04 20:08:24 -04:00
Tom Herbert
bc1fc390e1 ip_tunnel: Add GUE support
This patch allows configuring IPIP, sit, and GRE tunnels to use GUE.
This is very similar to fou excpet that we need to insert the GUE header
in addition to the UDP header on transmit.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 16:53:33 -07:00
Tom Herbert
37dd024779 gue: Receive side for Generic UDP Encapsulation
This patch adds support receiving for GUE packets in the fou module. The
fou module now supports direct foo-over-udp (no encapsulation header)
and GUE. To support this a type parameter is added to the fou netlink
parameters.

For a GUE socket we define gue_udp_recv, gue_gro_receive, and
gue_gro_complete to handle the specifics of the GUE protocol. Most
of the code to manage and configure sockets is common with the fou.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 16:53:33 -07:00
Tom Herbert
efc98d08e1 fou: eliminate IPv4,v6 specific GRO functions
This patch removes fou[46]_gro_receive and fou[46]_gro_complete
functions. The v4 or v6 variants were chosen for the UDP offloads
based on the address family of the socket this is not necessary
or correct. Alternatively, this patch adds is_ipv6 to napi_gro_skb.
This is set in udp6_gro_receive and unset in udp4_gro_receive. In
fou_gro_receive the value is used to select the correct inet_offloads
for the protocol of the outer IP header.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 16:53:32 -07:00
Tom Herbert
7371e0221c ip_tunnel: Account for secondary encapsulation header in max_headroom
When adjusting max_header for the tunnel interface based on egress
device we need to account for any extra bytes in secondary encapsulation
(e.g. FOU).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 16:53:32 -07:00
Eric Dumazet
01291202ed net: do not export skb_gro_receive()
skb_gro_receive() is only called from tcp_gro_receive() which is
not in a module.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:54:30 -07:00
Eric Dumazet
55a93b3ea7 qdisc: validate skb without holding lock
Validation of skb can be pretty expensive :

GSO segmentation and/or checksum computations.

We can do this without holding qdisc lock, so that other cpus
can queue additional packets.

Trick is that requeued packets were already validated, so we carry
a boolean so that sch_direct_xmit() can validate a fresh skb list,
or directly use an old one.

Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads
host.

Turning TSO on or off had no effect on throughput, only few more cpu
cycles. Lock contention on qdisc lock disappeared.

Same if disabling TX checksum offload.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:36:11 -07:00
Herton R. Krzesinski
593cbb3ec6 net/rds: fix possible double free on sock tear down
I got a report of a double free happening at RDS slab cache. One
suspicion was that may be somewhere we were doing a sock_hold/sock_put
on an already freed sock. Thus after providing a kernel with the
following change:

 static inline void sock_hold(struct sock *sk)
 {
-       atomic_inc(&sk->sk_refcnt);
+       if (!atomic_inc_not_zero(&sk->sk_refcnt))
+               WARN(1, "Trying to hold sock already gone: %p (family: %hd)\n",
+                       sk, sk->sk_family);
 }

The warning successfuly triggered:

Trying to hold sock already gone: ffff81f6dda61280 (family: 21)
WARNING: at include/net/sock.h:350 sock_hold()
Call Trace:
<IRQ>  [<ffffffff8adac135>] :rds:rds_send_remove_from_sock+0xf0/0x21b
[<ffffffff8adad35c>] :rds:rds_send_drop_acked+0xbf/0xcf
[<ffffffff8addf546>] :rds_rdma:rds_ib_recv_tasklet_fn+0x256/0x2dc
[<ffffffff8009899a>] tasklet_action+0x8f/0x12b
[<ffffffff800125a2>] __do_softirq+0x89/0x133
[<ffffffff8005f30c>] call_softirq+0x1c/0x28
[<ffffffff8006e644>] do_softirq+0x2c/0x7d
[<ffffffff8006e4d4>] do_IRQ+0xee/0xf7
[<ffffffff8005e625>] ret_from_intr+0x0/0xa
<EOI>

Looking at the call chain above, the only way I think this would be
possible is if somewhere we already released the same socket->sock which
is assigned to the rds_message at rds_send_remove_from_sock. Which seems
only possible to happen after the tear down done on rds_release.

rds_release properly calls rds_send_drop_to to drop the socket from any
rds_message, and some proper synchronization is in place to avoid race
with rds_send_drop_acked/rds_send_remove_from_sock. However, I still see
a very narrow window where it may be possible we touch a sock already
released: when rds_release races with rds_send_drop_acked, we check
RDS_MSG_ON_CONN to avoid cleanup on the same rds_message, but in this
specific case we don't clear rm->m_rs. In this case, it seems we could
then go on at rds_send_drop_to and after it returns, the sock is freed
by last sock_put on rds_release, with concurrently we being at
rds_send_remove_from_sock; then at some point in the loop at
rds_send_remove_from_sock we process an rds_message which didn't have
rm->m_rs unset for a freed sock, and a possible sock_hold on an sock
already gone at rds_release happens.

This hopefully address the described condition above and avoids a double
free on "second last" sock_put. In addition, I removed the comment about
socket destruction on top of rds_send_drop_acked: we call rds_send_drop_to
in rds_release and we should have things properly serialized there, thus
I can't see the comment being accurate there.

Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 12:52:00 -07:00
Herton R. Krzesinski
eb74cc97b8 net/rds: do proper house keeping if connection fails in rds_tcp_conn_connect
I see two problems if we consider the sock->ops->connect attempt to fail in
rds_tcp_conn_connect. The first issue is that for example we don't remove the
previously added rds_tcp_connection item to rds_tcp_tc_list at
rds_tcp_set_callbacks, which means that on a next reconnect attempt for the
same rds_connection, when rds_tcp_conn_connect is called we can again call
rds_tcp_set_callbacks, resulting in duplicated items on rds_tcp_tc_list,
leading to list corruption: to avoid this just make sure we call
properly rds_tcp_restore_callbacks before we exit. The second issue
is that we should also release the sock properly, by setting sock = NULL
only if we are returning without error.

Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 12:51:59 -07:00
Herton R. Krzesinski
310886dd5f net/rds: call rds_conn_drop instead of open code it at rds_connect_complete
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 12:51:59 -07:00
Jesper Dangaard Brouer
808e7ac0bd qdisc: dequeue bulking also pickup GSO/TSO packets
The TSO and GSO segmented packets already benefit from bulking
on their own.

The TSO packets have always taken advantage of the only updating
the tailptr once for a large packet.

The GSO segmented packets have recently taken advantage of
bulking xmit_more API, via merge commit 53fda7f7f9 ("Merge
branch 'xmit_list'"), specifically via commit 7f2e870f2a ("net:
Move main gso loop out of dev_hard_start_xmit() into helper.")
allowing qdisc requeue of remaining list.  And via commit
ce93718fb7 ("net: Don't keep around original SKB when we
software segment GSO frames.").

This patch allow further bulking of TSO/GSO packets together,
when dequeueing from the qdisc.

Testing:
 Measuring HoL (Head-of-Line) blocking for TSO and GSO, with
netperf-wrapper. Bulking several TSO show no performance regressions
(requeues were in the area 32 requeues/sec).

Bulking several GSOs does show small regression or very small
improvement (requeues were in the area 8000 requeues/sec).

 Using ixgbe 10Gbit/s with GSO bulking, we can measure some additional
latency. Base-case, which is "normal" GSO bulking, sees varying
high-prio queue delay between 0.38ms to 0.47ms.  Bulking several GSOs
together, result in a stable high-prio queue delay of 0.50ms.

 Using igb at 100Mbit/s with GSO bulking, shows an improvement.
Base-case sees varying high-prio queue delay between 2.23ms to 2.35ms

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 12:37:06 -07:00
Jesper Dangaard Brouer
5772e9a346 qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE
Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
sending/processing an entire skb list.

This patch implements qdisc bulk dequeue, by allowing multiple packets
to be dequeued in dequeue_skb().

The optimization principle for this is two fold, (1) to amortize
locking cost and (2) avoid expensive tailptr update for notifying HW.
 (1) Several packets are dequeued while holding the qdisc root_lock,
amortizing locking cost over several packet.  The dequeued SKB list is
processed under the TXQ lock in dev_hard_start_xmit(), thus also
amortizing the cost of the TXQ lock.
 (2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
API to delay HW tailptr update, which also reduces the cost per
packet.

One restriction of the new API is that every SKB must belong to the
same TXQ.  This patch takes the easy way out, by restricting bulk
dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
qdisc only have attached a single TXQ.

Some detail about the flow; dev_hard_start_xmit() will process the skb
list, and transmit packets individually towards the driver (see
xmit_one()).  In case the driver stops midway in the list, the
remaining skb list is returned by dev_hard_start_xmit().  In
sch_direct_xmit() this returned list is requeued by dev_requeue_skb().

To avoid overshooting the HW limits, which results in requeuing, the
patch limits the amount of bytes dequeued, based on the drivers BQL
limits.  In-effect bulking will only happen for BQL enabled drivers.

Small amounts for extra HoL blocking (2x MTU/0.24ms) were
measured at 100Mbit/s, with bulking 8 packets, but the
oscillating nature of the measurement indicate something, like
sched latency might be causing this effect. More comparisons
show, that this oscillation goes away occationally. Thus, we
disregard this artifact completely and remove any "magic" bulking
limit.

For now, as a conservative approach, stop bulking when seeing TSO and
segmented GSO packets.  They already benefit from bulking on their own.
A followup patch add this, to allow easier bisect-ability for finding
regressions.

Jointed work with Hannes, Daniel and Florian.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 12:37:06 -07:00
Arturo Borrero
8da4cc1b10 netfilter: nft_masq: register/unregister notifiers on module init/exit
We have to register the notifiers in the masquerade expression from
the the module _init and _exit path.

This fixes crashes when removing the masquerade rule with no
ipt_MASQUERADE support in place (which was masking the problem).

Fixes: 9ba1f72 ("netfilter: nf_tables: add new nft_masq expression")
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-03 14:24:35 +02:00
David S. Miller
739e4a758e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/r8152.c
	net/netfilter/nfnetlink.c

Both r8152 and nfnetlink conflicts were simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-02 11:25:43 -07:00
John W. Linville
f6cd071891 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-10-02 13:56:19 -04:00
Pablo Neira Ayuso
4b7fd5d97e netfilter: explicit module dependency between br_netfilter and physdev
You can use physdev to match the physical interface enslaved to the
bridge device. This information is stored in skb->nf_bridge and it is
set up by br_netfilter. So, this is only available when iptables is
used from the bridge netfilter path.

Since 34666d4 ("netfilter: bridge: move br_netfilter out of the core"),
the br_netfilter code is modular. To reduce the impact of this change,
we can autoload the br_netfilter if the physdev match is used since
we assume that the users need br_netfilter in place.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:57 +02:00
Pablo Neira Ayuso
36d2af5998 netfilter: nf_tables: allow to filter from prerouting and postrouting
This allows us to emulate the NAT table in ebtables, which is actually
a plain filter chain that hooks at prerouting, output and postrouting.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:56 +02:00
Pablo Neira Ayuso
756c1b1a7f netfilter: nft_compat: remove incomplete 32/64 bits arch compat code
This code was based on the wrong asumption that you can probe based
on the match/target private size that we get from userspace. This
doesn't work at all when you have to dump the info back to userspace
since you don't know what word size the userspace utility is using.

Currently, the extensions that require arch compat are limit match
and the ebt_mark match/target. The standard targets are not used by
the nft-xt compat layer, so they are not affected. We can work around
this limitation with a new revision that uses arch agnostic types.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:55 +02:00
Pablo Neira Ayuso
1b1bc49c0f netfilter: nf_tables: wait for call_rcu completion on module removal
Make sure the objects have been released before the nf_tables modules
is removed.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:54 +02:00
Pablo Neira Ayuso
1109a90c01 netfilter: use IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
In 34666d4 ("netfilter: bridge: move br_netfilter out of the core"),
the bridge netfilter code has been modularized.

Use IS_ENABLED instead of ifdef to cover the module case.

Fixes: 34666d4 ("netfilter: bridge: move br_netfilter out of the core")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:54 +02:00
Pablo Neira Ayuso
c8d7b98bec netfilter: move nf_send_resetX() code to nf_reject_ipvX modules
Move nf_send_reset() and nf_send_reset6() to nf_reject_ipv4 and
nf_reject_ipv6 respectively. This code is shared by x_tables and
nf_tables.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:49 +02:00
Pablo Neira Ayuso
51b0a5d8c2 netfilter: nft_reject: introduce icmp code abstraction for inet and bridge
This patch introduces the NFT_REJECT_ICMPX_UNREACH type which provides
an abstraction to the ICMP and ICMPv6 codes that you can use from the
inet and bridge tables, they are:

* NFT_REJECT_ICMPX_NO_ROUTE: no route to host - network unreachable
* NFT_REJECT_ICMPX_PORT_UNREACH: port unreachable
* NFT_REJECT_ICMPX_HOST_UNREACH: host unreachable
* NFT_REJECT_ICMPX_ADMIN_PROHIBITED: administratevely prohibited

You can still use the specific codes when restricting the rule to match
the corresponding layer 3 protocol.

I decided to not overload the existing NFT_REJECT_ICMP_UNREACH to have
different semantics depending on the table family and to allow the user
to specify ICMP family specific codes if they restrict it to the
corresponding family.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:29:57 +02:00
Jukka Rissanen
9c238ca8ec Bluetooth: 6lowpan: Check transmit errors for multicast packets
We did not return error if multicast packet transmit failed.
This might not be desired so return error also in this case.
If there are multiple 6lowpan devices where the multicast packet
is sent, then return error even if sending to only one of them fails.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-10-02 13:41:57 +03:00
Jukka Rissanen
d7b6b0a532 Bluetooth: 6lowpan: Return EAGAIN error also for multicast packets
Make sure that we are able to return EAGAIN from l2cap_chan_send()
even for multicast packets. The error code was ignored unncessarily.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-10-02 13:41:39 +03:00
Jukka Rissanen
a7807d73a0 Bluetooth: 6lowpan: Avoid memory leak if memory allocation fails
If skb_unshare() returns NULL, then we leak the original skb.
Solution is to use temp variable to hold the new skb.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-10-02 13:41:32 +03:00
Jukka Rissanen
fc12518a4b Bluetooth: 6lowpan: Memory leak as the skb is not freed
The earlier multicast commit 36b3dd250d ("Bluetooth: 6lowpan:
Ensure header compression does not corrupt IPv6 header") lost one
skb free which then caused memory leak.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-10-02 13:41:30 +03:00
Johan Hedberg
02e246aee8 Bluetooth: Fix lockdep warning with l2cap_chan_connect
The L2CAP connection's channel list lock (conn->chan_lock) must never be
taken while already holding a channel lock (chan->lock) in order to
avoid lock-inversion and lockdep warnings. So far the l2cap_chan_connect
function has acquired the chan->lock early in the function and then
later called l2cap_chan_add(conn, chan) which will try to take the
conn->chan_lock. This violates the correct order of taking the locks and
may lead to the following type of lockdep warnings:

-> #1 (&conn->chan_lock){+.+...}:
       [<c109324d>] lock_acquire+0x9d/0x140
       [<c188459c>] mutex_lock_nested+0x6c/0x420
       [<d0aab48e>] l2cap_chan_add+0x1e/0x40 [bluetooth]
       [<d0aac618>] l2cap_chan_connect+0x348/0x8f0 [bluetooth]
       [<d0cc9a91>] lowpan_control_write+0x221/0x2d0 [bluetooth_6lowpan]
-> #0 (&chan->lock){+.+.+.}:
       [<c10928d8>] __lock_acquire+0x1a18/0x1d20
       [<c109324d>] lock_acquire+0x9d/0x140
       [<c188459c>] mutex_lock_nested+0x6c/0x420
       [<d0ab05fd>] l2cap_connect_cfm+0x1dd/0x3f0 [bluetooth]
       [<d0a909c4>] hci_le_meta_evt+0x11a4/0x1260 [bluetooth]
       [<d0a910eb>] hci_event_packet+0x3ab/0x3120 [bluetooth]
       [<d0a7cb08>] hci_rx_work+0x208/0x4a0 [bluetooth]

       CPU0                    CPU1
       ----                    ----
  lock(&conn->chan_lock);
                               lock(&chan->lock);
                               lock(&conn->chan_lock);
  lock(&chan->lock);

Before calling l2cap_chan_add() the channel is not part of the
conn->chan_l list, and can therefore only be accessed by the L2CAP user
(such as l2cap_sock.c). We can therefore assume that it is the
responsibility of the user to handle mutual exclusion until this point
(which we can see is already true in l2cap_sock.c by it in many places
touching chan members without holding chan->lock).

Since the hci_conn and by exctension l2cap_conn creation in the
l2cap_chan_connect() function depend on chan details we cannot simply
add a mutex_lock(&conn->chan_lock) in the beginning of the function
(since the conn object doesn't yet exist there). What we can do however
is move the chan->lock taking later into the function where we already
have the conn object and can that way take conn->chan_lock first.

This patch implements the above strategy and does some other necessary
changes such as using __l2cap_chan_add() which assumes conn->chan_lock
is held, as well as adding a second needed label so the unlocking
happens as it should.

Reported-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Tested-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-02 10:37:07 +02:00
Alexei Starovoitov
38b2cf2982 net: pktgen: packet bursting via skb->xmit_more
This patch demonstrates the effect of delaying update of HW tailptr.
(based on earlier patch by Jesper)

burst=1 is the default. It sends one packet with xmit_more=false
burst=2 sends one packet with xmit_more=true and
        2nd copy of the same packet with xmit_more=false
burst=3 sends two copies of the same packet with xmit_more=true and
        3rd copy with xmit_more=false

Performance with ixgbe (usec 30):
burst=1  tx:9.2 Mpps
burst=2  tx:13.5 Mpps
burst=3  tx:14.5 Mpps full 10G line rate

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 22:08:12 -04:00
Florian Fainelli
775dd692bd net: bridge: add a br_set_state helper function
In preparation for being able to propagate port states to e.g: notifiers
or other kernel parts, do not manipulate the port state directly, but
instead use a helper function which will allow us to do a bit more than
just setting the state.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 22:03:50 -04:00
WANG Cong
a0efb80ce3 net_sched: avoid calling tcf_unbind_filter() in call_rcu callback
This fixes the following crash:

[   63.976822] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   63.980094] CPU: 1 PID: 15 Comm: ksoftirqd/1 Not tainted 3.17.0-rc6+ #648
[   63.980094] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   63.980094] task: ffff880117dea690 ti: ffff880117dfc000 task.ti: ffff880117dfc000
[   63.980094] RIP: 0010:[<ffffffff817e6d07>]  [<ffffffff817e6d07>] u32_destroy_key+0x27/0x6d
[   63.980094] RSP: 0018:ffff880117dffcc0  EFLAGS: 00010202
[   63.980094] RAX: ffff880117dea690 RBX: ffff8800d02e0820 RCX: 0000000000000000
[   63.980094] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 6b6b6b6b6b6b6b6b
[   63.980094] RBP: ffff880117dffcd0 R08: 0000000000000000 R09: 0000000000000000
[   63.980094] R10: 00006c0900006ba8 R11: 00006ba100006b9d R12: 0000000000000001
[   63.980094] R13: ffff8800d02e0898 R14: ffffffff817e6d4d R15: ffff880117387a30
[   63.980094] FS:  0000000000000000(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
[   63.980094] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   63.980094] CR2: 00007f07e6732fed CR3: 000000011665b000 CR4: 00000000000006e0
[   63.980094] Stack:
[   63.980094]  ffff88011a9cd300 ffffffff82051ac0 ffff880117dffce0 ffffffff817e6d68
[   63.980094]  ffff880117dffd70 ffffffff810cb4c7 ffffffff810cb3cd ffff880117dfffd8
[   63.980094]  ffff880117dea690 ffff880117dea690 ffff880117dfffd8 000000000000000a
[   63.980094] Call Trace:
[   63.980094]  [<ffffffff817e6d68>] u32_delete_key_freepf_rcu+0x1b/0x1d
[   63.980094]  [<ffffffff810cb4c7>] rcu_process_callbacks+0x3bb/0x691
[   63.980094]  [<ffffffff810cb3cd>] ? rcu_process_callbacks+0x2c1/0x691
[   63.980094]  [<ffffffff817e6d4d>] ? u32_destroy_key+0x6d/0x6d
[   63.980094]  [<ffffffff810780a4>] __do_softirq+0x142/0x323
[   63.980094]  [<ffffffff810782a8>] run_ksoftirqd+0x23/0x53
[   63.980094]  [<ffffffff81092126>] smpboot_thread_fn+0x203/0x221
[   63.980094]  [<ffffffff81091f23>] ? smpboot_unpark_thread+0x33/0x33
[   63.980094]  [<ffffffff8108e44d>] kthread+0xc9/0xd1
[   63.980094]  [<ffffffff819e00ea>] ? do_wait_for_common+0xf8/0x125
[   63.980094]  [<ffffffff8108e384>] ? __kthread_parkme+0x61/0x61
[   63.980094]  [<ffffffff819e43ec>] ret_from_fork+0x7c/0xb0
[   63.980094]  [<ffffffff8108e384>] ? __kthread_parkme+0x61/0x61

tp could be freed in call_rcu callback too, the order is not guaranteed.

John Fastabend says:

====================
Its worth noting why this is safe. Any running schedulers will either
read the valid class field or it will be zeroed.

All schedulers today when the class is 0 do a lookup using the
same call used by the tcf_exts_bind(). So even if we have a running
classifier hit the null class pointer it will do a lookup and get
to the same result. This is particularly fragile at the moment because
the only way to verify this is to audit the schedulers call sites.
====================

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 22:00:42 -04:00
WANG Cong
6e0565697a net_sched: fix another crash in cls_tcindex
This patch fixes the following crash:

[  166.670795] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  166.674230] IP: [<ffffffff814b739f>] __list_del_entry+0x5c/0x98
[  166.674230] PGD d0ea5067 PUD ce7fc067 PMD 0
[  166.674230] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  166.674230] CPU: 1 PID: 775 Comm: tc Not tainted 3.17.0-rc6+ #642
[  166.674230] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  166.674230] task: ffff8800d03c4d20 ti: ffff8800cae7c000 task.ti: ffff8800cae7c000
[  166.674230] RIP: 0010:[<ffffffff814b739f>]  [<ffffffff814b739f>] __list_del_entry+0x5c/0x98
[  166.674230] RSP: 0018:ffff8800cae7f7d0  EFLAGS: 00010207
[  166.674230] RAX: 0000000000000000 RBX: ffff8800cba8d700 RCX: ffff8800cba8d700
[  166.674230] RDX: 0000000000000000 RSI: dead000000200200 RDI: ffff8800cba8d700
[  166.674230] RBP: ffff8800cae7f7d0 R08: 0000000000000001 R09: 0000000000000001
[  166.674230] R10: 0000000000000000 R11: 000000000000859a R12: ffffffffffffffe8
[  166.674230] R13: ffff8800cba8c5b8 R14: 0000000000000001 R15: ffff8800cba8d700
[  166.674230] FS:  00007fdb5f04a740(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
[  166.674230] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  166.674230] CR2: 0000000000000000 CR3: 00000000cf929000 CR4: 00000000000006e0
[  166.674230] Stack:
[  166.674230]  ffff8800cae7f7e8 ffffffff814b73e8 ffff8800cba8d6e8 ffff8800cae7f828
[  166.674230]  ffffffff817caeec 0000000000000046 ffff8800cba8c5b0 ffff8800cba8c5b8
[  166.674230]  0000000000000000 0000000000000001 ffff8800cf8e33e8 ffff8800cae7f848
[  166.674230] Call Trace:
[  166.674230]  [<ffffffff814b73e8>] list_del+0xd/0x2b
[  166.674230]  [<ffffffff817caeec>] tcf_action_destroy+0x4c/0x71
[  166.674230]  [<ffffffff817ca0ce>] tcf_exts_destroy+0x20/0x2d
[  166.674230]  [<ffffffff817ec2b5>] tcindex_delete+0x196/0x1b7

struct list_head can not be simply copied and we should always init it.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 22:00:42 -04:00
Tom Herbert
54bc9bac30 gre: Set inner protocol in v4 and v6 GRE transmit
Call skb_set_inner_protocol to set inner Ethernet protocol to
protocol being encapsulation by GRE before tunnel_xmit. This is
needed for GSO if UDP encapsulation (fou) is being done.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 21:35:51 -04:00
Tom Herbert
077c5a0948 ipip: Set inner IP protocol in ipip
Call skb_set_inner_ipproto to set inner IP protocol to IPPROTO_IPV4
before tunnel_xmit. This is needed if UDP encapsulation (fou) is
being done.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 21:35:51 -04:00
Tom Herbert
469471cdfc sit: Set inner IP protocol in sit
Call skb_set_inner_ipproto to set inner IP protocol to IPPROTO_IPV6
before tunnel_xmit. This is needed if UDP encapsulation (fou) is
being done.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 21:35:51 -04:00
Tom Herbert
8bce6d7d0d udp: Generalize skb_udp_segment
skb_udp_segment is the function called from udp4_ufo_fragment to
segment a UDP tunnel packet. This function currently assumes
segmentation is transparent Ethernet bridging (i.e. VXLAN
encapsulation). This patch generalizes the function to
operate on either Ethertype or IP protocol.

The inner_protocol field must be set to the protocol of the inner
header. This can now be either an Ethertype or an IP protocol
(in a union). A new flag in the skbuff indicates which type is
effective. skb_set_inner_protocol and skb_set_inner_ipproto
helper functions were added to set the inner_protocol. These
functions are called from the point where the tunnel encapsulation
is occuring.

When skb_udp_tunnel_segment is called, the function to segment the
inner packet is selected based on the inner IP or Ethertype. In the
case of an IP protocol encapsulation, the function is derived from
inet[6]_offloads. In the case of Ethertype, skb->protocol is
set to the inner_protocol and skb_mac_gso_segment is called. (GRE
currently does this, but it might be possible to lookup the protocol
in offload_base and call the appropriate segmenation function
directly).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 21:35:51 -04:00
Eric Dumazet
ce1a4ea3f1 net: avoid one atomic operation in skb_clone()
Fast clone cloning can actually avoid an atomic_inc(), if we
guarantee prior clone_ref value is 1.

This requires a change kfree_skbmem(), to perform the
atomic_dec_and_test() on clone_ref before setting fclone to
SKB_FCLONE_UNAVAILABLE.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 21:27:23 -04:00
Fabian Frederick
e500f488c2 net/dccp/ccid.c: add __init to ccid_activate
ccid_activate is only called by __init ccid_initialize_builtins in same module.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 18:33:13 -04:00
Fabian Frederick
0c5b8a4629 net/dccp/proto.c: add __init to dccp_mib_init
dccp_mib_init is only called by __init dccp_init in same module.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 18:33:13 -04:00
Eric Dumazet
d0bf4a9e92 net: cleanup and document skb fclone layout
Lets use a proper structure to clearly document and implement
skb fast clones.

Then, we might experiment more easily alternative layouts.

This patch adds a new skb_fclone_busy() helper, used by tcp and xfrm,
to stop leaking of implementation details.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 16:34:25 -04:00
Yuchung Cheng
b248230c34 tcp: abort orphan sockets stalling on zero window probes
Currently we have two different policies for orphan sockets
that repeatedly stall on zero window ACKs. If a socket gets
a zero window ACK when it is transmitting data, the RTO is
used to probe the window. The socket is aborted after roughly
tcp_orphan_retries() retries (as in tcp_write_timeout()).

But if the socket was idle when it received the zero window ACK,
and later wants to send more data, we use the probe timer to
probe the window. If the receiver always returns zero window ACKs,
icsk_probes keeps getting reset in tcp_ack() and the orphan socket
can stall forever until the system reaches the orphan limit (as
commented in tcp_probe_timer()). This opens up a simple attack
to create lots of hanging orphan sockets to burn the memory
and the CPU, as demonstrated in the recent netdev post "TCP
connection will hang in FIN_WAIT1 after closing if zero window is
advertised." http://www.spinics.net/lists/netdev/msg296539.html

This patch follows the design in RTO-based probe: we abort an orphan
socket stalling on zero window when the probe timer reaches both
the maximum backoff and the maximum RTO. For example, an 100ms RTT
connection will timeout after roughly 153 seconds (0.3 + 0.6 +
.... + 76.8) if the receiver keeps the window shut. If the orphan
socket passes this check, but the system already has too many orphans
(as in tcp_out_of_resources()), we still abort it but we'll also
send an RST packet as the connection may still be active.

In addition, we change TCP_USER_TIMEOUT to cover (life or dead)
sockets stalled on zero-window probes. This changes the semantics
of TCP_USER_TIMEOUT slightly because it previously only applies
when the socket has pending transmission.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reported-by: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 16:27:52 -04:00
Fabian Frederick
cb57659a15 cipso: add __init to cipso_v4_cache_init
cipso_v4_cache_init is only called by __init cipso_v4_init

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 15:46:20 -04:00
Fabian Frederick
57a02c39c1 inet: frags: add __init to ip4_frags_ctl_register
ip4_frags_ctl_register is only called by __init ipfrag_init

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 15:46:19 -04:00
Fabian Frederick
47d7a88c18 tcp: add __init to tcp_init_mem
tcp_init_mem is only called by __init tcp_init.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 15:41:14 -04:00
Thierry Reding
e506d405ac net: dsa: Fix build warning for !PM_SLEEP
The dsa_switch_suspend() and dsa_switch_resume() functions are only used
when PM_SLEEP is enabled, so they need #ifdef CONFIG_PM_SLEEP protection
to avoid a compiler warning.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 15:24:00 -04:00
Eric Dumazet
2c804d0f8f ipv4: mentions skb_gro_postpull_rcsum() in inet_gro_receive()
Proper CHECKSUM_COMPLETE support needs to adjust skb->csum
when we remove one header. Its done using skb_gro_postpull_rcsum()

In the case of IPv4, we know that the adjustment is not really needed,
because the checksum over IPv4 header is 0. Lets add a comment to
ease code comprehension and avoid copy/paste errors.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 13:44:05 -04:00
Fabian Frederick
f0a0c1cedf ieee802154: fix __init functions
Commit 3243acd37f
("ieee802154: add __init to lowpan_frags_sysctl_register")

added __init to lowpan_frags_ns_sysctl_register instead of
lowpan_frags_sysctl_register

Suggested-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 02:03:13 -04:00
Trond Myklebust
72c23f0819 Merge branch 'bugfixes' into linux-next
* bugfixes:
  NFSv4.1: Fix an NFSv4.1 state renewal regression
  NFSv4: fix open/lock state recovery error handling
  NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
  NFS: Fabricate fscache server index key correctly
  SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT
  nfs: fix duplicate proc entries
2014-09-30 17:21:41 -04:00
Li RongQing
a12a601ed1 tcp: Change tcp_slow_start function to return void
No caller uses the return value, so make this function return void.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 17:09:16 -04:00
Fabian Frederick
3243acd37f ieee802154: add __init to lowpan_frags_sysctl_register
lowpan_frags_sysctl_register is only called by __init lowpan_net_frag_init
(part of the lowpan module).

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 17:08:06 -04:00
Fabian Frederick
0d4a2f9a33 irda: add __init to irlan_open
irlan_open is only called by __init irlan_init in same module.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 17:08:06 -04:00
Florian Westphal
57f5877c11 netfilter: bridge: build br_nf_core only if required
Eric reports build failure with
CONFIG_BRIDGE_NETFILTER=n

We insist to build br_nf_core.o unconditionally, but we must only do so
if br_netfilter was enabled, else it fails to build due to
functions being defined to empty stubs (and some structure members
being defined out).

Also, BRIDGE_NETFILTER=y|m makes no sense when BRIDGE=n.

Fixes: 34666d467 (netfilter: bridge: move br_netfilter out of the core)
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 14:07:51 -04:00
Hannes Frederic Sowa
705f1c869d ipv6: remove rt6i_genid
Eric Dumazet noticed that all no-nonexthop or no-gateway routes which
are already marked DST_HOST (e.g. input routes routes) will always be
invalidated during sk_dst_check. Thus per-socket dst caching absolutely
had no effect and early demuxing had no effect.

Thus this patch removes rt6i_genid: fn_sernum already gets modified during
add operations, so we only must ensure we mutate fn_sernum during ipv6
address remove operations. This is a fairly cost extensive operations,
but address removal should not happen that often. Also our mtu update
functions do the same and we heard no complains so far. xfrm policy
changes also cause a call into fib6_flush_trees. Also plug a hole in
rt6_info (no cacheline changes).

I verified via tracing that this change has effect.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 14:00:48 -04:00
James Morris
6c8ff877cd Merge commit 'v3.16' into next 2014-10-01 00:44:04 +10:00
John Fastabend
b0ab6f9275 net: sched: enable per cpu qstats
After previous patches to simplify qstats the qstats can be
made per cpu with a packed union in Qdisc struct.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 01:02:26 -04:00
John Fastabend
6401585366 net: sched: restrict use of qstats qlen
This removes the use of qstats->qlen variable from the classifiers
and makes it an explicit argument to gnet_stats_copy_queue().

The qlen represents the qdisc queue length and is packed into
the qstats at the last moment before passnig to user space. By
handling it explicitely we avoid, in the percpu stats case, having
to figure out which per_cpu variable to put it in.

It would probably be best to remove it from qstats completely
but qstats is a user space ABI and can't be broken. A future
patch could make an internal only qstats structure that would
avoid having to allocate an additional u32 variable on the
Qdisc struct. This would make the qstats struct 128bits instead
of 128+32.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 01:02:26 -04:00
John Fastabend
25331d6ce4 net: sched: implement qstat helper routines
This adds helpers to manipulate qstats logic and replaces locations
that touch the counters directly. This simplifies future patches
to push qstats onto per cpu counters.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 01:02:26 -04:00
John Fastabend
22e0f8b932 net: sched: make bstats per cpu and estimator RCU safe
In order to run qdisc's without locking statistics and estimators
need to be handled correctly.

To resolve bstats make the statistics per cpu. And because this is
only needed for qdiscs that are running without locks which is not
the case for most qdiscs in the near future only create percpu
stats when qdiscs set the TCQ_F_CPUSTATS flag.

Next because estimators use the bstats to calculate packets per
second and bytes per second the estimator code paths are updated
to use the per cpu statistics.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 01:02:26 -04:00
Ignacy Gawędzki
17c9c82326 ematch: Fix matching of inverted containers.
Negated expressions and sub-expressions need to have their flags checked for
TCF_EM_INVERT and their result negated accordingly.

Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 15:31:29 -04:00
Eric Dumazet
73d3fe6d1c gro: fix aggregation for skb using frag_list
In commit 8a29111c7c ("net: gro: allow to build full sized skb")
I added a regression for linear skb that traditionally force GRO
to use the frag_list fallback.

Erez Shitrit found that at most two segments were aggregated and
the "if (skb_gro_len(p) != pinfo->gso_size)" test was failing.

This is because pinfo at this spot still points to the last skb in the
chain, instead of the first one, where we find the correct gso_size
information.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 8a29111c7c ("net: gro: allow to build full sized skb")
Reported-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 15:17:59 -04:00
David S. Miller
852248449c Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
pull request: netfilter/ipvs updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next,
most relevantly they are:

1) Four patches to make the new nf_tables masquerading support
   independent of the x_tables infrastructure. This also resolves a
   compilation breakage if the masquerade target is disabled but the
   nf_tables masq expression is enabled.

2) ipset updates via Jozsef Kadlecsik. This includes the addition of the
   skbinfo extension that allows you to store packet metainformation in the
   elements. This can be used to fetch and restore this to the packets through
   the iptables SET target, patches from Anton Danilov.

3) Add the hash:mac set type to ipset, from Jozsef Kadlecsick.

4) Add simple weighted fail-over scheduler via Simon Horman. This provides
   a fail-over IPVS scheduler (unlike existing load balancing schedulers).
   Connections are directed to the appropriate server based solely on
   highest weight value and server availability, patch from Kenny Mathis.

5) Support IPv6 real servers in IPv4 virtual-services and vice versa.
   Simon Horman informs that the motivation for this is to allow more
   flexibility in the choice of IP version offered by both virtual-servers
   and real-servers as they no longer need to match: An IPv4 connection
   from an end-user may be forwarded to a real-server using IPv6 and
   vice versa. No ip_vs_sync support yet though. Patches from Alex Gartrell
   and Julian Anastasov.

6) Add global generation ID to the nf_tables ruleset. When dumping from
   several different object lists, we need a way to identify that an update
   has ocurred so userspace knows that it needs to refresh its lists. This
   also includes a new command to obtain the 32-bits generation ID. The
   less significant 16-bits of this ID is also exposed through res_id field
   in the nfnetlink header to quickly detect the interference and retry when
   there is no risk of ID wraparound.

7) Move br_netfilter out of the bridge core. The br_netfilter code is
   built in the bridge core by default. This causes problems of different
   kind to people that don't want this: Jesper reported performance drop due
   to the inconditional hook registration and I remember to have read complains
   on netdev from people regarding the unexpected behaviour of our bridging
   stack when br_netfilter is enabled (fragmentation handling, layer 3 and
   upper inspection). People that still need this should easily undo the
   damage by modprobing the new br_netfilter module.

8) Dump the set policy nf_tables that allows set parameterization. So
   userspace can keep user-defined preferences when saving the ruleset.
   From Arturo Borrero.

9) Use __seq_open_private() helper function to reduce boiler plate code
   in x_tables, From Rob Jones.

10) Safer default behaviour in case that you forget to load the protocol
   tracker. Daniel Borkmann and Florian Westphal detected that if your
   ruleset is stateful, you allow traffic to at least one single SCTP port
   and the SCTP protocol tracker is not loaded, then any SCTP traffic may
   be pass through unfiltered. After this patch, the connection tracking
   classifies SCTP/DCCP/UDPlite/GRE packets as invalid if your kernel has
   been compiled with support for these modules.
====================

Trivially resolved conflict in include/linux/skbuff.h, Eric moved some
netfilter skbuff members around, and the netfilter tree adjusted the
ifdef guards for the bridging info pointer.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 14:46:53 -04:00
Florian Westphal
735d383117 tcp: change TCP_ECN prefixes to lower case
Suggested by Stephen. Also drop inline keyword and let compiler decide.

gcc 4.7.3 decides to no longer inline tcp_ecn_check_ce, so split it up.
The actual evaluation is not inlined anymore while the ECN_OK test is.

Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 14:41:22 -04:00
Florian Westphal
d82bd12298 tcp: move TCP_ECN_create_request out of header
After Octavian Purdilas tcp ipv4/ipv6 unification work this helper only
has a single callsite.

While at it, convert name to lowercase, suggested by Stephen.

Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 14:41:22 -04:00
Steve Wise
7e5be28827 svcrdma: advertise the correct max payload
Svcrdma currently advertises 1MB, which is too large.  The correct value
is the minimum of RPCSVC_MAXPAYLOAD and the max scatter-gather allowed
in an NFSRDMA IO chunk * the host page size. This bug is usually benign
because the Linux X64 NFSRDMA client correctly limits the payload size to
the correct value (64*4096 = 256KB).  But if the Linux client is PPC64
with a 64KB page size, then the client will indeed use a payload size
that will overflow the server.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-09-29 14:35:18 -04:00
Li RongQing
41c91996d9 tcp: remove unnecessary assignment.
This variable i is overwritten to 0 by following code

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 12:31:12 -04:00
Eric Dumazet
b193722731 net: reorganize sk_buff for faster __copy_skb_header()
With proliferation of bit fields in sk_buff, __copy_skb_header() became
quite expensive, showing as the most expensive function in a GSO
workload.

__copy_skb_header() performance is also critical for non GSO TCP
operations, as it is used from skb_clone()

This patch carefully moves all the fields that were not copied in a
separate zone : cloned, nohdr, fclone, peeked, head_frag, xmit_more

Then I moved all other fields and all other copied fields in a section
delimited by headers_start[0]/headers_end[0] section so that we
can use a single memcpy() call, inlined by compiler using long
word load/stores.

I also tried to make all copies in the natural orders of sk_buff,
to help hardware prefetching.

I made sure sk_buff size did not change.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 12:27:20 -04:00
Jukka Rissanen
156395c998 Bluetooth: 6lowpan: Enable multicast support
Set multicast support for 6lowpan network interface.
This is needed in every network interface that supports IPv6.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-29 17:06:38 +02:00
Jukka Rissanen
36b3dd250d Bluetooth: 6lowpan: Ensure header compression does not corrupt IPv6 header
If skb is going to multiple destinations, then make sure that we
do not overwrite the common IPv6 headers. So before compressing
the IPv6 headers, we copy the skb and that is then sent to 6LoWPAN
Bluetooth devices.

This is a similar patch as what was done for IEEE 802.154 6LoWPAN
in commit f19f4f9525 ("ieee802154: 6lowpan: ensure header compression
does not corrupt ipv6 header")

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-29 17:06:38 +02:00
Florian Westphal
db29a9508a netfilter: conntrack: disable generic tracking for known protocols
Given following iptables ruleset:

-P FORWARD DROP
-A FORWARD -m sctp --dport 9 -j ACCEPT
-A FORWARD -p tcp --dport 80 -j ACCEPT
-A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT

One would assume that this allows SCTP on port 9 and TCP on port 80.
Unfortunately, if the SCTP conntrack module is not loaded, this allows
*all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT,
which we think is a security issue.

This is because on the first SCTP packet on port 9, we create a dummy
"generic l4" conntrack entry without any port information (since
conntrack doesn't know how to extract this information).

All subsequent packets that are unknown will then be in established
state since they will fallback to proto_generic and will match the
'generic' entry.

Our originally proposed version [1] completely disabled generic protocol
tracking, but Jozsef suggests to not track protocols for which a more
suitable helper is available, hence we now mitigate the issue for in
tree known ct protocol helpers only, so that at least NAT and direction
information will still be preserved for others.

 [1] http://www.spinics.net/lists/netfilter-devel/msg33430.html

Joint work with Daniel Borkmann.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-29 12:17:49 +02:00
Arturo Borrero
9363dc4b59 netfilter: nf_tables: store and dump set policy
We want to know in which cases the user explicitly sets the policy
options. In that case, we also want to dump back the info.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-29 11:28:03 +02:00
Jukka Rissanen
59790aa287 Bluetooth: 6lowpan: Make sure skb exists before accessing it
We need to make sure that the saved skb exists when
resuming or suspending a CoC channel. This can happen if
initial credits is 0 when channel is connected.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-29 10:10:02 +02:00
Daniel Borkmann
e3118e8359 net: tcp: add DCTCP congestion control algorithm
This work adds the DataCenter TCP (DCTCP) congestion control
algorithm [1], which has been first published at SIGCOMM 2010 [2],
resp. follow-up analysis at SIGMETRICS 2011 [3] (and also, more
recently as an informational IETF draft available at [4]).

DCTCP is an enhancement to the TCP congestion control algorithm for
data center networks. Typical data center workloads are i.e.
i) partition/aggregate (queries; bursty, delay sensitive), ii) short
messages e.g. 50KB-1MB (for coordination and control state; delay
sensitive), and iii) large flows e.g. 1MB-100MB (data update;
throughput sensitive). DCTCP has therefore been designed for such
environments to provide/achieve the following three requirements:

  * High burst tolerance (incast due to partition/aggregate)
  * Low latency (short flows, queries)
  * High throughput (continuous data updates, large file
    transfers) with commodity, shallow buffered switches

The basic idea of its design consists of two fundamentals: i) on the
switch side, packets are being marked when its internal queue
length > threshold K (K is chosen so that a large enough headroom
for marked traffic is still available in the switch queue); ii) the
sender/host side maintains a moving average of the fraction of marked
packets, so each RTT, F is being updated as follows:

 F := X / Y, where X is # of marked ACKs, Y is total # of ACKs
 alpha := (1 - g) * alpha + g * F, where g is a smoothing constant

The resulting alpha (iow: probability that switch queue is congested)
is then being used in order to adaptively decrease the congestion
window W:

 W := (1 - (alpha / 2)) * W

The means for receiving marked packets resp. marking them on switch
side in DCTCP is the use of ECN.

RFC3168 describes a mechanism for using Explicit Congestion Notification
from the switch for early detection of congestion, rather than waiting
for segment loss to occur.

However, this method only detects the presence of congestion, not
the *extent*. In the presence of mild congestion, it reduces the TCP
congestion window too aggressively and unnecessarily affects the
throughput of long flows [4].

DCTCP, as mentioned, enhances Explicit Congestion Notification (ECN)
processing to estimate the fraction of bytes that encounter congestion,
rather than simply detecting that some congestion has occurred. DCTCP
then scales the TCP congestion window based on this estimate [4],
thus it can derive multibit feedback from the information present in
the single-bit sequence of marks in its control law. And thus act in
*proportion* to the extent of congestion, not its *presence*.

Switches therefore set the Congestion Experienced (CE) codepoint in
packets when internal queue lengths exceed threshold K. Resulting,
DCTCP delivers the same or better throughput than normal TCP, while
using 90% less buffer space.

It was found in [2] that DCTCP enables the applications to handle 10x
the current background traffic, without impacting foreground traffic.
Moreover, a 10x increase in foreground traffic did not cause any
timeouts, and thus largely eliminates TCP incast collapse problems.

The algorithm itself has already seen deployments in large production
data centers since then.

We did a long-term stress-test and analysis in a data center, short
summary of our TCP incast tests with iperf compared to cubic:

This test measured DCTCP throughput and latency and compared it with
CUBIC throughput and latency for an incast scenario. In this test, 19
senders sent at maximum rate to a single receiver. The receiver simply
ran iperf -s.

The senders ran iperf -c <receiver> -t 30. All senders started
simultaneously (using local clocks synchronized by ntp).

This test was repeated multiple times. Below shows the results from a
single test. Other tests are similar. (DCTCP results were extremely
consistent, CUBIC results show some variance induced by the TCP timeouts
that CUBIC encountered.)

For this test, we report statistics on the number of TCP timeouts,
flow throughput, and traffic latency.

1) Timeouts (total over all flows, and per flow summaries):

            CUBIC            DCTCP
  Total     3227             25
  Mean       169.842          1.316
  Median     183              1
  Max        207              5
  Min        123              0
  Stddev      28.991          1.600

Timeout data is taken by measuring the net change in netstat -s
"other TCP timeouts" reported. As a result, the timeout measurements
above are not restricted to the test traffic, and we believe that it
is likely that all of the "DCTCP timeouts" are actually timeouts for
non-test traffic. We report them nevertheless. CUBIC will also include
some non-test timeouts, but they are drawfed by bona fide test traffic
timeouts for CUBIC. Clearly DCTCP does an excellent job of preventing
TCP timeouts. DCTCP reduces timeouts by at least two orders of
magnitude and may well have eliminated them in this scenario.

2) Throughput (per flow in Mbps):

            CUBIC            DCTCP
  Mean      521.684          521.895
  Median    464              523
  Max       776              527
  Min       403              519
  Stddev    105.891            2.601
  Fairness    0.962            0.999

Throughput data was simply the average throughput for each flow
reported by iperf. By avoiding TCP timeouts, DCTCP is able to
achieve much better per-flow results. In CUBIC, many flows
experience TCP timeouts which makes flow throughput unpredictable and
unfair. DCTCP, on the other hand, provides very clean predictable
throughput without incurring TCP timeouts. Thus, the standard deviation
of CUBIC throughput is dramatically higher than the standard deviation
of DCTCP throughput.

Mean throughput is nearly identical because even though cubic flows
suffer TCP timeouts, other flows will step in and fill the unused
bandwidth. Note that this test is something of a best case scenario
for incast under CUBIC: it allows other flows to fill in for flows
experiencing a timeout. Under situations where the receiver is issuing
requests and then waiting for all flows to complete, flows cannot fill
in for timed out flows and throughput will drop dramatically.

3) Latency (in ms):

            CUBIC            DCTCP
  Mean      4.0088           0.04219
  Median    4.055            0.0395
  Max       4.2              0.085
  Min       3.32             0.028
  Stddev    0.1666           0.01064

Latency for each protocol was computed by running "ping -i 0.2
<receiver>" from a single sender to the receiver during the incast
test. For DCTCP, "ping -Q 0x6 -i 0.2 <receiver>" was used to ensure
that traffic traversed the DCTCP queue and was not dropped when the
queue size was greater than the marking threshold. The summary
statistics above are over all ping metrics measured between the single
sender, receiver pair.

The latency results for this test show a dramatic difference between
CUBIC and DCTCP. CUBIC intentionally overflows the switch buffer
which incurs the maximum queue latency (more buffer memory will lead
to high latency.) DCTCP, on the other hand, deliberately attempts to
keep queue occupancy low. The result is a two orders of magnitude
reduction of latency with DCTCP - even with a switch with relatively
little RAM. Switches with larger amounts of RAM will incur increasing
amounts of latency for CUBIC, but not for DCTCP.

4) Convergence and stability test:

This test measured the time that DCTCP took to fairly redistribute
bandwidth when a new flow commences. It also measured DCTCP's ability
to remain stable at a fair bandwidth distribution. DCTCP is compared
with CUBIC for this test.

At the commencement of this test, a single flow is sending at maximum
rate (near 10 Gbps) to a single receiver. One second after that first
flow commences, a new flow from a distinct server begins sending to
the same receiver as the first flow. After the second flow has sent
data for 10 seconds, the second flow is terminated. The first flow
sends for an additional second. Ideally, the bandwidth would be evenly
shared as soon as the second flow starts, and recover as soon as it
stops.

The results of this test are shown below. Note that the flow bandwidth
for the two flows was measured near the same time, but not
simultaneously.

DCTCP performs nearly perfectly within the measurement limitations
of this test: bandwidth is quickly distributed fairly between the two
flows, remains stable throughout the duration of the test, and
recovers quickly. CUBIC, in contrast, is slow to divide the bandwidth
fairly, and has trouble remaining stable.

  CUBIC                      DCTCP

  Seconds  Flow 1  Flow 2    Seconds  Flow 1  Flow 2
   0       9.93    0          0       9.92    0
   0.5     9.87    0          0.5     9.86    0
   1       8.73    2.25       1       6.46    4.88
   1.5     7.29    2.8        1.5     4.9     4.99
   2       6.96    3.1        2       4.92    4.94
   2.5     6.67    3.34       2.5     4.93    5
   3       6.39    3.57       3       4.92    4.99
   3.5     6.24    3.75       3.5     4.94    4.74
   4       6       3.94       4       5.34    4.71
   4.5     5.88    4.09       4.5     4.99    4.97
   5       5.27    4.98       5       4.83    5.01
   5.5     4.93    5.04       5.5     4.89    4.99
   6       4.9     4.99       6       4.92    5.04
   6.5     4.93    5.1        6.5     4.91    4.97
   7       4.28    5.8        7       4.97    4.97
   7.5     4.62    4.91       7.5     4.99    4.82
   8       5.05    4.45       8       5.16    4.76
   8.5     5.93    4.09       8.5     4.94    4.98
   9       5.73    4.2        9       4.92    5.02
   9.5     5.62    4.32       9.5     4.87    5.03
  10       6.12    3.2       10       4.91    5.01
  10.5     6.91    3.11      10.5     4.87    5.04
  11       8.48    0         11       8.49    4.94
  11.5     9.87    0         11.5     9.9     0

SYN/ACK ECT test:

This test demonstrates the importance of ECT on SYN and SYN-ACK packets
by measuring the connection probability in the presence of competing
flows for a DCTCP connection attempt *without* ECT in the SYN packet.
The test was repeated five times for each number of competing flows.

              Competing Flows  1 |    2 |    4 |    8 |   16
                               ------------------------------
Mean Connection Probability    1 | 0.67 | 0.45 | 0.28 |    0
Median Connection Probability  1 | 0.65 | 0.45 | 0.25 |    0

As the number of competing flows moves beyond 1, the connection
probability drops rapidly.

Enabling DCTCP with this patch requires the following steps:

DCTCP must be running both on the sender and receiver side in your
data center, i.e.:

  sysctl -w net.ipv4.tcp_congestion_control=dctcp

Also, ECN functionality must be enabled on all switches in your
data center for DCTCP to work. The default ECN marking threshold (K)
heuristic on the switch for DCTCP is e.g., 20 packets (30KB) at
1Gbps, and 65 packets (~100KB) at 10Gbps (K > 1/7 * C * RTT, [4]).

In above tests, for each switch port, traffic was segregated into two
queues. For any packet with a DSCP of 0x01 - or equivalently a TOS of
0x04 - the packet was placed into the DCTCP queue. All other packets
were placed into the default drop-tail queue. For the DCTCP queue,
RED/ECN marking was enabled, here, with a marking threshold of 75 KB.
More details however, we refer you to the paper [2] under section 3).

There are no code changes required to applications running in user
space. DCTCP has been implemented in full *isolation* of the rest of
the TCP code as its own congestion control module, so that it can run
without a need to expose code to the core of the TCP stack, and thus
nothing changes for non-DCTCP users.

Changes in the CA framework code are minimal, and DCTCP algorithm
operates on mechanisms that are already available in most Silicon.
The gain (dctcp_shift_g) is currently a fixed constant (1/16) from
the paper, but we leave the option that it can be chosen carefully
to a different value by the user.

In case DCTCP is being used and ECN support on peer site is off,
DCTCP falls back after 3WHS to operate in normal TCP Reno mode.

ss {-4,-6} -t -i diag interface:

  ... dctcp wscale:7,7 rto:203 rtt:2.349/0.026 mss:1448 cwnd:2054
  ssthresh:1102 ce_state 0 alpha 15 ab_ecn 0 ab_tot 735584
  send 10129.2Mbps pacing_rate 20254.1Mbps unacked:1822 retrans:0/15
  reordering:101 rcv_space:29200

  ... dctcp-reno wscale:7,7 rto:201 rtt:0.711/1.327 ato:40 mss:1448
  cwnd:10 ssthresh:1102 fallback_mode send 162.9Mbps pacing_rate
  325.5Mbps rcv_rtt:1.5 rcv_space:29200

More information about DCTCP can be found in [1-4].

  [1] http://simula.stanford.edu/~alizade/Site/DCTCP.html
  [2] http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp-final.pdf
  [3] http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp_analysis-full.pdf
  [4] http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00

Joint work with Florian Westphal and Glenn Judd.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:10 -04:00
Florian Westphal
9890092e46 net: tcp: more detailed ACK events and events for CE marked packets
DataCenter TCP (DCTCP) determines cwnd growth based on ECN information
and ACK properties, e.g. ACK that updates window is treated differently
than DUPACK.

Also DCTCP needs information whether ACK was delayed ACK. Furthermore,
DCTCP also implements a CE state machine that keeps track of CE markings
of incoming packets.

Therefore, extend the congestion control framework to provide these
event types, so that DCTCP can be properly implemented as a normal
congestion algorithm module outside of the core stack.

Joint work with Daniel Borkmann and Glenn Judd.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:10 -04:00
Florian Westphal
7354c8c389 net: tcp: split ack slow/fast events from cwnd_event
The congestion control ops "cwnd_event" currently supports
CA_EVENT_FAST_ACK and CA_EVENT_SLOW_ACK events (among others).
Both FAST and SLOW_ACK are only used by Westwood congestion
control algorithm.

This removes both flags from cwnd_event and adds a new
in_ack_event callback for this. The goal is to be able to
provide more detailed information about ACKs, such as whether
ECE flag was set, or whether the ACK resulted in a window
update.

It is required for DataCenter TCP (DCTCP) congestion control
algorithm as it makes a different choice depending on ECE being
set or not.

Joint work with Daniel Borkmann and Glenn Judd.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:10 -04:00
Daniel Borkmann
30e502a34b net: tcp: add flag for ca to indicate that ECN is required
This patch adds a flag to TCP congestion algorithms that allows
for requesting to mark IPv4/IPv6 sockets with transport as ECN
capable, that is, ECT(0), when required by a congestion algorithm.

It is currently used and needed in DataCenter TCP (DCTCP), as it
requires both peers to assert ECT on all IP packets sent - it
uses ECN feedback (i.e. CE, Congestion Encountered information)
from switches inside the data center to derive feedback to the
end hosts.

Therefore, simply add a new flag to icsk_ca_ops. Note that DCTCP's
algorithm/behaviour slightly diverges from RFC3168, therefore this
is only (!) enabled iff the assigned congestion control ops module
has requested this. By that, we can tightly couple this logic really
only to the provided congestion control ops.

Joint work with Florian Westphal and Glenn Judd.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:10 -04:00
Florian Westphal
55d8694fa8 net: tcp: assign tcp cong_ops when tcp sk is created
Split assignment and initialization from one into two functions.

This is required by followup patches that add Datacenter TCP
(DCTCP) congestion control algorithm - we need to be able to
determine if the connection is moderated by DCTCP before the
3WHS has finished.

As we walk the available congestion control list during the
assignment, we are always guaranteed to have Reno present as
it's fixed compiled-in. Therefore, since we're doing the
early assignment, we don't have a real use for the Reno alias
tcp_init_congestion_ops anymore and can thus remove it.

Actual usage of the congestion control operations are being
made after the 3WHS has finished, in some cases however we
can access get_info() via diag if implemented, therefore we
need to zero out the private area for those modules.

Joint work with Daniel Borkmann and Glenn Judd.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:10 -04:00
John Fastabend
53dfd50181 net: sched: cls_rcvp, complete rcu conversion
This completes the cls_rsvp conversion to RCU safe
copy, update semantics.

As a result all cases of tcf_exts_change occur on
empty lists now.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:04:55 -04:00
WANG Cong
68f6a7c6c9 net_sched: fix another regression in cls_tcindex
Clearly the following change is not expected:

	-       if (!cp.perfect && !cp.h)
	-               cp.alloc_hash = cp.hash;
	+       if (!cp->perfect && cp->h)
	+               cp->alloc_hash = cp->hash;

Fixes: commit 331b72922c ("net: sched: RCU cls_tcindex")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:34:35 -04:00
WANG Cong
02c5e84413 net_sched: fix errno in tcindex_set_parms()
When kmemdup() fails, we should return -ENOMEM.

Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:34:22 -04:00
Rick Jones
825bae5d97 arp: Do not perturb drop profiles with ignored ARP packets
We do not wish to disturb dropwatch or perf drop profiles with an ARP
we will ignore.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:30:35 -04:00
WANG Cong
18d0264f63 net_sched: remove the first parameter from tcf_exts_destroy()
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:29:01 -04:00
David S. Miller
f5c7e1a47a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2014-09-25

1) Remove useless hash_resize_mutex in xfrm_hash_resize().
   This mutex is used only there, but xfrm_hash_resize()
   can't be called concurrently at all. From Ying Xue.

2) Extend policy hashing to prefixed policies based on
   prefix lenght thresholds. From Christophe Gouault.

3) Make the policy hash table thresholds configurable
   via netlink. From Christophe Gouault.

4) Remove the maximum authentication length for AH.
   This was needed to limit stack usage. We switched
   already to allocate space, so no need to keep the
   limit. From Herbert Xu.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:19:15 -04:00
WANG Cong
2c1a4311b6 neigh: check error pointer instead of NULL for ipv4_neigh_lookup()
Fixes: commit f187bc6efb ("ipv4: No need to set generic neighbour pointer")
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:16:04 -04:00
Florian Fainelli
7905288f09 net: dsa: allow switches driver to implement get/set EEE
Allow switches driver to query and enable/disable EEE on a per-port
basis by implementing the ethtool_{get,set}_eee settings and delegating
these operations to the switch driver.

set_eee() will need to coordinate with the PHY driver to make sure that
EEE is enabled, the link-partner supports it and the auto-negotiation
result is satisfactory.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:14:09 -04:00
Florian Fainelli
b2f2af21e3 net: dsa: allow enabling and disable switch ports
Whenever a per-port network device is used/unused, invoke the switch
driver port_enable/port_disable callbacks to allow saving as much power
as possible by disabling unused parts of the switch (RX/TX logic, memory
arrays, PHYs...). We supply a PHY device argument to make sure the
switch driver can act on the PHY device if needed (like putting/taking
the PHY out of deep low power mode).

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:14:08 -04:00
Florian Fainelli
f7f1de51ed net: dsa: start and stop the PHY state machine
dsa_slave_open() should start the PHY library state machine for its PHY
interface, and dsa_slave_close() should stop the PHY library state
machine accordingly.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:14:08 -04:00
Peter Pan(潘卫平)
155c6e1ad4 tcp: use tcp_flags in tcp_data_queue()
This patch is a cleanup which follows the idea in commit e11ecddf51 (tcp: use
TCP_SKB_CB(skb)->tcp_flags in input path),
and it may reduce register pressure since skb->cb[] access is fast,
bacause skb is probably in a register.

v2: remove variable th
v3: reword the changelog

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:37:57 -04:00
Eric Dumazet
cd7d8498c9 tcp: change tcp_skb_pcount() location
Our goal is to access no more than one cache line access per skb in
a write or receive queue when doing the various walks.

After recent TCP_SKB_CB() reorganizations, it is almost done.

Last part is tcp_skb_pcount() which currently uses
skb_shinfo(skb)->gso_segs, which is a terrible choice, because it needs
3 cache lines in current kernel (skb->head, skb->end, and
shinfo->gso_segs are all in 3 different cache lines, far from skb->cb)

This very simple patch reuses space currently taken by tcp_tw_isn
only in input path, as tcp_skb_pcount is only needed for skb stored in
write queue.

This considerably speeds up tcp_ack(), granted we avoid shinfo->tx_flags
to get SKBTX_ACK_TSTAMP, which seems possible.

This also speeds up all sack processing in general.

This speeds up tcp_sendmsg() because it no longer has to access/dirty
shinfo.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:36:48 -04:00
Eric Dumazet
971f10eca1 tcp: better TCP_SKB_CB layout to reduce cache line misses
TCP maintains lists of skb in write queue, and in receive queues
(in order and out of order queues)

Scanning these lists both in input and output path usually requires
access to skb->next, TCP_SKB_CB(skb)->seq, and TCP_SKB_CB(skb)->end_seq

These fields are currently in two different cache lines, meaning we
waste lot of memory bandwidth when these queues are big and flows
have either packet drops or packet reorders.

We can move TCP_SKB_CB(skb)->header at the end of TCP_SKB_CB, because
this header is not used in fast path. This allows TCP to search much faster
in the skb lists.

Even with regular flows, we save one cache line miss in fast path.

Thanks to Christoph Paasch for noticing we need to cleanup
skb->cb[] (IPCB/IP6CB) before entering IP stack in tx path,
and that I forgot IPCB use in tcp_v4_hnd_req() and tcp_v4_save_options().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:35:43 -04:00
Eric Dumazet
a224772db8 ipv6: add a struct inet6_skb_parm param to ipv6_opt_accepted()
ipv6_opt_accepted() assumes IP6CB(skb) holds the struct inet6_skb_parm
that it needs. Lets not assume this, as TCP stack might use a different
place.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:35:43 -04:00
Eric Dumazet
24a2d43d88 ipv4: rename ip_options_echo to __ip_options_echo()
ip_options_echo() assumes struct ip_options is provided in &IPCB(skb)->opt
Lets break this assumption, but provide a helper to not change all call points.

ip_send_unicast_reply() gets a new struct ip_options pointer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:35:42 -04:00
Steffen Klassert
cd0a0bd9b8 ip6_gre: Return an error when adding an existing tunnel.
ip6gre_tunnel_locate() should not return an existing tunnel if
create is true. Otherwise it is possible to add the same
tunnel multiple times without getting an error.

So return NULL if the tunnel that should be created already
exists.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:19:46 -04:00
Steffen Klassert
d814b847be ip6_vti: Return an error when adding an existing tunnel.
vti6_locate() should not return an existing tunnel if
create is true. Otherwise it is possible to add the same
tunnel multiple times without getting an error.

So return NULL if the tunnel that should be created already
exists.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:19:46 -04:00
Steffen Klassert
2b0bb01b6e ip6_tunnel: Return an error when adding an existing tunnel.
ip6_tnl_locate() should not return an existing tunnel if
create is true. Otherwise it is possible to add the same
tunnel multiple times without getting an error.

So return NULL if the tunnel that should be created already
exists.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:19:46 -04:00
Dan Williams
3f33407856 net: make tcp_cleanup_rbuf private
net_dma was the only external user so this can become local to tcp.c
again.

Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2014-09-28 07:22:21 -07:00
Dan Williams
d27f9bc104 net_dma: revert 'copied_early'
Now that tcp_dma_try_early_copy() is gone nothing ever sets
copied_early.

Also reverts "53240c208776 tcp: Fix possible double-ack w/ user dma"
since it is no longer necessary.

Cc: Ali Saidi <saidi@engin.umich.edu>
Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Neal Cardwell <ncardwell@google.com>
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2014-09-28 07:22:21 -07:00
Dan Williams
7bced39751 net_dma: simple removal
Per commit "77873803363c net_dma: mark broken" net_dma is no longer used
and there is no plan to fix it.

This is the mechanical removal of bits in CONFIG_NET_DMA ifdef guards.
Reverting the remainder of the net_dma induced changes is deferred to
subsequent patches.

Marked for stable due to Roman's report of a memory leak in
dma_pin_iovec_pages():

    https://lkml.org/lkml/2014/9/3/177

Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: David Whipple <whipple@securedatainnovations.ch>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: <stable@vger.kernel.org>
Reported-by: Roman Gushchin <klamm@yandex-team.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2014-09-28 07:05:16 -07:00
Nicolas Dichtel
5a4ee9a9a0 ip6gre: add a rtnl link alias for ip6gretap
With this alias, we don't need to load manually the module before adding an
ip6gretap interface with iproute2.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 17:15:57 -04:00
Eric Dumazet
ff04a771ad net : optimize skb_release_data()
Cache skb_shinfo(skb) in a variable to avoid computing it multiple
times.

Reorganize the tests to remove one indentation level.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 16:53:49 -04:00
Wang Sheng-Hui
8280bf00fd net/openvswitch: remove dup comment in vport.h
Remove the duplicated comment
"/* The following definitions are for users of the vport subsytem: */"
in vport.h

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 16:42:33 -04:00
David S. Miller
e7af85db54 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
nf pull request for net

This series contains netfilter fixes for net, they are:

1) Fix lockdep splat in nft_hash when releasing sets from the
   rcu_callback context. We don't the mutex there anymore.

2) Remove unnecessary spinlock_bh in the destroy path of the nf_tables
   rbtree set type from rcu_callback context.

3) Fix another lockdep splat in rhashtable. None of the callers hold
   a mutex when calling rhashtable_destroy.

4) Fix duplicated error reporting from nfnetlink when aborting and
   replaying a batch.

5) Fix a Kconfig issue reported by kbuild robot.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 16:21:29 -04:00
LEROY Christophe
58e3cac561 net: optimise inet_proto_csum_replace4()
csum_partial() is a generic function which is not optimised for small fixed
length calculations, and its use requires to store "from" and "to" values in
memory while we already have them available in registers. This also has impact,
especially on RISC processors. In the same spirit as the change done by
Eric Dumazet on csum_replace2(), this patch rewrites inet_proto_csum_replace4()
taking into account RFC1624.

I spotted during a NATted tcp transfert that csum_partial() is one of top 5
consuming functions (around 8%), and the second user of csum_partial() is
inet_proto_csum_replace4().

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 16:14:17 -04:00
Eric Dumazet
f4a775d144 net: introduce __skb_header_release()
While profiling TCP stack, I noticed one useless atomic operation
in tcp_sendmsg(), caused by skb_header_release().

It turns out all current skb_header_release() users have a fresh skb,
that no other user can see, so we can avoid one atomic operation.

Introduce __skb_header_release() to clearly document this.

This gave me a 1.5 % improvement on TCP_RR workload.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 15:40:06 -04:00
David S. Miller
57219dc7bf Merge tag 'master-2014-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-09-22

Please pull this batch of updates intended for the 3.18 stream...

For the mac80211 bits, Johannes says:

"This time, I have some rate minstrel improvements, support for a very
small feature from CCX that Steinar reverse-engineered, dynamic ACK
timeout support, a number of changes for TDLS, early support for radio
resource measurement and many fixes. Also, I'm changing a number of
places to clear key memory when it's freed and Intel claims copyright
for code they developed."

For the bluetooth bits, Johan says:

"Here are some more patches intended for 3.18. Most of them are cleanups
or fixes for SMP. The only exception is a fix for BR/EDR L2CAP fixed
channels which should now work better together with the L2CAP
information request procedure."

For the iwlwifi bits, Emmanuel says:

"I fix here dvm which was broken by my last pull request. Arik
continues to work on TDLS and Luca solved a few issues in CT-Kill. Eyal
keeps digging into rate scaling code, more to come soon. Besides this,
nothing really special here."

Beyond that, there are the usual big batches of updates to ath9k, b43,
mwifiex, and wil6210 as well as a handful of other bits here and there.
Also, rtlwifi gets some btcoexist attention from Larry.

Please let me know if there are problems!
====================

Had to adjust the wil6210 code to comply with Joe Perches's recent
change in net-next to make the netdev_*() routines return void instead
of 'int'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 15:39:24 -04:00