Since we no-longer do special handling of SMP within l2cap_core.c we
don't have any code for calling l2cap_conn_del() when smp.c doesn't like
the data it gets. At the same time we cannot simply export
l2cap_conn_del() since it will try to lock the channels it calls into
whereas we already hold the lock in the smp.c l2cap_chan callbacks (i.e.
it'd lead to a deadlock).
This patch adds a new l2cap_conn_shutdown() API which is very similar to
l2cap_conn_del() except that it defers the call to l2cap_conn_del()
through a workqueue, thereby making it safe to use it from an L2CAP
channel callback.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Now that we have all the necessary pieces in place we can fully convert
SMP to use the L2CAP channel infrastructure. This patch adds the
necessary callbacks and removes the now unneeded conn->smp_chan pointer.
One notable behavioral change in this patch comes from the following
code snippet:
- case L2CAP_CID_SMP:
- if (smp_sig_channel(conn, skb))
- l2cap_conn_del(conn->hcon, EACCES);
This piece of code was essentially forcing a disconnection if garbage
SMP data was received. The l2cap_conn_del() function is private to
l2cap_conn.c so we don't have access to it anymore when using the L2CAP
channel callbacks. Therefore, the behavior of the new code is simply to
return errors in the recv() callback (which is simply the old
smp_sig_channel()), but no disconnection will occur.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Now that we have per-adapter SMP data thanks to the root SMP L2CAP
channel we can take advantage of it and attach the AES crypto context
(only used for SMP) to it. This means that the smp_irk_matches() and
smp_generate_rpa() function can be converted to internally handle the
AES context.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch creates the initial SMP L2CAP channels and a skeleton for
their callbacks. There is one per-adapter channel created upon adapter
registration, and then one channel per-connection created through the
new_connection callback. The channels are registered with the reserved
CID 0x1f for now in order to not conflict with existing SMP
functionality. Once everything is in place the value can be changed to
what it should be, i.e. L2CAP_CID_SMP.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In preparation for converting SMP to use l2cap_chan it's useful to add a
few more callback helpers so that smp.c won't need to define all of its
own.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The LE ATT socket uses a special trick where it temporarily sets
BT_CONFIG state for the duration of a security level elevation. In order
to not require special hacks for going back to BT_CONNECTED state in the
l2cap_core.c code the most reasonable place to resume the state is the
resume callback. This patch adds a new flag to track the pending
security level change and ensures that the state is set back to
BT_CONNECTED in the resume callback in case the flag is set.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Similar to our hci_update_background_scan() function we can simplify a
lot of code by creating a unified helper function for doing page scan
updates. This patch adds such a function to hci_core.c and updates all
the relevant places to use it.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There are several situations where we're interested in knowing whether
we're currently in the process of powering off an adapter. This patch
adds a convenience function for the purpose and makes it public since
we'll soon need to access it from hci_event.c as well.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Pull SElinux fixes from Paul Moore:
"Two small patches to fix a couple of build warnings in SELinux and
NetLabel. The patches are obvious enough that I don't think any
additional explanation is necessary, but it basically boils down to
the usual: I was stupid, and these patches fix some of the stupid.
Both patches were posted earlier this week to the SELinux list, and
that is where they sat as I didn't think there were noteworthy enough
to go upstream at this point in time, but DaveM would rather see them
upstream now so who am I to argue. As the patches are both very
small"
* 'stable-3.17' of git://git.infradead.org/users/pcmoore/selinux:
selinux: remove unused variabled in the netport, netnode, and netif caches
netlabel: fix the netlbl_catmap_setlong() dummy function
When I added the netlbl_catmap_setlong() function I mistakenly forgot
to mark the associated dummy function as an inline.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Moore <pmoore@redhat.com>
sock_tx_timestamp() should not ignore initial *tx_flags value, as TCP
stack can store SKBTX_SHARED_FRAG in it.
Also first argument (struct sock *) can be const.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 4ed2d765df ("net-timestamp: TCP timestamping")
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
"Highlights:
1) Steady transitioning of the BPF instructure to a generic spot so
all kernel subsystems can make use of it, from Alexei Starovoitov.
2) SFC driver supports busy polling, from Alexandre Rames.
3) Take advantage of hash table in UDP multicast delivery, from David
Held.
4) Lighten locking, in particular by getting rid of the LRU lists, in
inet frag handling. From Florian Westphal.
5) Add support for various RFC6458 control messages in SCTP, from
Geir Ola Vaagland.
6) Allow to filter bridge forwarding database dumps by device, from
Jamal Hadi Salim.
7) virtio-net also now supports busy polling, from Jason Wang.
8) Some low level optimization tweaks in pktgen from Jesper Dangaard
Brouer.
9) Add support for ipv6 address generation modes, so that userland
can have some input into the process. From Jiri Pirko.
10) Consolidate common TCP connection request code in ipv4 and ipv6,
from Octavian Purdila.
11) New ARP packet logger in netfilter, from Pablo Neira Ayuso.
12) Generic resizable RCU hash table, with intial users in netlink and
nftables. From Thomas Graf.
13) Maintain a name assignment type so that userspace can see where a
network device name came from (enumerated by kernel, assigned
explicitly by userspace, etc.) From Tom Gundersen.
14) Automatic flow label generation on transmit in ipv6, from Tom
Herbert.
15) New packet timestamping facilities from Willem de Bruijn, meant to
assist in measuring latencies going into/out-of the packet
scheduler, latency from TCP data transmission to ACK, etc"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1536 commits)
cxgb4 : Disable recursive mailbox commands when enabling vi
net: reduce USB network driver config options.
tg3: Modify tg3_tso_bug() to handle multiple TX rings
amd-xgbe: Perform phy connect/disconnect at dev open/stop
amd-xgbe: Use dma_set_mask_and_coherent to set DMA mask
net: sun4i-emac: fix memory leak on bad packet
sctp: fix possible seqlock seadlock in sctp_packet_transmit()
Revert "net: phy: Set the driver when registering an MDIO bus device"
cxgb4vf: Turn off SGE RX/TX Callback Timers and interrupts in PCI shutdown routine
team: Simplify return path of team_newlink
bridge: Update outdated comment on promiscuous mode
net-timestamp: ACK timestamp for bytestreams
net-timestamp: TCP timestamping
net-timestamp: SCHED timestamp on entering packet scheduler
net-timestamp: add key to disambiguate concurrent datagrams
net-timestamp: move timestamp flags out of sk_flags
net-timestamp: extend SCM_TIMESTAMPING ancillary data struct
cxgb4i : Move stray CPL definitions to cxgb4 driver
tcp: reduce spurious retransmits due to transient SACK reneging
qlcnic: Initialize dcbnl_ops before register_netdev
...
Pull security subsystem updates from James Morris:
"In this release:
- PKCS#7 parser for the key management subsystem from David Howells
- appoint Kees Cook as seccomp maintainer
- bugfixes and general maintenance across the subsystem"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (94 commits)
X.509: Need to export x509_request_asymmetric_key()
netlabel: shorter names for the NetLabel catmap funcs/structs
netlabel: fix the catmap walking functions
netlabel: fix the horribly broken catmap functions
netlabel: fix a problem when setting bits below the previously lowest bit
PKCS#7: X.509 certificate issuer and subject are mandatory fields in the ASN.1
tpm: simplify code by using %*phN specifier
tpm: Provide a generic means to override the chip returned timeouts
tpm: missing tpm_chip_put in tpm_get_random()
tpm: Properly clean sysfs entries in error path
tpm: Add missing tpm_do_selftest to ST33 I2C driver
PKCS#7: Use x509_request_asymmetric_key()
Revert "selinux: fix the default socket labeling in sock_graft()"
X.509: x509_request_asymmetric_keys() doesn't need string length arguments
PKCS#7: fix sparse non static symbol warning
KEYS: revert encrypted key change
ima: add support for measuring and appraising firmware
firmware_class: perform new LSM checks
security: introduce kernel_fw_from_file hook
PKCS#7: Missing inclusion of linux/err.h
...
Conflicts:
drivers/net/Makefile
net/ipv6/sysctl_net_ipv6.c
Two ipv6_table_template[] additions overlap, so the index
of the ipv6_table[x] assignments needed to be adjusted.
In the drivers/net/Makefile case, we've gotten rid of the
garbage whereby we had to list every single USB networking
driver in the top-level Makefile, there is just one
"USB_NETWORKING" that guards everything.
Signed-off-by: David S. Miller <davem@davemloft.net>
Datagrams timestamped on transmission can coexist in the kernel stack
and be reordered in packet scheduling. When reading looped datagrams
from the socket error queue it is not always possible to unique
correlate looped data with original send() call (for application
level retransmits). Even if possible, it may be expensive and complex,
requiring packet inspection.
Introduce a data-independent ID mechanism to associate timestamps with
send calls. Pass an ID alongside the timestamp in field ee_data of
sock_extended_err.
The ID is a simple 32 bit unsigned int that is associated with the
socket and incremented on each send() call for which software tx
timestamp generation is enabled.
The feature is enabled only if SOF_TIMESTAMPING_OPT_ID is set, to
avoid changing ee_data for existing applications that expect it 0.
The counter is reset each time the flag is reenabled. Reenabling
does not change the ID of already submitted data. It is possible
to receive out of order IDs if the timestamp stream is not quiesced
first.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_flags is reaching its limit. New timestamping options will not fit.
Move all of them into a new field sk->sk_tsflags.
Added benefit is that this removes boilerplate code to convert between
SOF_TIMESTAMPING_.. and SOCK_TIMESTAMPING_.. in getsockopt/setsockopt.
SOCK_TIMESTAMPING_RX_SOFTWARE is also used to toggle the receive
timestamp logic (netstamp_needed). That can be simplified and this
last key removed, but will leave that for a separate patch.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
The u16 in sock can be moved into a 16-bit hole below sk_gso_max_segs,
though that scatters tstamp fields throughout the struct.
Signed-off-by: David S. Miller <davem@davemloft.net>
Applications that request kernel tx timestamps with SO_TIMESTAMPING
read timestamps as recvmsg() ancillary data. The response is defined
implicitly as timespec[3].
1) define struct scm_timestamping explicitly and
2) add support for new tstamp types. On tx, scm_timestamping always
accompanies a sock_extended_err. Define previously unused field
ee_info to signal the type of ts[0]. Introduce SCM_TSTAMP_SND to
define the existing behavior.
The reception path is not modified. On rx, no struct similar to
sock_extended_err is passed along with SCM_TIMESTAMPING.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit reduces spurious retransmits due to apparent SACK reneging
by only reacting to SACK reneging that persists for a short delay.
When a sequence space hole at snd_una is filled, some TCP receivers
send a series of ACKs as they apparently scan their out-of-order queue
and cumulatively ACK all the packets that have now been consecutiveyly
received. This is essentially misbehavior B in "Misbehaviors in TCP
SACK generation" ACM SIGCOMM Computer Communication Review, April
2011, so we suspect that this is from several common OSes (Windows
2000, Windows Server 2003, Windows XP). However, this issue has also
been seen in other cases, e.g. the netdev thread "TCP being hoodwinked
into spurious retransmissions by lack of timestamps?" from March 2014,
where the receiver was thought to be a BSD box.
Since snd_una would temporarily be adjacent to a previously SACKed
range in these scenarios, this receiver behavior triggered the Linux
SACK reneging code path in the sender. This led the sender to clear
the SACK scoreboard, enter CA_Loss, and spuriously retransmit
(potentially) every packet from the entire write queue at line rate
just a few milliseconds before the ACK for each packet arrives at the
sender.
To avoid such situations, now when a sender sees apparent reneging it
does not yet retransmit, but rather adjusts the RTO timer to give the
receiver a little time (max(RTT/2, 10ms)) to send us some more ACKs
that will restore sanity to the SACK scoreboard. If the reneging
persists until this RTO then, as before, we clear the SACK scoreboard
and enter CA_Loss.
A 10ms delay tolerates a receiver sending such a stream of ACKs at
56Kbit/sec. And to allow for receivers with slower or more congested
paths, we wait for at least RTT/2.
We validated the resulting max(RTT/2, 10ms) delay formula with a mix
of North American and South American Google web server traffic, and
found that for ACKs displaying transient reneging:
(1) 90% of inter-ACK delays were less than 10ms
(2) 99% of inter-ACK delays were less than RTT/2
In tests on Google web servers this commit reduced reneging events by
75%-90% (as measured by the TcpExtTCPSACKReneging counter), without
any measurable impact on latency for user HTTP and SPDY requests.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/6lowpan/iphc.c
Minor conflicts in iphc.c were changes overlapping with some
style cleanups.
John W. Linville says:
====================
Please pull this last(?) batch of wireless change intended for the
3.17 stream...
For the NFC bits, Samuel says:
"This is a rather quiet one, we have:
- A new driver from ST Microelectronics for their NCI ST21NFCB,
including device tree support.
- p2p support for the ST21NFCA driver
- A few fixes an enhancements for the NFC digital laye"
For the Atheros bits, Kalle says:
"Michal and Janusz did some important RX aggregation fixes, basically we
were missing RX reordering altogether. The 10.1 firmware doesn't support
Ad-Hoc mode and Michal fixed ath10k so that it doesn't advertise Ad-Hoc
support with that firmware. Also he implemented a workaround for a KVM
issue."
For the Bluetooth bits, Gustavo and Johan say:
"To quote Gustavo from his previous request:
'Some last minute fixes for -next. We have a fix for a use after free in
RFCOMM, another fix to an issue with ADV_DIRECT_IND and one for ADV_IND with
auto-connection handling. Last, we added support for reading the codec and
MWS setting for controllers that support these features.'
Additionally there are fixes to LE scanning, an update to conform to the 4.1
core specification as well as fixes for tracking the page scan state. All
of these fixes are important for 3.17."
And,
"We've got:
- 6lowpan fixes/cleanups
- A couple crash fixes, one for the Marvell HCI driver and another in LE SMP.
- Fix for an incorrect connected state check
- Fix for the bondable requirement during pairing (an issue which had
crept in because of using "pairable" when in fact the actual meaning
was "bondable" (these have different meanings in Bluetooth)"
Along with those are some late-breaking hardware support patches in
brcmfmac and b43 as well as a stray ath9k patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use kmem_cache to allocate/free inet_frag_queue objects since they're
all the same size per inet_frags user and are alloced/freed in high volumes
thus making it a perfect case for kmem_cache.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the flags to an enum definion, swap FIRST_IN/LAST_IN to be in increasing
order and add comments explaining each flag and the inet_frag_queue struct
members.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The last_in field has been used to store various flags different from
first/last frag in so give it a more descriptive name: flags.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Historically the NetLabel LSM secattr catmap functions and data
structures have had very long names which makes a mess of the NetLabel
code and anyone who uses NetLabel. This patch renames the catmap
functions and structures from "*_secattr_catmap_*" to just "*_catmap_*"
which improves things greatly.
There are no substantial code or logic changes in this patch.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
The NetLabel secattr catmap functions, and the SELinux import/export
glue routines, were broken in many horrible ways and the SELinux glue
code fiddled with the NetLabel catmap structures in ways that we
probably shouldn't allow. At some point this "worked", but that was
likely due to a bit of dumb luck and sub-par testing (both inflicted
by yours truly). This patch corrects these problems by basically
gutting the code in favor of something less obtuse and restoring the
NetLabel abstractions in the SELinux catmap glue code.
Everything is working now, and if it decides to break itself in the
future this code will be much easier to debug than the code it
replaces.
One noteworthy side effect of the changes is that it is no longer
necessary to allocate a NetLabel catmap before calling one of the
NetLabel APIs to set a bit in the catmap. NetLabel will automatically
allocate the catmap nodes when needed, resulting in less allocations
when the lowest bit is greater than 255 and less code in the LSMs.
Cc: stable@vger.kernel.org
Reported-by: Christian Evans <frodox@zoho.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
The NetLabel category (catmap) functions have a problem in that they
assume categories will be set in an increasing manner, e.g. the next
category set will always be larger than the last. Unfortunately, this
is not a valid assumption and could result in problems when attempting
to set categories less than the startbit in the lowest catmap node.
In some cases kernel panics and other nasties can result.
This patch corrects the problem by checking for this and allocating a
new catmap node instance and placing it at the front of the list.
Cc: stable@vger.kernel.org
Reported-by: Christian Evans <frodox@zoho.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
The SCTP socket extensions API document describes the v4mapping option as
follows:
8.1.15. Set/Clear IPv4 Mapped Addresses (SCTP_I_WANT_MAPPED_V4_ADDR)
This socket option is a Boolean flag which turns on or off the
mapping of IPv4 addresses. If this option is turned on, then IPv4
addresses will be mapped to V6 representation. If this option is
turned off, then no mapping will be done of V4 addresses and a user
will receive both PF_INET6 and PF_INET type addresses on the socket.
See [RFC3542] for more details on mapped V6 addresses.
This description isn't really in line with what the code does though.
Introduce addr_to_user (renamed addr_v4map), which should be called
before any sockaddr is passed back to user space. The new function
places the sockaddr into the correct format depending on the
SCTP_I_WANT_MAPPED_V4_ADDR option.
Audit all places that touched v4mapped and either sanely construct
a v4 or v6 address then call addr_to_user, or drop the
unnecessary v4mapped check entirely.
Audit all places that call addr_to_user and verify they are on a sycall
return path.
Add a custom getname that formats the address properly.
Several bugs are addressed:
- SCTP_I_WANT_MAPPED_V4_ADDR=0 often returned garbage for
addresses to user space
- The addr_len returned from recvmsg was not correct when
returning AF_INET on a v6 socket
- flowlabel and scope_id were not zerod when promoting
a v4 to v6
- Some syscalls like bind and connect behaved differently
depending on v4mapped
Tested bind, getpeername, getsockname, connect, and recvmsg for proper
behaviour in v4mapped = 1 and 0 cases.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains netfilter updates for net-next, they are:
1) Add the reject expression for the nf_tables bridge family, this
allows us to send explicit reject (TCP RST / ICMP dest unrech) to
the packets matching a rule.
2) Simplify and consolidate the nf_tables set dumping logic. This uses
netlink control->data to filter out depending on the request.
3) Perform garbage collection in xt_hashlimit using a workqueue instead
of a timer, which is problematic when many entries are in place in
the tables, from Eric Dumazet.
4) Remove leftover code from the removed ulog target support, from
Paul Bolle.
5) Dump unmodified flags in the netfilter packet accounting when resetting
counters, so userspace knows that a counter was in overquota situation,
from Alexey Perevalov.
6) Fix wrong usage of the bitwise functions in nfnetlink_acct, also from
Alexey.
7) Fix a crash when adding new set element with an empty NFTA_SET_ELEM_LIST
attribute.
This patchset also includes a couple of cleanups for xt_LED from
Duan Jiong and for nf_conntrack_ipv4 (using coccinelle) from
Himangi Saraogi.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ipv4 tunnels created with "local any remote $ip" didn't work properly since
7d442fab0 (ipv4: Cache dst in tunnels). 99% of packets sent via those tunnels
had src addr = 0.0.0.0. That was because only dst_entry was cached, although
fl4.saddr has to be cached too. Every time ip_tunnel_xmit used cached dst_entry
(tunnel_rtable_get returned non-NULL), fl4.saddr was initialized with
tnl_params->saddr (= 0 in our case), and wasn't changed until iptunnel_xmit().
This patch adds saddr to ip_tunnel->dst_cache, fixing this issue.
Reported-by: Sergey Popov <pinkbyte@gentoo.org>
Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This setting maps to the HCI_BONDABLE flag which tracks whether we're
bondable or not. Therefore, rename the mgmt setting and respective
command accordingly.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The HCI_PAIRABLE flag isn't actually controlling whether we're pairable
but whether we're bondable. Therefore, rename it accordingly.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This define is unused since commit
96cb3eb7a1 ("6lowpan: fix fragmentation on
sending side"). It is a worst case scenario for payload calculation.
Since commit 96cb3eb7a1 we calculation the
payload to use the optimal size.
This define is also necessary for ieee802154 6lowpan only and the file
include/net/6lowpan.h should contain generic 6lowpan things only.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes the own implementation to check of link-layer,
broadcast and any address type and use the IPv6 api for that.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We create a proc dir for each network device, this will cause
conflicts when the devices have name "all" or "default".
Rather than emitting an ugly kernel warning, we could just
fail earlier by checking the device name.
Reported-by: Stephane Chazelas <stephane.chazelas@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SO_TIMESTAMPING API defines three types of timestamps: software,
hardware in raw format (hwtstamp) and hardware converted to system
format (syststamp). The last has been deprecated in favor of combining
hwtstamp with a PTP clock driver. There are no active users in the
kernel.
The option was device driver dependent. If set, but without hardware
support, the correct behavior is to return zero in the relevant field
in the SCM_TIMESTAMPING ancillary message. Without device drivers
implementing the option, this field is effectively always zero.
Remove the internal plumbing to dissuage new drivers from implementing
the feature. Keep the SOF_TIMESTAMPING_SYS_HARDWARE flag, however, to
avoid breaking existing applications that request the timestamp.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the NFC pull request for 3.17.
This is a rather quiet one, we have:
- A new driver from ST Microelectronics for their NCI ST21NFCB,
including device tree support.
- p2p support for the ST21NFCA driver
- A few fixes an enhancements for the NFC digital layer
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJT0CJhAAoJEIqAPN1PVmxKtnAP/jOu1cP/EsegdByqmEumjSGQ
mBv7wC+/r91wV6Hny8Ij8uxyN+/QfxF75nwPjTrPSApo7mHOlF8FeY0/EktxJazo
c/NIAntNwREIbpkc68CIQNrYr9YFkKEM+7lCV1ImALUb/CPfiH7Fx7vGdhkCKqkc
B8etkLyeJtl9EqSZM7GI2YrEbPzPEWLk2ydVQ9BccxvN0I8rc29DnD+DNR5sL6Wa
7bEmZF5J/GnVErvnS8uHDGgerpzlFAj7MONrsxADZCHPie0F87T3wXAGwKlaBY6R
nOde0YCS749JxN1c2AdmgIadwo5XqjeSbjQ1g8L8HJTWY8Tl9Vw8GoQFN9qhHkSM
gkOya77n0R8SZWoJzo3BFBMpsncVG2cJIsnwpRBIWMUzg76mGe7Fzl21KCXD8xyy
xvy8Ar3QKPDTu6uvLNEPk9+cfl/JxQAoLNL30eeZGBDBSg/g2ptNiBYNLXvVoEtU
B/xyTdmA1SXQnOKGKNxjFCo+WZDXSoTrWeml/uvBhprVAj6YS3K/imc9EiL4zcD7
72iNGZbIZRfw91x7VbbQ5Nb8PEyYsLef8ztUFM9HgliNgIDMHaQHoXYwmD4264uO
RGTETdHYQb0ltX3HsBiIgTNF+Y1sJUVP3TyEhjSk58TpCy/S6v9YjcT9RYmNpuPk
YftdxfapAKV0EJhcQc1r
=wR0y
-----END PGP SIGNATURE-----
Merge tag 'nfc-next-3.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
Samuel Ortiz <sameo@linux.intel.com> says:
"NFC: 3.17 pull request
This is the NFC pull request for 3.17.
This is a rather quiet one, we have:
- A new driver from ST Microelectronics for their NCI ST21NFCB,
including device tree support.
- p2p support for the ST21NFCA driver
- A few fixes an enhancements for the NFC digital layer"
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and
Jedidiah describe ways exploiting linux IP identifier generation to
infer whether two machines are exchanging packets.
With commit 73f156a6e8 ("inetpeer: get rid of ip_id_count"), we
changed IP id generation, but this does not really prevent this
side-channel technique.
This patch adds a random amount of perturbation so that IP identifiers
for a given destination [1] are no longer monotonically increasing after
an idle period.
Note that prandom_u32_max(1) returns 0, so if generator is used at most
once per jiffy, this patch inserts no hole in the ID suite and do not
increase collision probability.
This is jiffies based, so in the worst case (HZ=1000), the id can
rollover after ~65 seconds of idle time, which should be fine.
We also change the hash used in __ip_select_ident() to not only hash
on daddr, but also saddr and protocol, so that ICMP probes can not be
used to infer information for other protocols.
For IPv6, adds saddr into the hash as well, but not nexthdr.
If I ping the patched target, we can see ID are now hard to predict.
21:57:11.008086 IP (...)
A > target: ICMP echo request, seq 1, length 64
21:57:11.010752 IP (... id 2081 ...)
target > A: ICMP echo reply, seq 1, length 64
21:57:12.013133 IP (...)
A > target: ICMP echo request, seq 2, length 64
21:57:12.015737 IP (... id 3039 ...)
target > A: ICMP echo reply, seq 2, length 64
21:57:13.016580 IP (...)
A > target: ICMP echo request, seq 3, length 64
21:57:13.019251 IP (... id 3437 ...)
target > A: ICMP echo reply, seq 3, length 64
[1] TCP sessions uses a per flow ID generator not changed by this patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jeffrey Knockel <jeffk@cs.unm.edu>
Reported-by: Jedidiah R. Crandall <crandall@cs.unm.edu>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Hannes Frederic Sowa <hannes@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless-next 2014-07-25
Please pull this batch of updates intended for the 3.17 stream!
For the mac80211 bits, Johannes says:
"We have a lot of TDLS patches, among them a fix that should make hwsim
tests happy again. The rest, this time, is mostly small fixes."
For the Bluetooth bits, Gustavo says:
"Some more patches for 3.17. The most important change here is the move of
the 6lowpan code to net/6lowpan. It has been agreed with Davem that this
change will go through the bluetooth tree. The rest are mostly clean up and
fixes."
and,
"Here follows some more patches for 3.17. These are mostly fixes to what
we've sent to you before for next merge window."
For the iwlwifi bits, Emmanuel says:
"I have the usual amount of BT Coex stuff. Arik continues to work
on TDLS and Ariej contributes a few things for HS2.0. I added a few
more things to the firmware debugging infrastructure. Eran fixes a
small bug - pretty normal content."
And for the Atheros bits, Kalle says:
"For ath6kl me and Jessica added support for ar6004 hw3.0, our latest
version of ar6004.
For ath10k Janusz added a printout so that it's easier to check what
ath10k kconfig options are enabled. He also added a debugfs file to
configure maximum amsdu and ampdu values. Also we had few fixes as
usual."
On top of that is the usual large batch of various driver updates --
brcmfmac, mwifiex, the TI drivers, and wil6210 all get some action.
Rafał has also been very busy with b43 and related updates.
Also, I pulled the wireless tree into this in order to resolve a
merge conflict...
P.S. The change to fs/compat_ioctl.c reflects a name change in a
Bluetooth header file...
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Change formal parameter name to not shadow the global jiffies.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rehash is rare operation, don't force readers to take
the read-side rwlock.
Instead, we only have to detect the (rare) case where
the secret was altered while we are trying to insert
a new inetfrag queue into the table.
If it was changed, drop the bucket lock and recompute
the hash to get the 'new' chain bucket that we have to
insert into.
Joint work with Nikolay Aleksandrov.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
merge functionality into the eviction workqueue.
Instead of rebuilding every n seconds, take advantage of the upper
hash chain length limit.
If we hit it, mark table for rebuild and schedule workqueue.
To prevent frequent rebuilds when we're completely overloaded,
don't rebuild more than once every 5 seconds.
ipfrag_secret_interval sysctl is now obsolete and has been marked as
deprecated, it still can be changed so scripts won't be broken but it
won't have any effect. A comment is left above each unused secret_timer
variable to avoid confusion.
Joint work with Nikolay Aleksandrov.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 'nqueues' counter is protected by the lru list lock,
once thats removed this needs to be converted to atomic
counter. Given this isn't used for anything except for
reporting it to userspace via /proc, just remove it.
We still report the memory currently used by fragment
reassembly queues.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the high_thresh limit is reached we try to toss the 'oldest'
incomplete fragment queues until memory limits are below the low_thresh
value. This happens in softirq/packet processing context.
This has two drawbacks:
1) processors might evict a queue that was about to be completed
by another cpu, because they will compete wrt. resource usage and
resource reclaim.
2) LRU list maintenance is expensive.
But when constantly overloaded, even the 'least recently used' element is
recent, so removing 'lru' queue first is not 'fairer' than removing any
other fragment queue.
This moves eviction out of the fast path:
When the low threshold is reached, a work queue is scheduled
which then iterates over the table and removes the queues that exceed
the memory limits of the namespace. It sets a new flag called
INET_FRAG_EVICTED on the evicted queues so the proper counters will get
incremented when the queue is forcefully expired.
When the high threshold is reached, no more fragment queues are
created until we're below the limit again.
The LRU list is now unused and will be removed in a followup patch.
Joint work with Nikolay Aleksandrov.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
First step to move eviction handling into a work queue.
We lose two spots that accounted evicted fragments in MIB counters.
Accounting will be restored since the upcoming work-queue evictor
invokes the frag queue timer callbacks instead.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Store the default values for minimum and maximum advertising interval
with all the other controller defaults. These vaules are sent to the
adapter whenever advertising is (re)enabled.
Signed-off-by: Georg Lukas <georg@op-co.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The ulog targets were recently killed. A few references to the Kconfig
macros CONFIG_IP_NF_TARGET_ULOG and CONFIG_BRIDGE_EBT_ULOG were left
untouched. Kill these too.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When adding remote devices to the kernel using the Add Device management
command, these devices are explicitly allowed to connect. This kind of
incoming connections are possible even when the controller itself is
not connectable.
For BR/EDR this distinction is pretty simple since there is only one
type of incoming connections. With LE this is not that simple anymore
since there are ADV_IND and ADV_DIRECT_IND advertising events.
The ADV_DIRECT_IND advertising events are send for incoming (slave
initiated) connections only. And this is the only thing the kernel
should allow when adding devices using action 0x01. This meaning
of incoming connections is coming from BR/EDR and needs to be
mapped to LE the same way.
Supporting the auto-connection of devices using ADV_IND advertising
events is an important feature as well. However it does not map to
incoming connections. So introduce a new action 0x02 that allows
the kernel to connect to devices using ADV_DIRECT_IND and in addition
ADV_IND advertising reports.
This difference is represented by the new HCI_AUTO_CONN_DIRECT value
for only connecting to ADV_DIRECT_IND. For connection to ADV_IND and
ADV_DIRECT_IND the old value HCI_AUTO_CONN_ALWAYS is used.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
It hasn't been used since commit 0fd7bac(net: relax rcvbuf limits).
Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the Bluetooth controller supports Get MWS Transport Layer
Configuration command, then issue it during initialization.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
If the Bluetooth controller supports Read Local Supported Codecs
command, then issue it during initialization so that the list of
codecs is known.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The digital layer of the NFC subsystem currently
supports a 'tg_listen_mdaa' driver hook that supports
devices that can do mode detection and automatic
anticollision. However, there are some devices that
can do mode detection but not automatic anitcollision
so add the 'tg_listen_md' hook to support those devices.
In order for the digital layer to get the RF technology
detected by the device from the driver, add the
'tg_get_rf_tech' hook. It is only valid to call this
hook immediately after a successful call to 'tg_listen_md'.
CC: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Remove extra blank line that was inadvertently
added by a recent commit.
CC: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
stop_poll allows to stop CLF reader polling. Some other operations might be
necessary for some CLF to stop polling. For example in card mode.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
MSG_MORE and 'corking' a socket would require that the transmit of
a data chunk be delayed.
Rename the return value to be less specific.
Signed-off-by: David Laight <david.laight@aculab.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/infiniband/hw/cxgb4/device.c
The cxgb4 conflict was simply overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Some drivers may be performing most of Tx/Rx
aggregation on their own (e.g. in firmware)
including AddBa/DelBa negotiations but may
otherwise require Rx reordering assistance.
The patch exports 2 new functions for establishing
Rx aggregation sessions in assumption device
driver has taken care of the necessary
negotiations.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
[fix endian bug]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains updates for your net-next tree,
they are:
1) Use kvfree() helper function from x_tables, from Eric Dumazet.
2) Remove extra timer from the conntrack ecache extension, use a
workqueue instead to redeliver lost events to userspace instead,
from Florian Westphal.
3) Removal of the ulog targets for ebtables and iptables. The nflog
infrastructure superseded this almost 9 years ago, time to get rid
of this code.
4) Replace the list of loggers by an array now that we can only have
two possible non-overlapping logger flavours, ie. kernel ring buffer
and netlink logging.
5) Move Eric Dumazet's log buffer code to nf_log to reuse it from
all of the supported per-family loggers.
6) Consolidate nf_log_packet() as an unified interface for packet logging.
After this patch, if the struct nf_loginfo is available, it explicitly
selects the logger that is used.
7) Move ip and ip6 logging code from xt_LOG to the corresponding
per-family loggers. Thus, x_tables and nf_tables share the same code
for packet logging.
8) Add generic ARP packet logger, which is used by nf_tables. The
format aims to be consistent with the output of xt_LOG.
9) Add generic bridge packet logger. Again, this is used by nf_tables
and it routes the packets to the real family loggers. As a result,
we get consistent logging format for the bridge family. The ebt_log
logging code has been intentionally left in place not to break
backward compatibility since the logging output differs from xt_LOG.
10) Update nft_log to explicitly request the required family logger when
needed.
11) Finish nft_log so it supports arp, ip, ip6, bridge and inet families.
Allowing selection between netlink and kernel buffer ring logging.
12) Several fixes coming after the netfilter core logging changes spotted
by robots.
13) Use IS_ENABLED() macros whenever possible in the netfilter tree,
from Duan Jiong.
14) Removal of a couple of unnecessary branch before kfree, from Fabian
Frederick.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new "NFC_DIGITAL_FRAMING_*" calls to the digital
layer so the driver can make the necessary adjustments
when performing anticollision while in target mode.
The driver must ensure that the effect of these calls
happens after the following response has been sent but
before reception of the next request begins.
Acked-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
v2: fixed issue with checking return of dcbnl_rtnl_ops->getapp()
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even though our side requests authentication, the original action that
caused it may be remotely triggered, such as an incoming L2CAP or RFCOMM
connect request. To track this information introduce a new hci_conn flag
called HCI_CONN_AUTH_INITIATOR.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We're interested in whether an authentication request is because of a
remote or local action. So far hci_conn_security() has been used both
for incoming and outgoing actions (e.g. RFCOMM or L2CAP connect
requests) so without some modifications it cannot know which peer is
responsible for requesting authentication.
This patch adds a new "bool initiator" parameter to hci_conn_security()
to indicate which side is responsible for the request and updates the
current users to pass this information correspondingly.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
sparse is throwing warnings when building sunrpc modules due to some
endianness shenanigans in ipv6.h. Specifically:
CHECK net/sunrpc/addr.c
include/net/ipv6.h:573:17: warning: restricted __be64 degrades to integer
include/net/ipv6.h:577:34: warning: restricted __be32 degrades to integer
include/net/ipv6.h:573:17: warning: restricted __be64 degrades to integer
include/net/ipv6.h:577:34: warning: restricted __be32 degrades to integer
Sprinkle some endianness fixups to silence them. These should all get
fixed up at compile time, so I don't think this will add any extra work
to be done at runtime.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many multicast sources can have the same port which can result in a very
large list when hashing by port only. Hash by address and port instead
if this is the case. This makes multicast more similar to unicast.
On a 24-core machine receiving from 500 multicast sockets on the same
port, before this patch 80% of system CPU was used up by spin locking
and only ~25% of packets were successfully delivered.
With this patch, all packets are delivered and kernel overhead is ~8%
system CPU on spinlocks.
Signed-off-by: David Held <drheld@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/nf_tables fixes
The following patchset contains nf_tables fixes, they are:
1) Fix wrong transaction handling when the table flags are not
modified.
2) Fix missing rcu read_lock section in the netlink dump path, which
is not protected by the nfnl_lock.
3) Set NLM_F_DUMP_INTR in the netlink dump path to indicate
interferences with updates.
4) Fix 64 bits chain counters when they are retrieved from a 32 bits
arch, from Eric Dumazet.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements section 5.3.6. of RFC6458, that is, support
for 'SCTP Next Receive Information Structure' (SCTP_NXTINFO) which
is placed into ancillary data cmsghdr structure for each recvmsg()
call, if this information is already available when delivering the
current message.
This option can be enabled/disabled via setsockopt(2) on SOL_SCTP
level by setting an int value with 1/0 for SCTP_RECVNXTINFO in
user space applications as per RFC6458, section 8.1.30.
The sctp_nxtinfo structure is defined as per RFC as below ...
struct sctp_nxtinfo {
uint16_t nxt_sid;
uint16_t nxt_flags;
uint32_t nxt_ppid;
uint32_t nxt_length;
sctp_assoc_t nxt_assoc_id;
};
... and provided under cmsg_level IPPROTO_SCTP, cmsg_type
SCTP_NXTINFO, while cmsg_data[] contains struct sctp_nxtinfo.
Joint work with Daniel Borkmann.
Signed-off-by: Geir Ola Vaagland <geirola@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements section 5.3.5. of RFC6458, that is, support
for 'SCTP Receive Information Structure' (SCTP_RCVINFO) which is
placed into ancillary data cmsghdr structure for each recvmsg()
call.
This option can be enabled/disabled via setsockopt(2) on SOL_SCTP
level by setting an int value with 1/0 for SCTP_RECVRCVINFO in user
space applications as per RFC6458, section 8.1.29.
The sctp_rcvinfo structure is defined as per RFC as below ...
struct sctp_rcvinfo {
uint16_t rcv_sid;
uint16_t rcv_ssn;
uint16_t rcv_flags;
<-- 2 bytes hole -->
uint32_t rcv_ppid;
uint32_t rcv_tsn;
uint32_t rcv_cumtsn;
uint32_t rcv_context;
sctp_assoc_t rcv_assoc_id;
};
... and provided under cmsg_level IPPROTO_SCTP, cmsg_type
SCTP_RCVINFO, while cmsg_data[] contains struct sctp_rcvinfo.
An sctp_rcvinfo item always corresponds to the data in msg_iov.
Joint work with Daniel Borkmann.
Signed-off-by: Geir Ola Vaagland <geirola@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements section 5.3.4. of RFC6458, that is, support
for 'SCTP Send Information Structure' (SCTP_SNDINFO) which can be
placed into ancillary data cmsghdr structure for sendmsg() calls.
The sctp_sndinfo structure is defined as per RFC as below ...
struct sctp_sndinfo {
uint16_t snd_sid;
uint16_t snd_flags;
uint32_t snd_ppid;
uint32_t snd_context;
sctp_assoc_t snd_assoc_id;
};
... and supplied under cmsg_level IPPROTO_SCTP, cmsg_type
SCTP_SNDINFO, while cmsg_data[] contains struct sctp_sndinfo.
An sctp_sndinfo item always corresponds to the data in msg_iov.
Joint work with Daniel Borkmann.
Signed-off-by: Geir Ola Vaagland <geirola@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most (probably all) controllers can only deal with a single slave LE
connection at a time. This patch adds a counter for such connections so
that the number can be quickly looked up without iterating the
connections list.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We need to be able to track slave vs master LE connections in
hci_conn_hash, and to be able to do that we need to know the role of the
connection by the time hci_conn_add_has() is called. This means in
practice the hci_conn_add() call that creates the hci_conn_object.
This patch adds a new role parameter to hci_conn_add() function to give
the object its initial role value, and updates the callers to pass the
appropriate role to it. Since the function now takes care of
initializing both conn->role and conn->out values we can remove some
other unnecessary assignments.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
To make the code more understandable it makes sense to use the new HCI
defines for connection role instead of a "bool master" parameter. This
makes it immediately clear when looking at the function calls what the
last parameter is describing.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Having a dedicated u8 role variable in the hci_conn struct greatly
simplifies tracking of the role, since this is the native way that it's
represented on the HCI level.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
All HCI commands and events, including LE ones, use 0x00 for master role
and 0x01 for slave role. It makes therefore sense to add generic defines
for these instead of the current LE_CONN_ROLE_MASTER. Having clean
defines will also make it possible to provide simpler internal APIs.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Since Yuchung's 9b44190dc1 (tcp: refactor F-RTO), tcp_enter_cwr is always
called with set_ssthresh = 1. Thus, we can remove this argument from
tcp_enter_cwr. Further, as we remove this one, tcp_init_cwnd_reduction
is then always called with set_ssthresh = true, and so we can get rid of
this argument as well.
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This passes down NET_NAME_USER (or NET_NAME_ENUM) to alloc_netdev(),
for any device created over rtnetlink.
v9: restore reverse-christmas-tree order of local variables
Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added udp_tunnel.c which can contain some common functions for UDP
tunnels. The first function in this is udp_sock_create which is used
to open the listener port for a UDP tunnel.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code in neigh_sysctl_register() relies on a specific layout of
struct neigh_table, namely that the 'gc_*' variables are directly
following the 'parms' member in a specific order. The code, though,
expresses this in the most ugly way.
Get rid of the ugly casts and use the 'tbl' pointer to get a handle to
the table. This way we can refer to the 'gc_*' variables directly.
Similarly seen in the grsecurity patch, written by Brad Spengler.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use generic u64_stats_sync infrastructure to get proper 64bit stats,
even on 32bit arches, at no extra cost for 64bit arches.
Without this fix, 32bit arches can have some wrong counters at the time
the carry is propagated into upper word.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
An updater may interfer with the dumping of any of the object lists.
Fix this by using a per-net generation counter and use the
nl_dump_check_consistent() interface so the NLM_F_DUMP_INTR flag is set
to notify userspace that it has to restart the dump since an updater
has interfered.
This patch also replaces the existing consistency checking code in the
rule dumping path since it is broken. Basically, the value that the
dump callback returns is not propagated to userspace via
netlink_dump_start().
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
John W. Linville says:
====================
Please pull this batch of updates intended for the 3.17 stream...
This is primarily a Bluetooth pull. Gustavo says:
"A lot of patches to 3.17. The bulk of changes here are for LE support.
The 6loWPAN over Bluetooth now has it own module, we also have support for
background auto-connection and passive scanning, Bluetooth device address
provisioning, support for reading Bluetooth clock values and LE connection
parameters plus many many fixes."
The balance is just a pull of the wireless.git tree, to avoid some
pending merge problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The spinlock protecting the L2CAP ident number can be converted into
a mutex since the whole processing is run in a workqueue. So instead
of using a spinlock, just use a mutex here.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The support for LE encryption is optional. When encryption is not
supported then also do not enable the encryption related events.
This moves the event mask setting to the third initialization
stage to ensure that the LE features are available.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
This patch introduces a possibility for userspace to set various (so far
two) modes of generating addresses. This is useful for example for
NetworkManager because it can set the mode to NONE and take care of link
local addresses itself. That allow it to have the interface up,
monitoring carrier but still don't have any addresses on it.
One more use-case by Dan Williams:
<quote>
WWAN devices often have their LL address provided by the firmware of the
device, which sometimes refuses to respond to incorrect LL addresses
when doing DHCPv6 or IPv6 ND. The kernel cannot generate the correct LL
address for two reasons:
1) WWAN pseudo-ethernet interfaces often construct a fake MAC address,
or read a meaningless MAC address from the firmware. Thus the EUI64 and
the IPv6LL address the kernel assigns will be wrong. The real LL
address is often retrieved from the firmware with AT or proprietary
commands.
2) WWAN PPP interfaces receive their LL address from IPV6CP, not from
kernel assignments. Only after IPV6CP has completed do we know the LL
address of the PPP interface and its peer. But the kernel has already
assigned an incorrect LL address to the interface.
So being able to suppress the kernel LL address generation and assign
the one retrieved from the firmware is less complicated and more robust.
</quote>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no external user of the SCO timeout constants and thus
move them into net/bluetooth/sco.c where they are actuallu used.
In addition just remove SCO_CONN_IDLE_TIMEOUT since it is unused.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The SCO_DEFAULT_FLUSH_TO constant has been defined, but it is not
used anywhere and so just remove it.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
There exists no external user of struct sco_conn and thus move
it into the one place that is actually using it.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
There exists no external user of struct sco_pinfo and sco_pi and
thus move it into the one place that is actually using it.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The list of L2CAP fixed channels increased with newer versions of the
specification. This just updates the constants for it.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The internals of the HCI request framework should not be leaking to
its users. Move them all into net/bluetooth/hci_core.c and provide
a simple hci_req_pending helper function for the one user outside
the framework.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
There exists no external user of struct hci_pinfo and hci_pi and thus
move it into the one place that is actually using it.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
There is only single location using struct hci_sec_filter and with
that there is no point in putting this declaration into a global
header file. So move it right next to its user and make the code
a lot more simpler.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
All the HCI sockets and ioctl based definitions have been in a global
header file that also includes all the HCI protocol structures. To
make this a bit cleaner, move them into its own file.
This also adjusts fs/compat_ioctl.c to only include this new file
and not all the protocol structures that are not needed.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The Set Connectable/Discoverable mgmt handlers use a hci_request with a
proper callback to handle the HCI command sending. It makes therefore
little sense to have this extra function to be called from hci_event.c
for command failures.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Since the HCISETSCAN ioctl is the only non-mgmt user we care about for
setting the right discoverable state we can simply do the necessary
updates in the ioctl handler function instead. This then allows the
removal of the mgmt_discoverable function and should simplify that state
handling considerably.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The mgmt_connectable function has been used to ensure that the right
actions to HCI_CONNECTABLE are taken when the HCI_Write_Scan_Enable
command is triggered by something else than mgmt. The only other user
that we really care about is the HCISETSCAN ioctl code, so we can
actually more simply perform the needed changes there instead.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When the white list is in use the code would not update the
HCI_CONNECTABLE flag if it gets changed through the ioctl code (e.g.
hciconfig hci0 pscan). Since the flag is important for properly
accepting incoming connections add code to fix it up if necessary and
emit a New Settings mgmt event.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch extends the Add/Remove device commands by letting user space
pass BR/EDR addresses to them. The resulting entries get stored in a new
hdev->whitelist list. The idea is that we can now selectively accept
connections from devices in the list even though HCI_CONNECTABLE is not
set (the actual implementation of this is coming in a subsequent patch).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We already have several lists with struct bdaddr_list entries, and there
will be more in the future. Since the operations for adding, removing,
looking up and clearing entries in these lists are exactly the same it
doesn't make sense to define new functions for every single list. This
patch unifies the functions by passing the list_head to them instead of
a hci_dev pointer.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Authenticated Payload Timeout Expired event is valid for
controllers with BR/EDR Secure Connections support, but also for
LE only controllers supporting LE Ping feature. When either of them
is available enable this event. Previous it was not enabled when
the controller was only supporting LE operation.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Commit cb1ce2ef38 ("ipv6: Implement automatic flow label generation
on transmit") introduced ip6_make_flowlabel, while commit b73c3d0e4f
("net: Save TX flow hash in sock and set in skbuf on xmit") introduced
ip6_set_txhash.
ip6_set_tx_hash() uses sk_v6_daddr which references
__sk_common.skc_v6_daddr from struct sock_common, which is gated with
IS_ENABLED(CONFIG_IPV6).
ip6_make_flowlabel() uses the ipv6 member from struct net which is
also gated with IS_ENABLED(CONFIG_IPV6).
When CONFIG_IPV6 is disabled, we will hit a build failure that looks
like this when the compiler attempts inlining these functions:
CC [M] drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.o
In file included from include/net/inet_sock.h:27:0,
from include/net/ip.h:30,
from drivers/net/ethernet/broadcom/cnic.c:37:
include/net/ipv6.h: In function 'ip6_set_txhash':
include/net/sock.h:327:33: error: 'struct sock_common' has no member named 'skc_v6_daddr'
#define sk_v6_daddr __sk_common.skc_v6_daddr
^
include/net/ipv6.h:696:49: note: in expansion of macro 'sk_v6_daddr'
keys.dst = (__force __be32)ipv6_addr_hash(&sk->sk_v6_daddr);
^
In file included from include/net/inetpeer.h:15:0,
from include/net/route.h:28,
from include/net/ip.h:31,
from drivers/net/ethernet/broadcom/cnic.c:37:
include/net/ipv6.h: In function 'ip6_make_flowlabel':
include/net/ipv6.h:706:37: error: 'struct net' has no member named 'ipv6'
if (!flowlabel && (autolabel || net->ipv6.sysctl.auto_flowlabels)) {
^
Fixes: cb1ce2ef38 ("ipv6: Implement automatic flow label generation on transmit")
Fixes: b73c3d0e4f ("net: Save TX flow hash in sock and set in skbuf on xmit")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using pointers into sctp_cmd_seq_t.cmds[] lets the compiler generate much
better code.
Use the last entry first to optimise the overflow check.
Signed-off-by: David Laight <david.laight@aculab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even if memset() is inlined (as on x86) using it to zero the union
generates a memory word write of zero, followed by a write of the
smaller field, and then a read of the word.
As well as being a lot of instructions the sequence is unlikely to
be optimised by the store-load forward hardware so will be slow.
Instead allocate a field of the union that is the same size as the
entire union and write a zero value to it. The compiler will then
generate the required value in a register.
Zeroing the union shouldn't be necessary, but this patch series isn't
intended to have a behavioural change.
Signed-off-by: David Laight <david.laight@aculab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_init_cmd_seq() and sctp_next_cmd() are only called from one place.
The call sequence for sctp_add_cmd_sf() is likely to be longer than
the inlined code.
With sctp_add_cmd_sf() inlined the compiler can optimise repeated calls.
Signed-off-by: David Laight <david.laight@aculab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless-next 2014-07-03
Please pull this first batch of wireless updates intended for the
3.17 stream...
For the mac80211 bits, Johannes says:
"The biggest thing here is probably Arik's TDLS rework, beyond that we
have smaller improvements and features like David's scanning IE thing,
Luca's queue work, some CSA work, etc. Also your PID rate control
removal, of course."
For the iwlwifi bits, Emmanuel says:
"I have here a whole bunch of various things. Andy contributes
better debug prints for dvm specific flows and a module parameter to
completely disable power save for dvm. Andrei is sharing the premises
of his work on CSA - more to come. Eran and Liad keep on working
on the new devices. I have the regular amount of BT Coex stuff and
I continue to work on the firmware error report system adding more
debug capabilities. More to come on that subject too."
On top of that, there are some cleanups to the new rsi driver, some
continuing improvements to the rtl818x drivers, and the usual bundles
of updates to ath9k, b43, mwifiex, wil6210, and a few other bits here
and there.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the real advertising state is now tracked with its own flag we can
simply set/unset the HCI_ADVERTISING flag in the
set_advertising_complete function.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Having a single HCI_ADVERTISING flag is problematic since it tries to
track both the real advertising state and the corresponding mgmt
setting. To make the logic simpler and more reliable add a new flag that
only tracks the actual advertising state that has been written to the
controller.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds new mac802154 hw flags for transmit power, csma and
listen before transmit (lbt). These flags indicates that the transceiver
supports these features. If the flags are set and the driver doesn't
implement the necessary functions, then ieee802154_register_device
returns -ENOSYS "Function not implemented".
This patch merges also all at86rf230 operations into one operations structure
and set the right hw flags for the at86rf230 transceivers.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Automatically generate flow labels for IPv6 packets on transmit.
The flow label is computed based on skb_get_hash. The flow label will
only automatically be set when it is zero otherwise (i.e. flow label
manager hasn't set one). This supports the transmit side functionality
of RFC 6438.
Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
functionality per socket.
By default, auto flowlabels are disabled to avoid possible conflicts
with flow label manager, however if this feature proves useful we
may want to enable it by default.
It should also be noted that FreeBSD has already implemented automatic
flow labels (including the sysctl and socket option). In FreeBSD,
automatic flow labels default to enabled.
Performance impact:
Running super_netperf with 200 flows for TCP_RR and UDP_RR for
IPv6. Note that in UDP case, __skb_get_hash will be called for
every packet with explains slight regression. In the TCP case
the hash is saved in the socket so there is no regression.
Automatic flow labels disabled:
TCP_RR:
86.53% CPU utilization
127/195/322 90/95/99% latencies
1.40498e+06 tps
UDP_RR:
90.70% CPU utilization
118/168/243 90/95/99% latencies
1.50309e+06 tps
Automatic flow labels enabled:
TCP_RR:
85.90% CPU utilization
128/199/337 90/95/99% latencies
1.40051e+06
UDP_RR
92.61% CPU utilization
115/164/236 90/95/99% latencies
1.4687e+06
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In vxlan and OVS vport-vxlan call common function to get source port
for a UDP tunnel. Removed vxlan_src_port since the functionality is
now in udp_flow_src_port.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds udp_flow_src_port function which is intended to be
a common function that UDP tunnel implementations call to set the source
port. The source port is chosen so that a hash over the outer headers
(IP addresses and UDP ports) acts as suitable hash for the flow of the
encapsulated packet. In this manner, UDP encapsulation works with RSS
and ECMP based wrt the inner flow.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For a connected socket we can precompute the flow hash for setting
in skb->hash on output. This is a performance advantage over
calculating the skb->hash for every packet on the connection. The
computation is done using the common hash algorithm to be consistent
with computations done for packets of the connection in other states
where thers is no socket (e.g. time-wait, syn-recv, syn-cookies).
This patch adds sk_txhash to the sock structure. inet_set_txhash and
ip6_set_txhash functions are added which are called from points in
TCP and UDP where socket moves to established state.
skb_set_hash_from_sk is a function which sets skb->hash from the
sock txhash value. This is called in UDP and TCP transmit path when
transmitting within the context of a socket.
Tested: ran super_netperf with 200 TCP_RR streams over a vxlan
interface (in this case skb_get_hash called on every TX packet to
create a UDP source port).
Before fix:
95.02% CPU utilization
154/256/505 90/95/99% latencies
1.13042e+06 tps
Time in functions:
0.28% skb_flow_dissect
0.21% __skb_get_hash
After fix:
94.95% CPU utilization
156/254/485 90/95/99% latencies
1.15447e+06
Neither __skb_get_hash nor skb_flow_dissect appear in perf
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the hash computation located in __skb_get_hash to be a separate
function which takes flow_keys as input. This will allow flow hash
computation in other contexts where we only have addresses and ports.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Always store in snt_synack the time at which the server received the
first client SYN and attempted to send the first SYNACK.
Recent commit aa27fc501 ("tcp: tcp_v[46]_conn_request: fix snt_synack
initialization") resolved an inconsistency between IPv4 and IPv6 in
the initialization of snt_synack. This commit brings back the idea
from 843f4a55e (tcp: use tcp_v4_send_synack on first SYN-ACK), which
was going for the original behavior of snt_synack from the commit
where it was added in 9ad7c049f0 ("tcp: RFC2988bis + taking RTT
sample from 3WHS for the passive open side") in v3.1.
In addition to being simpler (and probably a tiny bit faster),
unconditionally storing the time of the first SYNACK attempt has been
useful because it allows calculating a performance metric quantifying
how long it took to establish a passive TCP connection.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Cc: Octavian Purdila <octavian.purdila@intel.com>
Cc: Jerry Chu <hkchu@google.com>
Acked-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we have both LE scanning and advertising simultaneously enabled we
need a way to tell hci_connect_le() in which role to initiate a
connection. This patch adds a new parameter to the function to give it
the necessary information. For auto-connect and mgmt_pair_device we
always use master role, whereas for L2CAP users (in practice sockets) we
use slave role whenever HCI_ADVERTISING is set and master role
otherwise.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The auth_type value which gets assigned to hci_conn->auth_type is
something that's only used for BR/EDR connections and is of no value for
LE connections. It makes therefore little sense to pass it to the
hci_connect_le() function. This patch removes the parameter from the
function.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When we establish connections as a consequence of receiving an
advertising report it makes no sense to wait the normal 20 second LE
connection timeout. This patch modifies the hci_connect_le function to
take an extra timeout value and uses a lower 2 second timeout for the
auto-connection case. This timeout is intentionally chosen to be just a
bit higher than the 1.28 second timeout that High Duty Cycle Advertising
uses.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This adds support for changing the public device address. This feature
is required by controllers that do not provide a public address and
have HCI_QUIRK_INVALID_BDADDR set.
Even if a controller has a public device address, this is useful when
an embedded system wants to use its own value. As long as the driver
provides the set_bdaddr callback, this allows changing the device
address before powering on the controller.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When the external configuration triggers the switch to a configured
controller, it means the setup needs to be run. Controllers that start
out unconfigured have only run limited set of HCI commands. This is
not enough for complete operation and thus run the setup procedure
before announcing the new controller index.
This introduces HCI_CONFIG flag as companion to HCI_SETUP flag. The
HCI_SETUP flag is only used once for the initial setup procedure. And
during that procedure hdev->setup driver callback is called. With the
new HCI_CONFIG the switch from unconfigured to configured state is
triggering the same setup procedure just without hdev->setup. This
is required since bringing a controller back to unconfigured state
from configured state is possible.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When calling Device Remove with BDADDR_ANY we should in a similar way
emit Device Removed events as we do when removing a single device. Since
we have to iterate the list and call device_removed() the dedicated
hci_conn_params_clear_enabled() is not really useful anymore. This patch
removes the helper function and does the event emission and list item
removal in a single loop.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
During the setup phase of a controller, the Bluetooth address will be
read and to have that original address available for later use, store
it as setup address.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When one or more of the missing configuration options change, then send
this even to all the other management interface clients.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The Set External Configuration management command allows for switching
between configured and unconfigured start if HCI_QURIK_EXTERNAL_CONFIG
is set by the transport driver.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When a controller requires external configuration, then setting this
quirk will allow indicating this.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When a Bluetooth controller does not have a valid public Bluetooth
address, then allow the driver to indicate this. If the quirk is
set, the Bluetooth core will switch to unconfigured state first
and will allow userspace to configure the address before starting
the full initialization of the controller.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The current existing device quirks are not documented. So instead of
spreading bits and pieces somewhere in the code, add proper comments
on where these quirks can be used and what behavior they change.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The public address configuration option is value 0x02 since the generic
external configuration is value 0x01. So adjust this accordingly and
also add the value 0x01 to the list.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
In some circumstances we need to look up entries in pend_le_conns and in
other in pend_le_reports. This patch converts the existing lookup
function for pend_le_conns to something that can be used for both lists.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Since there are no more users of this function we can simply go ahead
and remove it.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When powering off (hci_dev_do_close) we should clear both the
pend_le_reports and pend_le_conns types of entries. When powering on
respectively we should populate both lists. This patch converts the
hci_pend_le_conns_clear() function into hci_pend_le_actions_clear()
(which can now be static) and converts the restart_le_auto_conns()
function into restart_le_actions().
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Now that there are no-longer any users of the hci_pend_le_conn_del()
function we can simply go ahead and remove it.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
To simplify manipulation and lookup of hci_conn_params entries of the
type HCI_AUTO_CONN_REPORT it makes sense to store them in their own
list. The new action list_head in hci_conn_params is used for this
purpose.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In preparation to store also HCI_AUTO_CONN_REPORT entries in a list it
makes sense to convert the existing pend_le_conn list head of
hci_conn_params into a more generically named "action". This makes sense
because a parameter entry will never participate in more than one action
list.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Read Controller Configuration Information command allows retrieving
details about possible configurations option. The supported options are
returned and also the missing options (if any).
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Since the connection parameters are always a basis for adding entries to
hdev->pend_le_conns (so far of type bdaddr_list) it's simpler and more
efficient to have the parameters themselves be the entries in the
pend_le_conns list. We do this by adding another list_head to the
hci_conn_params struct.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
To be able to make the right choice of whether to start passive scanning
or to send out a mgmt_device_found event we need to know if there are
any devices in the le_conn_params list with the auto_connect value set
to HCI_AUTO_CONN_REPORT. This patch adds a counter for this kind of
devices.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This command allows to get the list of currently known controller that
are in unconfigured state.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When a controller in an unconfigured state gets removed, then send
Unconfigured Index Removed events.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When a controller is in unconfigured state it is currently hidden
from the management interface. This change now announces the new
controller with an Unconfigured Index Added event and allows clients
to easily detect the controller.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
With the new unconfigured controller state it is possible to provide a
fully functional HCI transport, but disable the higher level operations
that would normally happen. This way userspace can try to configure the
controller before releases the unconfigured state.
The internal state is represented by HCI_UNCONFIGURED. This replaces the
HCI_QUIRK_RAW_DEVICE quirk as internal state representation. This is now
a real state and drivers can use the quirk to actually trigger this
state. In the future this will allow a more fine grained switching from
unconfigured state to configured state for controller inititialization.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
There are more places that can take advantage of is_identity_address()
besides hci_core.c. This patch moves the function to hci_core.h and
gives it the appropriate hci_ prefix.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The calling functions of mgmt_new_conn_param have more information about
the parameters, such as whether the kernel is tracking them or not. It
makes therefore sense to have them pass an initial store_hint value to
the mgmt_new_conn_param function.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The caller of hci_le_conn_update is directly interested in knowing what
the best value is for the store_hint parameter of the corresponding
mgmt event. Since hci_le_conn_update knows whether there were stored
parameters that were updated or not we can have it return an initial
store_hint value to the caller.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch implements the new Load Connection Parameters mgmt command
that's intended to load the desired connection parameters for LE
devices.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The 0x00 action value of mgmt means "scan and report" but do not
connect. This is different from HCI_AUTO_CONN_DISABLED so we need a new
value for it.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In some circumstances we'll need to either clear only the enabled
parameters or only the disabled ones. This patch adds convenience
functions for this purpose.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We'll soon have specific clear functions for clearing enabled or
disabled entries, so rename the function that removes everything to
clear_all().
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Some embedded controllers allow the programming of a public address
and this adds vendor support for supporting OEM confguration of such
addresses.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
This patch introduces a new Mgmt event called "New Connection Parameter".
This event indicates to userspace the connection parameters values the
remote device requested.
The user may store these values and load them into kernel. This way, next
time a connection is established to that device, the kernel will use those
parameters values instead of the default ones.
This event is sent when the remote device requests new connection
parameters through connection parameter update procedure. This event is
not sent for slave connections.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Bluetooth 4.1 introduces a new LE meta event called "LE Remote
Connection Parameter Request" event. In order to the controller
sends this event to host, we should enable it during controller
initialization.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for LE Connection Parameters Request Link
Layer control procedure introduced in Core spec 4.1. This procedure
allows a Peripheral or Central to update the Link Layer connection
parameters of an established connection.
Regarding the acceptance of connection parameters, the LL procedure
follows the same approach of L2CAP procedure (see l2cap_conn_param_
update_req function). We accept any connection parameters values as
long as they are within the valid range.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
For the Block/Unblock Device mgmt commands we should only emit the
Blocked/Unblocked events on any socket except for the one which received
the command. The code was previously incorrectly trying to look up a
non-existent pending command and thereby ending up not skipping the
command socket for the event.
We can simplify the code a lot by simply sending the event directly from
the command handler functions. We have the reference to the command
socket available there which makes it easy to pass to the mgmt_event
function for skipping.
The only notable side-effect of this is that the old blacklisting
ioctl's no-longer cause mgmt events to be emitted, however as user space
versions using these ioctl's are not mgmt-aware this is acceptable.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There are certain subtle differences in behavior when we're encrypted
with the STK, such as allowing re-encryption even though the security
level stays the same. Because of this, add a flag to track whether we're
encrypted with an STK or not.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Device Found events of the management interface should indicate if
it is possible to connect to a remote device or if it is broadcaster
only advertising. To allow this differentation the Not Connectable flag
is introduced that will be set when it is known that a device can not
be connected.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Providing the flags parameter directly to mgmt_device_found function
makes the core simpler and more readable. With this it becomes a lot
easier to add new flags in the future.
This also changes hci_inquiry_cache_update to just return that flags
needed for mgmt_device_found since that is its only use for the two
return parameters anyway.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The connection interval parameter of hci_conn_params_set are always used
with the controller defaults. So just let hci_conn_params_add set the
controller default and not bother resetting them to controller defaults
every time the hci_conn_params_set is called.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When adding new connection parameters, it is useful to return either
the existing struct or the newly created one.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
This patch renames l2cap_check_conn_param() to hci_check_conn_params()
and moves it to hci_core.h so it can reused in others files. This helper
will be reused in the next patch.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In some cases it is useful to not overwrite connection parametes and
instead just create default ones if they don't exist. This function
does exactly that. hci_conn_params_add will allow to create new
default connection parameters. hci_conn_params_set will set the
values and also create new parameters if they don't exist.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Store the connection latency and supervision timeout default values
with all the other controller defaults. And when needed use them
for new connections.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When devices are added or removed, then make sure that events are send
out to all other clients so that the list of devices can be easily
tracked. This is especially important when external clients are
adding or removing devices within the auto-connection list.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
This allows adding or removing devices from the background scanning
list the kernel maintains. Device flagged for auto-connection will
be automatically connected if they are found.
The passive scanning required for auto-connection will be started
and stopped on demand.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When the slave updates the connection parameters, store also the
connection latency and supervision timeout information in the
internal list of connection parameters for known devices.
Having these values available allowes the auto-connection
procedure to use the correct values from the beginning without
having to request an update on every connection establishment.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
This patch implements support for the Get Clock Information mgmt
command. This is done by performing one or two HCI_Read_Clock commands
and creating the response from the stored values in the hci_dev and
hci_conn structs.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for storing the local and piconet clock values
from the HCI_Read_Clock command response to the hci_dev and hci_conn
structs. This will be later used in another patch to implement support
for the Get Clock Info mgmt command.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When the connection is in master role and it is going to be
disconnected based on the disconnection timeout, then send
the HCI_Read_Clock_Offset command in an attempt to update the
clock offset value in the inquiry cache.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The smp_conn member of struct hci_conn was simply a pointer to the
l2cap_conn object. Since we already have hcon->l2cap_data that points to
the same thing there's no need to have this second variable. This patch
removes it and changes the single place that was using it to use
hcon->l2cap_data instead.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch replaces the hard-coded value in hci_bdaddr_is_rpa() helper
by the corresponding macro ADDR_LE_DEV_RANDOM.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Since the link_mode member of the hci_conn struct is a bit field and we
already have a flags member as well it makes sense to merge these two
together. This patch moves all used link_mode bits into corresponding
flags. To keep backwards compatibility with user space we still need to
provide a get_link_mode() helper function for the ioctl's that expect a
link_mode style value.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
To pave the way for actively using debug keys for pairing this patch
adds a new HCI_USE_DEBUG_KEYS flag for the purpose. When the flag is set
we issue a HCI_Write_SSP_Debug mode whenever HCI_Write_SSP_Mode(0x01)
has been issued as well as before issuing a HCI_Write_SSP_Mode(0x00)
command.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There's no point in having boolean variables in the hci_conn struct
since it already has a flags member. This patch converts the flush_key
member into a proper flag.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We're planning to add a flag to actively use debug keys in addition to
simply just accepting them, which makes the current generically named
DEBUG_KEYS flag a bit confusing. Since the flag in practice affects
whether the kernel keeps debug keys around or not rename it to
HCI_KEEP_DEBUG_KEYS.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There are two callers of hci_add_link_key(). The first one is the HCI
Link Key Notification event and the second one the mgmt code that
receives a list of link keys from user space. Previously we've had the
hci_add_link_key() function being responsible for also emitting a mgmt
signal but for the latter use case this should not happen. Because of
this a rather awkward new_key paramter has been passed to the function.
This patch moves the mgmt event sending out from the hci_add_link_key()
function, thereby making the code a bit more understandable.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
By returning the added (or updated) key we pave the way for further
refactoring (in subsequent patches) that allows moving the mgmt event
sending out from this function (and thereby removal of the awkward
new_key parameter).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When the LE controller changes its connection parameters, it will send
a connection parameter update event. Make sure that the new set of
parameters are stored in hci_conn struct and thus will properly update
the previous values retrieved from the connection complete event.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The LE connection parameters are needed later on to be able to decide
if it is required to trigger connection update procedures. So when the
connection has been established successfully, store the current used
parameters in hci_conn struct.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Create a CoC dynamically instead of one fixed channel for communication
to peer devices.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The highly optimized TX path for L2CAP channels and its fragmentation
within the HCI ACL packets requires to copy data from user provided
IO vectors and also kernel provided memory buffers.
This patch allows channel clients to provide a memcpy_fromiovec callback
to keep this optimized behavior, but adapt it to kernel vs user memory
for the TX path. For all kernel internal L2CAP channels, a default
implementation is provided that can be referenced.
In case of A2MP, this fixes a long-standing issue with wrongly accessing
kernel memory as user memory.
This patch originally by Marcel Holtmann.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
All the special settings configured via debugfs are either developer
only options or temporary solutions. To not clutter the standard flags,
move them to their own dbg_flags entry.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
These defines were probably put in to track authenticated vs
unauthenticated LTKs, however since the LTK struct has a separate
boolean authenticated member these were never used.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The LTK type has really nothing to do with HCI so it makes more sense to
have these in smp.h than hci.h. This patch moves the defines to smp.h
and removes the HCI_ prefix in the same go.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We never store the "master" type of STKs since we request encryption
directly with them so we only need one STK type (the one that's
looked-up on the slave side). Simply remove the unnecessary define and
rename the _SLAVE one to the shorter form.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Since the whole HCI command, event and data packet processing has been
migrated to use workqueues instead of tasklets, it makes sense to use
struct delayed_work instead of struct timer_list for the timeout
handling. This patch converts the hdev->cmd_timer to use workqueue
as well.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When allocating the L2CAP SKB for transmission, provide the upper layers
with a clear distinction on what is the header and what is the body
portion of the SKB.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The struct l2cap_ctrl fields are wasting an unsigned int when all the
bits can fit into an __u8 field.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The SKB for L2CAP sockets are all allocated in a central callback
in the socket support. Instead of having to pass around the socket
priority all the time, assign it to skb->priority when actually
allocating the SKB.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The struct l2cap_ops field should not allow any modifications and thus
it is better declared as const.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
RFC4960, section 8.3 says:
On an idle destination address that is allowed to heartbeat,
it is recommended that a HEARTBEAT chunk is sent once per RTO
of that destination address plus the protocol parameter
'HB.interval', with jittering of +/- 50% of the RTO value,
and exponential backoff of the RTO if the previous HEARTBEAT
is unanswered.
Currently, we calculate jitter via sctp_jitter() function first,
and then add its result to the current RTO for the new timeout:
TMO = RTO + (RAND() % RTO) - (RTO / 2)
`------------------------^-=> sctp_jitter()
Instead, we can just simplify all this by directly calculating:
TMO = (RTO / 2) + (RAND() % RTO)
With the help of prandom_u32_max(), we don't need to open code
our own global PRNG, but can instead just make use of the per
CPU implementation of prandom with better quality numbers. Also,
we can now spare us the conditional for divide by zero check
since no div or mod operation needs to be used. Note that
prandom_u32_max() won't emit the same result as a mod operation,
but we really don't care here as we only want to have a random
number scaled into RTO interval.
Note, exponential RTO backoff is handeled elsewhere, namely in
sctp_do_8_2_transport_strike().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The max_dsize attribute in ctl_table for lowpan_frags_ns_ctl_table is
configured with integer accessing methods. This patch change the
max_dsize attribute to int to avoid a possible buffer overflow.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_dst_cache has __rcu annotation, so we need a cast to avoid
following sparse error :
include/net/sock.h:1774:19: warning: incorrect type in initializer (different address spaces)
include/net/sock.h:1774:19: expected struct dst_entry [noderef] <asn:4>*__ret
include/net/sock.h:1774:19: got struct dst_entry *dst
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 7f50236153 ("ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix")
Signed-off-by: David S. Miller <davem@davemloft.net>
When an UDP application switches from AF_INET to AF_INET6 sockets, we
have a small performance degradation for IPv4 communications because of
extra cache line misses to access ipv6only information.
This can also be noticed for TCP listeners, as ipv6_only_sock() is also
used from __inet_lookup_listener()->compute_score()
This is magnified when SO_REUSEPORT is used.
Move ipv6only into struct sock_common so that it is available at
no extra cost in lookups.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have two different ways to handle changes to sk->sk_dst
First way (used by TCP) assumes socket lock is owned by caller, and use
no extra lock : __sk_dst_set() & __sk_dst_reset()
Another way (used by UDP) uses sk_dst_lock because socket lock is not
always taken. Note that sk_dst_lock is not softirq safe.
These ways are not inter changeable for a given socket type.
ipv4_sk_update_pmtu(), added in linux-3.8, added a race, as it used
the socket lock as synchronization, but users might be UDP sockets.
Instead of converting sk_dst_lock to a softirq safe version, use xchg()
as we did for sk_rx_dst in commit e47eb5dfb2 ("udp: ipv4: do not use
sk_dst_lock from softirq context")
In a follow up patch, we probably can remove sk_dst_lock, as it is
only used in IPv6.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Fixes: 9cb3a50c5f ("ipv4: Invalidate the socket cached route on pmtu events if possible")
Signed-off-by: David S. Miller <davem@davemloft.net>
Create tcp_conn_request and remove most of the code from
tcp_v4_conn_request and tcp_v6_conn_request.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add queue_add_hash member to tcp_request_sock_ops so that we can later
unify tcp_v4_conn_request and tcp_v6_conn_request.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add mss_clamp member to tcp_request_sock_ops so that we can later
unify tcp_v4_conn_request and tcp_v6_conn_request.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create a new tcp_request_sock_ops method to unify the IPv4/IPv6
signature for tcp_v[46]_send_synack. This allows us to later unify
tcp_v4_rtx_synack with tcp_v6_rtx_synack and tcp_v4_conn_request with
tcp_v4_conn_request.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
More work in preparation of unifying tcp_v4_conn_request and
tcp_v6_conn_request: indirect the init sequence calls via the
tcp_request_sock_ops.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create wrappers with same signature for the IPv4/IPv6 request routing
calls and use these wrappers (via route_req method from
tcp_request_sock_ops) in tcp_v4_conn_request and tcp_v6_conn_request
with the purpose of unifying the two functions in a later patch.
We can later drop the wrapper functions and modify inet_csk_route_req
and inet6_cks_route_req to use the same signature.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the specific IPv4/IPv6 cookie sequence initialization to a new
method in tcp_request_sock_ops in preparation for unifying
tcp_v4_conn_request and tcp_v6_conn_request.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the specific IPv4/IPv6 intializations to a new method in
tcp_request_sock_ops in preparation for unifying tcp_v4_conn_request
and tcp_v6_conn_request.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since pktops is only used for IPv6 only and opts is used for IPv4
only, we can move these fields into a union and this allows us to drop
the inet6_reqsk_alloc function as after this change it becomes
equivalent with inet_reqsk_alloc.
This patch also fixes a kmemcheck issue in the IPv6 stack: the flags
field was not annotated after a request_sock was allocated.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the generic plain text packet loggger for bridged packets.
It routes the logging message to the real protocol packet logger.
I decided not to refactor the ebt_log code for two reasons:
1) The ebt_log output is not consistent with the IPv4 and IPv6
Netfilter packet loggers. The output is different for no good
reason and it adds redundant code to handle packet logging.
2) To avoid breaking backward compatibility for applications
outthere that are parsing the specific ebt_log output, the ebt_log
output has been left as is. So only nftables will use the new
consistent logging format for logged bridged packets.
More decisions coming in this patch:
1) This also removes ebt_log as default logger for bridged packets.
Thus, nf_log_packet() routes packet to this new packet logger
instead. This doesn't break backward compatibility since
nf_log_packet() is not used to log packets in plain text format
from anywhere in the ebtables/netfilter bridge code.
2) The new bridge packet logger also performs a lazy request to
register the real IPv4, ARP and IPv6 netfilter packet loggers.
If the real protocol logger is no available (not compiled or the
module is not available in the system, not packet logging happens.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Before this patch, the nf_loginfo parameter specified the logging
configuration in case the specified default logger was loaded. This
patch updates the semantics of the nf_loginfo parameter in
nf_log_packet() which now indicates the logger that you explicitly
want to use.
Thus, nf_log_packet() is exposed as an unified interface which
internally routes the log message to the corresponding logger type
by family.
The module dependencies are expressed by the new nf_logger_find_get()
and nf_logger_put() functions which bump the logger module refcount.
Thus, you can not remove logger modules that are used by rules anymore.
Another important effect of this change is that the family specific
module is only loaded when required. Therefore, xt_LOG and nft_log
will just trigger the autoload of the nf_log_{ip,ip6} modules
according to the family.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The plain text logging is currently embedded into the xt_LOG target.
In order to be able to use the plain text logging from nft_log, as a
first step, this patch moves the family specific code to the following
files and Kconfig symbols:
1) net/ipv4/netfilter/nf_log_ip.c: CONFIG_NF_LOG_IPV4
2) net/ipv6/netfilter/nf_log_ip6.c: CONFIG_NF_LOG_IPV6
3) net/netfilter/nf_log_common.c: CONFIG_NF_LOG_COMMON
These new modules will be required by xt_LOG and nft_log. This patch
is based on original patch from Arturo Borrero Gonzalez.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When IP route cache had been removed in linux-3.6, we broke assumption
that dst entries were all freed after rcu grace period. DST_NOCACHE
dst were supposed to be freed from dst_release(). But it appears
we want to keep such dst around, either in UDP sockets or tunnels.
In sk_dst_get() we need to make sure dst refcount is not 0
before incrementing it, or else we might end up freeing a dst
twice.
DST_NOCACHE set on a dst does not mean this dst can not be attached
to a socket or a tunnel.
Then, before actual freeing, we need to observe a rcu grace period
to make sure all other cpus can catch the fact the dst is no longer
usable.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dormando <dormando@rydia.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves Eric Dumazet's log buffer implementation from the
xt_log.h header file to the core net/netfilter/nf_log.c. This also
includes the renaming of the structure and functions to avoid possible
undesired namespace clashes.
This change allows us to use it from the arp and bridge packet logging
implementation in follow up patches.
Now that legacy ulog targets are not available anymore in the tree, we
can have up to two possible loggers:
1) The plain text logging via kernel logging ring.
2) The nfnetlink_log infrastructure which delivers log messages
to userspace.
This patch replaces the list of loggers by an array of two pointers
per family for each possible logger and it also introduces a new field
to the nf_logger structure which indicates the position in the logger
array (based on the logger type).
This prepares a follow up patch that consolidates the nf_log_packet()
interface by allowing to specify the logger as parameter.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This brings the (per-conntrack) ecache extension back to 24 bytes in size
(was 152 byte on x86_64 with lockdep on).
When event delivery fails, re-delivery is attempted via work queue.
Redelivery is attempted at least every 0.1 seconds, but can happen
more frequently if userspace is not congested.
The nf_ct_release_dying_list() function is removed.
With this patch, ownership of the to-be-redelivered conntracks
(on-dying-list-with-DYING-bit not yet set) is with the work queue,
which will release the references once event is out.
Joint work with Pablo Neira Ayuso.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Driver is now responsible for veryfing if the
switch is possible.
Since this is inherently tricky driver may decide
to disconnect an interface later with
cfg80211_stop_iface().
This doesn't mean driver can accept everything. It
should do it's best to verify requests and reject
them as soon as possible.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Multi-vif in-place reservations happen when
it is impossible to allocate more channel contexts
as indicated by interface combinations.
Such reservations are not finalized until all
assigned interfaces are ready.
This still doesn't handle all possible cases
(i.e. degradation of number of channels) properly.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Split sched scan IEs to band specific and not band specific
blocks. Common IEs blocks may be sent to the FW once per command,
instead of per band.
This allows optimization of size of the command, which may be
required by some drivers (eg. iwlmvm with newer firmware version).
As this changes the mac80211 API, update all drivers to use the
new version correctly, even if they don't (yet) make use of the
split data.
Signed-off-by: David Spinadel <david.spinadel@intel.com>
Reviewed-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some drivers (such as iwlmvm) can handle multiple bands in a single
HW scan request. Add a HW flag to indicate that the driver support
this. To hold the required data, create a separate structure for
HW scan request that holds cfg scan request and data about
different parts of the scan IEs.
As this changes the mac80211 API, update all drivers using it to
use the correct new function type/argument.
Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
give any information about IPv4 or IPv6.
This patch adds new member, n_proto, to struct flow_keys. Which records the
IP layer type. i.e IPv4 or IPv6.
This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow.
Adding new member to flow_keys increases the struct size by around 4 bytes.
This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
qdisc_cb_private_validate()
So increase data size by 4
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After sending a TDLS discovery-request, we expect a reply to arrive on
the AP's channel. We must stay on the channel (no PSM, scan, etc.), since
a TDLS setup-response is a direct packet not buffered by the AP.
Add a new mac80211 driver callback to allow discovery session protection.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Write a mac80211 to the cfg80211 API for requesting a userspace TDLS
operation. Define TDLS specific reason codes that can be used here.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The TDLS initiator is set once during link setup. If determines the
address ordering in the link identifier IE.
Fix dependent drivers - mwifiex and mac80211.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently, cfg80211 tries to implement ethtool, but that doesn't
really scale well, with all the different operations. Make the
lower-level driver responsible for it, which currently only has
an effect on mac80211. It will similarly not scale well at that
level though, since mac80211 also has many drivers.
To cleanly implement this in mac80211, introduce a new file and
move some code to appropriate places.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
ir_mark initialization is done for both TCP v4 and v6, move it in the
common tcp_openreq_init function.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since 4fefee5 ("netfilter: nf_tables: allow to delete several objects
from a batch"), every new rule bumps the chain use counter. However,
this is limited to 16 bits, which means that it will overrun after
2^16 rules.
Use a u32 chain counter and check for overflows (just like we do for
table objects).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In __udp_lib_checksum_complete check if checksum is being done over all
the data (len is equal to skb->len) and if it is call
__skb_checksum_complete instead of __skb_checksum_complete_head. This
allows checksum to be saved in checksum complete.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.
2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
Benniston.
3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
Mork.
4) BPF now has a "random" opcode, from Chema Gonzalez.
5) Add more BPF documentation and improve test framework, from Daniel
Borkmann.
6) Support TCP fastopen over ipv6, from Daniel Lee.
7) Add software TSO helper functions and use them to support software
TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia.
8) Support software TSO in fec driver too, from Nimrod Andy.
9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.
10) Handle broadcasts more gracefully over macvlan when there are large
numbers of interfaces configured, from Herbert Xu.
11) Allow more control over fwmark used for non-socket based responses,
from Lorenzo Colitti.
12) Do TCP congestion window limiting based upon measurements, from Neal
Cardwell.
13) Support busy polling in SCTP, from Neal Horman.
14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.
15) Bridge promisc mode handling improvements from Vlad Yasevich.
16) Don't use inetpeer entries to implement ID generation any more, it
performs poorly, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
tcp: fixing TLP's FIN recovery
net: fec: Add software TSO support
net: fec: Add Scatter/gather support
net: fec: Increase buffer descriptor entry number
net: fec: Factorize feature setting
net: fec: Enable IP header hardware checksum
net: fec: Factorize the .xmit transmit function
bridge: fix compile error when compiling without IPv6 support
bridge: fix smatch warning / potential null pointer dereference
via-rhine: fix full-duplex with autoneg disable
bnx2x: Enlarge the dorq threshold for VFs
bnx2x: Check for UNDI in uncommon branch
bnx2x: Fix 1G-baseT link
bnx2x: Fix link for KR with swapped polarity lane
sctp: Fix sk_ack_backlog wrap-around problem
net/core: Add VF link state control policy
net/fsl: xgmac_mdio is dependent on OF_MDIO
net/fsl: Make xgmac_mdio read error message useful
net_sched: drr: warn when qdisc is not work conserving
...
The DRR scheduler requires that items on the active list are work
conserving, i.e. do not hold on to skbs for throttling purposes, etc.
Attaching e.g. tbf renders DRR useless because all other classes on the
active list are delayed as well.
So, warn users that this configuration won't work as expected; we
already do this in couple of other qdiscs, see e.g.
commit b00355db3f
('pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler')
The 'const' change is needed to avoid compiler warning ("discards 'const'
qualifier from pointer target type").
tested with:
drr_hier() {
parent=$1
classes=$2
for i in $(seq 1 $classes); do
classid=$parent$(printf %x $i)
tc class add dev eth0 parent $parent classid $classid drr
tc qdisc add dev eth0 parent $classid tbf rate 64kbit burst 256kbit limit 64kbit
done
}
tc qdisc add dev eth0 root handle 1: drr
drr_hier 1: 32
tc filter add dev eth0 protocol all pref 1 parent 1: handle 1 flow hash keys dst perturb 1 divisor 32
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Be more precise in transport path selection and use ktime
helpers instead of jiffies to compare and pick the better
primary and secondary recently used transports. This also
avoids any side-effects during a possible roll-over, and
could lead to better path decision-making.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_fragment can be called from process context (from tso_fragment).
Add a new gfp parameter to allow it to preserve atomic memory if
possible.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Reviewed-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that 3.15 is released, this merges the 'next' branch into 'master',
bringing us to the normal situation where my 'master' branch is the
merge window.
* accumulated work in next: (6809 commits)
ufs: sb mutex merge + mutex_destroy
powerpc: update comments for generic idle conversion
cris: update comments for generic idle conversion
idle: remove cpu_idle() forward declarations
nbd: zero from and len fields in NBD_CMD_DISCONNECT.
mm: convert some level-less printks to pr_*
MAINTAINERS: adi-buildroot-devel is moderated
MAINTAINERS: add linux-api for review of API/ABI changes
mm/kmemleak-test.c: use pr_fmt for logging
fs/dlm/debug_fs.c: replace seq_printf by seq_puts
fs/dlm/lockspace.c: convert simple_str to kstr
fs/dlm/config.c: convert simple_str to kstr
mm: mark remap_file_pages() syscall as deprecated
mm: memcontrol: remove unnecessary memcg argument from soft limit functions
mm: memcontrol: clean up memcg zoneinfo lookup
mm/memblock.c: call kmemleak directly from memblock_(alloc|free)
mm/mempool.c: update the kmemleak stack trace for mempool allocations
lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations
mm: introduce kmemleak_update_trace()
mm/kmemleak.c: use %u to print ->checksum
...
Added VXLAN link configuration for sending UDP checksums, and allowing
TX and RX of UDP6 checksums.
Also, call common iptunnel_handle_offloads and added GSO support for
checksums.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call gso_make_checksum. This should have the benefit of using a
checksum that may have been previously computed for the packet.
This also adds NETIF_F_GSO_GRE_CSUM to differentiate devices that
offload GRE GSO with and without the GRE checksum offloaed.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added udp_set_csum and udp6_set_csum functions to set UDP checksums
in packets. These are for simple UDP packets such as those that might
be created in UDP tunnels.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull trivial tree changes from Jiri Kosina:
"Usual pile of patches from trivial tree that make the world go round"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
staging: go7007: remove reference to CONFIG_KMOD
aic7xxx: Remove obsolete preprocessor define
of: dma: doc fixes
doc: fix incorrect formula to calculate CommitLimit value
doc: Note need of bc in the kernel build from 3.10 onwards
mm: Fix printk typo in dmapool.c
modpost: Fix comment typo "Modules.symvers"
Kconfig.debug: Grammar s/addition/additional/
wimax: Spelling s/than/that/, wording s/destinatary/recipient/
aic7xxx: Spelling s/termnation/termination/
arm64: mm: Remove superfluous "the" in comment
of: Spelling s/anonymouns/anonymous/
dma: imx-sdma: Spelling s/determnine/determine/
ath10k: Improve grammar in comments
ath6kl: Spelling s/determnine/determine/
of: Improve grammar for of_alias_get_id() documentation
drm/exynos: Spelling s/contro/control/
radio-bcm2048.c: fix wrong overflow check
doc: printk-formats: do not mention casts for u64/s64
doc: spelling error changes
...
Conflicts:
include/net/inetpeer.h
net/ipv6/output_core.c
Changes in net were fixing bugs in code removed in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery
is disabled.
Note how GSO/TSO packets do not have monotonically incrementing ID.
06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396)
06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212)
06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972)
06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292)
06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764)
It appears I introduced this bug in linux-3.1.
inet_getid() must return the old value of peer->ip_id_count,
not the new one.
Lets revert this part, and remove the prevention of
a null identification field in IPv6 Fragment Extension Header,
which is dubious and not even done properly.
Fixes: 87c48fa3b4 ("ipv6: make fragment identifications less predictable")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless-next 2014-06-02
Please pull this remaining batch of updates intended for the 3.16 stream...
For the mac80211 bits, Johannes says:
"The remainder for -next right now is mostly fixes, and a handful of
small new things like some CSA infrastructure, the regdb script mW/dBm
conversion change and sending wiphy notifications."
For the bluetooth bits, Gustavo says:
"Some more patches for 3.16. There is nothing really special here, just a
bunch of clean ups, fixes plus some small improvements. Please pull."
For the nfc bits, Samuel says:
"We have:
- Felica (Type3) tags support for trf7970a
- Type 4b tags support for port100
- st21nfca DTS typo fix
- A few sparse warning fixes"
For the atheros bits, Kalle says:
"Ben added support for setting antenna configurations. Michal improved
warm reset so that we would not need to fall back to cold reset that
often, an issue where ath10k stripped protected flag while in monitor
mode and made module initialisation asynchronous to fix the problems
with firmware loading when the driver is linked to the kernel.
Luca removed unused channel_switch_beacon callbacks both from ath9k and
ath10k. Marek fixed Protected Management Frames (PMF) when using Action
Frames. Also we had other small fixes everywhere in the driver."
Along with that, there are a handful of updates to a variety
of drivers. This includes updates to at76c50x-usb, ath9k, b43,
brcmfmac, mwifiex, rsi, rtlwifi, and wil6210.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ideally, we would need to generate IP ID using a per destination IP
generator.
linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.
1) each inet_peer struct consumes 192 bytes
2) inetpeer cache uses a binary tree of inet_peer structs,
with a nominal size of ~66000 elements under load.
3) lookups in this tree are hitting a lot of cache lines, as tree depth
is about 20.
4) If server deals with many tcp flows, we have a high probability of
not finding the inet_peer, allocating a fresh one, inserting it in
the tree with same initial ip_id_count, (cf secure_ip_id())
5) We garbage collect inet_peer aggressively.
IP ID generation do not have to be 'perfect'
Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.
We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.
ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)
secure_ip_id() and secure_ipv6_id() no longer are needed.
Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
This small patchset contains three accumulated Netfilter/IPVS updates,
they are:
1) Refactorize common NAT code by encapsulating it into a helper
function, similarly to what we do in other conntrack extensions,
from Florian Westphal.
2) A minor format string mismatch fix for IPVS, from Masanari Iida.
3) Add quota support to the netfilter accounting infrastructure, now
you can add quotas to accounting objects via the nfnetlink interface
and use them from iptables. You can also listen to quota
notifications from userspace. This enhancement from Mathieu Poirier.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In some cases, when the driver is already using all the channel
contexts it can handle at once, we have to do an in-place switch
(ie. we cannot afford using an extra context temporarily for the
transaction). But some drivers may not support switching the channel
context assigned to a vif on the fly (ie. without unassigning and
assigning it) while others may only work if the context is changed on
the fly, without unassigning it first.
To allow these different scenarios, add a new driver operation that
let's the driver decide how to handle an in-place switch.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Conflicts:
drivers/net/bonding/bond_alb.c
drivers/net/ethernet/altera/altera_msgdma.c
drivers/net/ethernet/altera/altera_sgdma.c
net/ipv6/xfrm6_output.c
Several cases of overlapping changes.
The xfrm6_output.c has a bug fix which overlaps the renaming
of skb->local_df to skb->ignore_df.
In the Altera TSE driver cases, the register access cleanups
in net-next overlapped with bug fixes done in net.
Similarly a bug fix to send ALB packets in the bonding driver using
the right source address overlaps with cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Define separate fields in the sock structure for configuring disabling
checksums in both TX and RX-- sk_no_check_tx and sk_no_check_rx.
The SO_NO_CHECK socket option only affects sk_no_check_tx. Also,
removed UDP_CSUM_* defines since they are no longer necessary.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It doesn't seem like an protocols are setting anything other
than the default, and allowing to arbitrarily disable checksums
for a whole protocol seems dangerous. This can be done on a per
socket basis.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the mgmt level we have a key type parameter which currently accepts
two possible values: 0x00 for unauthenticated and 0x01 for
authenticated. However, in the internal struct smp_ltk representation we
have an explicit "authenticated" boolean value.
To make this distinction clear, add defines for the possible mgmt values
and do conversion to and from the internal authenticated value.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2014-05-22
This is the last ipsec pull request before I leave for
a three weeks vacation tomorrow. David, can you please
take urgent ipsec patches directly into net/net-next
during this time?
I'll continue to run the ipsec/ipsec-next trees as soon
as I'm back.
1) Simplify the xfrm audit handling, from Tetsuo Handa.
2) Codingstyle cleanup for xfrm_output, from abian Frederick.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Although the implementation probably needs a lot of work, this initial API
allows to implement software TSO in mvneta and mv643xx_eth drivers in a not
so intrusive way.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the NFC pull request for 3.16. We have:
- STMicroeectronics st21nfca support. The st21nfca is an HCI chipset and
thus relies on the HCI stack. This submission provides support for tag
redaer/writer mode (including Type 5) and device tree bindings.
- PM runtime support and a bunch of bug fixes for TI's trf7970a.
- Device tree support for NXP's pn544. Legacy platform data support is
obviously kept intact.
- NFC Tag type 4B support to the NFC Digital stack.
- SOCK_RAW type support to the raw NFC socket, and allow NCI
sniffing from that. This can be extended to report HCI frames and also
proprietarry ones like e.g. the pn533 ones.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTepRlAAoJEIqAPN1PVmxKnF0P/RvfrZs6CbGNJC+dkEbk90p1
nsngy4+4MmPwJYVzObnLz4Br0k1kmFKiOKske6drjMpgzDWeuQelw3B7bd3FYfxD
YkQsc5RC984xrDoDH5pn8mA6VJqmn7whrmcibTYAixrDqTvo8gw6uja4ryAnSdZm
n7cRbh/A5F/sa7O4mPA0bCTdp4jAS/vOP9rGFDOth/b5yJVs99XmC+AZp/Ad9BUx
+/osWGmBV5jshtX7aPTSxIQB4BUaP/lP1DW8yF5whKDjsHC9QyJcAtw9HfZ4tv2h
YNteZZ8yjM+rSjnDw/LvDc2Gp8Z8P1GYf8D3QN3cWhw1ZvXi7CnqKjEnm41sbfaH
L5esIfsRBUdmk6Ika7zALqmOQFI3PzH+ag96punl29qb2gyBDRSnXKVLirv3xxFG
h7vYtQL43Rosn/4pSilRbYReRwyKbSCxW3un/tUJy0Faafs6q+9oMC2aWbIfTT6l
40n4H9EmzYy2OaaXSFckiIIYYgVDAji8GLXTf+dPHb+NrH3QQOR3m27WzHc4rmYk
kUrv0lKoFswA+VLlIcJTrSKNF21FDjwuImzIWiPz6Fx/+rWJ0b4GlQyIynD72LpR
2LkUhTrxuRuRtxVCtvTdkPlL6Bdp3HO7t4qZ0EirgnpmGK6NScBgABoqFJSbz9uS
UUvZbHVIjLrDU9zzoyz8
=cSl+
-----END PGP SIGNATURE-----
Merge tag 'nfc-next-3.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
Samuel Ortiz <sameo@linux.intel.com> says:
"NFC: 3.16: First pull request
This is the NFC pull request for 3.16. We have:
- STMicroeectronics st21nfca support. The st21nfca is an HCI chipset and
thus relies on the HCI stack. This submission provides support for tag
redaer/writer mode (including Type 5) and device tree bindings.
- PM runtime support and a bunch of bug fixes for TI's trf7970a.
- Device tree support for NXP's pn544. Legacy platform data support is
obviously kept intact.
- NFC Tag type 4B support to the NFC Digital stack.
- SOCK_RAW type support to the raw NFC socket, and allow NCI
sniffing from that. This can be extended to report HCI frames and also
proprietarry ones like e.g. the pn533 ones."
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Pablo Neira Ayuso says:
====================
Netfilter/nftables updates for net-next
The following patchset contains Netfilter/nftables updates for net-next,
most relevantly they are:
1) Add set element update notification via netlink, from Arturo Borrero.
2) Put all object updates in one single message batch that is sent to
kernel-space. Before this patch only rules where included in the batch.
This series also introduces the generic transaction infrastructure so
updates to all objects (tables, chains, rules and sets) are applied in
an all-or-nothing fashion, these series from me.
3) Defer release of objects via call_rcu to reduce the time required to
commit changes. The assumption is that all objects are destroyed in
reverse order to ensure that dependencies betweem them are fulfilled
(ie. rules and sets are destroyed first, then chains, and finally
tables).
4) Allow to match by bridge port name, from Tomasz Bursztyka. This series
include two patches to prepare this new feature.
5) Implement the proper set selection based on the characteristics of the
data. The new infrastructure also allows you to specify your preferences
in terms of memory and computational complexity so the underlying set
type is also selected according to your needs, from Patrick McHardy.
6) Several cleanup patches for nft expressions, including one minor possible
compilation breakage due to missing mark support, also from Patrick.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Experience with the recent e114a710aa ("tcp: fix cwnd limited
checking to improve congestion control") has shown that there are
common cases where that commit can cause cwnd to be much larger than
necessary. This leads to TSO autosizing cooking skbs that are too
large, among other things.
The main problems seemed to be:
(1) That commit attempted to predict the future behavior of the
connection by looking at the write queue (if TSO or TSQ limit
sending). That prediction sometimes overestimated future outstanding
packets.
(2) That commit always allowed cwnd to grow to twice the number of
outstanding packets (even in congestion avoidance, where this is not
needed).
This commit improves both of these, by:
(1) Switching to a measurement-based approach where we explicitly
track the largest number of packets in flight during the past window
("max_packets_out"), and remember whether we were cwnd-limited at the
moment we finished sending that flight.
(2) Only allowing cwnd to grow to twice the number of outstanding
packets ("max_packets_out") in slow start. In congestion avoidance
mode we now only allow cwnd to grow if it was fully utilized.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Channels in 2.4GHz band overlap, this means that if we
send a probe request on channel 1 and then move to channel
2, we will hear the probe response on channel 2. In this
case, the RSSI will be lower than if we had heard it on
the channel on which it was sent (1 in this case).
The firmware / low level driver can parse the channel in
the DS IE or HT IE and compensate the RSSI so that it will
still have a valid value even if we heard the frame on an
adjacent channel. This can be done up to a certain offset.
Add this offset as a configuration for the low level driver.
A low level driver that can compensate the low RSSI in this
case should assign the maximal offset for which the RSSI
value is still valid.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Implement and export the new cfg80211_get_station() API.
This utility can be used by other kernel modules to obtain
detailed information about a given wireless station.
It will be in particular useful to batman-adv which will
implement a wireless rate based metric.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add get_expected_throughput() API to mac80211 so that each
driver can implement its own version based on the RC
algorithm they are using (might be using an HW RC algo).
The API returns a value expressed in Kbps.
Also, add the new get_expected_throughput() member
to the rate_control_ops structure in order to be
able to query the RC algorithm (this patch provides an
implementation of this API for both minstrel and
minstrel_ht).
The related member in the station_info object is now
filled accordingly when dumping a station.
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Users may need information about the expected throughput
towards a given peer.
This value is supposed to consider the size overhead
generated by the 802.11 header.
This value is exported in kbps through the get_station() API
by including it into the station_info object.
Moreover, it is sent to user space when replying to the
nl80211 GET_STATION command.
This information will be useful to the batman-adv module
which will use it for its new metric computation.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This allows for a more generic NFC sniffing by using SOCKPROTO_RAW
SOCK_RAW to read RAW NFC frames. This is for sniffing anything but LLCP
(HCI, NCI, etc...).
Signed-off-by: Hiren Tandel <hirent@marvell.com>
Signed-off-by: Rahul Tank <rahult@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This requires changing the nl80211 parsing code a bit to use
intermediate pointers for the allocation, but clarifies the
API towards the drivers.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This also propagates through the drivers.
The orinoco driver uses the cfg80211 API structs for internal
bookkeeping, and so needs a (void *) cast that removes the
const - but that's OK because it allocates those pointers.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The csa_counter_offs was erroneously described as csa_offs in
the docbook section.
This fixes two warnings when making htmldocs (at least):
Warning(include/net/mac80211.h:3428): No description found for parameter 'csa_counter_offs[IEEE80211_MAX_CSA_COUNTERS_NUM]'
Warning(include/net/mac80211.h:3428): Excess struct/union/enum/typedef member 'csa_offs' description in 'ieee80211_mutable_offsets'
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Move the comment in the structure to a description of the
max_num_csa_counters field in the docbook area.
This fixes a warning when building htmldocs (at least):
Warning(include/net/cfg80211.h:3064): No description found for parameter 'max_num_csa_counters'
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Now that all objects are released in the reverse order via the
transaction infrastructure, we can enqueue the release via
call_rcu to save one synchronize_rcu. For small rule-sets loaded
via nft -f, it now takes around 50ms less here.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Instead of caching the original skbuff that contains the netlink
messages, this stores the netlink message sequence number, the
netlink portID and the report flag. This helps to prepare the
introduction of the object release via call_rcu.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Leave the set content in consistent state if we fail to load the
batch. Use the new generic transaction infrastructure to achieve
this.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch speeds up rule-set updates and it also provides a way
to revert updates and leave things in consistent state in case that
the batch needs to be aborted.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch speeds up rule-set updates and it also introduces a way to
revert chain updates if the batch is aborted. The idea is to store the
changes in the transaction to apply that in the commit step.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch reworks the nf_tables API so set updates are included in
the same batch that contains rule updates. This speeds up rule-set
updates since we skip a dialog of four messages between kernel and
user-space (two on each direction), from:
1) create the set and send netlink message to the kernel
2) process the response from the kernel that contains the allocated name.
3) add the set elements and send netlink message to the kernel.
4) process the response from the kernel (to check for errors).
To:
1) add the set to the batch.
2) add the set elements to the batch.
3) add the rule that points to the set.
4) send batch to the kernel.
This also introduces an internal set ID (NFTA_SET_ID) that is unique
in the batch so set elements and rules can refer to new sets.
Backward compatibility has been only retained in userspace, this
means that new nft versions can talk to the kernel both in the new
and the old fashion.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The patch adds message type to the transaction to simplify the
commit the and abort routines. Yet another step forward in the
generalisation of the transaction infrastructure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch generalises the existing rule transaction infrastructure
so it can be used to handle set, table and chain object transactions
as well. The transaction provides a data area that stores private
information depending on the transaction type.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The new transaction infrastructure updates the family, table and chain
objects in the context structure, so let's deconstify them. While at it,
move the context structure initialization routine to the top of the
source file as it will be also used from the table and chain routines.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The 802.15.4-2011 standard states that for each key, a list of devices
that use this key shall be kept. Previous patches have only considered
two options:
* a device "uses" (or may use) all keys, rendering the list useless
* a device is restricted to a certain set of keys
Another option would be that a device *may* use all keys, but need not
do so, and we are interested in the actual set of keys the device uses.
Recording keys used by any given device may have a noticable performance
impact and might not be needed as often. The common case, in which a
device will not switch keys too often, should still perform well.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow datagram sockets to override the security settings of the device
they send from on a per-socket basis. Requires CAP_NET_ADMIN or
CAP_NET_RAW, since raw sockets can send arbitrary packets anyway.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The added structures match 802.15.4-2011 link-layer security PIBs as
closely as is reasonable. Some lists required by the standard were
modeled as bitmaps (frame_types and command_frame_ids in *llsec_key,
802.15.4-2011 7.5/Table 61), since using lists for those seems a bit
excessive and not particularly useful. The DeviceDescriptorHandleList
was inverted and is here a per-device list, since operations on this
list are likely to have both a key and a device at hand, and per-device
lists of keys are shorter than per-key lists of devices.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support to store local maximum TX power level for
connection when reply for HCI_Read_Transmit_Power_Level is received.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for Get Connection Information mgmt command
which can be used to query for information about connection, i.e. RSSI
and local TX power level.
In general values cached in hci_conn are returned as long as they are
considered valid, i.e. do not exceed age limit set in hdev. This limit
is calculated as random value between min/max values to avoid client
trying to guess when to poll for updated information.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds conn_info_min_age and conn_info_max_age parameters to
debugfs which determine lifetime of connection information. Actual
lifetime will be random value between min and max age.
Default values for min and max age are 1000ms and 3000ms respectively.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
RFC 4861 states in 7.2.5:
The IsRouter flag in the cache entry MUST be set based on the
Router flag in the received advertisement. In those cases
where the IsRouter flag changes from TRUE to FALSE as a result
of this update, the node MUST remove that router from the
Default Router List and update the Destination Cache entries
for all destinations using that neighbor as a router as
specified in Section 7.3.3. This is needed to detect when a
node that is used as a router stops forwarding packets due to
being configured as a host.
Currently, when dealing with NA Message which IsRouter flag changes from
TRUE to FALSE, the kernel only removes router from the Default Router List,
and don't update the Destination Cache entries.
Now in order to update those Destination Cache entries, i introduce
function rt6_clean_tohost().
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current mac_cb handling of ieee802154 is rather awkward and limited.
Decompose the single flags field into multiple fields with the meanings
of each subfield of the flags field to make future extensions (for
example, link-layer security) easier. Also don't set the frame sequence
number in upper layers, since that's a thing the MAC is supposed to set
on frame transmit - we set it on header creation, but assuming that
upper layers do not blindly duplicate our headers, this is fine.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When dealing with 802.15.4, one often has to know the maximum payload
size for a given packet. This depends on many factors, one of which is
whether or not a security header is present in the frame. These
definitions and functions provide an easy way for any upper layer to
calculate the maximum payload size for a packet. The first obvious user
for this is 6lowpan, which duplicates this calculation and gets it
partially wrong because it ignores security headers.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
> include/net/ip.h:211:5: warning: "CONFIG_SYSCTL" is not defined [-Wundef]
> #if CONFIG_SYSCTL
> ^
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the beacon CSA counters part of ieee80211_mutable_offsets and don't
decrement CSA counters when generating a beacon template. This permits the
driver to offload the CSA counters handling. Since mac80211 updates the probe
responses with the correct counter, the driver should sync the counter's value
with mac80211 using ieee80211_csa_update_counter function.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add a new API ieee80211_beacon_get_template, which doesn't
affect DTIM counter and should be used if the device generates beacon
frames, and new beacon template is needed. In addition set the offsets
to TIM IE for MESH interface.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Change the type of NL80211_ATTR_CSA_C_OFF_BEACON and
NL80211_ATTR_CSA_C_OFF_PRESP to be NLA_BINARY which allows
userspace to use beacons and probe responses with
multiple CSA counters.
This isn't breaking the API since userspace can
continue to use nla_put_u16 for this attributes, which
is equivalent to a single element u16 array.
In addition advertise max number of supported CSA counters.
This is needed when using CSA and eCSA IEs together.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add NL80211_ATTR_CSA_C_OFFSETS_TX which holds an array
of offsets to the CSA counters which should be updated
when sending a management frames with NL80211_CMD_FRAME.
This API should be used by the drivers that wish to keep the
CSA counter updated in probe responses, but do not implement
probe response offloading and so, do not use
ieee80211_proberesp_get function.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Missing a colon on definition use is a bit odd so
change the macro for the 32 bit case to declare an
__attribute__((unused)) and __deprecated variable.
The __deprecated attribute will cause gcc to emit
an error if the variable is actually used.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_local_port_range is already per netns, so should ip_local_reserved_ports
be. And since it is none by default we don't actually need it when we don't
enable CONFIG_SYSCTL.
By the way, rename inet_is_reserved_local_port() to inet_is_local_reserved_port()
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using mark-based routing, sockets returned from accept()
may need to be marked differently depending on the incoming
connection request.
This is the case, for example, if different socket marks identify
different networks: a listening socket may want to accept
connections from all networks, but each connection should be
marked with the network that the request came in on, so that
subsequent packets are sent on the correct network.
This patch adds a sysctl to mark TCP sockets based on the fwmark
of the incoming SYN packet. If enabled, and an unmarked socket
receives a SYN, then the SYN packet's fwmark is written to the
connection's inet_request_sock, and later written back to the
accepted socket when the connection is established. If the
socket already has a nonzero mark, then the behaviour is the same
as it is today, i.e., the listening socket's fwmark is used.
Black-box tested using user-mode linux:
- IPv4/IPv6 SYN+ACK, FIN, etc. packets are routed based on the
mark of the incoming SYN packet.
- The socket returned by accept() is marked with the mark of the
incoming SYN packet.
- Tested with syncookies=1 and syncookies=2.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kernel-originated IP packets that have no user socket associated
with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.)
are emitted with a mark of zero. Add a sysctl to make them have
the same mark as the packet they are replying to.
This allows an administrator that wishes to do so to use
mark-based routing, firewalling, etc. for these replies by
marking the original packets inbound.
Tested using user-mode linux:
- ICMP/ICMPv6 echo replies and errors.
- TCP RST packets (IPv4 and IPv6).
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To avoid large code duplication in IPv6, we need to first simplify
the complicate SYN-ACK sending code in tcp_v4_conn_request().
To use tcp_v4(6)_send_synack() to send all SYN-ACKs, we need to
initialize the mini socket's receive window before trying to
create the child socket and/or building the SYN-ACK packet. So we move
that initialization from tcp_make_synack() to tcp_v4_conn_request()
as a new function tcp_openreq_init_req_rwin().
After this refactoring the SYN-ACK sending code is simpler and easier
to implement Fast Open for IPv6.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consolidate various cookie checking and generation code to simplify
the fast open processing. The main goal is to reduce code duplication
in tcp_v4_conn_request() for IPv6 support.
Removes two experimental sysctl flags TFO_SERVER_ALWAYS and
TFO_SERVER_COOKIE_NOT_CHKD used primarily for developmental debugging
purposes.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move common TFO functions that will be used by both v4 and v6
to tcp_fastopen.c. Create a helper tcp_fastopen_queue_check().
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the moment, the ath9k/ath10k DFS module only supports detecting ETSI
radar patterns.
Add a bitmap in the interface combinations, indicating which DFS regions
are supported by the detector. If unset, support for all regions is
assumed.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
As suggested by several people, rename local_df to ignore_df,
since it means "ignore df bit if it is set".
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/altera/altera_sgdma.c
net/netlink/af_netlink.c
net/sched/cls_api.c
net/sched/sch_api.c
The netlink conflict dealt with moving to netlink_capable() and
netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations
in non-init namespaces. These were simple transformations from
netlink_capable to netlink_ns_capable.
The Altera driver conflict was simply code removal overlapping some
void pointer cast cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support to store local TX power level for connection
when reply for HCI_Read_Transmit_Power_Level is received.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
John W. Linville says:
====================
pull request: wireless 2014-05-08
This one is all from Johannes:
"Here are a few small fixes for the current cycle: radiotap TX flags were
wrong (fix by Bob), Chun-Yeow fixes an SMPS issue with mesh interfaces,
Eliad fixes a locking bug and a cfg80211 state problem and finally
Henning sent me a fix for IBSS rate information."
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When the driver fails during HW restart or resume, the whole
stack goes into a very confused state with interfaces being
up while the hardware is down etc.
Address this by shutting down everything; we'll run into a
lot of warnings in the process but that's better than having
the whole stack get messed up.
Reviewed-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Similarly, when CONFIG_SYSCTL is not set, ping_group_range should still
work, just that no one can change it. Therefore we should move it out of
sysctl_net_ipv4.c. And, it should not share the same seqlock with
ip_local_port_range.
BTW, rename it to ->ping_group_range instead.
Cc: David S. Miller <davem@davemloft.net>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Reported-by: Stefan de Konink <stefan@konink.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When CONFIG_SYSCTL is not set, ip_local_port_range should still work,
just that no one can change it. Therefore we should move it out of sysctl_inet.c.
Also, rename it to ->ip_local_ports instead.
Cc: David S. Miller <davem@davemloft.net>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Reported-by: Stefan de Konink <stefan@konink.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support to store RSSI for connection when reply for
HCI_Read_RSSI is received.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When trying to generate documentation, at least xmldocs, we get the
following warning:
Warning(include/net/cfg80211.h:461): No description found for parameter 'nl80211_iftype'
Fix it by adding the iftype argument name to the
cfg80211_chandef_dfs_required() function declaration.
Reported-and-tested-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
commit 8f0ea0fe3a (snmp: reduce percpu needs by 50%)
reduced snmp array size to 1, so technically it doesn't have to be
an array any more. What's more, after the following commit:
commit 933393f58f
Date: Thu Dec 22 11:58:51 2011 -0600
percpu: Remove irqsafe_cpu_xxx variants
We simply say that regular this_cpu use must be safe regardless of
preemption and interrupt state. That has no material change for x86
and s390 implementations of this_cpu operations. However, arches that
do not provide their own implementation for this_cpu operations will
now get code generated that disables interrupts instead of preemption.
probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
almost 3 years, no one complains.
So, just convert the array to a single pointer and remove snmp_mib_init()
and snmp_mib_free() as well.
Cc: Christoph Lameter <cl@linux.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The quoted text and figure are from RFC 6040 ("Tunnelling of Explicit
Congestion Notification").
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This exports a new cfg80211_stop_iface() function.
This is intended for driver internal interface
combination management and channel switching.
Due to locking issues (it re-enters driver) the
call is asynchronous and uses cfg80211 event
list/worker.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It was possible for tx queues to be stuck stopped
if AP CSA finalization failed. In that case
neither stop_ap nor do_stop woke the queues up.
This means it was impossible to perform tx at all
until driver was reloaded or a successful CSA was
performed later.
It was possible to solve this in a simpler manner
however this is more robust and future proof
(having multi-vif CSA in mind).
New sdata->csa_block_tx is introduced to keep
track of which interfaces requested tx to be
blocked for CSA. This is required because mac80211
stops all tx queues for that purpose. This means
queues must be awoken only when last tx-blocking
CSA interface is finished.
It is still possible to have tx queues stopped
after CSA failure but as soon as offending
interfaces are stopped from userspace (stop_ap or
ifdown) tx queues are woken up properly.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Enabling CONFIG_DEBUG_ATOMIC_SLEEP has shown that some rfcomm functions
acquiring spinlocks call sleeping locks further in the chain. Converting
the offending spinlocks into mutexes makes sleeping safe.
Signed-off-by: Libor Pechacek <lpechacek@suse.cz>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Call skb_checksum_init instead of private functions.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call skb_checksum_init instead of private functions.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
csum_add is really nothing more then add-with-carry which
can be implemented efficiently in some architectures.
Allow architecture to define this protected by HAVE_ARCH_CSUM_ADD.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless-next 2014-05-02
Please pull this batch of updates intended for the 3.16 stream...
For the mac80211 bits, Johannes says:
"In this round we have a large number of small features and
improvements from people too numerous to list here. The only really
bit thing is Michał and Luca's CSA work (including changing how
interface combination verification is done)."
For the Bluetooth bits, Gustavo says:
"Here goes some patches for the -next release. There is nothing
really special for this pull request, just a bunch of refactors,
fixes and clean ups."
For the ath10k/ath6kl bits, Kalle says:
"For ath6kl Kalle fixed a bunch of checkpatch warnings.
In ath10k we had more changes, major ones being:
* fix memory allocation failures after a firmware crash (Michal)
* some rework of DFS configuration to enable it correctly in all cases
(Michal)
* add a new firmware crash option to make it possible to crash 10.1
firmware for testing purposes (Marek P)
* fix RTS/CTS protection in certain cases (Marek K)
* fix wrong RSSI and rate reporting in some cases (Janusz)
* fix firmware stats reporting (Chun, Ben & Bartosz)"
For the iwlwifi bits, Emmanuel says:
"I have here a bunch of unrelated things. I disabled support for
-7.ucode which means that I can removed a lot of code. Eliad has
a brand new feature: we reduce the Tx power when the link allows -
this reduces our power consumption. The regular changes in power and
scan area. One interesting thing though is the patches from Johannes,
we have now GRO which allows to increase our throughput in TCP Rx. The
main advantage is that it reduces the number of TCP Acks - these TCP
Acks are completely useless when we are using A-MPDU since the first
packet of the A-MPDU generates a TCP Ack which is made obsolete by
the next packets."
Along with that, there are a variety of updates to b43, mwifiex,
rtl8180 and wil6210 drivers and a handful of other updates here
and there.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Right now the core vsock module is the owner of the proto family. This
means there's nothing preventing the transport module from unloading if
there are open sockets, which results in a panic. Fix that by allowing
the transport to be the owner, which will refcount it properly.
Includes version bump to 1.0.1.0-k
Passes checkpatch this time, I swear...
Acked-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expose a new tdls flag for the public ieee80211_sta struct.
This can be used in some rate control decisions.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add locked-version for cfg80211_sched_scan_stopped.
This is used for some users that might want to
call it when rtnl is already locked.
Fixes: d43c6b6 ("mac80211: reschedule sched scan after HW restart")
Cc: stable@vger.kernel.org (3.14+)
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit e114a710aa ("tcp: fix cwnd limited checking to improve
congestion control") obsoleted in_flight parameter from
tcp_is_cwnd_limited() and its callers.
This patch does the removal as promised.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung discovered tcp_is_cwnd_limited() was returning false in
slow start phase even if the application filled the socket write queue.
All congestion modules take into account tcp_is_cwnd_limited()
before increasing cwnd, so this behavior limits slow start from
probing the bandwidth at full speed.
The problem is that even if write queue is full (aka we are _not_
application limited), cwnd can be under utilized if TSO should auto
defer or TCP Small queues decided to hold packets.
So the in_flight can be kept to smaller value, and we can get to the
point tcp_is_cwnd_limited() returns false.
With TCP Small Queues and FQ/pacing, this issue is more visible.
We fix this by having tcp_cwnd_validate(), which is supposed to track
such things, take into account unsent_segs, the number of segs that we
are not sending at the moment due to TSO or TSQ, but intend to send
real soon. Then when we are cwnd-limited, remember this fact while we
are processing the window of ACKs that comes back.
For example, suppose we have a brand new connection with cwnd=10; we
are in slow start, and we send a flight of 9 packets. By the time we
have received ACKs for all 9 packets we want our cwnd to be 18.
We implement this by setting tp->lsnd_pending to 9, and
considering ourselves to be cwnd-limited while cwnd is less than
twice tp->lsnd_pending (2*9 -> 18).
This makes tcp_is_cwnd_limited() more understandable, by removing
the GSO/TSO kludge, that tried to work around the issue.
Note the in_flight parameter can be removed in a followup cleanup
patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This replaces 6 identical code snippets with a call to a new
static inline function.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DSA drivers have a trick which consists in allocating "priv_size" more
bytes to account for the DSA driver private context. Add a helper
function to access that private context instead of open-coding it in
drivers with (void *)(ds + 1).
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce copy-past a bit by adding a common helper.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This extends NL80211_CMD_SET_CHANNEL to allow dynamic channel bandwidth
changes in AP mode (including P2P GO) during a lifetime of the BSS. This
can be used to implement, e.g., HT 20/40 MHz co-existence rules on the
2.4 GHz band.
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When actions are attached to a filter, they are a part of the filter
itself, so when changing a filter we should allow to overwrite the actions
inside as well.
In my specific case, when I tried to _append_ a new action to an existing
filter which already has an action, I got EEXIST since kernel refused
to overwrite the existing one in kernel.
This patch checks if we are changing the filter checking NLM_F_CREATE flag
(Sigh, filters don't use NLM_F_REPLACE...) and then passes the boolean down
to actions. This fixes the problem above.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since there are frequency bands (e.g. 5.9GHz) allowing channels
with only 10 or 5 MHz bandwidth, this patch adds attributes that
allow keeping track about this information.
When channel attributes are reported to user-space, make sure to
not break old tools, i.e. if the 'split wiphy dump' is enabled,
report the extra attributes (if present) describing the bandwidth
restrictions. If the 'split wiphy dump' is not enabled,
completely omit those channels that have flags set to either
IEEE80211_CHAN_NO_10MHZ or IEEE80211_CHAN_NO_20MHZ.
Add the check for new bandwidth restriction flags in
cfg80211_chandef_usable() to comply with the restrictions.
Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some chips can encrypt managment frames in HW, but
require generated IV in the frame. Add a key flag
that allows us to achieve this.
Signed-off-by: Marek Kwaczynski <marek.kwaczynski@tieto.com>
[use BIT(0) to fill that spot, fix indentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The patch splits cfg80211_check_combinations()
into an iterator function and a simple iteration
user.
This makes it possible for drivers to asses how
many channels can use given iftype setup. This in
turn can be used for future
multi-interface/multi-channel channel switching.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch allows to switch the netns when packet is encapsulated or
decapsulated.
The vxlan socket is openned into the i/o netns, ie into the netns where
encapsulated packets are received. The socket lookup is done into this netns to
find the corresponding vxlan tunnel. After decapsulation, the packet is
injecting into the corresponding interface which may stand to another netns.
When one of the two netns is removed, the tunnel is destroyed.
Configuration example:
ip netns add netns1
ip netns exec netns1 ip link set lo up
ip link add vxlan10 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0
ip link set vxlan10 netns netns1
ip netns exec netns1 ip addr add 192.168.0.249/24 broadcast 192.168.0.255 dev vxlan10
ip netns exec netns1 ip link set vxlan10 up
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_net_capable - The common case, operations that are safe in a network namespace.
sk_capable - Operations that are not known to be safe in a network namespace
sk_ns_capable - The general case for special cases.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes noted this is not needed, all of the fragment
accessors don't need CONFIG_NET_NS. This goes test compiled with
CONFIG_BT_6LOWPAN=y and a disabled CONFIG_NET_NS.
CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: linux-zigbee-devel@lists.sourceforge.net
Cc: David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will be useful to create network family dedicated META expression
as for NFPROTO_BRIDGE for instance.
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit f1370cc4 "xfrm: Remove useless secid field from xfrm_audit." changed
"struct xfrm_audit" to have either
{ audit_get_loginuid(current) / audit_get_sessionid(current) } or
{ INVALID_UID / -1 } pair.
This means that we can represent "struct xfrm_audit" as "bool".
This patch replaces "struct xfrm_audit" argument with "bool".
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
It seems to me that commit ab5f5e8b "[XFRM]: xfrm audit calls" is doing
something strange at xfrm_audit_helper_usrinfo().
If secid != 0 && security_secid_to_secctx(secid) != 0, the caller calls
audit_log_task_context() which basically does
secid != 0 && security_secid_to_secctx(secid) == 0 case
except that secid is obtained from current thread's context.
Oh, what happens if secid passed to xfrm_audit_helper_usrinfo() was
obtained from other thread's context? It might audit current thread's
context rather than other thread's context if security_secid_to_secctx()
in xfrm_audit_helper_usrinfo() failed for some reason.
Then, are all the caller of xfrm_audit_helper_usrinfo() passing either
secid obtained from current thread's context or secid == 0?
It seems to me that they are.
If I didn't miss something, we don't need to pass secid to
xfrm_audit_helper_usrinfo() because audit_log_task_context() will
obtain secid from current thread's context.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Add RF tech and framing macros for the ISO/IEC 14443-B Protocol.
Cc: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
load_session allows a CLF to restore the gate <-> pipe table from some
proprietary location.
The main advantage to add this function is to reduce the memory wear by
running pipe creation (and storing) only once.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Make tcp_cwnd_application_limited() static and move it from tcp_input.c to
tcp_output.c
Signed-off-by: Weiping Pan <wpan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't rely on driver files or other headers having this file included.
CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: linux-zigbee-devel@lists.sourceforge.net
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will simplify the new reassembly backport
with no code changes being required.
CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: linux-zigbee-devel@lists.sourceforge.net
Cc: David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, it is possible to create an SCTP socket, then switch
auth_enable via sysctl setting to 1 and crash the system on connect:
Oops[#1]:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1
task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000
[...]
Call Trace:
[<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80
[<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4
[<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c
[<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8
[<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214
[<ffffffff8043af68>] sctp_rcv+0x588/0x630
[<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24
[<ffffffff803acb50>] ip6_input+0x2c0/0x440
[<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564
[<ffffffff80310650>] process_backlog+0xb4/0x18c
[<ffffffff80313cbc>] net_rx_action+0x12c/0x210
[<ffffffff80034254>] __do_softirq+0x17c/0x2ac
[<ffffffff800345e0>] irq_exit+0x54/0xb0
[<ffffffff800075a4>] ret_from_irq+0x0/0x4
[<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48
[<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148
[<ffffffff805a88b0>] start_kernel+0x37c/0x398
Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0
03e00008 00000000
---[ end trace b530b0551467f2fd ]---
Kernel panic - not syncing: Fatal exception in interrupt
What happens while auth_enable=0 in that case is, that
ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs()
when endpoint is being created.
After that point, if an admin switches over to auth_enable=1,
the machine can crash due to NULL pointer dereference during
reception of an INIT chunk. When we enter sctp_process_init()
via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk,
the INIT verification succeeds and while we walk and process
all INIT params via sctp_process_param() we find that
net->sctp.auth_enable is set, therefore do not fall through,
but invoke sctp_auth_asoc_set_default_hmac() instead, and thus,
dereference what we have set to NULL during endpoint
initialization phase.
The fix is to make auth_enable immutable by caching its value
during endpoint initialization, so that its original value is
being carried along until destruction. The bug seems to originate
from the very first days.
Fix in joint work with Daniel Borkmann.
Reported-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As suggested by Julian:
Simply, flowi4_iif must not contain 0, it does not
look logical to ignore all ip rules with specified iif.
because in fib_rule_match() we do:
if (rule->iifindex && (rule->iifindex != fl->flowi_iif))
goto out;
flowi4_iif should be LOOPBACK_IFINDEX by default.
We need to move LOOPBACK_IFINDEX to include/net/flow.h:
1) It is mostly used by flowi_iif
2) Fix the following compile error if we use it in flow.h
by the patches latter:
In file included from include/linux/netfilter.h:277:0,
from include/net/netns/netfilter.h:5,
from include/net/net_namespace.h:21,
from include/linux/netdevice.h:43,
from include/linux/icmpv6.h:12,
from include/linux/ipv6.h:61,
from include/net/ipv6.h:16,
from include/linux/sunrpc/clnt.h:27,
from include/linux/nfs_fs.h:30,
from init/do_mounts.c:32:
include/net/flow.h: In function ‘flowi4_init_output’:
include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function)
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the dst->output() path for ipv4, the code assumes the skb it has to
transmit is attached to an inet socket, specifically via
ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
provider of the packet is an AF_PACKET socket.
The dst->output() method gets an additional 'struct sock *sk'
parameter. This needs a cascade of changes so that this parameter can
be propagated from vxlan to final consumer.
Fixes: 8f646c922d ("vxlan: keep original skb ownership")
Reported-by: lucien xin <lucien.xin@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_queue_xmit() assumes the skb it has to transmit is attached to an
inet socket. Commit 31c70d5956 ("l2tp: keep original skb ownership")
changed l2tp to not change skb ownership and thus broke this assumption.
One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(),
so that we do not assume skb->sk points to the socket used by l2tp
tunnel.
Fixes: 31c70d5956 ("l2tp: keep original skb ownership")
Reported-by: Zhan Jianyu <nasa4836@gmail.com>
Tested-by: Zhan Jianyu <nasa4836@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains three Netfilter fixes for your net tree,
they are:
* Fix missing generation sequence initialization which results in a splat
if lockdep is enabled, it was introduced in the recent works to improve
nf_conntrack scalability, from Andrey Vagin.
* Don't flush the GRE keymap list in nf_conntrack when the pptp helper is
disabled otherwise this crashes due to a double release, from Andrey
Vagin.
* Fix nf_tables cmp fast in big endian, from Patrick McHardy.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Francois reported that setting big mtu on loopback device could prevent
tcp sessions making progress.
We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets.
We must limit the IPv6 MTU to (65535 + 40) bytes in theory.
Tested:
ifconfig lo mtu 70000
netperf -H ::1
Before patch : Throughput : 0.05 Mbits
After patch : Throughput : 35484 Mbits
Reported-by: Francois WELLENREITER <f.wellenreiter@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
nft_cmp_fast is used for equality comparisions of size <= 4. For
comparisions of size < 4 byte a mask is calculated that is applied to
both the data from userspace (during initialization) and the register
value (during runtime). Both values are stored using (in effect) memcpy
to a memory area that is then interpreted as u32 by nft_cmp_fast.
This works fine on little endian since smaller types have the same base
address, however on big endian this is not true and the smaller types
are interpreted as a big number with trailing zero bytes.
The mask therefore must not include the lower bytes, but the higher bytes
on big endian. Add a helper function that does a cpu_to_le32 to switch
the bytes on big endian. Since we're dealing with a mask of just consequitive
bits, this works out fine.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull yet more networking updates from David Miller:
1) Various fixes to the new Redpine Signals wireless driver, from
Fariya Fatima.
2) L2TP PPP connect code takes PMTU from the wrong socket, fix from
Dmitry Petukhov.
3) UFO and TSO packets differ in whether they include the protocol
header in gso_size, account for that in skb_gso_transport_seglen().
From Florian Westphal.
4) If VLAN untagging fails, we double free the SKB in the bridging
output path. From Toshiaki Makita.
5) Several call sites of sk->sk_data_ready() were referencing an SKB
just added to the socket receive queue in order to calculate the
second argument via skb->len. This is dangerous because the moment
the skb is added to the receive queue it can be consumed in another
context and freed up.
It turns out also that none of the sk->sk_data_ready()
implementations even care about this second argument.
So just kill it off and thus fix all these use-after-free bugs as a
side effect.
6) Fix inverted test in tcp_v6_send_response(), from Lorenzo Colitti.
7) pktgen needs to do locking properly for LLTX devices, from Daniel
Borkmann.
8) xen-netfront driver initializes TX array entries in RX loop :-) From
Vincenzo Maffione.
9) After refactoring, some tunnel drivers allow a tunnel to be
configured on top itself. Fix from Nicolas Dichtel.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
vti: don't allow to add the same tunnel twice
gre: don't allow to add the same tunnel twice
drivers: net: xen-netfront: fix array initialization bug
pktgen: be friendly to LLTX devices
r8152: check RTL8152_UNPLUG
net: sun4i-emac: add promiscuous support
net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
net: ipv6: Fix oif in TCP SYN+ACK route lookup.
drivers: net: cpsw: enable interrupts after napi enable and clearing previous interrupts
drivers: net: cpsw: discard all packets received when interface is down
net: Fix use after free by removing length arg from sk_data_ready callbacks.
Drivers: net: hyperv: Address UDP checksum issues
Drivers: net: hyperv: Negotiate suitable ndis version for offload support
Drivers: net: hyperv: Allocate memory for all possible per-pecket information
bridge: Fix double free and memory leak around br_allowed_ingress
bonding: Remove debug_fs files when module init fails
i40evf: program RSS LUT correctly
i40evf: remove open-coded skb_cow_head
ixgb: remove open-coded skb_cow_head
igbvf: remove open-coded skb_cow_head
...
particularly with a focus on RDMA.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: GPGTools - http://gpgtools.org
iQIcBAABAgAGBQJTRXV1AAoJEDZk62b0Tg6xYWAP/0Kfp9/8Z05SQ1fnJveySw1O
nKK//1/GBmquntqFAVHI6yJRLzPJ+Z/Y4u4a0qriwfgpPQvUJQrL77tY/VfEBUPB
VoJG1tj7lpdLrO3p4YyPkPjymyC7YOoFNjGEstWFg7HetwnnqqZL2LB+5yJzyqjx
y9nv1HzsrbAE7j8C4hQ1Nmds5muUb5VhnTtPhjrx4tP1sWWh8XTVJbsVDiEqx6cu
uJXFFTbkONr9jKfv+Ki3H2pZej2yD7w4tU4lkdcGNyij/Q4Xn1iERqroW2/GT6Cl
AXxlIKN24ASjWo4VqW0Wf8gO8vbUtHRChoiZ69DvzNTkbWmAIWSFHyQJ4cinwuyr
UbOQZuccO59QtpNVpBvG/vjnbI54rg+VGLy+xE0vcrBDlyoptc56IAFSg8zJY5UN
ysbyHCGME//9VZ1zeeZvkMjm8z5Enp6x4zmtnUHmufO7DVMTFUePED6U1u9WIyP5
FFy5EboXMSh97yB8REvbIlY2MgBJWYdnyzLKFMeRpzC8fOXJqBoCX8i3Z5SkMYWJ
1FS/pGr7ec/VX1iHXSYi9hhzTJ6o9mEmOIhaO4UcqAuK8Rk2jbCp1Lx0iDhmJtdT
zjofGDe57ro7nOZf8A/TI5z+6BZq8KYVfZrXtYaPqC4rXOzJAox/yHlz9FmAJYgv
ssOIKXX0ujLwqrMBVatj
=yzy/
-----END PGP SIGNATURE-----
Merge tag 'for-linus-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull 9p changes from Eric Van Hensbergen:
"A bunch of updates and cleanup within the transport layer,
particularly with a focus on RDMA"
* tag 'for-linus-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9pnet_rdma: check token type before int conversion
9pnet: trans_fd : allocate struct p9_trans_fd and struct p9_conn together.
9pnet: p9_client->conn field is unused. Remove it.
9P: Get rid of REQ_STATUS_FLSH
9pnet_rdma: add cancelled()
9pnet_rdma: update request status during send
9P: Add cancelled() to the transport functions.
net: Mark function as static in 9p/client.c
9P: Add memory barriers to protect request fields over cb/rpc threads handoff
Several spots in the kernel perform a sequence like:
skb_queue_tail(&sk->s_receive_queue, skb);
sk->sk_data_ready(sk, skb->len);
But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up. So this skb->len access is potentially
to freed up memory.
Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.
And finally, no actual implementation of this callback actually uses
the length argument. And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.
So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.
Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.
Signed-off-by: David S. Miller <davem@davemloft.net>
The channels with 5/10MHz bandwidth are not HT. We have to
reflect this in conf_is_ht() function which returns whether the
particular channel is HT or not.
Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit a2f73b6c5d ("cfg80211: move regulatory flags to their own variable")
renamed WIPHY_FLAG_CUSTOM_REGULATORY to REGULATORY_CUSTOM_REG, but missed to
update one comment.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
With single-channel drivers, we need to be able to change a running
chanctx if we want to use chanctx reservation. Not all drivers may be
able to do this, so add a flag that indicates support for it.
Changing a running chanctx can also be used as an optimization in
multi-channel drivers when the context needs to be reserved for future
usage.
Introduce IEEE80211_CHANCTX_RESERVED chanctx mode to mark a channel as
reserved so nobody else can use it (since we know it's going to
change). In the future, we may allow several vifs to use the same
reservation as long as they plan to use the chanctx on the same
future channel.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Move the counting part of the interface combination check from
cfg80211 to mac80211.
This is needed to simplify locking when the driver has to perform a
combination check by itself (eg. with channel-switch).
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some interface types don't require DFS (such as STATION, P2P_CLIENT
etc). In order to centralize these decisions, make
cfg80211_chandef_dfs_required() take the iftype into consideration.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Separate the code that counts the interface types and channels from
the code that check the interface combinations. The new function that
checks for combinations is exported so it can be called by the
drivers.
This is done in preparation for moving the interface combinations
checks out of cfg80211.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Allow GO operation on a channel marked with IEEE80211_CHAN_GO_CONCURRENT
iff there is an active station interface that is associated to
an AP operating on the same channel in the 2 GHz band or the same UNII band
(in the 5 GHz band). This relaxation is not allowed if the channel is
marked with IEEE80211_CHAN_RADAR.
Note that this is a permissive approach to the FCC definitions,
that require a clear assessment that the device operating the AP is
an authorized master, i.e., with radar detection and DFS capabilities.
It is assumed that such restrictions are enforced by user space.
Furthermore, it is assumed, that if the conditions that allowed for
the operation of the GO on such a channel change, i.e., the station
interface disconnected from the AP, it is the responsibility of user
space to evacuate the GO from the channel.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The FCC are clarifying some soft configuration requirements,
which among other include the following:
1. Indoor operation, where a device can use channels requiring indoor
operation, subject to that it can guarantee indoor operation,
i.e., the device is connected to AC Power or the device is under
the control of a local master that is acting as an AP and is
connected to AC Power.
2. Concurrent GO operation, where devices may instantiate a P2P GO
while they are under the guidance of an authorized master. For example,
on a channel on which a BSS is connected to an authorized master, i.e.,
with DFS and radar detection capability in the UNII band.
See https://apps.fcc.gov/eas/comments/GetPublishedDocument.html?id=327&tn=528122
Add support for advertising Indoor-only and GO-Concurrent channel
properties.
Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This will allow the low level driver to make decision based
on the vif such as queues etc...
Since the vif might be NULL, we can't add it to the tracing
functions.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
[fix staging rtl8821ae driver]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When dynamically creating interfaces from userspace, e.g. for P2P usage,
such interfaces are usually owned by the process that created them, i.e.
wpa_supplicant. Should wpa_supplicant crash, such interfaces will often
cease operating properly and cause problems on restarting the process.
To avoid this problem, introduce an ownership concept for interfaces. If
an interface is owned by a netlink socket, then it will be destroyed if
the netlink socket is closed for any reason, including if the process it
belongs to crashed. This gives us a race-free way to get rid of any such
interfaces.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pull more networking updates from David Miller:
1) If a VXLAN interface is created with no groups, we can crash on
reception of packets. Fix from Mike Rapoport.
2) Missing includes in CPTS driver, from Alexei Starovoitov.
3) Fix string validations in isdnloop driver, from YOSHIFUJI Hideaki
and Dan Carpenter.
4) Missing irq.h include in bnxw2x, enic, and qlcnic drivers. From
Josh Boyer.
5) AF_PACKET transmit doesn't statistically count TX drops, from Daniel
Borkmann.
6) Byte-Queue-Limit enabled drivers aren't handled properly in
AF_PACKET transmit path, also from Daniel Borkmann.
Same problem exists in pktgen, and Daniel fixed it there too.
7) Fix resource leaks in driver probe error paths of new sxgbe driver,
from Francois Romieu.
8) Truesize of SKBs can gradually get more and more corrupted in NAPI
packet recycling path, fix from Eric Dumazet.
9) Fix uniprocessor netfilter build, from Florian Westphal. In the
longer term we should perhaps try to find a way for ARRAY_SIZE() to
work even with zero sized array elements.
10) Fix crash in netfilter conntrack extensions due to mis-estimation of
required extension space. From Andrey Vagin.
11) Since we commit table rule updates before trying to copy the
counters back to userspace (it's the last action we perform), we
really can't signal the user copy with an error as we are beyond the
point from which we can unwind everything. This causes all kinds of
use after free crashes and other mysterious behavior.
From Thomas Graf.
12) Restore previous behvaior of div/mod by zero in BPF filter
processing. From Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
net: sctp: wake up all assocs if sndbuf policy is per socket
isdnloop: several buffer overflows
netdev: remove potentially harmful checks
pktgen: fix xmit test for BQL enabled devices
net/at91_ether: avoid NULL pointer dereference
tipc: Let tipc_release() return 0
at86rf230: fix MAX_CSMA_RETRIES parameter
mac802154: fix duplicate #include headers
sxgbe: fix duplicate #include headers
net: filter: be more defensive on div/mod by X==0
netfilter: Can't fail and free after table replacement
xen-netback: Trivial format string fix
net: bcmgenet: Remove unnecessary version.h inclusion
net: smc911x: Remove unused local variable
bonding: Inactive slaves should keep inactive flag's value
netfilter: nf_tables: fix wrong format in request_module()
netfilter: nf_tables: set names cannot be larger than 15 bytes
netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len
netfilter: Add {ipt,ip6t}_osf aliases for xt_osf
netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks
...
"len" contains sizeof(nf_ct_ext) and size of extensions. In a worst
case it can contain all extensions. Bellow you can find sizes for all
types of extensions. Their sum is definitely bigger than 256.
nf_ct_ext_types[0]->len = 24
nf_ct_ext_types[1]->len = 32
nf_ct_ext_types[2]->len = 24
nf_ct_ext_types[3]->len = 32
nf_ct_ext_types[4]->len = 152
nf_ct_ext_types[5]->len = 2
nf_ct_ext_types[6]->len = 16
nf_ct_ext_types[7]->len = 8
I have seen "len" up to 280 and my host has crashes w/o this patch.
The right way to fix this problem is reducing the size of the ecache
extension (4) and Florian is going to do this, but these changes will
be quite large to be appropriate for a stable tree.
Fixes: 5b423f6a40 (netfilter: nf_conntrack: fix racy timer handling with reliable)
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull cgroup updates from Tejun Heo:
"A lot updates for cgroup:
- The biggest one is cgroup's conversion to kernfs. cgroup took
after the long abandoned vfs-entangled sysfs implementation and
made it even more convoluted over time. cgroup's internal objects
were fused with vfs objects which also brought in vfs locking and
object lifetime rules. Naturally, there are places where vfs rules
don't fit and nasty hacks, such as credential switching or lock
dance interleaving inode mutex and cgroup_mutex with object serial
number comparison thrown in to decide whether the operation is
actually necessary, needed to be employed.
After conversion to kernfs, internal object lifetime and locking
rules are mostly isolated from vfs interactions allowing shedding
of several nasty hacks and overall simplification. This will also
allow implmentation of operations which may affect multiple cgroups
which weren't possible before as it would have required nesting
i_mutexes.
- Various simplifications including dropping of module support,
easier cgroup name/path handling, simplified cgroup file type
handling and task_cg_lists optimization.
- Prepatory changes for the planned unified hierarchy, which is still
a patchset away from being actually operational. The dummy
hierarchy is updated to serve as the default unified hierarchy.
Controllers which aren't claimed by other hierarchies are
associated with it, which BTW was what the dummy hierarchy was for
anyway.
- Various fixes from Li and others. This pull request includes some
patches to add missing slab.h to various subsystems. This was
triggered xattr.h include removal from cgroup.h. cgroup.h
indirectly got included a lot of files which brought in xattr.h
which brought in slab.h.
There are several merge commits - one to pull in kernfs updates
necessary for converting cgroup (already in upstream through
driver-core), others for interfering changes in the fixes branch"
* 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (74 commits)
cgroup: remove useless argument from cgroup_exit()
cgroup: fix spurious lockdep warning in cgroup_exit()
cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c
cgroup: break kernfs active_ref protection in cgroup directory operations
cgroup: fix cgroup_taskset walking order
cgroup: implement CFTYPE_ONLY_ON_DFL
cgroup: make cgrp_dfl_root mountable
cgroup: drop const from @buffer of cftype->write_string()
cgroup: rename cgroup_dummy_root and related names
cgroup: move ->subsys_mask from cgroupfs_root to cgroup
cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding
cgroup: remove NULL checks from [pr_cont_]cgroup_{name|path}()
cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root
cgroup: reorganize cgroup bootstrapping
cgroup: relocate setting of CGRP_DEAD
cpuset: use rcu_read_lock() to protect task_cs()
cgroup_freezer: document freezer_fork() subtleties
cgroup: update cgroup_transfer_tasks() to either succeed or fail
cgroup: drop task_lock() protection around task->cgroups
cgroup: update how a newly forked task gets associated with css_set
...
Pull networking updates from David Miller:
"Here is my initial pull request for the networking subsystem during
this merge window:
1) Support for ESN in AH (RFC 4302) from Fan Du.
2) Add full kernel doc for ethtool command structures, from Ben
Hutchings.
3) Add BCM7xxx PHY driver, from Florian Fainelli.
4) Export computed TCP rate information in netlink socket dumps, from
Eric Dumazet.
5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas
Dichtel.
6) Convert many drivers to pci_enable_msix_range(), from Alexander
Gordeev.
7) Record SKB timestamps more efficiently, from Eric Dumazet.
8) Switch to microsecond resolution for TCP round trip times, also
from Eric Dumazet.
9) Clean up and fix 6lowpan fragmentation handling by making use of
the existing inet_frag api for it's implementation.
10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss.
11) Auto size SKB lengths when composing netlink messages based upon
past message sizes used, from Eric Dumazet.
12) qdisc dumps can take a long time, add a cond_resched(), From Eric
Dumazet.
13) Sanitize netpoll core and drivers wrt. SKB handling semantics.
Get rid of never-used-in-tree netpoll RX handling. From Eric W
Biederman.
14) Support inter-address-family and namespace changing in VTI tunnel
driver(s). From Steffen Klassert.
15) Add Altera TSE driver, from Vince Bridgers.
16) Optimizing csum_replace2() so that it doesn't adjust the checksum
by checksumming the entire header, from Eric Dumazet.
17) Expand BPF internal implementation for faster interpreting, more
direct translations into JIT'd code, and much cleaner uses of BPF
filtering in non-socket ocntexts. From Daniel Borkmann and Alexei
Starovoitov"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits)
netpoll: Use skb_irq_freeable to make zap_completion_queue safe.
net: Add a test to see if a skb is freeable in irq context
qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port'
net: ptp: move PTP classifier in its own file
net: sxgbe: make "core_ops" static
net: sxgbe: fix logical vs bitwise operation
net: sxgbe: sxgbe_mdio_register() frees the bus
Call efx_set_channels() before efx->type->dimension_resources()
xen-netback: disable rogue vif in kthread context
net/mlx4: Set proper build dependancy with vxlan
be2net: fix build dependency on VxLAN
mac802154: make csma/cca parameters per-wpan
mac802154: allow only one WPAN to be up at any given time
net: filter: minor: fix kdoc in __sk_run_filter
netlink: don't compare the nul-termination in nla_strcmp
can: c_can: Avoid led toggling for every packet.
can: c_can: Simplify TX interrupt cleanup
can: c_can: Store dlc private
can: c_can: Reduce register access
can: c_can: Make the code readable
...
The current set selection simply choses the first set type that provides
the requested features, which always results in the rbtree being chosen
by virtue of being the first set in the list.
What we actually want to do is choose the implementation that can provide
the requested features and is optimal from either a performance or memory
perspective depending on the characteristics of the elements and the
preferences specified by the user.
The elements are not known when creating a set. Even if we would provide
them for anonymous (literal) sets, we'd still have standalone sets where
the elements are not known in advance. We therefore need an abstract
description of the data charcteristics.
The kernel already knows the size of the key, this patch starts by
introducing a nested set description which so far contains only the maximum
amount of elements. Based on this the set implementations are changed to
provide an estimate of the required amount of memory and the lookup
complexity class.
The set ops have a new callback ->estimate() that is invoked during set
selection. It receives a structure containing the attributes known to the
kernel and is supposed to populate a struct nft_set_estimate with the
complexity class and, in case the size is known, the complete amount of
memory required, or the amount of memory required per element otherwise.
Based on the policy specified by the user (performance/memory, defaulting
to performance) the kernel will then select the best suited implementation.
Even if the set implementation would allow to add more than the specified
maximum amount of elements, they are enforced since new implementations
might not be able to add more than maximum based on which they were
selected.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit 9b2777d608 (ieee802154: add TX power control to wpan_phy)
and following erroneously added CSMA and CCA parameters for 802.15.4
devices as PHY parameters, while they are actually MAC parameters and
can differ for any two WPAN instances. Since it is now sensible to have
multiple WPAN devices with differing CSMA/CCA parameters, make these
parameters MAC parameters instead.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless-next 2014-03-31
Please accept this one last round of general wireless updates for
the 3.15 merge window!
For the Bluetooth bits, Gustavo says:
"Here follow another set of patches to 3.15. This is mostly a bug fix
pull request with the exception of one commit from Marcel which adds
tracking to the current configured LE scan type parameter."
Beyond that, notable bits include some final refactoring of rtl8180
and the addition of the rtl8187se driver, fixes for a number of
problems identified by Dan Carpenter and his static analysis tools,
and a handful of other bits here and there.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the whole rt6_need_strict as static inline into ip6_route.h,
so that it can be reused
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch basically does two things, i) removes the extern keyword
from the include/linux/filter.h file to be more consistent with the
rest of Joe's changes, and ii) moves filter accounting into the filter
core framework.
Filter accounting mainly done through sk_filter_{un,}charge() take
care of the case when sockets are being cloned through sk_clone_lock()
so that removal of the filter on one socket won't result in eviction
as it's still referenced by the other.
These functions actually belong to net/core/filter.c and not
include/net/sock.h as we want to keep all that in a central place.
It's also not in fast-path so uninlining them is fine and even allows
us to get rd of sk_filter_release_rcu()'s EXPORT_SYMBOL and a forward
declaration.
Joint work with Alexei Starovoitov.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/marvell/mvneta.c
The mvneta.c conflict is a case of overlapping changes,
a conversion to devm_ioremap_resource() vs. a conversion
to netdev_alloc_pcpu_stats.
Signed-off-by: David S. Miller <davem@davemloft.net>
addrconf_join_solict and addrconf_join_anycast may cause actions which
need rtnl locked, especially on first address creation.
A new DAD state is introduced which defers processing of the initial
DAD processing into a workqueue.
To get rtnl lock we need to push the code paths which depend on those
calls up to workqueues, specifically addrconf_verify and the DAD
processing.
(v2)
addrconf_dad_failure needs to be queued up to the workqueue, too. This
patch introduces a new DAD state and stop the DAD processing in the
workqueue (this is because of the possible ipv6_del_addr processing
which removes the solicited multicast address from the device).
addrconf_verify_lock is removed, too. After the transition it is not
needed any more.
As we are not processing in bottom half anymore we need to be a bit more
careful about disabling bottom half out when we lock spin_locks which are also
used in bh.
Relevant backtrace:
[ 541.030090] RTNL: assertion failed at net/core/dev.c (4496)
[ 541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 3.10.33-1-amd64-vyatta #1
[ 541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 541.031146] ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8
[ 541.031148] 0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18
[ 541.031150] 0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000
[ 541.031152] Call Trace:
[ 541.031153] <IRQ> [<ffffffff8148a9f0>] ? dump_stack+0xd/0x17
[ 541.031180] [<ffffffff813c98c1>] ? __dev_set_promiscuity+0x101/0x180
[ 541.031183] [<ffffffff813d3540>] ? __hw_addr_create_ex+0x60/0xc0
[ 541.031185] [<ffffffff813cfe1a>] ? __dev_set_rx_mode+0xaa/0xc0
[ 541.031189] [<ffffffff813d3a81>] ? __dev_mc_add+0x61/0x90
[ 541.031198] [<ffffffffa01dcf9c>] ? igmp6_group_added+0xfc/0x1a0 [ipv6]
[ 541.031208] [<ffffffff8111237b>] ? kmem_cache_alloc+0xcb/0xd0
[ 541.031212] [<ffffffffa01ddcd7>] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6]
[ 541.031216] [<ffffffffa01c2fae>] ? addrconf_join_solict+0x2e/0x40 [ipv6]
[ 541.031219] [<ffffffffa01ba2e9>] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6]
[ 541.031223] [<ffffffffa01c0772>] ? addrconf_join_anycast+0x92/0xa0 [ipv6]
[ 541.031226] [<ffffffffa01c311e>] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6]
[ 541.031229] [<ffffffffa01c3213>] ? ipv6_ifa_notify+0x33/0x50 [ipv6]
[ 541.031233] [<ffffffffa01c36c8>] ? addrconf_dad_completed+0x28/0x100 [ipv6]
[ 541.031241] [<ffffffff81075c1d>] ? task_cputime+0x2d/0x50
[ 541.031244] [<ffffffffa01c38d6>] ? addrconf_dad_timer+0x136/0x150 [ipv6]
[ 541.031247] [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]
[ 541.031255] [<ffffffff8105313a>] ? call_timer_fn.isra.22+0x2a/0x90
[ 541.031258] [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]
Hunks and backtrace stolen from a patch by Stephen Hemminger.
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To be consistent, lets use msec for this timeout as well.
Note: This define value is a minimum scan time taken from BT Core spec 4.0,
Vol 3, Part C, chapter 9.2.6
Signed-off-by: Lukasz Rymanowski <lukasz.rymanowski@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
With this patch it is possible to control discovery interleaved
timeout value from debugfs.
It is for fine tuning of this timeout.
Signed-off-by: Lukasz Rymanowski <lukasz.rymanowski@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Keep msec instead of jiffies in this define. This is needed by following
patch where we want this timeout to be exposed in debugfs.
Note: Value of this timeout comes from recommendation in BT Core Spec.4.0,
Vol 3, Part C, chapter 13.2.1.
Signed-off-by: Lukasz Rymanowski <lukasz.rymanowski@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If an IPv6 host route with metrics exists, an attempt to add a
new route for the same target with different metrics fails but
rewrites the metrics anyway:
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 rto_min lock 1s
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1500
RTNETLINK answers: File exists
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 rto_min lock 1.5s
This is caused by all IPv6 host routes using the metrics in
their inetpeer (or the shared default). This also holds for the
new route created in ip6_route_add() which shares the metrics
with the already existing route and thus ip6_route_add()
rewrites the metrics even if the new route ends up not being
used at all.
Another problem is that old metrics in inetpeer can reappear
unexpectedly for a new route, e.g.
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip route del fec0::1
12sp0:~ # ip route add fec0::1 dev eth0
12sp0:~ # ip route change fec0::1 dev eth0 hoplimit 10
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 hoplimit 10 rto_min lock 1s
Resolve the first problem by moving the setting of metrics down
into fib6_add_rt2node() to the point we are sure we are
inserting the new route into the tree. Second problem is
addressed by introducing new flag DST_METRICS_FORCE_OVERWRITE
which is set for a new host route in ip6_route_add() and makes
ipv6_cow_metrics() always overwrite the metrics in inetpeer
(even if they are not "new"); it is reset after that.
v5: use a flag in _metrics member rather than one in flags
v4: fix a typo making a condition always true (thanks to Hannes
Frederic Sowa)
v3: rewritten based on David Miller's idea to move setting the
metrics (and allocation in non-host case) down to the point we
already know the route is to be inserted. Also rebased to
net-next as it is quite late in the cycle.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The packet hash can be considered a property of the packet, not just
on RX path.
This patch changes name of rxhash and l4_rxhash skbuff fields to be
hash and l4_hash respectively. This includes changing uses of the
field in the code which don't call the access functions.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Especially in crowded environments it can become frequent that we have
to send out whatever pending event there is stored. Since user space
has its own filtering of small RSSI changes sending a 0 value will
essentially force user space to wake up the higher layers (e.g. over
D-Bus) even though the RSSI didn't actually change more than the
threshold value.
This patch adds storing also of the RSSI for pending advertising reports
so that we report an as accurate RSSI as possible when we have to send
out the stored information to user space.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The line was incorrectly split between the variable type and its name.
This patch fixes the issue.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
To avoid too many events being sent to user space and to help parsing of
all available remote device data it makes sense for us to wait for the
scan response and send a single merged Device Found event to user space.
This patch adds a few new variables to hci_dev to track the last
received ADV_IND/ADV_SCAN_IND, i.e. those which will cause a SCAN_REQ to
be send in the case of active scanning. When the SCAN_RSP is received
the pending data is passed together with the SCAN_RSP to the
mgmt_device_found function which takes care of merging them into a
single Device Found event.
We also need a bit of extra logic to handle situations where we don't
receive a SCAN_RSP after caching some data. In such a scenario we simply
have to send out the pending data as it is and then operate on the new
report as if there was no pending data.
We also need to send out any pending data when scanning stops as
well as ensure that the storage is empty at the start of a new active
scanning session. These both cases are covered by the update to the
hci_cc_le_set_scan_enable function in this patch.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When we're in peripheral mode (HCI_ADVERTISING flag is set) the most
natural mapping of connect() is to perform directed advertising to the
peer device.
This patch does the necessary changes to enable directed advertising and
keeps the hci_conn state as BT_CONNECT in a similar way as is done for
central or BR/EDR connection initiation.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In preparation for being able to merge ADV_IND/ADV_SCAN_IND and SCAN_RSP
together into a single device found event add a second parameter to the
mgmt_device_found function. For now all callers pass NULL as this
parameters since we don't yet have storing of the last received
advertising report.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Conflicts:
Documentation/devicetree/bindings/net/micrel-ks8851.txt
net/core/netpoll.c
The net/core/netpoll.c conflict is a bug fix in 'net' happening
to code which is completely removed in 'net-next'.
In micrel-ks8851.txt we simply have overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
Please pull this batch of wireless updates intended for 3.15!
For the mac80211 bits, Johannes says:
"This has a whole bunch of bugfixes for things that went into -next
previously as well as some other bugfixes I didn't want to rush into
3.14 at this point. The rest of it is some cleanups and a few small
features, the biggest of which is probably Janusz's regulatory DFS CAC
time code."
For the Bluetooth bits, Gustavo says:
"One more pull request to 3.15. This is mostly and bug fix pull request, it
contains several fixes and clean up all over the tree, plus some small new
features."
For the NFC bits, Samuel says:
"This is the NFC pull request for 3.15. With this one we have:
- Support for ISO 15693 a.k.a. NFC vicinity a.k.a. Type 5 tags. ISO
15693 are long range (1 - 2 meters) vicinity tags/cards. The kernel
now supports those through the NFC netlink and digital APIs.
- Support for TI's trf7970a chipset. This chipset relies on the NFC
digital layer and the driver currently supports type 2, 4A and 5 tags.
- Support for NXP's pn544 secure firmare download. The pn544 C3 chipsets
relies on a different firmware download protocal than the C2 one. We
now support both and use the right one depending on the version we
detect at runtime.
- Support for 4A tags from the NFC digital layer.
- A bunch of cleanups and minor fixes from Axel Lin and Thierry Escande."
For the iwlwifi bits, Emmanuel says:
"We were sending a host command while the mutex wasn't held. This
led to hard-to-catch races."
And...
"I have a fix for a "merge damage" which is not really a merge
damage: it enables scheduled scan which has been disabled in
wireless.git. Since you merged wireless.git into wireless-next.git,
this can now be fixed in wireless-next.git.
Besides this, Alex made a workaround for a hardware bug. This fix
allows us to consume less power in S3. Arik and Eliad continue to
work on D0i3 which is a run-time power saving feature. Eliad also
contributes a few bits to the rate scaling logic to which Eyal adds his
own contribution. Avri dives deep in the power code - newer firmware
will allow to enable power save in newer scenarios. Johannes made a few
clean-ups. I have the regular amount of BT Coex boring stuff. I disable
uAPSD since we identified firmware bugs that cause packet loss. One
thing that do stand out is the udev event that we now send when the
FW asserts. I hope it will allow us to debug the FW more easily."
Also included is one last iwlwifi pull for a build breakage fix...
For the Atheros bits, Kalle says:
"Michal now did some optimisations and was able to improve throughput by
100 Mbps on our MIPS based AP135 platform. Chun-Yeow added some
workarounds to be able to better use ad-hoc mode. Ben improved log
messages and added support for MSDU chaining. And, as usual, also some
smaller fixes."
Beyond that...
Andrea Merello continues his rtl8180 refactoring, in preparation for
a long-awaited rtl8187 driver. We get a new driver (rsi) for the
RS9113 chip, from Fariya Fatima. And, of course, we get the usual
round of updates for ath9k, brcmfmac, mwifiex, wil6210, etc. as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This request state is mostly useless, and properly implementing it
for RDMA would require an extra lock to be taken in handle_recv()
and in rdma_cancel() to avoid this race:
handle_recv() rdma_cancel()
. .
. if req->state == SENT
req->state = RCVD .
. req->state = FLSH
So just get rid of it.
Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
And move transport-specific code out of net/9p/client.c
Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
We need barriers to guarantee this pattern works as intended:
[w] req->rc, 1 [r] req->status, 1
wmb rmb
[w] req->status, 1 [r] req->rc
Where the wmb ensures that rc gets written before status,
and the rmb ensures that if you observe status == 1, rc is the new value.
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
ip_rt_dump do nothing after IPv4 route caches removal, so we can remove it.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When changing one 16bit value by another in IP header, we can adjust
the IP checksum by doing a simple operation described in RFC 1624, as
reminded by David.
csum_partial() is a complex function on x86_64, not really suited for
small number of checksummed bytes.
I spotted csum_partial() being in the top 20 most consuming functions
(more than 1 %) in a GRO workload, which was rather unexpected.
The caller was inet_gro_complete() doing a csum_replace2() when
building the new IP header for the GRO packet.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The LE scan type paramter defines if active scanning or passive scanning
is in use. Track the currently set value so it can be used for decision
making from other pieces in the core.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
While it is true that getnstimeofday() uses about 40 cycles if TSC
is available, it can use 1600 cycles if hpet is the clocksource.
Switch to get_jiffies_64(), as this is more than enough, and
go back to 60 seconds periods.
Fixes: 8c27bd75f0 ("tcp: syncookies: reduce cookie lifetime to 128 seconds")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The passkey_notify and user_confirm functions in mgmt.c were expecting
different endianess for the passkey, leading to a big endian bug and
sparse warning in recently added SMP code. This patch converts both
functions to expect host endianess and do the conversion to little
endian only when assigning to the mgmt event struct.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Add a bit in rx_status.vht_flags to let the low level driver
notify mac80211 about a beamformed packet. Propagate this
to the radiotap header.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
On 2.4Ghz band, the channels overlap since the delta
between different channels is 5Mhz while the width of the
receiver is 20Mhz (at least).
This means that we can hear beacons or probe responses from
adjacent channels. These frames will have a significant
lower RSSI which will feed all kinds of logic with inaccurate
data. An obvious example is the roaming algorithm that will
think our AP is getting weak and will try to move to another
AP.
In order to avoid this, update the signal only if the frame
has been heard on the same channel as the one advertised by
the AP in its DS / HT IEs.
We refrain from updating the values only if the AP is
already in the BSS list so that we will still have a valid
(but inaccurate) value if the AP was heard on an adjacent
channel only.
To achieve this, stop taking the channel from DS / HT IEs
in mac80211. The DS / HT IEs is taken into account to
discard the frame if it was received on a disabled channel.
This can happen due to the same phenomenon: the frame is
sent on channel 12, but heard on channel 11 while channel
12 can be disabled on certain devices. Since this check
is done in cfg80211, stop even checking this in mac80211.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
[remove unused rx_freq variable]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Drivers might want to have private data in addition
to all other ieee80211_tx_info.status fields.
The current ieee80211_tx_info.rate_driver_data overlaps
with some of the non-rate data (e.g. ampdu_ack_len), so
it might not be good enough.
Since we already know how much free bytes remained,
simply use this size to define (void *) array.
While on it, change ack_signal type from int to the more
explicit s32 type.
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Steffen Klassert says:
====================
One patch to rename a newly introduced struct. The rest is
the rework of the IPsec virtual tunnel interface for ipv6 to
support inter address family tunneling and namespace crossing.
1) Rename the newly introduced struct xfrm_filter to avoid a
conflict with iproute2. From Nicolas Dichtel.
2) Introduce xfrm_input_afinfo to access the address family
dependent tunnel callback functions properly.
3) Add and use a IPsec protocol multiplexer for ipv6.
4) Remove dst_entry caching. vti can lookup multiple different
dst entries, dependent of the configured xfrm states. Therefore
it does not make to cache a dst_entry.
5) Remove caching of flow informations. vti6 does not use the the
tunnel endpoint addresses to do route and xfrm lookups.
6) Update the vti6 to use its own receive hook.
7) Remove the now unused xfrm_tunnel_notifier. This was used from vti
and is replaced by the IPsec protocol multiplexer hooks.
8) Support inter address family tunneling for vti6.
9) Check if the tunnel endpoints of the xfrm state and the vti interface
are matching and return an error otherwise.
10) Enable namespace crossing for vti devices.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next,
most relevantly they are:
* cleanup to remove double semicolon from stephen hemminger.
* calm down sparse warning in xt_ipcomp, from Fan Du.
* nf_ct_labels support for nf_tables, from Florian Westphal.
* new macros to simplify rcu dereferences in the scope of nfnetlink
and nf_tables, from Patrick McHardy.
* Accept queue and drop (including reason for drop) to verdict
parsing in nf_tables, also from Patrick.
* Remove unused random seed initialization in nfnetlink_log, from
Florian Westphal.
* Allow to attach user-specific information to nf_tables rules, useful
to attach user comments to rule, from me.
* Return errors in ipset according to the manpage documentation, from
Jozsef Kadlecsik.
* Fix coccinelle warnings related to incorrect bool type usage for ipset,
from Fengguang Wu.
* Add hash:ip,mark set type to ipset, from Vytas Dauksa.
* Fix message for each spotted by ipset for each netns that is created,
from Ilia Mirkin.
* Add forceadd option to ipset, which evicts a random entry from the set
if it becomes full, from Josh Hunt.
* Minor IPVS cleanups and fixes from Andi Kleen and Tingwei Liu.
* Improve conntrack scalability by removing a central spinlock, original
work from Eric Dumazet. Jesper Dangaard Brouer took them over to address
remaining issues. Several patches to prepare this change come in first
place.
* Rework nft_hash to resolve bugs (leaking chain, missing rcu synchronization
on element removal, etc. from Patrick McHardy.
* Restore context in the rule deletion path, as we now release rule objects
synchronously, from Patrick McHardy. This gets back event notification for
anonymous sets.
* Fix NAT family validation in nft_nat, also from Patrick.
* Improve scalability of xt_connlimit by using an array of spinlocks and
by introducing a rb-tree of hashtables for faster lookup of accounted
objects per network. This patch was preceded by several patches and
refactorizations to accomodate this change including the use of kmem_cache,
from Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the NFC pull request for 3.15. With this one we have:
- Support for ISO 15693 a.k.a. NFC vicinity a.k.a. Type 5 tags. ISO
15693 are long range (1 - 2 meters) vicinity tags/cards. The kernel
now supports those through the NFC netlink and digital APIs.
- Support for TI's trf7970a chipset. This chipset relies on the NFC
digital layer and the driver currently supports type 2, 4A and 5 tags.
- Support for NXP's pn544 secure firmare download. The pn544 C3 chipsets
relies on a different firmware download protocal than the C2 one. We
now support both and use the right one depending on the version we
detect at runtime.
- Support for 4A tags from the NFC digital layer.
- A bunch of cleanups and minor fixes from Axel Lin and Thierry Escande.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTI1oQAAoJEIqAPN1PVmxKmxgP/iiwhQwsehiUZ6oXYjAc06WT
e+vmJwHuDgbEnr5QDSIdpeaKuxYtCErbFcG4SD5+thTQNEeiRsmTLYdkwZ0FCDF+
WKU3F59EdUZu4F0GyUXvt8lUC3D96bJSIHMCJdg1YDJtyaJcuITeK7lW325CK4hF
4ik0+HXcv9mt/Dq4k6xgqdgCOSz7o9dJSbzw5NTCN1I+G6CNgYrvYqSmvsKut2C7
DQvFu8KhbUkSlXgY61YIA2qpf1hfATkfgLdSsjGJfYdKvWLUAyKbEwoG8dkFA38Y
yqc49SJ6DJ4FGmSt4Qm2VtRa8HLqwjA30jBEeb7Zog80j7zEBS+th0NJxqDWp/82
eaeNdfrA5d17CHiR9ODuGOFt3VJZeN83u+JCli3FKGF727A/2Z0UI8UgnB5ZfcbZ
+5IxV6mQp/h8H4AaQzGQ88x2xibhnrW/XYmTpoxj7CcT3AdbGw/IerZXf+afXBce
iyJZ+Ktr6d5zFucCbm0QyZDSwAhfYDIYGiM6AIM3oum9n3K31dA0Qxy6saMiiK8c
ArYY6raQbRCaOBVCI8JI6PoQFvw5MNug40v7Q/G88dElte9mSLsGslVVhCNBpAVB
Atdwa+I3gwgsea1oqGJo0rSEaWv6PsOeQcFG8boDoQDUvgapzk17bTCz4k2DlXO5
U0Q/GcJnFEdcDhqVKqP5
=xikL
-----END PGP SIGNATURE-----
Merge tag 'nfc-next-3.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
Samuel Ortiz <sameo@linux.intel.com> says:
"NFC: 3.15: First pull request
This is the NFC pull request for 3.15. With this one we have:
- Support for ISO 15693 a.k.a. NFC vicinity a.k.a. Type 5 tags. ISO
15693 are long range (1 - 2 meters) vicinity tags/cards. The kernel
now supports those through the NFC netlink and digital APIs.
- Support for TI's trf7970a chipset. This chipset relies on the NFC
digital layer and the driver currently supports type 2, 4A and 5 tags.
- Support for NXP's pn544 secure firmare download. The pn544 C3 chipsets
relies on a different firmware download protocal than the C2 one. We
now support both and use the right one depending on the version we
detect at runtime.
- Support for 4A tags from the NFC digital layer.
- A bunch of cleanups and minor fixes from Axel Lin and Thierry Escande."
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Conflicts:
drivers/net/usb/r8152.c
drivers/net/xen-netback/netback.c
Both the r8152 and netback conflicts were simple overlapping
changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fragmentation and reassembly information for 6lowpan is independent from
the 802.15.4 stack and used only by the 6lowpan reassembly process. Move
the ieee802154_frag_info struct to a private are, it needn't be in the
802.15.4 skb control block.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change all internal uses of ieee802154_addr_sa to ieee802154_addr,
except for those instances that communicate directly with userspace.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the operations on 802.15.4 header structs introduced in a previous
patch to create and parse all headers in the mac802154 stack. This patch
reduces code duplication between different parts of the mac802154 stack
that needed information from headers, and also fixes a few bugs that
seem to have gone unnoticed until now:
* 802.15.4 dgram sockets would return a slightly incorrect value for
the SIOCINQ ioctl
* mac802154 would not drop frames with the "security enabled" bit set,
even though it does not support security, in violation of the
standard
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides a set of structures to represent 802.15.4 MAC
headers, and a set of operations to push/pull/peek these structs from
skbs. We cannot simply pointer-cast the skb MAC header pointer to these
structs, because 802.15.4 headers are wildly variable - depending on the
first three bytes, virtually all other fields of the header may be
present or not, and be present with different lengths.
The new header creation/parsing routines also support 802.15.4 security
headers, which are currently not supported by the mac802154
implementation of the protocol.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable sparse warnings about endianness, replace the remaining fields
regarding network operations without explicit endianness annotations
with such that are annotated, and propagate this through the entire
stack.
Uses of ieee802154_addr_sa are not changed yet, this patch is only
concerned with all other fields (such as address filters, operation
parameters and the likes).
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a replacement ieee802154_addr struct with proper endianness on
fields. Short address fields are stored as __le16 as on the network,
extended (EUI64) addresses are __le64 as opposed to the u8[8] format
used previously. This disconnect with the netdev address, which is
stored as big-endian u8[8], is intentional.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The struct as currently defined uses host byte order for some fields,
and most big endian/EUI display byte order for other fields. Inside the
stack, endianness should ideally match network byte order where possible
to minimize the number of byteswaps done in critical paths, but this
patch does not address this; it is only preparatory.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This was used from vti and is replaced by the IPsec protocol
multiplexer hooks. It is now unused, so remove it.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This patch adds an IPsec protocol multiplexer for ipv6. With
this it is possible to add alternative protocol handlers, as
needed for IPsec virtual tunnel interfaces.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
IPv6 can be build as a module, so we need mechanism to access
the address family dependent callback functions properly.
Therefore we introduce xfrm_input_afinfo, similar to that
what we have for the address family dependent part of
policies and states.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
We leak an active timer, the hotcpu notifier and all allocated
resources when we exit a namespace. Fix this by introducing a
flow_cache_fini() function where we release the resources before
we exit.
Fixes: ca925cf153 ("flowcache: Make flow cache name space aware")
Reported-by: Jakub Kicinski <moorray3@wp.pl>
Tested-by: Jakub Kicinski <moorray3@wp.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not legal to create multiple kmem_cache having the same name.
flowcache can use a single kmem_cache, no need for a per netns
one.
Fixes: ca925cf153 ("flowcache: Make flow cache name space aware")
Reported-by: Jakub Kicinski <moorray3@wp.pl>
Tested-by: Jakub Kicinski <moorray3@wp.pl>
Tested-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the latest draft specification from
the NFC-V committee, ISO/IEC 15693 tags will be
referred to as "Type 5" tags and not "Type V"
tags anymore. Make the code reflect the new
terminology.
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We do not need to switch the net_ns if the target net_ns the same
as the current one, so here we add a pre-check of net_ns to avoid
this as David suggested.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case the pairable option has been disabled, the pairing procedure
does not create keys for bonding. This means that these generated keys
should not be stored persistently.
For LTK and CSRK this is important to tell userspace to not store these
new keys. They will be available for the lifetime of the device, but
after the next power cycle they should not be used anymore.
Inform userspace to actually store the keys persistently only if both
sides request bonding.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The connection signature resolving key (CSRK) is used for attribute
protocol signed write procedures. This change generates a new local
key during pairing and requests the peer key as well.
Newly generated key and received key will be provided to userspace
using the New Signature Resolving Key management event.
The Master CSRK can be used for verification of remote signed write
PDUs and the Slave CSRK can be used for sending signed write PDUs
to the remote device.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
In order to fix set destruction notifications and get rid of unnecessary
members in private data structures, pass the context to expressions'
destructor functions again.
In order to do so, replace various members in the nft_rule_trans structure
by the full context.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nf_conntrack_lock is a monolithic lock and suffers from huge contention
on current generation servers (8 or more core/threads).
Perf locking congestion is clear on base kernel:
- 72.56% ksoftirqd/6 [kernel.kallsyms] [k] _raw_spin_lock_bh
- _raw_spin_lock_bh
+ 25.33% init_conntrack
+ 24.86% nf_ct_delete_from_lists
+ 24.62% __nf_conntrack_confirm
+ 24.38% destroy_conntrack
+ 0.70% tcp_packet
+ 2.21% ksoftirqd/6 [kernel.kallsyms] [k] fib_table_lookup
+ 1.15% ksoftirqd/6 [kernel.kallsyms] [k] __slab_free
+ 0.77% ksoftirqd/6 [kernel.kallsyms] [k] inet_getpeer
+ 0.70% ksoftirqd/6 [nf_conntrack] [k] nf_ct_delete
+ 0.55% ksoftirqd/6 [ip_tables] [k] ipt_do_table
This patch change conntrack locking and provides a huge performance
improvement. SYN-flood attack tested on a 24-core E5-2695v2(ES) with
10Gbit/s ixgbe (with tool trafgen):
Base kernel: 810.405 new conntrack/sec
After patch: 2.233.876 new conntrack/sec
Notice other floods attack (SYN+ACK or ACK) can easily be deflected using:
# iptables -A INPUT -m state --state INVALID -j DROP
# sysctl -w net/netfilter/nf_conntrack_tcp_loose=0
Use an array of hashed spinlocks to protect insertions/deletions of
conntracks into the hash table. 1024 spinlocks seem to give good
results, at minimal cost (4KB memory). Due to lockdep max depth,
1024 becomes 8 if CONFIG_LOCKDEP=y
The hash resize is a bit tricky, because we need to take all locks in
the array. A seqcount_t is used to synchronize the hash table users
with the resizing process.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Netfilter expectations are protected with the same lock as conntrack
entries (nf_conntrack_lock). This patch split out expectations locking
to use it's own lock (nf_conntrack_expect_lock).
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
One spinlock per cpu to protect dying/unconfirmed/template special lists.
(These lists are now per cpu, a bit like the untracked ct)
Add a @cpu field to nf_conn, to make sure we hold the appropriate
spinlock at removal time.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Changes while reading through the netfilter code.
Added hint about how conntrack nf_conn refcnt is accessed.
And renamed repl_hash to reply_hash for readability
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
iproute2 already defines a structure with that name, let's use another one to
avoid any conflict.
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This header is used by bluetooth and ieee802154 branch. This patch
move this header to the include/net directory to avoid a use of a
relative path in include.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The original documentation was very unclear.
The code fix is presumably related to the formerly unclear
documentation: SOCK_TIMESTAMPING_RX_SOFTWARE has no effect on
__sock_recv_timestamp's behavior, so calling __sock_recv_ts_and_drops
from sock_recv_ts_and_drops if only SOCK_TIMESTAMPING_RX_SOFTWARE is
set is pointless. This should have no user-observable effect.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Can be invoked from non-BH context.
Based upon a patch by Eric Dumazet.
Fixes: f19c29e3e3 ("tcp: snmp stats for Fast Open, SYN rtx, and data pkts")
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit e688a60480 ("net: introduce DST_NOPEER dst flag") introduced
DST_NOPEER because because of crashes in ipv6_select_ident called from
udp6_ufo_fragment.
Since commit 916e4cf46d ("ipv6: reuse ip6_frag_id from
ip6_ufo_append_data") we don't call ipv6_select_ident any more from
ip6_ufo_append_data, thus this flag lost its purpose and can be removed.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/wireless/ath/ath9k/recv.c
drivers/net/wireless/mwifiex/pcie.c
net/ipv6/sit.c
The SIT driver conflict consists of a bug fix being done by hand
in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
was created (netdev_alloc_pcpu_stats()) which takes care of this.
The two wireless conflicts were overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch drops the current way of 6lowpan fragmentation on receiving
side and replace it with a implementation which use the inet_frag api.
The old fragmentation handling has some race conditions and isn't
rfc4944 compatible. Also adding support to match fragments on
destination address, source address, tag value and datagram_size
which is missing in the current implementation.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds necessary ieee802154 6lowpan namespace to provide the
inet_frag information. This is a initial support for handling 6lowpan
fragmentation with the inet_frag api.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a 6lowpan fragmentation struct into cb of skb which
is necessary to hold fragmentation information.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The stop_scan_complete function was used as an intermediate step before
doing the actual connection creation. Since we're using hci_request
there's no reason to have this extra function around, i.e. we can simply
put both HCI commands into the same request.
The single task that the intermediate function had, i.e. indicating
discovery as stopped is now taken care of by a new
HCI_LE_SCAN_INTERRUPTED flag which allows us to do the discovery state
update when the stop scan command completes.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
LE connection attempts do not have a controller side timeout in the same
way as BR/EDR has (in form of the page timeout). Since we always do
scanning before initiating connections the attempts are always expected
to succeed in some reasonable time.
This patch adds a timer which forces a cancellation of the connection
attempt within 20 seconds if it has not been successful by then. This
way we e.g. ensure that mgmt_pair_device times out eventually and gives
an error response.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds defines for the initiator filter policy parameter values
of the HCI_LE_Create_Connection command. They will be used in a
subsequent patch to check whether we should have a timeout for the
connection attempt or not.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
For SMP we need the local and remote addresses (and their types) that
were used to establish the connection. These may be different from the
Identity Addresses or even the current RPA. To guarantee that we have
this information available and it is correct track these values
separately from the very beginning of the connection.
For outgoing connections we set the values as soon as we get a
successful command status for HCI_LE_Create_Connection (for which the
patch adds a command status handler function) and for incoming
connections as soon as we get a LE Connection Complete HCI event.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The random numbers in Bluetooth Low Energy are 64-bit numbers and should
also be little endian since the HCI specification is little endian.
Change the whole Low Energy pairing to use __le64 instead of a byte
array.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
If some of the cleanup commands caused by mgmt_set_powered(off) never
complete we should still force the adapter to be powered down. This is
rather easy to do since hdev->power_off is already a delayed work
struct. This patch schedules this delayed work if at least one HCI
command was sent by the cleanup procedure.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The current LE white list entries require storing in the HCI controller
structure. So provide a storage and access functions for it. In addition
export the current list via debugfs.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Add the definitions for clearing the LE white list, adding entries to
the LE white list and removing entries from the LE white list.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The hci_blacklist_clear function is not used outside of hci_core.c and
can be made static.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Avoid the following sparse __CHECK_ENDIAN__ warnings:
include/net/addrconf.h:318:25: warning: restricted __be64 degrades to integer
include/net/addrconf.h:318:70: warning: restricted __be64 degrades to integer
include/net/addrconf.h:330:25: warning: restricted __be64 degrades to integer
include/net/addrconf.h:330:70: warning: restricted __be64 degrades to integer
include/net/addrconf.h:347:25: warning: restricted __be64 degrades to integer
include/net/addrconf.h:348:26: warning: restricted __be64 degrades to integer
include/net/addrconf.h:349:18: warning: restricted __be64 degrades to integer
The warnings are false but they make it harder to spot real
bugs.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
This is the rework of the IPsec virtual tunnel interface
for ipv4 to support inter address family tunneling and
namespace crossing. The only change to the last RFC version
is a compile fix for an odd configuration where CONFIG_XFRM
is set but CONFIG_INET is not set.
1) Add and use a IPsec protocol multiplexer.
2) Add xfrm_tunnel_skb_cb to the skb common buffer
to store a receive callback there.
3) Make vti work with i_key set by not including the i_key
when comupting the hash for the tunnel lookup in case of
vti tunnels.
4) Update ip_vti to use it's own receive hook.
5) Remove xfrm_tunnel_notifier, this is replaced by the IPsec
protocol multiplexer.
6) We need to be protocol family indepenent, so use the on xfrm_lookup
returned dst_entry instead of the ipv4 rtable in vti_tunnel_xmit().
7) Add support for inter address family tunneling.
8) Check if the tunnel endpoints of the xfrm state and the vti interface
are matching and return an error otherwise.
8) Enable namespace crossing tor vti devices.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
1) Build fix for ip_vti when NET_IP_TUNNEL is not set.
We need this set to have ip_tunnel_get_stats64()
available.
2) Fix a NULL pointer dereference on sub policy usage.
We try to access a xfrm_state from the wrong array.
3) Take xfrm_state_lock in xfrm_migrate_state_find(),
we need it to traverse through the state lists.
4) Clone states properly on migration, otherwise we crash
when we migrate a state with aead algorithm attached.
5) Fix unlink race when between thread context and timer
when policies are deleted.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The number of places needing the local Identity Address are starting to
grow so it's better to have a single place for the logic of determining
it. This patch adds a convenience function for getting the Identity
Address and updates the two current places needing this to use it.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
To know the real source address for incoming connections (needed e.g.
for SMP) we should store the own_address_type parameter that was used
for the last HCI_LE_Write_Advertising_Parameters command. This patch
adds a proper command complete handler for the command and stores the
address type in a new adv_addr_type variable in the hci_dev struct.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This allows us to store user comment strings, but it could be also
used to store any kind of information that the user application needs
to link to the rule.
Scratch 8 bits for the new ulen field that indicates the length the
user data area. 4 bits from the handle (so it's 42 bits long, according
to Patrick, it would last 139 years with 1000 new rules per second)
and 4 bits from dlen (so the expression data area is 4K, which seems
sufficient by now even considering the compatibility layer).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Patrick McHardy <kaber@trash.net>
This patches creates the public hci_req_add_le_passive_scan helper so
it can be re-used outside hci_core.c in the next patch.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We should only accept connection parameters from identity addresses
(public or random static). Thus, we should check the address type
in hci_conn_params_add().
Additionally, since the IRK is removed during unpair, we should also
remove the connection parameters from that device.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch introduces the LE auto connection options: HCI_AUTO_CONN_
ALWAYS and HCI_AUTO_CONN_LINK_LOSS. Their working mechanism are
described as follows:
The HCI_AUTO_CONN_ALWAYS option configures the kernel to always re-
establish the connection, no matter the reason the connection was
terminated. This feature is required by some LE profiles such as
HID over GATT, Health Thermometer and Blood Pressure. These profiles
require the host autonomously connect to the device as soon as it
enters in connectable mode (start advertising) so the device is able
to delivery notifications or indications.
The BT_AUTO_CONN_LINK_LOSS option configures the kernel to re-
establish the connection in case the connection was terminated due
to a link loss. This feature is required by the majority of LE
profiles such as Proximity, Find Me, Cycling Speed and Cadence and
Time.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch introduces the LE auto connection infrastructure which
will be used to implement the LE auto connection options.
In summary, the auto connection mechanism works as follows: Once the
first pending LE connection is created, the background scanning is
started. When the target device is found in range, the kernel
autonomously starts the connection attempt. If connection is
established successfully, that pending LE connection is deleted and
the background is stopped.
To achieve that, this patch introduces the hci_update_background_scan()
which controls the background scanning state. This function starts or
stops the background scanning based on the hdev->pend_le_conns list. If
there is no pending LE connection, the background scanning is stopped.
Otherwise, we start the background scanning.
Then, every time a pending LE connection is added we call hci_update_
background_scan() so the background scanning is started (in case it is
not already running). Likewise, every time a pending LE connection is
deleted we call hci_update_background_scan() so the background scanning
is stopped (in case this was the last pending LE connection) or it is
started again (in case we have more pending LE connections). Finally,
we also call hci_update_background_scan() in hci_le_conn_failed() so
the background scan is restarted in case the connection establishment
fails. This way the background scanning keeps running until all pending
LE connection are established.
At this point, resolvable addresses are not support by this
infrastructure. The proper support is added in upcoming patches.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch introduces the hdev->pend_le_conn list which holds the
device addresses the kernel should autonomously connect. It also
introduces some helper functions to manipulate the list.
The list and helper functions will be used by the next patch which
implements the LE auto connection infrastructure.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
hci_connect() is a very simple and useless wrapper of hci_connect_acl
and hci_connect_le functions. Addtionally, all places where hci_connect
is called the link type value is passed explicitly. This way, we can
safely delete hci_connect, declare hci_connect_acl and hci_connect_le
in hci_core.h and call them directly.
No functionality is changed by this patch.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Some LE controllers don't support scanning and creating a connection
at the same time. So we should always stop scanning in order to
establish the connection.
Since we may prematurely stop the discovery procedure in favor of
the connection establishment, we should also cancel hdev->le_scan_
disable delayed work and set the discovery state to DISCOVERY_STOPPED.
This change does a small improvement since it is not mandatory the
user stops scanning before connecting anymore. Moreover, this change
is required by upcoming LE auto connection mechanism in order to work
properly with controllers that don't support background scanning and
connection establishment at the same time.
In future, we might want to do a small optimization by checking if
controller is able to scan and connect at the same time. For now,
we want the simplest approach so we always stop scanning (even if
the controller is able to carry out both operations).
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds the "hci_" prefix to le_conn_failed() helper and
declares it in hci_core.h so it can be reused in hci_event.c.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves stop LE scanning duplicate code to one single
place and reuses it. This will avoid more duplicate code in
upcoming patches.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Upcoming congestion controls for TCP require usec resolution for RTT
estimations. Millisecond resolution is simply not enough these days.
FQ/pacing in DC environments also require this change for finer control
and removal of bimodal behavior due to the current hack in
tcp_update_pacing_rate() for 'small rtt'
TCP_CONG_RTT_STAMP is no longer needed.
As Julian Anastasov pointed out, we need to keep user compatibility :
tcp_metrics used to export RTT and RTTVAR in msec resolution,
so we added RTT_US and RTTVAR_US. An iproute2 patch is needed
to use the new attributes if provided by the kernel.
In this example ss command displays a srtt of 32 usecs (10Gbit link)
lpk51:~# ./ss -i dst lpk52
Netid State Recv-Q Send-Q Local Address:Port Peer
Address:Port
tcp ESTAB 0 1 10.246.11.51:42959
10.246.11.52:64614
cubic wscale:6,6 rto:201 rtt:0.032/0.001 ato:40 mss:1448
cwnd:10 send
3620.0Mbps pacing_rate 7240.0Mbps unacked:1 rcv_rtt:993 rcv_space:29559
Updated iproute2 ip command displays :
lpk51:~# ./ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 274us rttvar 213us source
10.246.11.51
Old binary displays :
lpk51:~# ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 250us rttvar 125us source
10.246.11.51
With help from Julian Anastasov, Stephen Hemminger and Yuchung Cheng
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Larry Brakmo <brakmo@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
This option has the same semantic as IP_PMTUDISC_OMIT for IPv4 which
got recently introduced. It doesn't honor the path mtu discovered by the
host but in contrary to IPV6_PMTUDISC_INTERFACE allows the generation of
fragments if the packet size exceeds the MTU of the outgoing interface
MTU.
Fixes: 93b36cf342 ("ipv6: support IPV6_PMTU_INTERFACE on sockets")
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP_PMTUDISC_INTERFACE has a design error: because it does not allow the
generation of fragments if the interface mtu is exceeded, it is very
hard to make use of this option in already deployed name server software
for which I introduced this option.
This patch adds yet another new IP_MTU_DISCOVER option to not honor any
path mtu information and not accepting new icmp notifications destined for
the socket this option is enabled on. But we allow outgoing fragmentation
in case the packet size exceeds the outgoing interface mtu.
As such this new option can be used as a drop-in replacement for
IP_PMTUDISC_DONT, which is currently in use by most name server software
making the adoption of this option very smooth and easy.
The original advantage of IP_PMTUDISC_INTERFACE is still maintained:
ignoring incoming path MTU updates and not honoring discovered path MTUs
in the output path.
Fixes: 482fc6094a ("ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE")
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send Channel Availability Check time as a parameter
of start_radar_detection() callback.
Get CAC time from regulatory database.
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Introduce DFS CAC time as a regd param, configured per REG_RULE and
set per channel in cfg80211. DFS CAC time is close connected with
regulatory database configuration. Instead of using hardcoded values,
get DFS CAC time form regulatory database. Pass DFS CAC time to user
mode (mainly for iw reg get, iw list, iw info). Allow setting DFS CAC
time via CRDA. Add support for internal regulatory database.
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
[rewrap commit log]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This was used from vti and is replaced by the IPsec protocol
multiplexer hooks. It is now unused, so remove it.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
IPsec vti_rcv needs to remind the tunnel pointer to
check it later at the vti_rcv_cb callback. So add
this pointer to the IPsec common buffer, initialize
it and check it to avoid transport state matching of
a tunneled packet.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This patch add an IPsec protocol multiplexer. With this
it is possible to add alternative protocol handlers as
needed for IPsec virtual tunnel interfaces.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Steffen Klassert says:
====================
1) Introduce skb_to_sgvec_nomark function to add further data to the sg list
without calling sg_unmark_end first. Needed to add extended sequence
number informations. From Fan Du.
2) Add IPsec extended sequence numbers support to the Authentication Header
protocol for ipv4 and ipv6. From Fan Du.
3) Make the IPsec flowcache namespace aware, from Fan Du.
4) Avoid creating temporary SA for every packet when no key manager is
registered. From Horia Geanta.
5) Support filtering of SA dumps to show only the SAs that match a
given filter. From Nicolas Dichtel.
6) Remove caching of xfrm_policy_sk_bundles. The cached socket policy bundles
are never used, instead we create a new cache entry whenever xfrm_lookup()
is called on a socket policy. Most protocols cache the used routes to the
socket, so this caching is not needed.
7) Fix a forgotten SADB_X_EXT_FILTER length check in pfkey, from Nicolas
Dichtel.
8) Cleanup error handling of xfrm_state_clone.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Once mgmt_set_powered(off) starts doing disconnections we'll need to
care about any disconnections in mgmt.c and not just those with the
MGMT_CONNECTED flag set. Therefore, move the check into mgmt.c from
hci_event.c.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We'll soon need to make decisions on toggling the HCI_ADVERTISING flag
based on pending mgmt_set_powered commands. Therefore, move the handling
from hci_event.c into mgmt.c.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a convenience function to return the number of
connections in the conn_hash list. This will be useful once we update
the power off procedure to disconnect any open connections.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The RPA needs to be stored to know which is the current one. Otherwise
it is impossible to ensure that always the correct RPA can be programmed
into the controller when it is needed.
Current code checks if the address in the controller is a RPA, but that
can potentially lead to using a RPA that can not be resolved with the
IRK that has been distributed.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When running active scanning during LE discovery, do not reveal the own
identity to the peer devices. In case LE privacy has been enabled, then
a resolvable private address is used. If the LE privacy option is off,
then use an unresolvable private address.
The public address or static random address is never used in active
scanning anymore. This ensures that scan request are send using a
random address.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Now that the identity address type is always looked up for all
successful connections, the hdev->own_addr_type variable has become
completely unnecessary. Simply remove it.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a convenience function for updating the local random
address which is needed before advertising, scanning and initiating LE
connections.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a timer for updating the local RPA periodically. The
default timeout is set to 15 minutes.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds basic mgmt defines for enabling privacy. This includes a
new setting flag as well as the Set Privacy command.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This code adds a HCI_PRIVACY flag to track whether Privacy support is
enabled (meaning we have a local IRK) and makes sure the IRK is
distributed during SMP key distribution in case this flag is set.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch fixes two bugs in fastopen :
1) The tcp_sendmsg(..., @size) argument was ignored.
Code was relying on user not fooling the kernel with iovec mismatches
2) When MTU is about 64KB, tcp_send_syn_data() attempts order-5
allocations, which are likely to fail when memory gets fragmented.
Fixes: 783237e8da ("net-tcp: Fast Open client - sending SYN-data")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Tested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the ieee80211_iface_limit and the ieee80211_iface_combination
structures to docbook. Reformat the examples of combinations
slightly, so it looks a bit better on docbook.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
docbook (or one of its friends) gets confused with semi-colons in the
argument descriptions, causing it to think that the semi-colon is
marking a new section in the description of addr_mask in wiphy struct.
Prevent this by using hyphens instead of semi-colons in the mask
example.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For testing purposes it is useful to provide an option to change the
advertising channel map. So add a debugfs option to allow this.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Bug introduced by commit 7d442fab0a ("ipv4: Cache dst in tunnels").
Because sit code does not call ip_tunnel_init(), the dst_cache was not
initialized.
CC: Tom Herbert <therbert@google.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to solve races with sched_scan_stop, it is necessary
for the driver to be able to return an error to propagate that
to cfg80211 so it doesn't send an event.
Reviewed-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Due to userspace assumptions, the sched_scan_stop operation must
be synchronous, i.e. once it returns a new scheduled scan must be
able to start immediately. Document this in the API.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We loose a lot of information of the original state if we
clone it with xfrm_state_clone(). In particular, there is
no crypto algorithm attached if the original state uses
an aead algorithm. This patch add the missing information
to the clone state.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
While framing the TDLS Setup Confirmation frame, the driver needs to
know if the TDLS peer is VHT/HT/WMM capable and thus shall construct
the VHT/HT operation / WMM parameter elements accordingly. Supplicant
determines if the TDLS peer is VHT/HT/WMM capable based on the
presence of the respective IEs in the received TDLS Setup Response frame.
The host driver should not need to parse the received TDLS Response
frame and thus, should be able to rely on the supplicant to indicate
the capability of the peer through additional flags while transmitting
the TDLS Setup Confirmation frame through tdls_mgmt operations.
Signed-off-by: Sunil Dutt Undekari <usdutt@qti.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For Bluetooth controllers with LE support, track the value of the
currently configured random address. It is important to know what
the current random address is to avoid unneeded attempts to set
a new address. This will become important when introducing the
LE privacy support in the future.
In addition expose the current configured random address via
debugfs for debugging purposes.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The own_address_type debugfs option does not providing enough
flexibity for interacting with the upcoming LE privacy support.
What really is needed is an option to force using the static address
compared to the public address. The new force_static_address debugfs
option does exactly that. In addition it is also only available when
the controller does actually have a public address. For single mode
LE only controllers this option will not be available.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
In case we decide in udp6_sendmsg to send the packet down the ipv4
udp_sendmsg path because the destination is either of family AF_INET or
the destination is an ipv4 mapped ipv6 address, we don't honor the
maybe specified ipv4 mapped ipv6 address in IPV6_PKTINFO.
We simply can check for this option in ip_cmsg_send because no calls to
ipv6 module functions are needed to do so.
Reported-by: Gert Doering <gert@space.net>
Cc: Tore Anderson <tore@fud.no>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary helper function to send the New IRK mgmt
event and makes sure that the function is called at when SMP key
distribution has completed. The event is sent before the New LTK event
so user space knows which remote device to associate with the keys.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves the SMP Long Term Key notification over mgmt from the
hci_add_ltk function to smp.c when both sides have completed their key
distribution. This way we are also able to update the identity address
into the mgmt_new_ltk event.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
It's simpler (one less if-statement) to just evaluate the appropriate
value for store_hint in the mgmt_new_ltk function than to pass a boolean
parameter to the function. Furthermore, this simplifies moving the mgmt
event emission out from hci_add_ltk in subsequent patches.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The SMP code will need to postpone the mgmt event emission for the IRK
and LTKs. To avoid extra lookups at the end of the key distribution
simply return the added value from the add_ltk and add_irk functions.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch fix spelling typo in Documentation/DocBook.
It is because .html and .xml files are generated by make htmldocs,
I have to fix a typo within the source files.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This also adds NF_CT_LABELS_MAX_SIZE so it can be re-used
as BUILD_BUG_ON in nft_ct.
At this time, nft doesn't yet support writing to the label area;
when this changes the label->words handling needs to be moved
out of xt_connlabel.c into nf_conntrack_labels.c.
Also removes a useless run-time check: words cannot grow beyond
4 (32 bit) or 2 (64bit) since xt_connlabel enforces a maximum of
128 labels.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We currently cache socket policy bundles at xfrm_policy_sk_bundles.
These cached bundles are never used. Instead we create and cache
a new one whenever xfrm_lookup() is called on a socket policy.
Most protocols cache the used routes to the socket, so let's
remove the unused caching of socket policy bundles in xfrm.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Conflicts:
drivers/net/bonding/bond_3ad.h
drivers/net/bonding/bond_main.c
Two minor conflicts in bonding, both of which were overlapping
changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
The only place this is used outside rtnetlink.c is veth. So provide
wrapper function for this usage.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we initiate pairing through mgmt_pair_device the code has so far
been waiting for a successful HCI Encrypt Change event in order to
respond to the mgmt command. However, putting privacy into the play we
actually want the key distribution to be complete before replying so
that we can include the Identity Address in the mgmt response.
This patch updates the various hci_conn callbacks for LE in mgmt.c to
only respond in the case of failure, and adds a new mgmt_smp_complete
function that the SMP code will call once key distribution has been
completed.
Since the smp_chan_destroy function that's used to indicate completion
and clean up the SMP context can be called from various places,
including outside of smp.c, the easiest way to track failure vs success
is a new flag that we set once key distribution has been successfully
completed.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When we receive a remote identity address during SMP key distribution we
should ensure that any associated L2CAP channel instances get their
address information correspondingly updated (so that e.g. doing
getpeername on associated sockets returns the correct address).
This patch adds a new L2CAP core function l2cap_conn_update_id_addr()
which is used to iterate through all L2CAP channels associated with a
connection and update their address information.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There are many situations where we need to check if an LE address is an
RPA and if so try to look up the IRK for it. To simplify such cases this
patch adds a convenience function for the job.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When mgmt_unpair_device is called we should also remove any associated
IRKs. This patch adds a hci_remove_irk convenience function and ensures
that it's called when mgmt_unpair_device is called.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There are many functions that never fail but still declare an integer
return value for no reason. This patch converts these functions to use a
void return value to avoid any confusion of whether they can fail or not.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When removing Long Term Keys we should also be checking that the given
address type (public vs random) matches. This patch updates the
hci_remove_ltk function to take an extra parameter and uses it for
address type matching.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch implements the Load IRKs command for the management
interface. The command is used to load the kernel with the initial set
of IRKs. It also sets a HCI_RPA_RESOLVING flag to indicate that we can
start requesting devices to distribute their IRK to us.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When implementing support for Resolvable Private Addresses (RPAs) we'll
need to in several places be able to identify such addresses. This patch
adds a simple convenience function to do the identification of the
address type.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds the initial IRK storage and management functions to the
HCI core. This includes storing a list of IRKs per HCI device and the
ability to add, remove and lookup entries in that list.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Previously the crypto context has only been available for LE SMP
sessions, but now that we'll need to perform operations also during
discovery it makes sense to have this context part of the hci_dev
struct. Later, the context can be removed from the SMP context.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Introduce new netlink attributes for SET_PHY_ATTRS:
* CSMA minimal backoff exponent
* CSMA maximal backoff exponent
* CSMA retry limit
* frame retransmission limit
The CSMA attributes shall correspond to minBE, maxBE and maxCSMABackoffs of
802.15.4, respectively. The frame retransmission shall correspond to
maxFrameRetries of 802.15.4, unless given as -1: then the old behaviour
of the stack shall apply. For RF2xy, the old behaviour is to not do
channel sensing at all and simply send *right now*, which is not
intended behaviour for most applications and actually prohibited for
some channel/page combinations.
For all values except frame retransmission limit, the defaults of
802.15.4 apply. Frame retransmission limits are set to -1 to indicate
backward-compatible behaviour.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since three of the four clear channel assesment modes make use of energy
detection, provide an API to set the energy detection threshold.
Driver support for this is available in at86rf230 for the RF212 chips.
Since for these chips the minimal energy detection threshold depends on
page and channel used, add a field to struct at86rf230_local that stores
the minimal threshold. Actual ED thresholds are configured as offsets
from this value.
For RF212, setting the ED threshold will not work before a channel/page
has been set due to the dependency of energy detection in the chip and
the actual channel/page selected.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The standard describes four modes of clear channel assesment: "energy
above threshold", "carrier found", and the logical and/or of these two.
Support for CCA mode setting is included in the at86rf230 driver,
predicated for RF212 chips.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Listen-before-talk is an alternative to CSMA in uncoordinated networks
and prescribed by european regulations if one wants to have a device
with radio duty cycles above 10% (or less in some bands). Add a phy
property to enable/disable LBT in the phy, including support in the
at86rf230 driver for RF212 chips.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the current u8 transmit_power in wpan_phy with s8 transmit_power.
The u8 field contained the actual tx power and a tolerance field,
which no physical radio every used. Adjust sysfs entries to keep
compatibility with userspace, give tolerances of +-1dB statically there.
This patch only adds support for this in the at86rf230 driver and the
RF212 chip. Configuration calculation for RF212 is also somewhat basic,
but does the job - the RF212 datasheet gives a large table with
suggested values for combinations of TX power and page/channel, if this
does not work well, we might have to copy the whole table.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The goal of this patch is to allow userland to dump only a part of SA by
specifying a filter during the dump.
The kernel is in charge to filter SA, this avoids to generate useless netlink
traffic (it save also some cpu cycles). This is particularly useful when there
is a big number of SA set on the system.
Note that I removed the union in struct xfrm_state_walk to fix a problem on arm.
struct netlink_callback->args is defined as a array of 6 long and the first long
is used in xfrm code to flag the cb as initialized. Hence, we must have:
sizeof(struct xfrm_state_walk) <= sizeof(long) * 5.
With the union, it was false on arm (sizeof(struct xfrm_state_walk) was
sizeof(long) * 7), due to the padding.
In fact, whatever the arch is, this union seems useless, there will be always
padding after it. Removing it will not increase the size of this struct (and
reduce it on arm).
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Implementation of (a)rwnd calculation might lead to severe performance issues
and associations completely stalling. These problems are described and solution
is proposed which improves lksctp's robustness in congestion state.
1) Sudden drop of a_rwnd and incomplete window recovery afterwards
Data accounted in sctp_assoc_rwnd_decrease takes only payload size (sctp data),
but size of sk_buff, which is blamed against receiver buffer, is not accounted
in rwnd. Theoretically, this should not be the problem as actual size of buffer
is double the amount requested on the socket (SO_RECVBUF). Problem here is
that this will have bad scaling for data which is less then sizeof sk_buff.
E.g. in 4G (LTE) networks, link interfacing radio side will have a large portion
of traffic of this size (less then 100B).
An example of sudden drop and incomplete window recovery is given below. Node B
exhibits problematic behavior. Node A initiates association and B is configured
to advertise rwnd of 10000. A sends messages of size 43B (size of typical sctp
message in 4G (LTE) network). On B data is left in buffer by not reading socket
in userspace.
Lets examine when we will hit pressure state and declare rwnd to be 0 for
scenario with above stated parameters (rwnd == 10000, chunk size == 43, each
chunk is sent in separate sctp packet)
Logic is implemented in sctp_assoc_rwnd_decrease:
socket_buffer (see below) is maximum size which can be held in socket buffer
(sk_rcvbuf). current_alloced is amount of data currently allocated (rx_count)
A simple expression is given for which it will be examined after how many
packets for above stated parameters we enter pressure state:
We start by condition which has to be met in order to enter pressure state:
socket_buffer < currently_alloced;
currently_alloced is represented as size of sctp packets received so far and not
yet delivered to userspace. x is the number of chunks/packets (since there is no
bundling, and each chunk is delivered in separate packet, we can observe each
chunk also as sctp packet, and what is important here, having its own sk_buff):
socket_buffer < x*each_sctp_packet;
each_sctp_packet is sctp chunk size + sizeof(struct sk_buff). socket_buffer is
twice the amount of initially requested size of socket buffer, which is in case
of sctp, twice the a_rwnd requested:
2*rwnd < x*(payload+sizeof(struc sk_buff));
sizeof(struct sk_buff) is 190 (3.13.0-rc4+). Above is stated that rwnd is 10000
and each payload size is 43
20000 < x(43+190);
x > 20000/233;
x ~> 84;
After ~84 messages, pressure state is entered and 0 rwnd is advertised while
received 84*43B ~= 3612B sctp data. This is why external observer notices sudden
drop from 6474 to 0, as it will be now shown in example:
IP A.34340 > B.12345: sctp (1) [INIT] [init tag: 1875509148] [rwnd: 81920] [OS: 10] [MIS: 65535] [init TSN: 1096057017]
IP B.12345 > A.34340: sctp (1) [INIT ACK] [init tag: 3198966556] [rwnd: 10000] [OS: 10] [MIS: 10] [init TSN: 902132839]
IP A.34340 > B.12345: sctp (1) [COOKIE ECHO]
IP B.12345 > A.34340: sctp (1) [COOKIE ACK]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057017] [SID: 0] [SSEQ 0] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057017] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057018] [SID: 0] [SSEQ 1] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057018] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057019] [SID: 0] [SSEQ 2] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057019] [a_rwnd 9914] [#gap acks 0] [#dup tsns 0]
<...>
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057098] [SID: 0] [SSEQ 81] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057098] [a_rwnd 6517] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057099] [SID: 0] [SSEQ 82] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057099] [a_rwnd 6474] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057100] [SID: 0] [SSEQ 83] [PPID 0x18]
--> Sudden drop
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]
At this point, rwnd_press stores current rwnd value so it can be later restored
in sctp_assoc_rwnd_increase. This however doesn't happen as condition to start
slowly increasing rwnd until rwnd_press is returned to rwnd is never met. This
condition is not met since rwnd, after it hit 0, must first reach rwnd_press by
adding amount which is read from userspace. Let us observe values in above
example. Initial a_rwnd is 10000, pressure was hit when rwnd was ~6500 and the
amount of actual sctp data currently waiting to be delivered to userspace
is ~3500. When userspace starts to read, sctp_assoc_rwnd_increase will be blamed
only for sctp data, which is ~3500. Condition is never met, and when userspace
reads all data, rwnd stays on 3569.
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 1505] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 3010] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057101] [SID: 0] [SSEQ 84] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057101] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0]
--> At this point userspace read everything, rwnd recovered only to 3569
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057102] [SID: 0] [SSEQ 85] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057102] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0]
Reproduction is straight forward, it is enough for sender to send packets of
size less then sizeof(struct sk_buff) and receiver keeping them in its buffers.
2) Minute size window for associations sharing the same socket buffer
In case multiple associations share the same socket, and same socket buffer
(sctp.rcvbuf_policy == 0), different scenarios exist in which congestion on one
of the associations can permanently drop rwnd of other association(s).
Situation will be typically observed as one association suddenly having rwnd
dropped to size of last packet received and never recovering beyond that point.
Different scenarios will lead to it, but all have in common that one of the
associations (let it be association from 1)) nearly depleted socket buffer, and
the other association blames socket buffer just for the amount enough to start
the pressure. This association will enter pressure state, set rwnd_press and
announce 0 rwnd.
When data is read by userspace, similar situation as in 1) will occur, rwnd will
increase just for the size read by userspace but rwnd_press will be high enough
so that association doesn't have enough credit to reach rwnd_press and restore
to previous state. This case is special case of 1), being worse as there is, in
the worst case, only one packet in buffer for which size rwnd will be increased.
Consequence is association which has very low maximum rwnd ('minute size', in
our case down to 43B - size of packet which caused pressure) and as such
unusable.
Scenario happened in the field and labs frequently after congestion state (link
breaks, different probabilities of packet drop, packet reordering) and with
scenario 1) preceding. Here is given a deterministic scenario for reproduction:
>From node A establish two associations on the same socket, with rcvbuf_policy
being set to share one common buffer (sctp.rcvbuf_policy == 0). On association 1
repeat scenario from 1), that is, bring it down to 0 and restore up. Observe
scenario 1). Use small payload size (here we use 43). Once rwnd is 'recovered',
bring it down close to 0, as in just one more packet would close it. This has as
a consequence that association number 2 is able to receive (at least) one more
packet which will bring it in pressure state. E.g. if association 2 had rwnd of
10000, packet received was 43, and we enter at this point into pressure,
rwnd_press will have 9957. Once payload is delivered to userspace, rwnd will
increase for 43, but conditions to restore rwnd to original state, just as in
1), will never be satisfied.
--> Association 1, between A.y and B.12345
IP A.55915 > B.12345: sctp (1) [INIT] [init tag: 836880897] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 4032536569]
IP B.12345 > A.55915: sctp (1) [INIT ACK] [init tag: 2873310749] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3799315613]
IP A.55915 > B.12345: sctp (1) [COOKIE ECHO]
IP B.12345 > A.55915: sctp (1) [COOKIE ACK]
--> Association 2, between A.z and B.12346
IP A.55915 > B.12346: sctp (1) [INIT] [init tag: 534798321] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 2099285173]
IP B.12346 > A.55915: sctp (1) [INIT ACK] [init tag: 516668823] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3676403240]
IP A.55915 > B.12346: sctp (1) [COOKIE ECHO]
IP B.12346 > A.55915: sctp (1) [COOKIE ACK]
--> Deplete socket buffer by sending messages of size 43B over association 1
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315613] [SID: 0] [SSEQ 0] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315613] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
<...>
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315696] [a_rwnd 6388] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315697] [SID: 0] [SSEQ 84] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315697] [a_rwnd 6345] [#gap acks 0] [#dup tsns 0]
--> Sudden drop on 1
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315698] [SID: 0] [SSEQ 85] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315698] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]
--> Here userspace read, rwnd 'recovered' to 3698, now deplete again using
association 1 so there is place in buffer for only one more packet
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315799] [SID: 0] [SSEQ 186] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315799] [a_rwnd 86] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315800] [SID: 0] [SSEQ 187] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 43] [#gap acks 0] [#dup tsns 0]
--> Socket buffer is almost depleted, but there is space for one more packet,
send them over association 2, size 43B
IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403240] [SID: 0] [SSEQ 0] [PPID 0x18]
IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403240] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]
--> Immediate drop
IP A.60995 > B.12346: sctp (1) [SACK] [cum ack 387491510] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]
--> Read everything from the socket, both association recover up to maximum rwnd
they are capable of reaching, note that association 1 recovered up to 3698,
and association 2 recovered only to 43
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 1548] [#gap acks 0] [#dup tsns 0]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 3053] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315801] [SID: 0] [SSEQ 188] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315801] [a_rwnd 3698] [#gap acks 0] [#dup tsns 0]
IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403241] [SID: 0] [SSEQ 1] [PPID 0x18]
IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403241] [a_rwnd 43] [#gap acks 0] [#dup tsns 0]
A careful reader might wonder why it is necessary to reproduce 1) prior
reproduction of 2). It is simply easier to observe when to send packet over
association 2 which will push association into the pressure state.
Proposed solution:
Both problems share the same root cause, and that is improper scaling of socket
buffer with rwnd. Solution in which sizeof(sk_buff) is taken into concern while
calculating rwnd is not possible due to fact that there is no linear
relationship between amount of data blamed in increase/decrease with IP packet
in which payload arrived. Even in case such solution would be followed,
complexity of the code would increase. Due to nature of current rwnd handling,
slow increase (in sctp_assoc_rwnd_increase) of rwnd after pressure state is
entered is rationale, but it gives false representation to the sender of current
buffer space. Furthermore, it implements additional congestion control mechanism
which is defined on implementation, and not on standard basis.
Proposed solution simplifies whole algorithm having on mind definition from rfc:
o Receiver Window (rwnd): This gives the sender an indication of the space
available in the receiver's inbound buffer.
Core of the proposed solution is given with these lines:
sctp_assoc_rwnd_update:
if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0)
asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1;
else
asoc->rwnd = 0;
We advertise to sender (half of) actual space we have. Half is in the braces
depending whether you would like to observe size of socket buffer as SO_RECVBUF
or twice the amount, i.e. size is the one visible from userspace, that is,
from kernelspace.
In this way sender is given with good approximation of our buffer space,
regardless of the buffer policy - we always advertise what we have. Proposed
solution fixes described problems and removes necessity for rwnd restoration
algorithm. Finally, as proposed solution is simplification, some lines of code,
along with some bytes in struct sctp_association are saved.
Version 2 of the patch addressed comments from Vlad. Name of the function is set
to be more descriptive, and two parts of code are changed, in one removing the
superfluous call to sctp_assoc_rwnd_update since call would not result in update
of rwnd, and the other being reordering of the code in a way that call to
sctp_assoc_rwnd_update updates rwnd. Version 3 corrected change introduced in v2
in a way that existing function is not reordered/copied in line, but it is
correctly called. Thanks Vlad for suggesting.
Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for ATS request and response handling for type 4A tag
activation.
Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Add the header definitions required by upcoming
patches that add support for ISO/IEC 15693.
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The tty driver api design prefers no-fail writes if the driver
write_room() method has previously indicated space is available
to accept writes. Since this is trivially possible for the
RFCOMM tty driver, do so.
Introduce rfcomm_dlc_send_noerror(), which queues but does not
schedule the krfcomm thread if the dlc is not yet connected
(and thus does not error based on the connection state).
The mtu size test is also unnecessary since the caller already
chunks the written data into mtu size.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Tested-By: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Only one session/channel combination may be in use at any one
time. However, the failure does not occur until the tty is
opened (in rfcomm_dlc_open()).
Because these settings are actually bound at rfcomm device
creation (via RFCOMMCREATEDEV ioctl), validate and fail before
creating the rfcomm tty device.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Tested-By: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When RFCOMM_RELEASE_ONHUP is set, the rfcomm tty driver 'takes over'
the initial rfcomm_dev reference created by the RFCOMMCREATEDEV ioctl.
The assumption is that the rfcomm tty driver will release the
rfcomm_dev reference when the tty is freed (in rfcomm_tty_cleanup()).
However, if the tty is never opened, the 'take over' never occurs,
so when RFCOMMRELEASEDEV ioctl is called, the reference is not
released.
Track the state of the reference 'take over' so that the release
is guaranteed by either the RFCOMMRELEASEDEV ioctl or the rfcomm tty
driver.
Note that the synchronous hangup in rfcomm_release_dev() ensures
that rfcomm_tty_install() cannot race with the RFCOMMRELEASEDEV ioctl.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Tested-By: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
No logic prevents an rfcomm_dev from being released multiple
times. For example, if the rfcomm_dev ref count is large due
to pending tx, then multiple RFCOMMRELEASEDEV ioctls may
mistakenly release the rfcomm_dev too many times. Note that
concurrent ioctls are not required to create this condition.
Introduce RFCOMM_DEV_RELEASED status bit which guarantees the
rfcomm_dev can only be released once.
NB: Since the flags are exported to userspace, introduce the status
field to track state for which userspace should not be aware.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Tested-By: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Now that the LE L2CAP Connection Oriented Channel support has undergone a
decent amount of testing we can make it officially supported. This patch
removes the enable_lecoc module parameter which was previously needed to
enable support for LE L2CAP CoC.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Commit 684bad1107 "tcp: use PRR to reduce cwin in CWR state" removed all
calls to min_cwnd, so we can safely remove it.
Also, remove tcp_reno_min_cwnd because it was only used for min_cwnd.
Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no way the driver can pre-build the radiotap header,
so remove the comment stating that it can.
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch adds to hdev the connection parameters list (hdev->le_
conn_params). The elements from this list (struct hci_conn_params)
contains the connection parameters (for now, minimum and maximum
connection interval) that should be used during the connection
establishment.
Moreover, this patch adds helper functions to manipulate hdev->le_
conn_params list. Some of these functions are also declared in
hci_core.h since they will be used outside hci_core.c in upcoming
patches.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The LTK key types available right now are unauthenticated and
authenticated ones. Provide two simple constants for it.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The struct smp_ltk does not need to be packed and so remove __packed.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The field is not a boolean, it is actually a field for a key type. So
name it properly.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When encryption for LE links has been enabled, it will always be use
AES-CCM encryption. In case of BR/EDR Secure Connections, the link
will also use AES-CCM encryption. In both cases track the AES-CCM
status in the connection flags.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Originally allowing the use of debug keys was done via the Load Link
Keys management command. However this is BR/EDR specific and to be
flexible and allow extending this to LE as well, make this an independent
command.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When the controller has been enabled to allow usage of debug keys, then
clearly identify that in the current settings information.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
This patch groups the list_head fields from struct hci_dev together
and removes empty lines between them.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch creates two new fields in struct hci_conn to save the
minimum and maximum connection interval values used to establish
the connection this object represents.
This change is required in order to know what parameters the
connection is currently using.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If LTK distribution happens in both directions we will have two LTKs for
the same remote device: one which is used when we're connecting as
master and another when we're connecting as slave. When looking up LTKs
from the locally stored list we shouldn't blindly return the first match
but also consider which type of key is in question. If we do not do this
we may end up selecting an incorrect encryption key for a connection.
This patch fixes the issue by always specifying to the LTK lookup
functions whether we're looking for a master or a slave key.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There's no reason why A2MP should need or deserve its on channel type.
Instead we should be able to group all fixed CID users under a single
channel type and reuse as much code as possible for them. Where CID
specific exceptions are needed the chan-scid value can be used.
This patch renames the current A2MP channel type to a generic one and
thereby paves the way to allow converting ATT and SMP (and any future
fixed channel protocols) to use the new channel type.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a queue for incoming L2CAP data that's received before
l2cap_connect_cfm is called and processes the data once
l2cap_connect_cfm is called. This way we ensure that we have e.g. all
remote features before processing L2CAP signaling data (which is very
important for making the correct security decisions).
The processing of the pending rx data needs to be done through
queue_work since unlike l2cap_recv_acldata, l2cap_connect_cfm is called
with the hci_dev lock held which could cause potential deadlocks.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
For debugging purposes of Secure Connection Only support a simple
debugfs entry is used to indicate if this mode is active or not.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
With the introduction of security level 4, the RFCOMM sockets need to
be made aware of this new level. This change ensures that the pairing
requirements are set correctly for these connections.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
With the introduction of security level 4, the L2CAP sockets need to
be made aware of this new level. This change ensures that the pairing
requirements are set correctly for these connections.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The security level 4 is a new strong security requirement that is based
around 128-bit equivalent strength for link and encryption keys required
using FIPS approved algorithms. Which means that E0, SAFER+ and P-192
are not allowed. Only connections created with P-256 resulting from
using Secure Connections support are allowed.
This security level needs to be enforced when Secure Connection Only
mode is enabled for a controller or a service requires FIPS compliant
strong security. Currently it is not possible to enable either of
these two cases. This patch just puts in the foundation for being
able to handle security level 4 in the future.
It should be noted that devices or services with security level 4
requirement can only communicate using Bluetooth 4.1 controllers
with support for Secure Connections. There is no backward compatibilty
if used with older hardware.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
It is important to know if Secure Connections support has been enabled
for a given remote device. The information is provided in the remote
host features page. So track this information and provide a simple
helper function to extract the status.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The current management interface only allows to provide the remote
OOB input of P-192 data. This extends the command to also accept
P-256 data as well. To make this backwards compatible, the userspace
can decide to only provide P-192 data or the combined P-192 and P-256
data. It is also allowed to leave the P-192 data empty if userspace
only has the remote P-256 data.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Add function to allow adding P-192 and P-256 data to the internal
storage. This also fixes a few coding style issues from the previous
helper functions for the out-of-band credentials storage.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When Secure Connections has been enabled it is possible to provide P-192
and/or P-256 data during the pairing process. The internal out-of-band
credentials storage has been extended to also hold P-256 data.
Initially the P-256 data will be empty and with Secure Connections enabled
no P-256 data will be provided. This is according to the specification
since it might be possible that the remote side did not provide either
of the out-of-band credentials.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The Bluetooth 4.1 specification with Secure Connections support has
just been released and controllers with this feature are still in
an early stage.
A handful of controllers have already support for it, but they do
not always identify this feature correctly. This debugfs entry
allows to tell the kernel that the controller can be treated as
it would fully support Secure Connections.
Using debugfs to force Secure Connections support of course does
not make this feature magically appear in all controllers. This
is a debug functionality for early adopters. Once the majority
of controllers matures this quirk will be removed.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
For Secure Connections support and the usage of out-of-band pairing,
it is needed to read the P-256 hash and randomizer or P-192 hash and
randomizer. This change will read P-192 data when Secure Connections
is disabled and P-192 and P-256 data when it is enabled.
The difference is between using HCI Read Local OOB Data and using the
new HCI Read Local OOB Extended Data command. The first one has been
introduced with Bluetooth 2.1 and returns only the P-192 data.
< HCI Command: Read Local OOB Data (0x03|0x0057) plen 0
> HCI Event: Command Complete (0x0e) plen 36
Read Local OOB Data (0x03|0x0057) ncmd 1
Status: Success (0x00)
Hash C from P-192: 975a59baa1c4eee391477cb410b23e6d
Randomizer R with P-192: 9ee63b7dec411d3b467c5ae446df7f7d
The second command has been introduced with Bluetooth 4.1 and will
return P-192 and P-256 data.
< HCI Command: Read Local OOB Extended Data (0x03|0x007d) plen 0
> HCI Event: Command Complete (0x0e) plen 68
Read Local OOB Extended Data (0x03|0x007d) ncmd 1
Status: Success (0x00)
Hash C from P-192: 6489731804b156fa6355efb8124a1389
Randomizer R with P-192: 4781d5352fb215b2958222b3937b6026
Hash C from P-256: 69ef8a928b9d07fc149e630e74ecb991
Randomizer R with P-256: 4781d5352fb215b2958222b3937b6026
The change for the management interface is transparent and no change
is required for existing userspace. The Secure Connections feature
needs to be manually enabled. When it is disabled, then userspace
only gets the P-192 returned and with Secure Connections enabled,
userspace gets P-192 and P-256 in an extended structure.
It is also acceptable to just ignore the P-256 data since it is not
required to support them. The pairing with out-of-band credentials
will still succeed. However then of course no Secure Connection will
b established.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The support for Secure Connections need to be explicitly enabled by
userspace. This is required since only userspace that can handle the
new link key types should enable support for Secure Connections.
This command handling is similar to how Secure Simple Pairing enabling
is done. It also tracks the case when Secure Connections support is
enabled via raw HCI commands. This makes sure that the host features
page is updated as well.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The MGMT_SETTING_SECURE_CONN setting is used to track the support and
status for Secure Connections from the management interface. For HCI
based tracking HCI_SC_ENABLED flag is used.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
With the introduction of Secure Connections, the list of link key types
got extended by P-256 versions of authenticated and unauthenticated
link keys.
To avoid any confusion the previous authenticated and unauthenticated
link key types got ammended with a P912 postfix. And the two new keys
have a P256 postfix now. Existing code using the previous definitions
has been adjusted.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The Secure Connections feature introduces the support for P-256 strength
pairings (compared to P-192 with Secure Simple Pairing). This however
means that for out-of-band pairing the hash and randomizer needs to be
differentiated. Two new commands are introduced to handle the possible
combinations of P-192 and P-256. This add the HCI command definition
for both.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The Secure Connections feature is optional and host stacks have to
manually enable it. This add the HCI command definiton for reading
and writing this setting.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The support for Secure Connections introduces two new controller
features and one new host feature.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
In the case when KMs have no listeners, km_query() will fail and
temporary SAs are garbage collected immediately after their allocation.
This causes strain on memory allocation, leading even to OOM since
temporary SA alloc/free cycle is performed for every packet
and garbage collection does not keep up the pace.
The sane thing to do is to make sure we have audience before
temporary SA allocation.
Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
When an action is bonnd to a filter, there is no point to
remove it outside. Currently we just silently decrease the refcnt,
we should reject this explicitly with EPERM.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For bindcnt and refcnt etc., they are common for all actions,
not need to repeat such operations for their own, they can be unified
now. Actions just need to do its specific cleanup if needed.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now we can totally hide it from modules. tcf_hash_*() API's
will operate on struct tc_action, modules don't need to care about
the details.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NAPI was originally added to mac80211 a long time ago (by John in
commit 4e6cbfd09c in July 2010), but then removed years later
(by Stanislaw in commit 30c97120c6 in February 2013). No driver
ever used it, so that was fine.
Now I'm adding support for NAPI to our driver, so add some code
to mac80211 again to support NAPI. John was originally wrapping
some (but not nearly all NAPI-related functions), but that doesn't
scale very well with the number of functions that are there, some
of which are even only inlines. Thus, instead of doing that, let
the drivers manage the NAPI struct, except for napi_add() which is
needed so mac80211 knows how to call napi_gro_receive().
Also remove some no longer needed definitions that were left when
NAPI support was removed.
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Eyal Shapira <eyal@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Inserting a entry into flowcache, or flushing flowcache should be based
on per net scope. The reason to do so is flushing operation from fat
netns crammed with flow entries will also making the slim netns with only
a few flow cache entries go away in original implementation.
Since flowcache is tightly coupled with IPsec, so it would be easier to
put flow cache global parameters into xfrm namespace part. And one last
thing needs to do is bumping flow cache genid, and flush flow cache should
also be made in per net style.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In case of beacon_loss with IEEE80211_HW_CONNECTION_MONITOR
device, mac80211 probes the ap (and disconnects on timeout)
but ignores the ack.
If we already got an ack, there's no reason to continue
disconnecting. this can help devices that supports
IEEE80211_HW_CONNECTION_MONITOR only partially (e.g. take
care of keep alives, but does not probe the ap.
In case the device wants to disconnect without probing,
it can just call ieee80211_connection_loss.
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Move prototype declaration of function to header file
include/net/net_namespace.h from net/ipx/af_ipx.c because they are used
by more than one file.
This eliminates the following warning in net/ipx/sysctl_net_ipx.c:
net/ipx/sysctl_net_ipx.c:33:6: warning: no previous prototype for ‘ipx_register_sysctl’ [-Wmissing-prototypes]
net/ipx/sysctl_net_ipx.c:38:6: warning: no previous prototype for ‘ipx_unregister_sysctl’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move prototype declarations of function to header file
include/net/datalink.h from net/ipx/af_ipx.c because they are used by
more than one file.
This eliminates the following warning in net/ipx/pe2.c:
net/ipx/pe2.c:20:24: warning: no previous prototype for ‘make_EII_client’ [-Wmissing-prototypes]
net/ipx/pe2.c:32:6: warning: no previous prototype for ‘destroy_EII_client’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move prototype declaration of functions to header file include/net/ipx.h
from net/ipx/af_ipx.c because they are used by more than one file.
This eliminates the following warning in
net/ipx/ipx_route.c:33:19: warning: no previous prototype for ‘ipxrtr_lookup’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:52:5: warning: no previous prototype for ‘ipxrtr_add_route’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:94:6: warning: no previous prototype for ‘ipxrtr_del_routes’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:149:5: warning: no previous prototype for ‘ipxrtr_route_skb’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:171:5: warning: no previous prototype for ‘ipxrtr_route_packet’ [-Wmissing-prototypes]
net/ipx/ipx_route.c:261:5: warning: no previous prototype for ‘ipxrtr_ioctl’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move prototype definition of function to header file include/net/ipx.h
from net/ipx/ipx_route.c because they are used by more than one file.
This eliminates the following warning from net/ipx/af_ipx.c:
net/ipx/af_ipx.c:193:23: warning: no previous prototype for ‘ipxitf_find_using_net’ [-Wmissing-prototypes]
net/ipx/af_ipx.c:577:5: warning: no previous prototype for ‘ipxitf_send’ [-Wmissing-prototypes]
net/ipx/af_ipx.c:1219:8: warning: no previous prototype for ‘ipx_cksum’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move prototype declaration of functions to header file include/net/dn.h
from net/decnet/af_decnet.c because they are used by more than one file.
This eliminates the following warning in net/decnet/af_decnet.c:
net/decnet/sysctl_net_decnet.c:354:6: warning: no previous prototype for ‘dn_register_sysctl’ [-Wmissing-prototypes]
net/decnet/sysctl_net_decnet.c:359:6: warning: no previous prototype for ‘dn_unregister_sysctl’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move prototype declaration of functions to header file include/net/dn_route.h
from net/decnet/af_decnet.c because it is used by more than one file.
This eliminates the following warning in net/decnet/dn_route.c:
net/decnet/dn_route.c:629:5: warning: no previous prototype for ‘dn_route_rcv’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/nftables/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes, mostly nftables
fixes, most relevantly they are:
* Fix a crash in the h323 conntrack NAT helper due to expectation list
corruption, from Alexey Dobriyan.
* A couple of RCU race fixes for conntrack, one manifests by hitting BUG_ON
in nf_nat_setup_info() and the destroy path, patches from Andrey Vagin and
me.
* Dump direction attribute in nft_ct only if it is set, from Arturo
Borrero.
* Fix IPVS bug in its own connection tracking system that may lead to
copying only 4 bytes of the IPv6 address when initializing the
ip_vs_conn object, from Michal Kubecek.
* Fix -EBUSY errors in nftables when deleting the rules, chain and tables
in a row due mixture of asynchronous and synchronous object releasing,
from me.
* Three fixes for the nf_tables set infrastructure when using intervals and
mappings, from me.
* Four patches to fixing the nf_tables log, reject and ct expressions from
the new inet table, from Patrick McHardy.
* Fix memory overrun in the map that is used to dynamically allocate names
from anonymous sets, also from Patrick.
* Fix a potential oops if you dump a set with NFPROTO_UNSPEC and a table
name, from Patrick McHardy.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
cgroup_subsys is a bit messier than it needs to be.
* The name of a subsys can be different from its internal identifier
defined in cgroup_subsys.h. Most subsystems use the matching name
but three - cpu, memory and perf_event - use different ones.
* cgroup_subsys_id enums are postfixed with _subsys_id and each
cgroup_subsys is postfixed with _subsys. cgroup.h is widely
included throughout various subsystems, it doesn't and shouldn't
have claim on such generic names which don't have any qualifier
indicating that they belong to cgroup.
* cgroup_subsys->subsys_id should always equal the matching
cgroup_subsys_id enum; however, we require each controller to
initialize it and then BUG if they don't match, which is a bit
silly.
This patch cleans up cgroup_subsys names and initialization by doing
the followings.
* cgroup_subsys_id enums are now postfixed with _cgrp_id, and each
cgroup_subsys with _cgrp_subsys.
* With the above, renaming subsys identifiers to match the userland
visible names doesn't cause any naming conflicts. All non-matching
identifiers are renamed to match the official names.
cpu_cgroup -> cpu
mem_cgroup -> memory
perf -> perf_event
* controllers no longer need to initialize ->subsys_id and ->name.
They're generated in cgroup core and set automatically during boot.
* Redundant cgroup_subsys declarations removed.
* While updating BUG_ON()s in cgroup_init_early(), convert them to
WARN()s. BUGging that early during boot is stupid - the kernel
can't print anything, even through serial console and the trap
handler doesn't even link stack frame properly for back-tracing.
This patch doesn't introduce any behavior changes.
v2: Rebased on top of fe1217c4f3 ("net: net_cls: move cgroupfs
classid handling into core").
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Ingo Molnar <mingo@redhat.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
net_prio is the only cgroup which is allowed to be built as a module.
The savings from allowing one controller to be built as a module are
tiny especially given that cgroup module support itself adds quite a
bit of complexity.
Given that none of other controllers has much chance of being made a
module and that we're unlikely to add new modular controllers, the
added complexity is simply not justifiable.
As a first step to drop cgroup module support, this patch changes the
config option to bool from tristate and drops module related code from
it.
Also, while an earlier commit fe1217c4f3 ("net: net_cls: move
cgroupfs classid handling into core") dropped module support from
net_cls cgroup, it retained a call to cgroup_load_subsys(), which is
noop for built-in controllers. Drop it along with
init_netclassid_cgroup().
v2: Removed modular version of task_netprioidx() in
include/net/netprio_cgroup.h as suggested by Li Zefan.
v3: Rebased on top of fe1217c4f3 ("net: net_cls: move cgroupfs
classid handling into core"). net_cls cgroup part is mostly
dropped except for removal of init_netclassid_cgroup().
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Thomas Graf <tgraf@suug.ch>
We may lost race if we flush the rule-set (which happens asynchronously
via call_rcu) and we try to remove the table (that userspace assumes
to be empty).
Fix this by recovering synchronous rule and chain deletion. This was
introduced time ago before we had no batch support, and synchronous
rule deletion performance was not good. Now that we have the batch
support, we can just postpone the purge of old rule in a second step
in the commit phase. All object deletions are synchronous after this
patch.
As a side effect, we save memory as we don't need rcu_head per rule
anymore.
Cc: Patrick McHardy <kaber@trash.net>
Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a reject module for NFPROTO_INET. It does nothing but dispatch
to the AF-specific modules based on the hook family.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently the nft_reject module depends on symbols from ipv6. This is
wrong since no generic module should force IPv6 support to be loaded.
Split up the module into AF-specific and a generic part.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This capabilities weren't propagated to the radiotap header.
We don't set here the VHT_KNOWN / MCS_HAVE flag because not
all the low level drivers will know how to properly flag
the frames, hence the low level driver will be in charge
of setting IEEE80211_RADIOTAP_MCS_HAVE_FEC,
IEEE80211_RADIOTAP_MCS_HAVE_STBC and / or
IEEE80211_RADIOTAP_VHT_KNOWN_STBC according to its
capabilities.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
ieee80211_rx_status.flags is full. Define a new vht_flag
variable to be able to set more VHT related flags and make
room in flags.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Acked-by: Kalle Valo <kvalo@qca.qualcomm.com> [ath10k]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The purpose of this housekeeping is to make some room for
VHT flags. The radiotap vendor fields weren't in use.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For the reject module, we need to add AF-specific implementations to
get rid of incorrect module dependencies. Try to load an AF-specific
module first and fall back to generic modules.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
With this patch, the conntrack refcount is initially set to zero and
it is bumped once it is added to any of the list, so we fulfill
Eric's golden rule which is that all released objects always have a
refcount that equals zero.
Andrey Vagin reports that nf_conntrack_free can't be called for a
conntrack with non-zero ref-counter, because it can race with
nf_conntrack_find_get().
A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
ref-counter says that this conntrack is used. So when we release
a conntrack with non-zero counter, we break this assumption.
CPU1 CPU2
____nf_conntrack_find()
nf_ct_put()
destroy_conntrack()
...
init_conntrack
__nf_conntrack_alloc (set use = 1)
atomic_inc_not_zero(&ct->use) (use = 2)
if (!l4proto->new(ct, skb, dataoff, timeouts))
nf_conntrack_free(ct); (use = 2 !!!)
...
__nf_conntrack_alloc (set use = 1)
if (!nf_ct_key_equal(h, tuple, zone))
nf_ct_put(ct); (use = 0)
destroy_conntrack()
/* continue to work with CT */
After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU
race in nf_conntrack_find_get" another bug was triggered in
destroy_conntrack():
<4>[67096.759334] ------------[ cut here ]------------
<2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
...
<4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G C --------------- 2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
<4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>] [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
<4>[67096.760255] Call Trace:
<4>[67096.760255] [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30
<4>[67096.760255] [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
<4>[67096.760255] [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
<4>[67096.760255] [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
<4>[67096.760255] [<ffffffff81484419>] nf_iterate+0x69/0xb0
<4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255] [<ffffffff814845d4>] nf_hook_slow+0x74/0x110
<4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255] [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910
<4>[67096.760255] [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0
<4>[67096.760255] [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140
<4>[67096.760255] [<ffffffff81444f97>] sock_sendmsg+0x117/0x140
<4>[67096.760255] [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60
<4>[67096.760255] [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20
<4>[67096.760255] [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40
<4>[67096.760255] [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff814457c9>] sys_sendto+0x139/0x190
<4>[67096.760255] [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200
<4>[67096.760255] [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290
<4>[67096.760255] [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210
<4>[67096.760255] [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
I have reused the original title for the RFC patch that Andrey posted and
most of the original patch description.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Andrew Vagin <avagin@parallels.com>
Cc: Florian Westphal <fw@strlen.de>
Reported-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
MII management bus clock is derived from the MAC clock by dividing it by
MIIMODER register CLKDIV field value. This value may need to be set up
in case it is undefined or its default value is too high (and
communication with PHY is too slow) or too low (and communication with
PHY is impossible). The value of CLKDIV is not specified directly, but
is derived from the MAC clock for the default MII management bus frequency
of 2.5MHz. The MAC clock may be specified in the platform data, or in
the 'clocks' device tree attribute.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was possible to break interface combinations in
the following way:
combo 1: iftype = AP, num_ifaces = 2, num_chans = 2,
combo 2: iftype = AP, num_ifaces = 1, num_chans = 1, radar = HT20
With the above interface combinations it was
possible to:
step 1. start AP on DFS channel by matching combo 2
step 2. start AP on non-DFS channel by matching combo 1
This was possible beacuse (step 2) did not consider
if other interfaces require radar detection.
The patch changes how cfg80211 tracks channels -
instead of channel itself now a complete chandef
is stored.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When receiving an IBSS_JOINED event select the BSS object
based on the {bssid, channel} couple rather than the bssid
only.
With the current approach if another cell having the same
BSSID (but using a different channel) exists then cfg80211
picks up the wrong BSS object.
The result is a mismatching channel configuration between
cfg80211 and the driver, that can lead to any sort of
problem.
The issue can be triggered by having an IBSS sitting on
given channel and then asking the driver to create a new
cell using the same BSSID but with a different frequency.
By passing the channel to cfg80211_get_bss() we can solve
this ambiguity and retrieve/create the correct BSS object.
All the users of cfg80211_ibss_joined() have been changed
accordingly.
Moreover WARN when cfg80211_ibss_joined() gets a NULL
channel as argument and remove a bogus call of the same
function in ath6kl (it does not make sense to call
cfg80211_ibss_joined() with a zero BSSID on ibss-leave).
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: Arend van Spriel <arend@broadcom.com>
Cc: Bing Zhao <bzhao@marvell.com>
Cc: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Cc: libertas-dev@lists.infradead.org
Acked-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
[minor code cleanup in ath6kl]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The scheduled scan matchsets were intended to be a list of filters,
with the found BSS having to pass at least one of them to be passed
to the host. When the RSSI attribute was added, however, this was
broken and currently wpa_supplicant adds that attribute in its own
matchset; however, it doesn't intend that to mean that anything
that passes the RSSI filter should be passed to the host, instead
it wants it to mean that everything needs to also have higher RSSI.
This is semantically problematic because we have a list of filters
like [ SSID1, SSID2, SSID3, RSSI ] with no real indication which
one should be OR'ed and which one AND'ed.
To fix this, move the RSSI filter attribute into each matchset. As
we need to stay backward compatible, treat a matchset with only the
RSSI attribute as a "default RSSI filter" for all other matchsets,
but only if there are other matchsets (an RSSI-only matchset by
itself is still desirable.)
To make driver implementation easier, keep a global min_rssi_thold
for the entire request as well. The only affected driver is ath6kl.
I found this when I looked into the code after Raja Mani submitted
a patch fixing the n_match_sets calculation to disregard the RSSI,
but that patch didn't address the semantic issue.
Reported-by: Raja Mani <rmani@qti.qualcomm.com>
Acked-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There's not a single rate control algorithm actually in
a separate module where the module refcount would be
required. Similarly, there's no specific rate control
module.
Therefore, all the module handling code in rate control
is really just dead code, so remove it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Change the code to allow making all the rate control ops
const, nothing ever needs to change them. Also change all
drivers to make use of this and mark the ops const.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Allow to force SGI, LGI.
Mainly for test purpose.
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This required liberally sprinkling 'const' over brcmfmac
and mwifiex but seems like a useful thing to do since the
pointer can't really be written.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Addition of the frequency hints showed up couple of places in cfg80211
where pointers could be marked const and a shared function could be used
to fetch a valid channel.
Signed-off-by: Jouni Malinen <j@w1.fi>
[fix mwifiex]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This allows drivers to advertise the maximum number of associated
stations they support in AP mode (including P2P GO). User space
applications can use this for cleaner way of handling the limit (e.g.,
hostapd rejecting IEEE 802.11 authentication without manual
configuration of the limit) or to figure out what type of use cases can
be executed with multiple devices before trying and failing.
Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This clarifies the expected driver behavior on the older
NL80211_ATTR_MAC and NL80211_ATTR_WIPHY_FREQ attributes and adds a new
set of similar attributes with _HINT postfix to enable use of a
recommendation of the initial BSS to choose. This can be helpful for
some drivers that can avoid an additional full scan on connection
request if the information is provided to them (user space tools like
wpa_supplicant already has that information available based on earlier
scans).
In addition, this can be used to get more expected behavior for cases
where a specific BSS should be picked first based on operations like
Interworking network selection or WPS. These cases were already easily
addressed with drivers that leave BSS selection to user space, but there
was no convenient way to do this with drivers that take care of BSS
selection internally without using the NL80211_ATTR_MAC which is not
really desired since it is needed for other purposes to force the
association to remain with the same BSS.
Signed-off-by: Jouni Malinen <j@w1.fi>
[add const, fix policy]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
A beacon should never have a Channel Switch Announcement information
element with a count of 0, because a count of 1 means switch just
before the next beacon. So, if a count of 0 was valid in a beacon, it
would have been transmitted in the next channel already, which is
useless. A CSA count equal to zero is only meaningful in action
frames or probe_responses.
Fix the ieee80211_csa_is_complete() and ieee80211_update_csa()
functions accordingly.
With a CSA count of 0, we won't transmit any CSA beacons, because the
switch will happen before the next TBTT. To avoid extra work and
potential confusion in the drivers, complete the CSA immediately,
instead of waiting for the driver to call ieee80211_csa_finish().
To keep things simpler, we also switch immediately when the CSA count
is 1, while in theory we should delay the switch until just before the
next TBTT.
Additionally, move the ieee80211_csa_finish() function to cfg.c,
where it makes more sense.
Tested-by: Simon Wunderlich <sw@simonwunderlich.de>
Acked-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pull networking updates from David Miller:
1) BPF debugger and asm tool by Daniel Borkmann.
2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.
3) Correct reciprocal_divide and update users, from Hannes Frederic
Sowa and Daniel Borkmann.
4) Currently we only have a "set" operation for the hw timestamp socket
ioctl, add a "get" operation to match. From Ben Hutchings.
5) Add better trace events for debugging driver datapath problems, also
from Ben Hutchings.
6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we
have a small send and a previous packet is already in the qdisc or
device queue, defer until TX completion or we get more data.
7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.
8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
Borkmann.
9) Share IP header compression code between Bluetooth and IEEE802154
layers, from Jukka Rissanen.
10) Fix ipv6 router reachability probing, from Jiri Benc.
11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.
12) Support tunneling in GRO layer, from Jerry Chu.
13) Allow bonding to be configured fully using netlink, from Scott
Feldman.
14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
already get the TCI. From Atzm Watanabe.
15) New "Heavy Hitter" qdisc, from Terry Lam.
16) Significantly improve the IPSEC support in pktgen, from Fan Du.
17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom
Herbert.
18) Add Proportional Integral Enhanced packet scheduler, from Vijay
Subramanian.
19) Allow openvswitch to mmap'd netlink, from Thomas Graf.
20) Key TCP metrics blobs also by source address, not just destination
address. From Christoph Paasch.
21) Support 10G in generic phylib. From Andy Fleming.
22) Try to short-circuit GRO flow compares using device provided RX
hash, if provided. From Tom Herbert.
The wireless and netfilter folks have been busy little bees too.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
net/cxgb4: Fix referencing freed adapter
ipv6: reallocate addrconf router for ipv6 address when lo device up
fib_frontend: fix possible NULL pointer dereference
rtnetlink: remove IFLA_BOND_SLAVE definition
rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
qlcnic: update version to 5.3.55
qlcnic: Enhance logic to calculate msix vectors.
qlcnic: Refactor interrupt coalescing code for all adapters.
qlcnic: Update poll controller code path
qlcnic: Interrupt code cleanup
qlcnic: Enhance Tx timeout debugging.
qlcnic: Use bool for rx_mac_learn.
bonding: fix u64 division
rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
sfc: Use the correct maximum TX DMA ring size for SFC9100
Add Shradha Shah as the sfc driver maintainer.
net/vxlan: Share RX skb de-marking and checksum checks with ovs
tulip: cleanup by using ARRAY_SIZE()
ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
net/cxgb4: Don't retrieve stats during recovery
...
Pull audit update from Eric Paris:
"Again we stayed pretty well contained inside the audit system.
Venturing out was fixing a couple of function prototypes which were
inconsistent (didn't hurt anything, but we used the same value as an
int, uint, u32, and I think even a long in a couple of places).
We also made a couple of minor changes to when a couple of LSMs called
the audit system. We hoped to add aarch64 audit support this go
round, but it wasn't ready.
I'm disappearing on vacation on Thursday. I should have internet
access, but it'll be spotty. If anything goes wrong please be sure to
cc rgb@redhat.com. He'll make fixing things his top priority"
* git://git.infradead.org/users/eparis/audit: (50 commits)
audit: whitespace fix in kernel-parameters.txt
audit: fix location of __net_initdata for audit_net_ops
audit: remove pr_info for every network namespace
audit: Modify a set of system calls in audit class definitions
audit: Convert int limit uses to u32
audit: Use more current logging style
audit: Use hex_byte_pack_upper
audit: correct a type mismatch in audit_syscall_exit()
audit: reorder AUDIT_TTY_SET arguments
audit: rework AUDIT_TTY_SET to only grab spin_lock once
audit: remove needless switch in AUDIT_SET
audit: use define's for audit version
audit: documentation of audit= kernel parameter
audit: wait_for_auditd rework for readability
audit: update MAINTAINERS
audit: log task info on feature change
audit: fix incorrect set of audit_sock
audit: print error message when fail to create audit socket
audit: fix dangling keywords in audit_log_set_loginuid() output
audit: log on errors from filter user rules
...
Recent patch
bonding: add netlink attributes to slave link dev (1d3ee88ae0)
Introduced yet another device specific way to access slave information
over rtnetlink. There is one already there for bridge.
This patch introduces generic way to do this, for getting and setting
info as well by extending link_ops. Later on, this new interface will
be used for bridge ports as well.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change allows to consider an anycast address valid as source address
when given via an IPV6_PKTINFO or IPV6_2292PKTINFO ancillary data item.
So, when sending a datagram with ancillary data, the unicast and anycast
addresses are handled in the same way.
- Adds ipv6_chk_acast_addr_src() to check if an anycast address is link-local
on given interface or is global.
- Uses it in ip6_datagram_send_ctl().
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Zawadzki noticed that some divisions by reciprocal_divide()
were not correct [1][2], which he could also show with BPF code
after divisions are transformed into reciprocal_value() for runtime
invariance which can be passed to reciprocal_divide() later on;
reverse in BPF dump ended up with a different, off-by-one K in
some situations.
This has been fixed by Eric Dumazet in commit aee636c480
("bpf: do not use reciprocal divide"). This follow-up patch
improves reciprocal_value() and reciprocal_divide() to work in
all cases by using Granlund and Montgomery method, so that also
future use is safe and without any non-obvious side-effects.
Known problems with the old implementation were that division by 1
always returned 0 and some off-by-ones when the dividend and divisor
where very large. This seemed to not be problematic with its
current users, as far as we can tell. Eric Dumazet checked for
the slab usage, we cannot surely say so in the case of flex_array.
Still, in order to fix that, we propose an extension from the
original implementation from commit 6a2d7a955d resp. [3][4],
by using the algorithm proposed in "Division by Invariant Integers
Using Multiplication" [5], Torbjörn Granlund and Peter L.
Montgomery, that is, pseudocode for q = n/d where q, n, d is in
u32 universe:
1) Initialization:
int l = ceil(log_2 d)
uword m' = floor((1<<32)*((1<<l)-d)/d)+1
int sh_1 = min(l,1)
int sh_2 = max(l-1,0)
2) For q = n/d, all uword:
uword t = (n*m')>>32
q = (t+((n-t)>>sh_1))>>sh_2
The assembler implementation from Agner Fog [6] also helped a lot
while implementing. We have tested the implementation on x86_64,
ppc64, i686, s390x; on x86_64/haswell we're still half the latency
compared to normal divide.
Joint work with Daniel Borkmann.
[1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
[2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
[3] https://gmplib.org/~tege/division-paper.pdf
[4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
[5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
[6] http://www.agner.org/optimize/asmlib.zip
Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: Jesse Gross <jesse@nicira.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As David Laight suggests, we shouldn't necessarily call this
reciprocal_divide() when users didn't requested a reciprocal_value();
lets keep the basic idea and call it reciprocal_scale(). More
background information on this topic can be found in [1].
Joint work with Hannes Frederic Sowa.
[1] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
Suggested-by: David Laight <david.laight@aculab.com>
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Redefined bh_[un]lock_sock to sctp_bh[un]lock_sock for user
space friendly code which we haven't use in years, so removing them.
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Redefined {lock|release}_sock to sctp_{lock|release}_sock for user space friendly
code which we haven't use in years, so removing them.
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Redefined read_[un]lock to sctp_read_[un]lock for user space
friendly code which we haven't use in years, and the macros
we never used, so removing them.
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Redefined write_[un]lock to sctp_write_[un]lock for user space
friendly code which we haven't use in years, so removing them.
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Redefined spin_[un]lock to sctp_spin_[un]lock for user space friendly
code which we haven't use in years, so removing them.
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Redefined local_bh_{disable|enable} to sctp_local_bh_{disable|enable}
for user space friendly code which we haven't use in years, so removing them.
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Redefined spin_[un]lock_irqstore to sctp_spin_[un]lock_irqrestore for user
space friendly code which we haven't use in years, so removing them.
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add GRO handlers for protocols that do UDP encapsulation, with the intent of
being able to coalesce packets which encapsulate packets belonging to
the same TCP session.
For GRO purposes, the destination UDP port takes the role of the ether type
field in the ethernet header or the next protocol in the IP header.
The UDP GRO handler will only attempt to coalesce packets whose destination
port is registered to have gro handler.
Use a mark on the skb GRO CB data to disallow (flush) running the udp gro receive
code twice on a packet. This solves the problem of udp encapsulated packets whose
inner VM packet is udp and happen to carry a port which has registered offloads.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some ipv6 protocols cannot handle ipv4 addresses, so we must not allow
connecting and binding to them. sendmsg logic does already check msg->name
for this but must trust already connected sockets which could be set up
for connection to ipv4 address family.
Per-socket flag ipv6only is of no use here, as it is under users control
by setsockopt.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
So that we will not expose struct tcf_common to modules.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Every action ops has a pointer to hash info, so we don't need to
hard-code it in each module.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull scheduler changes from Ingo Molnar:
- Add the initial implementation of SCHED_DEADLINE support: a real-time
scheduling policy where tasks that meet their deadlines and
periodically execute their instances in less than their runtime quota
see real-time scheduling and won't miss any of their deadlines.
Tasks that go over their quota get delayed (Available to privileged
users for now)
- Clean up and fix preempt_enable_no_resched() abuse all around the
tree
- Do sched_clock() performance optimizations on x86 and elsewhere
- Fix and improve auto-NUMA balancing
- Fix and clean up the idle loop
- Apply various cleanups and fixes
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
sched: Fix __sched_setscheduler() nice test
sched: Move SCHED_RESET_ON_FORK into attr::sched_flags
sched: Fix up attr::sched_priority warning
sched: Fix up scheduler syscall LTP fails
sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls
sched/core: Fix htmldocs warnings
sched/deadline: No need to check p if dl_se is valid
sched/deadline: Remove unused variables
sched/deadline: Fix sparse static warnings
m68k: Fix build warning in mac_via.h
sched, thermal: Clean up preempt_enable_no_resched() abuse
sched, net: Fixup busy_loop_us_clock()
sched, net: Clean up preempt_enable_no_resched() abuse
sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
sched/preempt, locking: Rework local_bh_{dis,en}able()
sched/clock, x86: Avoid a runtime condition in native_sched_clock()
sched/clock: Fix up clear_sched_clock_stable()
sched/clock, x86: Use a static_key for sched_clock_stable
sched/clock: Remove local_irq_disable() from the clocks
sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs
...
It is not actually implemented.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data
for IPv4 datagrams on IPv6 sockets.
This patch splits the ip6_datagram_recv_ctl into two functions, one
which handles both protocol families, AF_INET and AF_INET6, while the
ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data.
ip6_datagram_recv_*_ctl never reported back any errors, so we can make
them return void. Also provide a helper for protocols which don't offer dual
personality to further use ip6_datagram_recv_ctl, which is exported to
modules.
I needed to shuffle the code for ping around a bit to make it easier to
implement dual personality for ping ipv6 sockets in future.
Reported-by: Gert Doering <gert@space.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the introduction of IPV6_FL_F_REFLECT, there is no guarantee of
flow label unicity. This patch introduces a new sysctl to protect the old
behaviour, enable by default.
Changelog of V3:
* rename ip6_flowlabel_consistency to flowlabel_consistency
* use net_info_ratelimited()
* checkpatch cleanups
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This information is already available via IPV6_FLOWINFO
of IPV6_2292PKTOPTIONS, and them a filtering to get the flow label
information. But it is probably logical and easier for users to add this
here, and to control both sent/received flow label values with the
IPV6_FLOWLABEL_MGR option.
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
net/ipv4/tcp_metrics.c
Overlapping changes between the "don't create two tcp metrics objects
with the same key" race fix in net and the addition of the destination
address in the lookup key in net-next.
Minor overlapping changes in bnx2x driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch :
1) Remove a dst leak if DST_NOCACHE was set on dst
Fix this by holding a reference only if dst really cached.
2) Remove a lockdep warning in __tunnel_dst_set()
This was reported by Cong Wang.
3) Remove usage of a spinlock where xchg() is enough
4) Remove some spurious inline keywords.
Let compiler decide for us.
Fixes: 7d442fab0a ("ipv4: Cache dst in tunnels")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 1ec047eb47 ("ipv6: introduce per-interface counter for
dad-completed ipv6 addresses") I build the detection of the first
operational link-local address much to complex. Additionally this code
now has a race condition.
Replace it with a much simpler variant, which just scans the address
list when duplicate address detection completes, to check if this is
the first valid link local address and send RS and MLD reports then.
Fixes: 1ec047eb47 ("ipv6: introduce per-interface counter for dad-completed ipv6 addresses")
Reported-by: Jiri Pirko <jiri@resnulli.us>
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is following the commit b903d324be (ipv6: tcp: fix TCLASS
value in ACK messages sent from TIME_WAIT).
For the same reason than tclass, we have to store the flow label in the
inet_timewait_sock to provide consistency of flow label on the last ACK.
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
When ndo_neigh_setup is called, the bitfield used by NEIGH_VAR_SET is
not initialized yet. This might cause confusion for the people who use
NEIGH_VAR_SET in ndo_neigh_setup. So rather introduce NEIGH_VAR_INIT for
usage in ndo_neigh_setup.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two places defined IPV6_TCLASS_SHIFT, so we should move it into ipv6.h,
and use this macro as possible. And define ip6_tclass helper to return
tclass
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change move anycast_src_echo_reply sysctl with other ipv6 sysctls.
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.
Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We still need this notifier even when we don't config
PROC_FS.
It should be rare to have a kernel without PROC_FS,
so just for completeness.
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/xfrm/xfrm_policy.c
Steffen Klassert says:
====================
This pull request has a merge conflict between commits be7928d20b
("net: xfrm: xfrm_policy: fix inline not at beginning of declaration") and
da7c224b1b ("net: xfrm: xfrm_policy: silence compiler warning") from
the net-next tree and commit 2f3ea9a95c ("xfrm: checkpatch erros with
inline keyword position") from the ipsec-next tree.
The version from net-next can be used, like it is done in linux-next.
1) Checkpatch cleanups, from Weilong Chen.
2) Fix lockdep complaints when pktgen is used with IPsec,
from Fan Du.
3) Update pktgen to allow any combination of IPsec transport/tunnel mode
and AH/ESP/IPcomp type, from Fan Du.
4) Make pktgen_dst_metrics static, Fengguang Wu.
5) Compile fix for pktgen when CONFIG_XFRM is not set,
from Fan Du.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Right now the sessionid value in the kernel is a combination of u32,
int, and unsigned int. Just use unsigned int throughout.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not necessary at all.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcf_match_indev() is called in fast path, it is not wise to
search for a netdev by ifindex and then compare by its name,
just compare the ifindex.
Also, dev->name could be changed by user-space, therefore
the match would be always fail, but dev->ifindex could
be consistent.
BTW, this will also save some bytes from the core struct of u32.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>