And redirect sock_prot_inuse_add and _get to use one.
As far as the dereferences are concerned. Before the patch we made
1 dereference to proto->inuse.add call, the call itself and then
called the __get_cpu_var() on a static variable. After the patch we
make a direct call, then one dereference to proto->inuse_idx and
then the same __get_cpu_var() on a still static variable. So this
patch doesn't seem to produce performance penalty on SMP.
This is not per-net yet, but I will deliberately make NET_NS=y case
separated from NET_NS=n one, since it'll cost us one-or-two more
dereferences to get the struct net and the inuse counter.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The inuse counters are going to become a per-cpu array. Introduce an
index for this array on the struct proto.
To handle the case of proto register-unregister-register loop the
bitmap is used. All its bits manipulations are protected with
proto_list_lock and a sanity check for the bitmap being exhausted is
also added.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On Fri, 2008-03-28 at 03:24 -0700, Andrew Morton wrote:
> they should all be renamed.
Done for include/net and net
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kill unnecessary llc_station_mac_sa.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
discard llc packet which has bogus packet length.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The qdisc_run loop is currently unbounded and runs entirely in a
softirq. This is bad as it may create an unbounded softirq run.
This patch fixes this by calling need_resched and breaking out if
necessary.
It also adds a break out if the jiffies value changes since that would
indicate we've been transmitting for too long which starves other
softirqs.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 9af3912ec9 ("[NET] Move DF check
to ip_forward") added a new check to send ICMP fragmentation needed
for large packets.
Unlike the check in ip_finish_output(), it doesn't check for GSO.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The older RW_LOCK_UNLOCKED macros defeat lockdep state tracing so
replace them with the newer __RW_LOCK_UNLOCKED macros.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Our interest is not the whole entry of proxy neighbor but the
NTF_ROUTER flag. Let's test it explicitly.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Extract hash function for pneigh entries from pneigh_lookup(),
__pneigh_lookup() and pneigh_delete() as pneigh_hash().
Extract core of pneigh_lookup() and __pneigh_lookup() as
__pneigh_lookup_1().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
LLC currently allows users to inject raw frames, including IP packets
encapsulated in SNAP. While Linux doesn't handle IP over SNAP, other
systems do. Restrict LLC sockets to root similar to packet sockets.
[ Modified Patrick's patch to use CAP_NEW_RAW --DaveM ]
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
With a was number of callsites sctp_add_cmd_sf wrapper bloats
kernel by some amount. Due to unlikely tracking allyesconfig,
with the initial result were around ~7kB (thus caught my
attention) while a non-debug config produced only ~2.3kB effect.
I (ij) proposed first a patch to uninline it but Vlad responded
with a patch that removed the only sctp_add_cmd call which is
wrapped by sctp_add_cmd_sf (I wasn't sure if I could do that).
I did minor cleanup to Vlad's patch.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This elliminates infamous race during module loading when one could lookup
proc entry without proc_fops assigned.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ESP does not account for the IV size when calling pskb_may_pull() to
ensure everything it accesses directly is within the linear part of a
potential fragment. This results in a BUG() being triggered when the
both the IPv4 and IPv6 ESP stack is fed with an skb where the first
fragment ends between the end of the esp header and the end of the IV.
This bug was found by Dirk Nehring <dnehring@gmx.net> .
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reorders some fields in various structures to have
less padding within the structures, making them smaller. It
doesn't yet make any type adjustments, but often size_t is used
for example for IE lengths which is total overkill since size_t
will be 8 bytes long on 64-bit yet the length can at most fill
a u8.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch alters the A-MPDU MLME in sta_info to use dynamic allocation,
thus drastically improving memory usage - from a constant ~2 Kbyte in
the previous (static) allocation to a lower limit of ~200 Byte and an upper
limit of ~2 Kbyte.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch makes ieee80211_get_channel a static inline defined in
cfg80211's header file which simply calls __ieee80211_get_channel
to avoid symbol clashes with the ieee80211 code.
The problem was pointed out by David Miller, thanks!
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch eliminate the use of buf_size as a trigger in favor of a new
flag to control Rx A-MPDU sessions through debugfs
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Otherwise, 'iwconfig wlan0 key off' with no key set results in:
Error for wireless request "Set Encode" (8B2A) :
SET failed on device wlan0 ; No such file or directory.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Make /proc/net/ip6_flowlabel show only flow labels belonging to the
current network namespace.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces a new member, fl_net, in struct ip6_flowlabel.
This allows to create labels with the same value in different namespaces.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of the network namespace information to have this protocol to
handle several network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IPv6 BEET output function is incorrectly including the inner
header in the payload to be protected. This causes a crash as
the packet doesn't actually have that many bytes for a second
header.
The IPv4 BEET output on the other hand is broken when it comes
to handling an inner IPv6 header since it always assumes an
inner IPv4 header.
This patch fixes both by making sure that neither BEET output
function touches the inner header at all. All access is now
done through the protocol-independent cb structure. Two new
attributes are added to make this work, the IP header length
and the IPv4 option length. They're filled in by the inner
mode's output function.
Thanks to Joakim Koskela for finding this problem.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commits f3db4851 ([NETNS][IPV6] ip6_fib - fib6_clean_all handle several
network namespaces) and 69ddb805 ([NETNS][IPV6] route6 - Make proc entry
/proc/net/rt6_stats per namespace) made some proc files per net.
Both of them introduced potential OOPS - get_proc_net can return NULL, but
this check is lost - and a struct net leak - in case single_open() fails the
previously got net is not put.
Kill all these bugs with one patch.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch eliminates an unnecessary poll-related routine
by merging it into TIPC's main polling routine, and updates
the comments associated with this code.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently each vlan_groupd contains 8 pointers on arrays with 512
pointers on struct net_device each :) Such a construction "in many
cases ... wastes memory".
My proposal is to allow for some of these arrays pointers be NULL,
meaning that there are no devices in it. When a new device is added
to the vlan_group, the appropriate array is allocated.
The check in vlan_group_get_device's is safe, since the pointer
vg->vlan_devices_arrays[x] can only switch from NULL to not-NULL.
The vlan_group_prealloc_vid() is guarded with rtnl lock and is
also safe.
I've checked (I hope that) all the places, that use these arrays
and found, that the register_vlan_dev is the only place, that can
put a vlan device on an empty vlan_group.
Rough calculations shows, that after the patch a setup with a
single vlan dev (or up to 512 vlans with sequential vids) will
occupy approximately 8 times less memory.
The question I have is - does this patch makes sense, or a totally
new structures are required to store the vlan_devs?
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The RDMACTXT_F_LAST_CTXT bit was getting set incorrectly
when the last chunk in the read-list spanned multiple pages. This
resulted in a kernel panic when the wrong context was used to
build the RPC iovec page list.
RDMA_READ is used to fetch RPC data from the client for
NFS_WRITE requests. A scatter-gather is used to map the
advertised client side buffer to the server-side iovec and
associated page list.
WR contexts are used to convey which scatter-gather entries are
handled by each WR. When the write data is large, a single RPC may
require multiple RDMA_READ requests so the contexts for a single RPC
are chained together in a linked list. The last context in this list
is marked with a bit RDMACTXT_F_LAST_CTXT so that when this WR completes,
the CQ handler code can enqueue the RPC for processing.
The code in rdma_read_xdr was setting this bit on the last two
contexts on this list when the last read-list chunk spanned multiple
pages. This caused the svc_rdma_recvfrom logic to incorrectly build
the RPC and caused the kernel to crash because the second-to-last
context doesn't contain the iovec page list.
Modified the condition that sets this bit so that it correctly detects
the last context for the RPC.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tested-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 8b7817f3a9 ([IPSEC]: Add ICMP host
relookup support) introduced some dst leaks on error paths: the rt
pointer can be forgotten to be put. Fix it bu going to a proper label.
Found after net namespace's lo refused to unregister :) Many thanks to
Den for valuable help during debugging.
Herbert pointed out, that xfrm_lookup() will put the rtable in case
of error itself, so the first goto fix is redundant.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Given that there are no apparent calls to lock_kernel() or
unlock_kernel() under net/ax25, delete the TODO reference related to
that.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
SIOCADDMULTI/SIOCDELMULTI check whether the driver has a set_multicast_list
method to determine whether it supports multicast. Drivers implementing
secondary unicast support use set_rx_mode however.
Check for both dev->set_multicast_mode and dev->set_rx_mode to determine
multicast capabilities.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This mostly re-uses the net, used in icmp netnsization patches from Denis.
After this ICMP sysctls are completely virtualized.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add some flesh to ipv4_sysctl_init_net and ipv4_sysctl_exit_net,
i.e. copy the table, alter .data pointers and register it per-net.
Other ipv4_table's sysctls are now global, but this is going to
change once sysctl permissions patches migrate from -mm tree to
mainline in 2.6.26 merge window :)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initialization is moved to icmp_sk_init, all the places, that
refer to them use init_net for now.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This includes adding pernet_operations, empty init and exit
hooks and a bit of changes in sysctl_ipv4_init just not to
have this part in next patches.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It should be a "struct ktermios" not a "struct termios".
Based upon a build warning reported by Stephen Rothwell.
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing these flags requires to use dev_set_allmulti/dev_set_promiscuity
or dev_change_flags. Setting it directly causes two unwanted effects:
- the next dev_change_flags call will notice a difference between
dev->gflags and the actual flags, enable promisc/allmulti
mode and incorrectly update dev->gflags
- this keeps the underlying device in promisc/allmulti mode until
the VLAN device is deleted
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Optimize call routing between NATed endpoints: when an external
registrar sends a media description that contains an existing RTP
expectation from a different SNATed connection, the gatekeeper
is trying to route the call directly between the two endpoints.
We assume both endpoints can reach each other directly and
"un-NAT" the addresses, which makes the media stream go between
the two endpoints directly.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for multiple media channels and use it to create
expectations for video streams when present.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SDP connection addresses may be contained in the payload multiple
times (in the session description and/or once per media description),
currently only the session description is properly updated. Split up
SDP mangling so the function setting up expectations only updates the
media port, update connection addresses from media descriptions while
parsing them and at the end update the session description when the
final addresses are known.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create expectations for the RTCP connections in addition to RTP connections.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Media streams can come from anywhere, add a module parameter which
controls whether wildcard expectations or expectations between the
two signalling endpoints are created.
Since the same media description sent on multiple connections may
results in multiple identical expections when using a wildcard source,
we need to check whether a similar expectation already exists for a
different connection before attempting to register it.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create expectations for incoming signalling connections when seeing
a REGISTER request. This is needed when the registrar uses a
different source port number for signalling messages and for receiving
incoming calls from other endpoints than the registrar.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SIP message may contain multiple Contact: addresses referring to
the NATed endpoint, translate all of them.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update maddr=, received= and rport= Via-header parameters refering to
the signalling connection.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce URI and header parameter parsing helpers. These are needed
by the conntrack helper to parse expiration values in Contact: header
parameters and by the NAT helper to properly update the Via-header
rport=, received= and maddr= parameters.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flush the RTP expectations we've created when a call is hung up or
terminated otherwise.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Perform NAT last after parsing the packet. This makes no difference
currently, but is needed when dealing with registrations to make
sure we seen the unNATed addresses.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for per-method request/response handlers and perform SDP
parsing for INVITE/UPDATE requests and for all informational and
successful responses.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the URI parsing helper to get the numerical addresses and get rid of the
text based header translation.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a helper function to parse a SIP-URI in a header value, optionally
iterating through all headers of this kind.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce new function for SIP header parsing that properly deals with
continuation lines and whitespace in headers and use it.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The request URI is not a header and needs to be treated differently than
real SIP headers. Add a seperate function for parsing it and get rid of
the POS_REQ_URI/POS_REG_REQ_URI definitions.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
SDP and SIP headers are quite different, SIP can have continuation lines,
leading and trailing whitespace after the colon and is mostly case-insensitive
while SDP headers always begin on a new line and are followed by an equal
sign and the value, without any whitespace.
Introduce new SDP header parsing function and convert all users that used
the SIP header parsing function. This will allow to properly deal with the
special SIP cases in the SIP header parsing function later.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace sizeof/memcmp by strlen/strcmp. Use case-insensitive comparison
for SIP methods and the SIP/2.0 string, as specified in RFC 3261.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The conntrack reference and ctinfo can be derived from the packet.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
After mangling the packet, the pointer to the data and the length of the data
portion may change and need to be adjusted.
Use double data pointers and a pointer to the length everywhere and add a
helper function to the NAT helper for performing the adjustments.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
"limit" marks the first character outside the bounds.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to set up the destination NAT mapping before the source NAT
mapping, so the NAT core gets to see the final tuple and can decide
whether the source port needs to be remapped.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce expectation classes and policies. An expectation class
is used to distinguish different types of expectations by the
same helper (for example audio/video/t.120). The expectation
policy is used to hold the maximum number of expectations and
the initial timeout for each class.
The individual classes are isolated from each other, which means
that for example an audio expectation will only evict other audio
expectations.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is useful for the SIP helper and signalling expectations.
We don't want to create a full-blown expectation with a wildcard
as source based on a single UDP packet, but need to know the
final port anyways. With inactive expectations we can register
the expectation and reserve the tuple, but wait for confirmation
from the registrar before activating it.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
With nf_conntrack DUMP_TUPLE got renamed to NF_CT_DUMP_TUPLE, fix
CLUSTERIP to use the proper macro name.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Default WMM params have to be set according to beacon/probe response
information prior to authentication (or IBSS start/join); beacon queue
is configured only in IBSS. This does not affect the use of 'real' WMM
params as reported by AP.
Signed-off-by: Vladimir Koutny <vlado@ksp.sk>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Postpone calling ieee80211_hw_config if hardware scanning is active.
This is similar to solution for software scanning where channel setting
is delayed until scan complete.
Signed-off-by: Mohamed Abbas <mohamed.abbas@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds a clean tear down for all block ack sessions if interface
goes down or if a deauthentication is done.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch also fixes the Rx timer's comments
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch fixes a wrong debug print when receiving delba
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When you have an AP on channel 13, it will currently often enough
be listed in scan results even when the regulatory domain restricts
to channels 1-11. This is due to channel overlap. To avoid getting
very strange failures, don't show such APs in the scan results. The
failure mode will now go from "I can see the AP but not associate"
to "I can't see the AP although I know it's there" which is easier
to debug.
This problem was first really noticed by Jes Sorensen.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Jes Sorensen <jes@trained-monkey.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Use the new ieee80211_get_channel() function instead of open-coding it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add ieee80211_get_channel() which gets you a channel struct for a
specific wiphy if that channel is present in that wiphy.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch makes mac80211 able to send a phase1 key for TKIP
decryption.
This is needed for drivers that don't do the rekeying by themselves
(i.e. iwlwifi). Upon IV16 wrap around, the packet is decrypted in SW,
if decryption is ok, mac80211 calls to update_tkip_key with a new
phase 1 RX key.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch makes mac80211 able to compute a TKIP key from an skb.
The requested key can be a phase 1 or a phase 2 key.
This is useful for drivers who need to provide tkip key to their
HW to enable HW encryption.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Introduce an inline net_eq() to compare two namespaces.
Without CONFIG_NET_NS, since no namespace other than &init_net
exists, it is always 1.
We do not need to convert 1) inline vs inline and
2) inline vs &init_net comparisons.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Introduce neigh_parms/pneigh_entry inlines: neigh_parms_net(), pneigh_net().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Without CONFIG_NET_NS, no namespace other than &init_net exists,
no need to store net in seq_net_private.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Introduce per-sock inlines: sock_net(), sock_net_set()
and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Introduce per-net_device inlines: dev_net(), dev_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Last part of hop-limit determination is always:
hoplimit = dst_metric(dst, RTAX_HOPLIMIT);
if (hoplimit < 0)
hoplimit = ipv6_get_hoplimit(dst->dev).
Let's consolidate it as ip6_dst_hoplimit(dst).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Each MIPv6 XFRM state (DSTOPT/RH2) holds either destination or source
address to be mangled in the IPv6 header (that is "CoA").
On Inter-MN communication after both nodes binds each other,
they use route optimized traffic two MIPv6 states applied, and
both source and destination address in the IPv6 header
are replaced by the states respectively.
The packet format is correct, however, next-hop routing search
are not.
This patch fixes it by remembering address pairs for later states.
Based on patch from Masahide NAKAMURA <nakam@linux-ipv6.org>.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Allow to create sockets in the namespace if the protocol ok with this.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP layer now can handle multiple namespaces normally. So, process such
packets normally and drop them only if the transport layer is not
aware about namespaces.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There were no packets in the namespace other than initial
previously. This will be changed in the neareast future. Netfilters
are not namespace aware and should be processed in the initial
namespace only for now.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace all the reast of the init_net with a proper net on the socket
layer.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace all the rest of the init_net with a proper net on the IP layer.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_options_compile uses inet_addr_type which requires a namespace. The
packet argument is optional, so parameter is the only way to obtain
it. Pass the init_net there for now.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Seqfile operation showing /proc/net/arp are already namespace
aware. All we need is to register this file for each namespace.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get namespace from a device and pass it to the routing engine. Enable
ARP packet processing and device notifiers after that.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This file displays the registered packet types, but some of them
(packet sockets creates such) can be bound to a net device and showing
them in a wrong namespace is not correct.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP-Lite sockets are displayed in another files, rather than
UDP ones, so make the present in namespaces as well.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just introduce a helper to remove ifdefs from inside the
udplite4_register function. This will help to make the next patch
nicer.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the commit f40c8174d3 ([NETNS][IPV4]
tcp - make proc handle the network namespaces) it is now possible to make
this file present in newly created namespaces.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the commit a91275eff4 ([NETNS][IPV6]
udp - make proc handle the network namespace) it is now possible to make
this file present in newly created namespaces.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Proxy neighbors do not have any reference counting, so any caller
of pneigh_lookup (unless it's a netlink triggered add/del routine)
should _not_ perform any actions on the found proxy entry.
There's one exception from this rule - the ipv6's ndisc_recv_ns()
uses found entry to check the flags for NTF_ROUTER.
This creates a race between the ndisc and pneigh_delete - after
the pneigh is returned to the caller, the nd_tbl.lock is dropped
and the deleting procedure may proceed.
One of the fixes would be to add a reference counting, but this
problem exists for ndisc only. Besides such a patch would be too
big for -rc4.
So I propose to introduce a __pneigh_lookup() which is supposed
to be called with the lock held and use it in ndisc code to check
the flags on alive pneigh entry.
Changes from v2:
As David noticed, Exported the __pneigh_lookup() to ipv6 module.
The checkpatch generates a warning on it, since the EXPORT_SYMBOL
does not follow the symbol itself, but in this file all the
exports come at the end, so I decided no to break this harmony.
Changes from v1:
Fixed comments from YOSHIFUJI - indentation of prototype in header
and the pndisc_check_router() name - and a compilation fix, pointed
by Daniel - the is_routed was (falsely) considered as uninitialized
by gcc.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
sch_htb: fix "too many events" situation
connector: convert to single-threaded workqueue
[ATM]: When proc_create() fails, do some error handling work and return -ENOMEM.
[SUNGEM]: Fix NAPI assertion failure.
BNX2X: prevent ethtool from setting port type
[9P] net/9p/trans_fd.c: remove unused variable
[IPV6] net/ipv6/ndisc.c: remove unused variable
[IPV4] fib_trie: fix warning from rcu_assign_poinger
[TCP]: Let skbs grow over a page on fast peers
[DLCI]: Fix tiny race between module unload and sock_ioctl.
[SCTP]: Fix build warnings with IPV6 disabled.
[IPV4]: Fix null dereference in ip_defrag
The iWARP protocol limits RDMA read requests to a single scatter
entry. NFS/RDMA has code in rdma_read_max_sge() that is supposed to
limit the sge_count for RDMA read requests to 1, but the code to do
that is inside an #ifdef RDMA_TRANSPORT_IWARP block. In the mainline
kernel at least, RDMA_TRANSPORT_IWARP is an enum and not a
preprocessor #define, so the #ifdef'ed code is never compiled.
In my test of a kernel build with -j8 on an NFS/RDMA mount, this
problem eventually leads to trouble starting with:
svcrdma: Error posting send = -22
svcrdma : RDMA_READ error = -22
and things go downhill from there.
The trivial fix is to delete the #ifdef guard. The check seems to be
a remnant of when the NFS/RDMA code was not merged and needed to
compile against multiple kernel versions, although I don't think it
ever worked as intended. In any case now that the code is upstream
there's no need to test whether the RDMA_TRANSPORT_IWARP constant is
defined or not.
Without this patch, my kernel build on an NFS/RDMA mount using NetEffect
adapters quickly and 100% reproducibly failed with an error like:
ld: final link failed: Software caused connection abort
With the patch applied I was able to complete a kernel build on the
same setup.
(Tom Tucker says this is "actually an _ancient_ remnant when it had to
compile against iWARP vs. non-iWARP enabled OFA trees.")
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sctp_datamsg_free and sctp_datamsg_track are just aliases for
sctp_datamsg_put and sctp_chunk_hold, respectively.
Saves 32 Bytes on x86.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make /proc/net/fib_trie and /proc/net/fib_triestat display
all routing tables, not just local and main.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
the first u32 copied from syncookie_secret is overwritten by the
minute-counter four lines below. After adjusting the destination
address, the size of syncookie_secret can be reduced accordingly.
AFAICS, the only other user of syncookie_secret[] is the ipv6
syncookie support. Because ipv6 syncookies only grab 44 bytes from
syncookie_secret[], this shouldn't affect them in any way.
With fixes from Glenn Griffin.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Glenn Griffin <ggriffin.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HTB is event driven algorithm and part of its work is to apply
scheduled events at proper times. It tried to defend itself from
livelock by processing only limited number of events per dequeue.
Because of faster computers some users already hit this hardcoded
limit.
This patch limits processing up to 2 jiffies (why not 1 jiffie ?
because it might stop prematurely when only fraction of jiffie
remains).
Signed-off-by: Martin Devera <devik@cdi.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patches removes unused code in ndisc_send_redirect() method in
net/ipv6/ndisc.c.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable cb is initialized but never used otherwise.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
type T;
identifier i;
constant C;
@@
(
extern T i;
|
- T i;
<+... when != i
- i = C;
...+>
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable hlen is initialized but never used otherwise.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
type T;
identifier i;
constant C;
@@
(
extern T i;
|
- T i;
<+... when != i
- i = C;
...+>
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This gets rid of a warning caused by the test in rcu_assign_pointer.
I tried to fix rcu_assign_pointer, but that devolved into a long set
of discussions about doing it right that came to no real solution.
Since the test in rcu_assign_pointer for constant NULL would never
succeed in fib_trie, just open code instead.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The route table parameters are set based on system memory and sysctl
values that almost never change. Also the genid only changes every
10 minutes.
RTprint is defined by never used.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sorry for the patch sequence confusion :| but I found that the similar
thing can be done for raw sockets easily too late.
Expand the proto.h union with the raw_hashinfo member and use it in
raw_prot and rawv6_prot. This allows to drop the protocol specific
versions of hash and unhash callbacks.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After this we have only udp_lib_get_port to get the port and two
stubs for ipv4 and ipv6. No difference in udp and udplite except
for initialized h.udp_hash member.
I tried to find a graceful way to drop the only difference between
udp_v4_get_port and udp_v6_get_port (i.e. the rcv_saddr comparison
routine), but adding one more callback on the struct proto didn't
appear such :( Maybe later.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Inspired by the commit ab1e0a13 ([SOCK] proto: Add hashinfo member to
struct proto) from Arnaldo, I made similar thing for UDP/-Lite IPv4
and -v6 protocols.
The result is not that exciting, but it removes some levels of
indirection in udpxxx_get_port and saves some space in code and text.
The first step is to union existing hashinfo and new udp_hash on the
struct proto and give a name to this union, since future initialization
of tcpxxx_prot, dccp_vx_protinfo and udpxxx_protinfo will cause gcc
warning about inability to initialize anonymous member this way.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes code a bit more uniform and straigthforward.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_options->is_data is assigned only and never checked. The structure is
not a part of kernel interface to the userspace. So, it is safe to remove
this field.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is the only way to reach ip_options compile with opt != NULL:
ip_options_get_finish
opt->is_data = 1;
ip_options_compile(opt, NULL)
So, checking for is_data inside opt != NULL branch is not needed.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While testing the virtio-net driver on KVM with TSO I noticed
that TSO performance with a 1500 MTU is significantly worse
compared to the performance of non-TSO with a 16436 MTU. The
packet dump shows that most of the packets sent are smaller
than a page.
Looking at the code this actually is quite obvious as it always
stop extending the packet if it's the first packet yet to be
sent and if it's larger than the MSS. Since each extension is
bound by the page size, this means that (given a 1500 MTU) we're
very unlikely to construct packets greater than a page, provided
that the receiver and the path is fast enough so that packets can
always be sent immediately.
The fix is also quite obvious. The push calls inside the loop
is just an optimisation so that we don't end up doing all the
sending at the end of the loop. Therefore there is no specific
reason why it has to do so at MSS boundaries. For TSO, the
most natural extension of this optimisation is to do the pushing
once the skb exceeds the TSO size goal.
This is what the patch does and testing with KVM shows that the
TSO performance with a 1500 MTU easily surpasses that of a 16436
MTU and indeed the packet sizes sent are generally larger than
16436.
I don't see any obvious downsides for slower peers or connections,
but it would be prudent to test this extensively to ensure that
those cases don't regress.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change TCP_DEFER_ACCEPT implementation so that it transitions a
connection to ESTABLISHED after handshake is complete instead of
leaving it in SYN-RECV until some data arrvies. Place connection in
accept queue when first data packet arrives from slow path.
Benefits:
- established connection is now reset if it never makes it
to the accept queue
- diagnostic state of established matches with the packet traces
showing completed handshake
- TCP_DEFER_ACCEPT timeouts are expressed in seconds and can now be
enforced with reasonable accuracy instead of rounding up to next
exponential back-off of syn-ack retry.
Signed-off-by: Patrick McManus <mcmanus@ducksong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a socket in LISTEN that had completed its 3 way handshake, but not notified
userspace because of SO_DEFER_ACCEPT, would retransmit the already
acked syn-ack during the time it was waiting for the first data byte
from the peer.
Signed-off-by: Patrick McManus <mcmanus@ducksong.com>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
timeout associated with SO_DEFER_ACCEPT wasn't being honored if it was
less than the timeout allowed by the maximum syn-recv queue size
algorithm. Fix by using the SO_DEFER_ACCEPT value if the ack has
arrived.
Signed-off-by: Patrick McManus <mcmanus@ducksong.com>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a narrow pedantry :) but the dlci_ioctl_hook check and call
should not be parted with the mutex lock.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commits f40c81 ([NETNS][IPV4] tcp - make proc handle the network
namespaces) and a91275 ([NETNS][IPV6] udp - make proc handle the
network namespace) both introduced bad checks on sockets and tw
buckets to belong to proper net namespace.
I.e. when checking for socket to belong to given net and family the
do {
sk = sk_next(sk);
} while (sk && sk->sk_net != net && sk->sk_family != family);
constructions were used. This is wrong, since as soon as the
sk->sk_net fits the net the socket is immediately returned, even if it
belongs to other family.
As the result four /proc/net/(udp|tcp)[6] entries show wrong info.
The udp6 entry even oopses when dereferencing inet6_sk(sk) pointer:
static void udp6_sock_seq_show(struct seq_file *seq, struct sock *sp, int bucket)
{
...
struct ipv6_pinfo *np = inet6_sk(sp);
...
dest = &np->daddr; /* will be NULL for AF_INET sockets */
...
seq_printf(...
dest->s6_addr32[0], dest->s6_addr32[1],
dest->s6_addr32[2], dest->s6_addr32[3],
...
Fix it by converting && to ||.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make socket filters work for netlink unicast and notifications.
This is useful for applications like Zebra that get overrun with
messages that are then ignored.
Note: netlink messages are in host byte order, but packet filter
state machine operations are done as network byte order.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Been seeing occasional panics in my testing of 2.6.25-rc in ip_defrag.
Offending line in ip_defrag is here:
net = skb->dev->nd_net
where dev is NULL. Bisected the problem down to commit
ac18e7509e ([NETNS][FRAGS]: Make the
inet_frag_queue lookup work in namespaces).
Below patch (idea from Patrick McHardy) fixes the problem for me.
Signed-off-by: Phil Oester <kernel@linuxace.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The proc init/exit functions take a new network namespace parameter in
order to register/unregister /proc/net/udp6 for a namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch, like udp proc, makes the proc functions to take care of
which namespace the socket belongs.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Copy the network namespace from the socket to the timewait socket.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the common udp proc functions to take care of which
socket they should show taking into account the namespace it belongs.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When CONFIG_PROC_FS=no, the out_sock_create label is not used because
the code using it is disabled and that leads to a warning at compile
time.
This patch fix that by making a specific function to initialize proc
for igmp6, and remove the annoying CONFIG_PROC_FS sections in
init/exit function.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update: My mailer ate one of Jarek's feedback mails... Fixed the
parameter in netif_set_gso_max_size() to be u32, not u16. Fixed the
whitespace issue due to a patch import botch. Changed the types from
u32 to unsigned int to be more consistent with other variables in the
area. Also brought the patch up to the latest net-2.6.26 tree.
Update: Made gso_max_size container 32 bits, not 16. Moved the
location of gso_max_size within netdev to be less hotpath. Made more
consistent names between the sock and netdev layers, and added a
define for the max GSO size.
Update: Respun for net-2.6.26 tree.
Update: changed max_gso_frame_size and sk_gso_max_size from signed to
unsigned - thanks Stephen!
This patch adds the ability for device drivers to control the size of
the TSO frames being sent to them, per TCP connection. By setting the
netdevice's gso_max_size value, the socket layer will set the GSO
frame size based on that value. This will propogate into the TCP
layer, and send TSO's of that size to the hardware.
This can be desirable to help tune the bursty nature of TSO on a
per-adapter basis, where one may have 1 GbE and 10 GbE devices
coexisting in a system, one running multiqueue and the other not, etc.
This can also be desirable for devices that cannot support full 64 KB
TSO's, but still want to benefit from some level of segmentation
offloading.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on notice from "Colin" <colins@sjtu.edu.cn>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When selecting a new window, tcp_select_window() tries not to shrink
the offered window by using the maximum of the remaining offered window
size and the newly calculated window size. The newly calculated window
size is always a multiple of the window scaling factor, the remaining
window size however might not be since it depends on rcv_wup/rcv_nxt.
This means we're effectively shrinking the window when scaling it down.
The dump below shows the problem (scaling factor 2^7):
- Window size of 557 (71296) is advertised, up to 3111907257:
IP 172.2.2.3.33000 > 172.2.2.2.33000: . ack 3111835961 win 557 <...>
- New window size of 514 (65792) is advertised, up to 3111907217, 40 bytes
below the last end:
IP 172.2.2.3.33000 > 172.2.2.2.33000: . 3113575668:3113577116(1448) ack 3111841425 win 514 <...>
The number 40 results from downscaling the remaining window:
3111907257 - 3111841425 = 65832
65832 / 2^7 = 514
65832 % 2^7 = 40
If the sender uses up the entire window before it is shrunk, this can have
chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq()
will notice that the window has been shrunk since tcp_wnd_end() is before
tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number.
This will fail the receivers checks in tcp_sequence() however since it
is before it's tp->rcv_wup, making it respond with a dupack.
If both sides are in this condition, this leads to a constant flood of
ACKs until the connection times out.
Make sure the window is never shrunk by aligning the remaining window to
the window scaling factor.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
zap_completion_queue() retrieves skbs from completion_queue where they have
zero skb->users counter. Before dev_kfree_skb_any() it should be non-zero
yet, so it's increased now.
Reported-and-tested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In br_fdb_cleanup() next_timer and this_timer are in jiffies, so they
should be compared using the time_after() macro.
Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it>
Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a race is SCTP between the loading of the module
and the access by the socket layer to the protocol functions.
In particular, a list of addresss that SCTP maintains is
not initialized prior to the registration with the protosw.
Thus it is possible for a user application to gain access
to SCTP functions before everything has been initialized.
The problem shows up as odd crashes during connection
initializtion when we try to access the SCTP address list.
The solution is to refactor how we do registration and
initialize the lists prior to registering with the protosw.
Care must be taken since the address list initialization
depends on some other pieces of SCTP initialization. Also
the clean-up in case of failure now also needs to be refactored.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a rule using ipt_recent is created with a hit count greater than
ip_pkt_list_tot, the rule will never match as it cannot keep track
of enough timestamps. This patch makes ipt_recent refuse to create such
rules.
With ip_pkt_list_tot's default value of 20, the following can be used
to reproduce the problem.
nc -u -l 0.0.0.0 1234 &
for i in `seq 1 100`; do echo $i | nc -w 1 -u 127.0.0.1 1234; done
This limits it to 20 packets:
iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \
--rsource
iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \
60 --hitcount 20 --name test --rsource -j DROP
While this is unlimited:
iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \
--rsource
iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \
60 --hitcount 21 --name test --rsource -j DROP
With the patch the second rule-set will throw an EINVAL.
Reported-by: Sean Kennedy <skennedy@vcn.com>
Signed-off-by: Daniel Hokka Zakrisson <daniel@hozac.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
logical-bitwise & confusion
Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
'sync' wakeups are a hint towards the scheduler that (certain)
networking related wakeups likely create coupling between tasks.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The NFSv4 protocol allows clients to negotiate security protocols on the
fly in the case where an administrator on the server changes the export
settings and/or in the case where we may have a filesystem migration event.
Instead of having the NFS client code cache credentials that are tied to a
particular AUTH method it is therefore preferable to have a generic credential
that can be converted into whatever AUTH is in use by the RPC client when
the read/write/sillyrename/... is put on the wire.
We do this by means of the new "generic" credential, which basically just
caches the minimal information that is needed to look up an RPCSEC_GSS,
AUTH_SYS, or AUTH_NULL credential.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We need the ability to treat 'generic' creds specially, since they want to
bind instances of the auth cred instead of binding themselves.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add an rpc credential that is not tied to any particular auth mechanism,
but that can be cached by NFS, and later used to look up a cred for
whichever auth mechanism that turns out to be valid when the RPC call is
being made.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The current RPCAUTH_LOOKUP_ROOTCREDS flag only works for AUTH_SYS
authentication, and then only as a special case in the code. This patch
removes the auth_sys special casing, and replaces it with generic code.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
[SCTP]: Fix local_addr deletions during list traversals.
net: fix build with CONFIG_NET=n
[TCP]: Prevent sending past receiver window with TSO (at last skb)
rt2x00: Add new D-Link USB ID
rt2x00: never disable multicast because it disables broadcast too
libertas: fix the 'compare command with itself' properly
drivers/net/Kconfig: fix whitespace for GELIC_WIRELESS entry
[NETFILTER]: nf_queue: don't return error when unregistering a non-existant handler
[NETFILTER]: nfnetlink_queue: fix EPERM when binding/unbinding and instance 0 exists
[NETFILTER]: nfnetlink_log: fix EPERM when binding/unbinding and instance 0 exists
[NETFILTER]: nf_conntrack: replace horrible hack with ksize()
[NETFILTER]: nf_conntrack: add \n to "expectation table full" message
[NETFILTER]: xt_time: fix failure to match on Sundays
[NETFILTER]: nfnetlink_log: fix computation of netlink skb size
[NETFILTER]: nfnetlink_queue: fix computation of allocated size for netlink skb.
[NETFILTER]: nfnetlink: fix ifdef in nfnetlink_compat.h
[NET]: include <linux/types.h> into linux/ethtool.h for __u* typedef
[NET]: Make /proc/net a symlink on /proc/self/net (v3)
RxRPC: fix rxrpc_recvmsg()'s returning of msg_name
net/enc28j60: oops fix
...
The assertion that checks for sge context overflow is
incorrectly hard-coded to 32. This causes a kernel bug
check when using big-data mounts. Changed the BUG_ON to
use the computed value RPCSVC_MAXPAGES.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
RDMA connection shutdown on an SMP machine can cause a kernel crash due
to the transport close path racing with the I/O tasklet.
Additional transport references were added as follows:
- A reference when on the DTO Q to avoid having the transport
deleted while queued for I/O.
- A reference while there is a QP able to generate events.
- A reference until the DISCONNECTED event is received on the CM ID
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since the lists are circular, we need to explicitely tag
the address to be deleted since we might end up freeing
the list head instead. This fixes some interesting SCTP
crashes.
Signed-off-by: Chidambar 'ilLogict' Zinnoury <illogict@online.fr>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With TSO it was possible to send past the receiver window when the skb
to be sent was the last in the write queue while the receiver window
is the limiting factor. One can notice that there's a loophole in the
tcp_mss_split_point that lacked a receiver window check for the
tcp_write_queue_tail() if also cwnd was smaller than the full skb.
Noticed by Thomas Gleixner <tglx@linutronix.de> in form of "Treason
uncloaked! Peer ... shrinks window .... Repaired." messages (the peer
didn't actually shrink its window as the message suggests, we had just
sent something past it without a permission to do so).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit ce7663d84:
[NETFILTER]: nfnetlink_queue: don't unregister handler of other subsystem
changed nf_unregister_queue_handler to return an error when attempting to
unregister a queue handler that is not identical to the one passed in.
This is correct in case we really do have a different queue handler already
registered, but some existing userspace code always does an unbind before
bind and aborts if that fails, so try to be nice and return success in
that case.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to the nfnetlink_log problem, nfnetlink_queue incorrectly
returns -EPERM when binding or unbinding to an address family and
queueing instance 0 exists and is owned by a different process. Unlike
nfnetlink_log it previously completes the operation, but it is still
incorrect.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When binding or unbinding to an address family, the res_id is usually set
to zero. When logging instance 0 already exists and is owned by a different
process, this makes nfunl_recv_config return -EPERM without performing
the bind operation.
Since no operation on the foreign logging instance itself was requested,
this is incorrect. Move bind/unbind commands before the queue instance
permissions checks.
Also remove an incorrect comment.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a horrible slab abuse in net/netfilter/nf_conntrack_extend.c
that can be replaced with a call to ksize().
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Andrew Schulman <andrex@alumni.utexas.net>
xt_time_match() in net/netfilter/xt_time.c in kernel 2.6.24 never
matches on Sundays. On my host I have a rule like
iptables -A OUTPUT -m time --weekdays Sun -j REJECT
and it never matches. The problem is in localtime_2(), which uses
r->weekday = (4 + r->dse) % 7;
to map the epoch day onto a weekday in {0,...,6}. In particular this
gives 0 for Sundays. But 0 has to be wrong; a weekday of 0 can never
match. xt_time_match() has
if (!(info->weekdays_match & (1 << current_time.weekday)))
return false;
and when current_time.weekday = 0, the result of the & is always
zero, even when info->weekdays_match = XT_TIME_ALL_WEEKDAYS = 0xFE.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is similar to nfnetlink_queue fixes. It fixes the computation
of skb size by using NLMSG_SPACE instead of NLMSG_ALIGN.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Size of the netlink skb was wrongly computed because the formula was using
NLMSG_ALIGN instead of NLMSG_SPACE. NLMSG_ALIGN does not add the room for
netlink header as NLMSG_SPACE does. This was causing a failure of message
building in some cases.
On my test system, all messages for packets in range [8*k+41, 8*k+48] where k
is an integer were invalid and the corresponding packets were dropped.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reinette pointed out that with the sta_info RCU-ification
the behaviour here changed and the conf_tx callback is
now invoked under RCU read lock. That is not necessary so
this patch restores the original behaviour
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Fix dentry revalidation for NFSv4 referrals and mountpoint crossings
NFS: Fix the fsid revalidation in nfs_update_inode()
SUNRPC: Fix a nfs4 over rdma transport oops
NFS: Fix an f_mode/f_flags confusion in fs/nfs/write.c
Prevent an RPC oops when freeing a dynamically allocated RDMA
buffer, used in certain special-case large metadata operations.
Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: James Lentini <jlentini@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch make use of the network namespace information at the right
places to handle the multicast for several network namespaces. It
makes the socket control to be per namespace too.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have the right network namespace at the right place now.
So make use of this information to make tcp6 per network namespace
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of having a tcp6_socket global to all the namespace, there is
tcp6 socket control per namespace. That is consistent with which
namespace sent a RST and allows to pass the socket to the underlying
function to retrieve the network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make ndisc socket control per namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make ndisc handle multiple network namespaces:
Remove references to init_net, add network namespace parameters and add
pernet_operations for ndisc
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit db1ed684f6 ("[IPV6]
UDP: Rename IPv6 UDP files."), commit
8be8af8fa4 ("[IPV4] UDP: Move
IPv4-specific bits to other file.") and commit
e898d4db27 ("[UDP]: Allow users to
configure UDP-Lite.").
First, udplite is of such small cost, and it is a core protocol just
like TCP and normal UDP are.
We spent enormous amounts of effort to make udplite share as much code
with core UDP as possible. All of that work is less valuable if we're
just going to slap a config option on udplite support.
It is also causing build failures, as reported on linux-next, showing
that the changeset was not tested very well. In fact, this is the
second build failure resulting from the udplite change.
Finally, the config options provided was a bool, instead of a modular
option. Meaning the udplite code does not even get build tested
by allmodconfig builds, and furthermore the user is not presented
with a reasonable modular build option which is particularly needed
by distribution vendors.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates TIPC's version number to 1.6.3.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes two enhancements to the routine used to
set bit fields within a TIPC message header:
1) It now ignores any bits of the new field value that are not
covered by the mask being used. (Previously, if the new value
exceeded the size of the mask the extra bits could corrupt
other fields in the message header word being updated.)
2) The code has been optimized to minimize the number of run-time
endianness conversion operations by leveraging the fact that the
mask (and, in some cases, the value as well) is constant and the
necessary conversion can be performed by the compiler.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch ensures that the 3-bit version field of the TIPC
message header is masked correctly when written into a message.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch eliminates some unused or duplicate message header
symbols, and fixes up the comments and/or location of a few
other symbols.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch eliminates warnings about undeclared symbols.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch validates that the "how" argument to shutdown()
is SHUT_RDWR, since this is the only form that TIPC supports.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes code associated with optional, user-specified
fields of the TIPC message header. Such fields were never
utilized by TIPC, and have now been removed from the protocol
specification.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mac80211 MLME requires restarting timers after a scan
completes but this wasn't done when hardware scan offload
was added, so add it now.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Tested-by: Bill Moss <bmoss@clemson.edu>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Skip properly entries whose dev does not match.
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Postponing the deletion is not really useful anymore.
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This avoids dereferencing a no longer existing struct mesh_path.
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Pointed out by Johannes Berg.
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch fixes all the mesh related endianness warnings reported by sparse. As
they were the reason why Johannes marked mesh as BROKEN, that flag has been
removed.
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Today I hit one of my new WARN_ONs in the mac80211 code because
a key wasn't being freed correctly. After wondering for a while
I finally tracked it to the fact that STA keys aren't added to
the per-sdata key list correctly, they are supposed to always be
on that list, not just for default keys. This patch fixes that.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
I noticed a bug I introduced when mesh is enabled: sta_info_destroy()
will end up calling cancel_timer() on a timer that has never been
initialized because the timer is only initialized in mesh_plink_alloc(),
not in sta_info_alloc(). This patch moves the initialization of all mesh
related fields into sta_info_alloc(), adds a bit of sanity checking to
the cfg80211 handlers and sta_info_insert() and makes mesh_plink_alloc()
a static helper function that is only used from the mesh plink code.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Quite a while ago I started this book. The required kernel-doc
patches have since gone into the tree so it is now possible to
build the book in mainline.
The actual documentation is still rather incomplete and not all
things are linked into the book, but this enables us to edit
the documentation collaboratively, hopefully driver authors can
add documentation based on their experience with mac80211.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Luis pointed out that this path is going to be freed right
away anyway so there's no point in assigning next_hop.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When we take down an interface, we need to remove the STA info
items that belong to it because otherwise we might invoke a
sta_notify() callback in the driver when we later delete the
STA entries, but in that case the driver will already have
removed its knowledge of the interface they belonged to leading
to confusion. Also, we could invoke the set_tim() callback after
the driver removed its knowledge of the interface, which can
lead to a crash if it requests a beacon with a then-invalid vif
pointer!
A side effect of this patch is that, because it was easier, it
disallows changing the WDS peer while an interface is up. Should
that actually be necessary, it can be added back, but the WDS
peer STA entry may not be added while the interface is UP so for
now I've simplified the WDS peer's STA entry lifetime management.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch cleans up the sta_info struct and documents how
each set of variables is locked. Notably, flags locking is
completely missing. It also adds kernel-doc for some (but
not all yet) members of the struct.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
sta_info_add() has two functions: allocating a station info
structure and inserting it into the hash table/list. Splitting
these two functions allows allocating with GFP_KERNEL in many
places instead of GFP_ATOMIC which is now required by the RCU
protection. Additionally, in many places RCU protection is now
no longer needed at all because between sta_info_alloc() and
sta_info_insert() the caller owns the structure.
This fixes a few race conditions with setting initial flags
and similar, but not all (see comments in ieee80211_sta.c and
cfg.c). More documentation on the existing races will be in
a follow-up patch.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This makes access to the STA hash table/list use RCU to protect
against freeing of items. However, it's not a true RCU, the
copy step is missing: whenever somebody changes a STA item it
is simply updated. This is an existing race condition that is
now somewhat understandable.
This patch also fixes the race key freeing vs. STA destruction
by making sure that sta_info_destroy() is always called under
RTNL and frees the key.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Split it into ieee80211_tx_data and ieee80211_rx_data to clarify
usage/flag usage and remove the stupid union thing.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Three __le16s followed by an enum (int) leave a two-byte hole
of padding which we can use for two of the other fields.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Accidentally copied in a __mesh_plink_deactivate, noticed by Luis
Carlos Cobo.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Moves another ifdef into the sta_info header file in favour of
compiling more code even w/o CONFIG_MAC80211_MESH.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This needs to be exported because rate control algorithms
can be modular.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This clarifies that the mesh networking code is currently
based on Draft 1.08 of the 802.11 Mesh Networking amendment.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This inserts a missing break statement which, if hit, would cause
the code to fall-through and unlock a spinlock twice. Noticed via
sparse's "lock count wrong in basic block" warning and careful
code inspection.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Currently marked BROKEN because of endianness problems.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes missing unlocks noticed by sparse.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Various cleanups, reducing the #ifdef mess and other things.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This completes the mesh interface handling code and a few other
bits about the mac80211 module.
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds code to allow adding mesh interfaces and configuring
mesh peers etc. Also, it adds code for station dumping.
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch contains the debugfs code for mesh statistics and configuration
parameters. Please note that generic support for r/w debugfs attributes has been
added.
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This file implements the on-demand Hybrid Wireless Mesh Protocol, at this moment
using hop-count as the metric. When no mesh path exists for a given destination
or the mesh path is not active, frames addressed to that destination will be
queued and a Path Request frame will be sent. Queued frames will be sent when
the path is resolved (usually after reception of a Path Response) or discarded
if discovery times out. Path Requests will also be sent to refresh paths that
are being used and are close to expiring.
Path Errors are sent when a path discovery process triggered by the attempt to
forward a frame originated in a different mesh point times out. Path Errors are
also sent when a peer link is determined to be unreachable because of high error
rates.
Multiple destination support in Path Requests and Path Errors and precursors
have not been implemented yet.
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The mesh path table associates destinations with the next hop to reach them. The
table is a hash of linked lists protected by rcu mechanisms. Every mesh path
contains a lock to protect the mesh path state.
Each outgoing mesh frame requires a look up into this table. Therefore, the
table it has been designed so it is not necessary to hold any lock to find the
appropriate next hop.
If the path is determined to be active within a rcu context we can safely
dereference mpath->next_hop->addr, since it holds a reference to the sta
next_hop. After a mesh path has been set active for the first time it next_hop
must always point to a valid sta. If this is not possible the mpath must be
deleted or replaced in a RCU safe fashion.
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This file implements mesh discovery and peer link establishment support using
the mesh peer link table provided in mesh_plinktbl.c.
Secure peer links have not been implemented yet.
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This includes support for mesh network scanning. The ugly code in
ieee80211_sta_scan_result() is my approach to work around wext. This has been
tested with wireless tools version 29 and works as expected (the new interface
mode is just not shown).
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Includes integration in struct sta_info of mesh peer link elements, previously
on their own mesh peer link table.
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This changes the TX/RX paths in mac80211 to support mesh interfaces.
This code will be cleaned up later again before being enabled.
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The two important features coded in mesh.c are:
Recently Multicast Cache: in on-demand HWMP, multicast traffic is retransmitted
by every receiving node. Even though a mesh TTL counter avoids infinite loops,
it is also necessary to avoid traffic explosion by keeping a cache of multicast
mesh frame that have been received recently. With this feature, maximum number
of retransmissions of a multicast frame for the case of N nodes within the range
of each other would be N. Without it, the maximum number of retransmissions
would be in the order of N^(MESH_TTL - 1).
Code to support mesh tables.
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Added support for mesh id and mesh path operation as well as
station structure dumping.
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix rxrpc_recvmsg() to return msg_name correctly. We shouldn't
overwrite the *msg struct, but should rather write into msg->msg_name
(there's a '&' unary operator that shouldn't be there).
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnep_sock_cleanup() always returns 0 and its return value isn't used
anywhere in the code.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
hci_sock_cleanup() always returns 0 and its return value isn't used
anywhere in the code.
Compile-tested with 'make allyesconfig && make net/bluetooth/bluetooth.ko'
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
(Anonymous) unions can help us to avoid ugly casts.
A common cast it the (struct rtable *)skb->dst one.
Defining an union like :
union {
struct dst_entry *dst;
struct rtable *rtable;
};
permits to use skb->rtable in place.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brings max_burst socket option set/get into line with the latest ietf
socket extensions api draft, while maintaining backwards
compatibility.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an address family is not listed in "Supported Address Types"
parameter(INIT Chunk), but the packet is sent by that family, this
address family should be considered as supported by peer. Otherwise,
an error condition will occur. For instance, if kernel receives an
IPV6 SCTP INIT chunk with "Support Address Types" parameter which
indicates just supporting IPV4 Address family. Kernel will reply an
IPV6 SCTP INIT ACK packet, but the source ipv6 address in ipv6 header
will be vacant. This is not correct.
refer to RFC4460 as following:
IMPLEMENTATION NOTE: If an SCTP endpoint lists in the 'Supported
Address Types' parameter either IPv4 or IPv6, but uses the other
family for sending the packet containing the INIT chunk, or if it
also lists addresses of the other family in the INIT chunk, then
the address family that is not listed in the 'Supported Address
Types' parameter SHOULD also be considered as supported by the
receiver of the INIT chunk. The receiver of the INIT chunk SHOULD
NOT respond with any kind of error indication.
Here is a fix to comply to RFC.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch make the changes necessary to support network namespaces in
ICMPv6.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The different subsystem of ipv6 are ready for namespaces, so let's
activate it for ipv6_rcv.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ip6_dst_lookup receive a socket as parameter. In some part of the code
it is called with a NULL socket parameter. We want to rely on the socket
to retrieve the network namespace, so we always pass a valid socket in all
cases.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an netns parameter to ip6_route_output. That will allow to access
to the right routing table for outgoing traffic.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
All the infrastructure to propagate the network namespace information
is ready. Make use of it.
There is a special case here between the initial network namespace and
the other namespaces:
* When ipv6 is initialized at boot time (aka in the init_net), it
registers to the notifier callback. So addrconf_notify will be called
as many time as there are network devices setup on the system and the
function will add ipv6 addresses to the network devices. But the first
device which needs to have its ipv6 address setup is the loopback,
unfortunatly this is not the case. So the loopback address is setup
manually in the ipv6 init function.
* With the network namespace, this ordering problem does not appears
because notifier is already setup and active, so as soon as we
register the loopback the ipv6 address is setup and it will be the
first device.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch propagates the network namespace pointer to the address
configuration routines which need it, which means adding a new
parameter to these functions, and make them use it instead of using
the initial network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset avoids creation of the /proc entry for snmp6 when
the call is made from a network namespace different from the init_net.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow creation of IPv6 raw and datagram sockets in network namespaces
other than init_net.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves initialization of IPv6 sysctl stuff at the end of
IPv6 initialization.
This will be helpful for network namespaces where some sysctl entries
depend on per-namespace variables, that need to be allocated and
initialized before they are referenced by sysctl.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Stephen Hemminger <shemminger@linux-foundation.org>
Based upon a patch by Marcel Wappler:
This patch fixes a DHCP issue of the kernel: some DHCP servers
(i.e. in the Linksys WRT54Gv5) are very strict about the contents
of the DHCPDISCOVER packet they receive from clients.
Table 5 in RFC2131 page 36 requests the fields 'ciaddr' and
'siaddr' MUST be set to '0'. These DHCP servers ignore Linux
kernel's DHCP discovery packets with these two fields set to
'255.255.255.255' (in contrast to popular DHCP clients, such as
'dhclient' or 'udhcpc'). This leads to a not booting system.
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge rate_control_pid_shift_adjust() to rate_control_pid_adjust_rate()
in order to make the learning algorithm aware of constraints on rates. Also
add some comments and rename variables.
This fixes a bug which prevented 802.11b/g non-AP STAs from working with
802.11b only AP STAs.
This patch was originally destined for 2.6.26, and is being backported
to fix a user reported problem in post-2.6.24 kernels.
Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Now the ESP uses the AEAD interface even for algorithms which are
not combined mode, we need to select CONFIG_CRYPTO_AUTHENC as
otherwise only combined mode algorithms will work.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patches improves the readibility of the ip6_dst_gc() routine.
It simplifies long lines which grow a lot due to the introduction
of network namespaces support.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the necessary changes to make IPv6 dst_entry garbage
collection work with multiple network namespaces.
In ip6_dst_gc(), static local variables are now declared
per-namespace.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ip6_dst_ops is moved inside the network namespace structure. All
references to this structure are now relative to the initial network
namespace.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6_dst_ops is dynamically allocated in init and exit functions. That
provides the ability to do multiple instanciations of this structure.
This will be needed for network namespaces, indeed dst_ops stores data
that are required to be per namespace: entries and gc_thresh.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rt6_info structures are moved inside the network namespace
structure. All references to these structures are now relative to the
initial network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch make mindless changes and prepares the code to use dynamic
allocation for rt6_info structure. The code accesses the rt6_info
structure as a pointer instead of a global static variable.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the routing engine use the network namespaces to
access routing informations: Add a network namespace parameter to
ipv6_route_ioctl and propagate the network namespace value to all the
routing code that have not yet been changed.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a network namespace parameter to rt6_purge_dflt_routers. This is
needed to call fib6_get_table with the appropriate network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a network namespace parameter to rt6_add_route_info() and
rt6_get_route_info to enable them to handle multiple network
namespaces.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the proc entry /proc/net/rt6_stats work in all network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a network namespace parameter to rt6_lookup().
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make /proc/net/ipv6_route and /proc/net/rt6_stats to be per namespace.
These proc files are now created when the network namespace is
initialized.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based upon a report by Andrew Morton and code analysis done
by Jarek Poplawski.
This reverts 33f807ba0d ("[NETPOLL]:
Kill NETPOLL_RX_DROP, set but never tested.") and
c7b6ea24b4 ("[NETPOLL]: Don't need
rx_flags.").
The rx_flags did get tested for zero vs. non-zero and therefore we do
need those tests and that code which sets NETPOLL_RX_DROP et al.
Signed-off-by: David S. Miller <davem@davemloft.net>
Stop dumping of entries when af_key socket receive queue is getting
full and continue it later when there is more room again.
This fixes dumping of large databases. Currently the entries not
fitting into the receive queue are just dropped (including the
end-of-dump message) which can confuse applications.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
The semaphore tsock->sem is used as mutex, convert it to the mutex API
Signed-off-by: Matthias Kaehlcke <matthias@kaehlcke.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rt6_stats is now per namespace with this patch. It is allocated
when a network namespace is created and freed when the network
namespace exits and references are relative to the network namespace.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allocates the rt6_stats struct dynamically when the fib6 is
initialized. That provides the ability to create several instances of
this structure for the network namespaces.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fib6_rules_ops is moved to the network namespace structure. All
references are changed to have it relatively to it.
Each time a network namespace is created a new fib6_rules_ops is
allocated, initialized and stored into the network namespace
structure.
The common part of the fib rules is namespace aware, so it is quite
easy to retrieve the network namespace from the rules and use it in
the different callbacks.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fib6_rules_ops structure is dynamically allocated, so that allows
to make several instances of it per network namespace.
The global static fib6_rules_ops structure is renamed to
fib6_rules_ops_template in order to quickly memcopy it for the
structure initialization.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fib6_clean_node function should have the network namespace it is
working on. The fib6_cleaner_t structure is extended with the network
namespace field to be passed to the fib6_clean_node function.
The different functions calling the fib6_clean_node function are
extended with the netns parameter when needed to propagate the netns
pointer.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the timer initialization at the network namespace creation and
store the network namespace in the timer argument.
That enables multiple timers (one per network namespace) to do garbage
collecting.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ip6_fib_timer gc timer is dynamically allocated and initialized in
the ip6 fib init function. There are no more references to a static
global variable. That will allow to make multiple instance of the
garbage collecting timer and make them per namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fib tables are now relative to the network namespace. When the
garbage collector timer expires, we must have a network namespace
parameter in order to retrieve the tables. For now this is the
init_net, but we should be able to have a timer per namespace and use
the timer callback parameter to pass the network namespace from the
expired timer.
The timer callback, fib6_run_gc, is actually used to be called
synchronously by some functions and asynchronously when the timer
expires.
When the timer expires, the delay specified for fib6_run_gc parameter
is always zero. So, I changed fib6_run_gc to not be a timer callback
but a function called by the timer callback and I added a timer
callback where its work is just to retrieve from the data arg of the
timer the network namespace and call fib6_run_gc with zero expiring
time and the network namespace parameters. That makes the code cleaner
for the fib6_run_gc callers.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function fib6_clean_all takes the network namespace as
parameter. That allows to flush the routes related to a specific
network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fib table for ipv6 are moved to the network namespace structure.
All references to them are made relatively to the network namespace.
All external calls to the ip6_fib functions taking the network
namespace parameter are made using the init_net variable, so the
ip6_fib engine is ready for the namespaces but the callers not yet.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the fib6 tables to be dynamically allocated. That
provides the ability to make several instances of them when a new
network namespace is created.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is changing the paths for sending MLD/MLDv2 messages
from dev_queue_xmit() to standard dst_output().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
For later use, this patch is renaming ndisc_dst_alloc()
(and related function/structures) to icmp6_dst_alloc()
(and so on). This patch also removing unused function-
pointer argument for it.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
For later use, this patch is renaming ndisc_flow_init() to
icmpv6_flow_init() and putting it in common place.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Since most users of ipv6_get_saddr() pass non-NULL as
dst argument, use ipv6_dev_get_saddr() directly.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Updated to incorporate Eric's suggestion of using a per cpu buffer
rather than allocating on the stack. Just a two line change, but will
resend in it's entirety.
Signed-off-by: Glenn Griffin <ggriffin.kernel@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
400 bytes allocated on stack might be a litle bit too much. Using a
per_cpu var is more friendly.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
There are some place, that calculate the ARP header length. These
calculations are correct, but
a) some operate with "magic" constants,
b) enlarge the code length (sometimes at the cost of coding style),
c) are not informative from the first glance.
The proposal is to introduce a helper, that includes all the good
sides of these calculations.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the l2cap info_timer is active the info_state will be set to
L2CAP_INFO_FEAT_MASK_REQ_SENT, and it will be unset after the timer is
deleted or timeout triggered.
Here in l2cap_conn_del only call del_timer_sync when the info_state is
set to L2CAP_INFO_FEAT_MASK_REQ_SENT.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
neigh_update sends skb from neigh->arp_queue while neigh_timer_handler
has increased skbs refcount and calls solicit with the
skb. neigh_timer_handler should not increase skbs refcount but make a
copy of the skb and do solicit with the copy.
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since a5fbb6d106
"KVM: fix !SMP build error" smp_call_function isn't a define anymore
that folds into nothing but a define that calls up_smp_call_function
with all parameters. Hence we cannot #ifdef out the unused code
anymore...
This seems to be the preferred method, so do this for s390 as well.
net/iucv/iucv.c: In function 'iucv_cleanup_queue':
net/iucv/iucv.c:657: error: '__iucv_cleanup_queue' undeclared
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It makes fackets_out to grow too slowly compared with the
real write queue.
This shouldn't cause those BUG_TRAP(packets <= tp->packets_out)
to trigger but how knows how such inconsistent fackets_out
affects here and there around TCP when everything is nowadays
assuming accurate fackets_out. So lets see if this silences
them all.
Reported by Guillaume Chazarain <guichaz@gmail.com>.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_association->hbinterval is unsigned long. Replace %8d with %8lu.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_options_echo is called on the packet input path after the initial
routing. The dst entry on the packet is cleared only in the several
very specific places and immidiately assigned back (may be new).
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Functions from __exit section should not be called from ones in __init
section. Fix this conflict.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even though I thought about it a lot and had also tested it, some
of my recent changes in the key code broke replacing keys, making
the kernel oops because a key is removed from a list while not on
it.
This patch fixes that using the list as an indication whether or
not the key is on it (an empty list means it's not on any list.)
Also, this patch fixes hw accel enabling, the check for not doing
hw accel when the interface is down was lost and is restored by
this.
Additionally, move adding the key to the list into the function
__ieee80211_key_replace() for more consistency.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In order to RCU-ify sta_info, we need to be able to allocate
a key without linking it to an sdata/sta structure (because
allocation cannot be done in an rcu critical section). This
patch splits up ieee80211_key_alloc() and updates all users
appropriately.
While at it, this patch fixes a number of race conditions
such as finally making key replacement atomic, unfortunately
at the expense of more complex code.
Note that this patch documents /existing/ bugs with sta info
and key interaction, there is currently a race condition
when a sta info is freed without holding the RTNL. This will
finally be fixed by a followup patch.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
These things aren't used and the only possible use is within
rate control algorithms, however those can, if they need it,
keep track of it in their private data. last_ack_ms isn't
even updated so completely useless.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If ieee80211_if_reinit() is called from ieee80211_unregister_hw()
then it is possible that the driver will still request a beacon
(it is allowed to until ieee80211_unregister_hw() has returned.)
This means we need to use an RCU-protected write to the beacon
information even in this function.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch fixes two errors introduced by
commit 19d35612f3cd7f60dd9174c0100584e21f5a1025
Author: Bruno Randolf <bruno@thinktube.com>
Date: Mon Feb 18 11:21:36 2008 +0900
mac80211: enable IBSS merging
The first error is an endianness problem that sparse found and
the second is a build failure when CONFIG_MAC80211_IBSS_DEBUG
is not set.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Bruno Randolf <bruno@thinktube.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When print_mac() was marked as __pure to avoid emitting a function
call in pr_debug() scenarios, a warning in this code surfaced since
it relies on the fact that the buffer is modified and doesn't use
the return value. This patch makes it use the return value instead.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reported-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Disallow having more than one IBSS interface up at any time
because of beacon distribution issues, and for now also disallow
having more than one IBSS/STA interface up at the same time
because we use the master interface's BSS struct which would
be completely corrupted when we have more than one up.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When a STA structure is added, it is often checked whether it
already exists before adding it. This, however, isn't done
atomically so there is a race condition that could lead to two
STA structures being added with the same MAC address. This
patch changes sta_info_add() to return an ERR_PTR in case
of failure and adds the failure mode -EEXIST when the STA
already exists.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This reworks the code for TX filtered frames, splitting it out to
a new function to handle those cases, making the clear instruction
a flag and renaming a few things to be easier to understand and
less Atheros hardware specific. Finally, it also makes the comments
explain more.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Configuration variables are only available to the preprocessor
Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This consolidates all TIM handling code to avoid re-introducing
errors with the bitmap/set_tim order and to reduce code. While
reading the code I noticed a possible problem so I also added
a comment about that.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The TIM flag that is kept in each station's info is completely
useless, there's no code (aside from the debugfs display code)
checking it, hence it can be removed. While doing that, I noticed
that the TIM handling is broken when buffered frames expire, so
fix that.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Drivers should be allowed to simply get a complete new beacon when
set_tim() is invoked (and set_tim() is required for drivers that
just want a beacon template!), so we need to update our own TIM
bitmap before calling set_tim() so that getting the beacon will
now get an already updated beacon.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds channels to US regulatory domain
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This doesn't really need to be a full int variable since it's
just a flag to indicate a PS-poll is in progress.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
enable IBSS cell merging. if an IBSS beacon with the same channel, same ESSID
and a TSF higher than the local TSF (mactime) is received, we have to join its
BSSID. while this might not be immediately apparent from reading the 802.11
standard it is compliant and necessary to make IBSS mode functional in many
cases. most drivers have a similar behaviour.
* move the relevant code section (previously only containing debug code) down
to the end of the function, so we can reuse the bss structure.
* we have to compare the mactime (TSF at the time of packet receive) rather
than the current TSF. since mactime is defined as the time the first data
symbol arrived we add the time until byte 24 where the timestamp resides, since
this is how the beacon timestamp is defined. as some some drivers are not able
to give a reliable mactime we fall back to use the current TSF, which will be
enough to catch most (but not all) cases where an IBSS merge is necessary.
* in IBSS mode we want to allow beacons to override probe response info so we
can correctly do merges.
* we don't only configure beacons based on scan results, so change that
message.
* to enable this we have to let all beacons thru in IBSS mode, even if they
have a different BSSID.
Signed-off-by: Bruno Randolf <bruno@thinktube.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
this moves ieee80211_sta_join_ibss() up for the next patch (ibss merge).
Signed-off-by: Bruno Randolf <bruno@thinktube.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The functions time_before, time_before_eq, time_after, and time_after_eq are more robust for comparing jiffies against other values.
So following patch implements usage of the time_after() macro, defined at linux/jiffies.h, which deals with wrapping correctly
Cc: linux-wireless@vger.kernel.org
Signed-off-by: S.Çağlar Onur <caglar@pardus.org.tr>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This brain-damaged code just bothers me, fix it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This changes mac80211 to pass the burst time to conf_tx in txop
units rather than 0.1msec units. 0.1msec units are only required
by atheros hardware (according to current driver support), all
other drivers do other calculations or require the txop value.
Therefore, it results in fewer calculations and more precision
if we just pass the txop value through to the driver.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When we want to go multiqueue, we will need to know the number of
queues the hardware has for registering the master netdev. This
number is only available in ieee80211_register_hw() rather than
ieee80211_alloc_hw(), so defer allocation of the master device to
ieee80211_register_hw().
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fix allows to control the number of bits that qdiscs book keeping
can be done for with respect to the qdisc pool
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds "cooked" monitor mode to mac80211. A monitor interface
in "cooked" mode will see all frames that mac80211 has not used
internally.
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There is some duplicated code that sits in front of each function
call to ieee80211_invoke_rx_handlers() that can very well be part
of that function if it gets slightly different arguments.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
It doesn't really make sense to have extra pointers to the RX/TX
handler arrays instead of just using the arrays directly, that
also allows us to make them static.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Uninline ieee80211_invoke_rx_handlers to save .text space,
make the code more readable in some places and remove the
"optimisation" that is hit only very few times and unclear
to start with.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Take advantage of the monitor configuration flags now provided by cfg80211.
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This allows precise control over what a monitor interface shows.
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some instances of RX_DROP mean that the frame was useless,
others mean that the frame should be visible in userspace
on "cooked" monitor interfaces. This patch splits up RX_DROP
and changes each instance appropriately.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The _DROP result will need to be split in the RX path but not
in the TX path, so for preparation split up the type into two
types, one for RX and one for TX. Also make sure (via sparse)
that they cannot be confused.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When the driver registers a IEEE80211_BAND_2GHZ band,
it can either be 802.11b or 802.11g. But when 802.11b rates
are registered "want" will be 3 (since 4 rates are being registered,
and each of those 4 rates will decrease "want").
Since this is a correct situation, there is no need to trigger
a WARN_ON() for this.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In the rate API patch I accidentally reverted the test for
ERP rates, this fixes it. All rates except 1, 2, 5.5 and 11
MBit are ERP rates, not those.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Merge rate_control_pid_shift_adjust() to rate_control_pid_adjust_rate()
in order to make the learning algorithm aware of constraints on rates. Also
add some comments and rename variables.
This fixes a bug which prevented 802.11b/g non-AP STAs from working with
802.11b only AP STAs.
Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch removes the 802.1X port acess control enable flag
since it is not required. Instead, set the authorized flag for
each station that we normally communicate with (WDS peers, IBSS
peers and APs we're associated to) and require hostapd to set
the authorized flag for all stations when port control is not
enabled.
Also, since I was working in that area, this documents station
flags and removes the unused "permanent" one.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When checking for the next band to advance to, there
was an off-by-one error that could lead to an access
to an invalid array index. Additionally, the later
check for scan_band >= IEEE80211_NUM_BANDS is not
required since that will never be true.
This also improves the comments related to that code.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This makes nl80211 export the hardware bitrate/channel capabilities
as registered in a wiphy.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch creates new cfg80211 wiphy API for channel and bitrate
registration and converts mac80211 and drivers to the new API. The
old mac80211 API is completely ripped out. All drivers (except ath5k)
are updated to the new API, in many cases I expect that optimisations
can be done.
Along with the regulatory code I've also ripped out the
IEEE80211_HW_DEFAULT_REG_DOMAIN_CONFIGURED flag, I believe it to be
unnecessary if the hardware simply gives us whatever channels it wants
to support and we then enable/disable them as required, which is pretty
much required for travelling.
Additionally, the patch adds proper "basic" rate handling for STA
mode interface, AP mode interface will have to have new API added
to allow userspace to set the basic rate set, currently it'll be
empty... However, the basic rate handling will need to be moved to
the BSS conf stuff.
I do expect there to be bugs in this, especially wrt. transmit
power handling where I'm basically clueless about how it should work.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
These handlers do not really return a status and the compiler
can do a much better job when they're simply static functions
that it can inline if appropriate. Also makes the code shorter.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds the ability to handle delBA from recipient to initiator
during an A-MPDU session
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds A-MPDU status report per STA to the debugfs.
The option to de/activate A-MPDU through debugfs is also present.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch initialize A-MPDU MLME data for Tx sessions.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch allows qdisc support in A-MPDU Tx. a method to
handle QoS <-> TID switches is present in this patch.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds the needed structures to describe the Tx aggregation MLME
per STA
new:
- struct tid_ampdu_tx: TID aggregation information (Tx)
changed:
- struct sta_ampdu_mlme: Tx aggregation information per TID and
dialog token creator were added
- struct sta_info: tid_to_tx_q added for tid<->tx queue mapping
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds the API for 3 stages in A-MPDU Tx session flow:
- request mac80211 to start/stop A-MPDU Tx session for specific TID. such a
request should be issued by a load aware element, either mac80211 itself
or external element.
- requests by mac80211 to low-level driver to start/stop Tx aggregation.
notice that low level driver responds now with Starting Sequence Number.
- async feedback by low-level to mac80211 to inform that HW is ready for
next A-MPDU Tx state.
Changes in API to Rx A-MPDU were also made, reflected in iwlwifi changes as
well.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When reworking the port access control code, I forgot multicast frames
and those are now always rejected because the destination station is
not known. This changes the code to allow through multicast frames and
also avoid the sta hash lookup (which is bound to fail) for them.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Only BSS_CHANGED_ASSOC was set in the 'changed' bitmask. Assignment to
bss_conf.assoc was absent.
This patch assign value to bss_conf.assoc according the association state.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The seq_file_operations' dev_mc_seq_xxx callbacks do the same thing as
the dev_seq_xxx ones do, but skip the SEQ_START_TOKEN.
So use the existing exported dev_seq_xxx calls and handle the
SEQ_START_TOKEN in the dev_mc_seq_show().
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It looks like dst parameter is used in this API due to historical
reasons. Actually, it is really used in the direct call to
tcp_v4_send_synack only. So, create a wrapper for tcp_v4_send_synack
and remove dst from rtx_syn_ack.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 3873 specifies several MIB objects that can't be obtained by the
current data set exported by /proc/sys/net/sctp/assoc. This patch
adds the missing pieces of data that allow us to compute all the
objects in the sctpAssocTable object.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some netfilter code and rxrpc one use seq_open() to open
a proc file, but seq_release_private to release one.
This is harmless, but ambiguous.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
These two also perform manual seq_open_private, so patch them both at
once. But unlike ATM code, these already use the seq_release_private,
so I splitted this patch from the previous one.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
lec_seq_open/lec_seq_release and __vcc_seq_open/vcc_seq_release
do seq_open/release_private's job.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In addition to commit 160f17 ("[SCTP]: Use proc_create() to setup
->proc_fops first") use proc_create in two more places.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
All preparations are done. Now just add a hook to perform an
initialization on namespace startup and replace icmpv6_sk macro with
proper inline call. Actual namespace the packet belongs too will be
passed later along with the one for the routing.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All preparations are done. Now just add a hook to perform an
initialization on namespace startup and replace icmp_sk macro with
proper inline call.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So, change icmp(v6)_sk creation/disposal to the scheme used in the
netlink for rtnl, i.e. create a socket in the context of the init_net
and assign the namespace without getting a referrence later.
Also use sk_release_kernel instead of sock_release to properly destroy
such sockets.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This staff will be needed for non-netlink kernel sockets, which should
also not pin a namespace like tcp_socket and icmp_socket.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge it to netlink_kernel_release.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Own __icmp(v6)_sk should be present in each namespace. So, it should be
allocated dynamically. Though, alloc_percpu does not fit the case as it
implies additional dereferrence for no bonus.
Allocate data for pointers just like __percpu_alloc_mask does and place
pointers to struct sock into this array.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have to get socket lock inside icmp(v6)_xmit_lock/unlock. The socket
is get from global variable now. When this code became namespaces, one
should pass a namespace and get socket from it.
Though, above is useless. Socket is available in the caller, just pass
it inside. This saves a bit of code now and saves more later.
add/remove: 0/0 grow/shrink: 1/3 up/down: 1/-169 (-168)
function old new delta
icmp_rcv 718 719 +1
icmpv6_rcv 2343 2303 -40
icmp_send 1566 1518 -48
icmp_reply 549 468 -81
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Basically, there is no difference, what to store: socket or sock. Though,
sock looks better as there will be 1 less dereferrence on the fast path.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use this macro only once in a function to save a bit of space.
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-98 (-98)
function old new delta
icmp_reply 562 561 -1
icmp_push_reply 305 258 -47
icmp_init 273 223 -50
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
icmp_init could fail and this is normal for namespace other than initial.
So, the panic should be triggered only on init_net initialization path.
Additionally create rollback path for icmp_init as a separate function.
It will also be used later during namespace destruction.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct net_proto_family* is not used in icmp[v6]_init, ndisc_init,
igmp_init and tcp_v4_init. Remove it.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use proc_create() to make sure that ->proc_fops be setup before gluing
PDE to main tree.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we've tightened up the locking rules for RPC queue wakeups, we can
remove the RCU-safe kfree calls...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This is designed to replace the timeout timer in the individual rpc_tasks.
By putting the timer function in the wait queue, we will eventually be able
to reduce the total number of timers in use by the RPC subsystem.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Change xfrm_policy and xfrm_state walking algorithm from O(n^2) to O(n).
This is achieved adding the entries to one more list which is used
solely for walking the entries.
This also fixes some races where the dump can have duplicate or missing
entries when the SPD/SADB is modified during an ongoing dump.
Dumping SADB with 20000 entries using "time ip xfrm state" the sys
time dropped from 1.012s to 0.080s.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the no longer used
EXPORT_SYMBOL_GPL(ip6_find_1stfragopt).
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Skip the prefix length matching in source address selection for
orchid -> non-orchid addresses.
Overlay Routable Cryptographic Hash IDentifiers (RFC 4843,
2001:10::/28) are currenty not globally reachable. Without this
check a host with an ORCHID address can end up preferring those over
regular addresses when talking to other regular hosts in the 2001::/16
range thus breaking non-orchid connections.
Signed-off-by: Juha-Matti Tapio <jmtapio@verkkotelakka.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new label for Overlay Routable Cryptographic Hash Identifiers
(RFC 4843) prefix 2001:10::/28 to help proper source address
selection.
ORCHID addresses are used by for example Host Identity Protocol. They are
global and routable, but they currently need support from both endpoints
and therefore mixing regular and ORCHID addresses for source and
destination is a bad idea in general case.
Signed-off-by: Juha-Matti Tapio <jmtapio@verkkotelakka.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The context is available from a network device passed in.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add namespace parameter to devinet_ioctl and locate device inside it for
state changes.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Show routing cache for a particular namespace only.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the other case /proc/net/rt_cache will look inconsistent in respect to
genid.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Device inside the namespace can be started and downed. So, active routing
cache should be cleaned up on device stop.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After all these preparations it is time to enable main IPv4 device
initialization routine inside namespace. It is safe do this now.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not calls hooks from device notifiers and disallow configuration from
ioctl/netlink layer.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Default ARP parameters should be findable regardless of the context.
Required to make inetdev_event working.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
neigh_sysctl_register should register sysctl entries inside correct namespace
to avoid naming conflict. Typical example is a loopback. Entries for it
present in all namespaces.
Required to make inetdev_event working.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_fib_init is kept enabled. It is already namespace-aware.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
They do exactly the same job.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a callback registered to inet address notifier chain.
The check is useless as:
- ifa->ifa_dev is always != NULL
- similar checks are abscent in all other notifiers.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use proc_create() to make sure that ->proc_fops be setup before gluing
PDE to main tree.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use proc_create() to make sure that ->proc_fops be setup before gluing
PDE to main tree.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use proc_create() to make sure that ->proc_fops be setup before gluing
PDE to main tree.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use proc_create() to make sure that ->proc_fops be setup before gluing
PDE to main tree.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use proc_create() to make sure that ->proc_fops be setup before gluing
PDE to main tree.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use proc_create() to make sure that ->proc_fops be setup before gluing
PDE to main tree.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use proc_create() to make sure that ->proc_fops be setup before gluing
PDE to main tree.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use proc_create() to make sure that ->proc_fops be setup before gluing
PDE to main tree.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use proc_create() to make sure that ->proc_fops be setup before gluing
PDE to main tree.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use proc_create() to make sure that ->proc_fops be setup before gluing
PDE to main tree.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use proc_create() to make sure that ->proc_fops be setup before gluing
PDE to main tree.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use proc_create() to make sure that ->proc_fops be setup before gluing
PDE to main tree.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new SCTP socket api (draft 16) updates the AUTH API structures.
We never exported these since we knew they would change.
Update the rest to match the draft.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
The chunks are stored inside a parameter structure in the kernel
and when we copy them to the user, we need to account for
the parameter header.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
I noticed while looking into some odd behavior in sctp, that the variable
name sctp_pf_inet6_specific was used twice to represent two different
pieces of data (its both a structure name and a pointer to that type of
structure), which is confusing to say the least, and potentially dangerous
depending on the variable scope. This patch cleans that up, and makes the
protocol and address family registration names in SCTP more regular,
increasing readability.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
ipv6.c | 12 ++++++------
protocol.c | 12 ++++++------
2 files changed, 12 insertions(+), 12 deletions(-)
As Davem mentioned in his recently patch
(d9595a7b9c)
that the procfs visibility should occur after
the ->proc_fops are setup.
And also, Alexey provide proc_create() to make
sure that ->proc_fops is setup before gluing PDE
to main tree.
We use proc_create().
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because we use shared tfm objects in order to conserve memory,
(each tfm requires 128K of vmalloc memory), BH needs to be turned
off on output as that can occur in process context.
Previously this was done implicitly by the xfrm output code.
That was lost when it became lockless. So we need to add the
BH disabling to IPComp directly.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "goto end;" part definitely must not be rate limited.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
sctp_assoc_change notification may contain the data from a received
ABORT chunk. Set the length correctly to account for that.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we're using RCU for the conntrack hash now, we need to avoid
getting preempted or interrupted by BHs while changing the stats.
Fixes warning reported by Tilman Schmidt <tilman@imap.cc> when using
preemptible RCU:
[ 48.180297] BUG: using smp_processor_id() in preemptible [00000000] code: ntpdate/3562
[ 48.180297] caller is __nf_conntrack_find+0x9b/0xeb [nf_conntrack]
[ 48.180297] Pid: 3562, comm: ntpdate Not tainted 2.6.25-rc2-mm1-testing #1
[ 48.180297] [<c02015b9>] debug_smp_processor_id+0x99/0xb0
[ 48.180297] [<fac643a7>] __nf_conntrack_find+0x9b/0xeb [nf_conntrack]
Tested-by: Tilman Schmidt <tilman@imap.cc>
Tested-by: Christian Casteyde <casteyde.christian@free.fr> [Bugzilla #10097]
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In error path, we do need to free memory just allocated.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Four tunnel drivers (ip_gre, ipip, ip6_tunnel and sit) can receive a
pre-defined name for a device from the userspace. Since these drivers
call the register_netdevice() (rtnl_lock, is held), which does _not_
generate the device's name, this name may contain a '%' character.
Not sure how bad is this to have a device with a '%' in its name, but
all the other places either use the register_netdev(), which call the
dev_alloc_name(), or explicitly call the dev_alloc_name() before
registering, i.e. do not allow for such names.
This had to be prior to the commit 34cc7b, but I forgot to number the
patches and this one got lost, sorry.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To make sure the procfs visibility occurs after the ->proc_fs ops are
setup, use proc_net_fops_create() and proc_net_remove().
This also fixes an OOPS after module unload in that the name string
for remove was wrong, so it wouldn't actually be removed. That bug
was introduced by commit 61145aa1a1
("[KEY]: Clean up proc files creation a bit.")
Signed-off-by: David S. Miller <davem@davemloft.net>
This bug did bite at least one user, who did have to resort to rebooting
the system after an "ifconfig eth0 127.0.0.1" typo.
Deleting the address and adding a new is a less intrusive workaround.
But I still beleive this is a bug that should be fixed. Some way or
another.
Another possibility would be to remove the scope mangling based on
address. This will always be incomplete (are 127/8 the only address
space with host scope requirements?)
We set the scope to RT_SCOPE_HOST if an IPv4 interface is configured
with a loopback address (127/8). The scope is never reset, and will
remain set to RT_SCOPE_HOST after changing the address. This patch
resets the scope if the address is changed again, to restore normal
functionality.
Signed-off-by: Bjorn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add some more missing initializations of the new nl_info.nl_net field
in IPv6 stack. This field will be used when network namespaces are
fully supported.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Delete a possibly armed timer before kfree'ing the connection object.
Solves: http://lkml.org/lkml/2008/2/15/514
Reported-by:Quel Qun <kelk1@comcast.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
An audit of the current RPC timeout functions shows that they don't really
ever need to run in the softirq context. As long as the softirq is
able to signal that the wakeup is due to a timeout (which it can do by
setting task->tk_status to -ETIMEDOUT) then the callback functions can just
run as standard task->tk_callback functions (in the rpciod/process
context).
The only possible border-line case would be xprt_timer() for the case of
UDP, when the callback is used to reduce the size of the transport
congestion window. In testing, however, the effect of moving that update
to a callback would appear to be minor.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In all cases where we currently use rpc_wake_up_task(), we almost always
know on which waitqueue the rpc_task is actually sleeping. This will allows
us to simplify the queue locking in a future patch.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
All RPC timeout callback functions are expected to wake the task up. We can
enforce this by moving the wakeup back into rpc_run_timer.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
A lot of the work done by the rpc_release() callback is inappropriate for
rpciod as it will often involve things like starting a new rpc call in
order to clean up state after an interrupted NFSv4 open() call, or
calls to mntput(), etc.
This patch allows the caller of rpc_run_task() to specify that the
rpc_release callback should run on a different workqueue than the default
rpciod_workqueue.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
net/tipc/cluster.c:145:2: warning: Using plain integer as NULL pointer
net/tipc/link.c:3254:36: warning: Using plain integer as NULL pointer
net/tipc/ref.c:151:15: warning: Using plain integer as NULL pointer
net/tipc/zone.c:85:2: warning: Using plain integer as NULL pointer
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits)
[NETFILTER]: fix ebtable targets return
[IP_TUNNEL]: Don't limit the number of tunnels with generic name explicitly.
[NET]: Restore sanity wrt. print_mac().
[NEIGH]: Fix race between neighbor lookup and table's hash_rnd update.
[RTNL]: Validate hardware and broadcast address attribute for RTM_NEWLINK
tg3: ethtool phys_id default
[BNX2]: Update version to 1.7.4.
[BNX2]: Disable parallel detect on an HP blade.
[BNX2]: More 5706S link down workaround.
ssb: Fix support for PCI devices behind a SSB->PCI bridge
zd1211rw: fix sparse warnings
rtl818x: fix sparse warnings
ssb: Fix pcicore cardbus mode
ssb: Make the GPIO API reentrancy safe
ssb: Fix the GPIO API
ssb: Fix watchdog access for devices without a chipcommon
ssb: Fix serial console on new bcm47xx devices
ath5k: Fix build warnings on some 64-bit platforms.
WDEV, ath5k, don't return int from bool function
WDEV: ath5k, fix lock imbalance
...
The function ebt_do_table doesn't take NF_DROP as a verdict from the targets.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the added dev_alloc_name() call to create tunnel device name,
rather than iterate in a hand-made loop with an artificial limit.
Thanks Patrick for noticing this.
[ The way this works is, when the device is actually registered,
the generic code noticed the '%' in the name and invokes
dev_alloc_name() to fully resolve the name. -DaveM ]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
MAC_FMT had only one user and we tried to get rid of
that, but this created more problems than it solved.
As a result, this reverts three commits:
235365f3aa ("net/8021q/vlan_dev.c: Use
print_mac."), fea5fa875e ("[NET]: Remove
MAC_FMT"), and 8f789c4844 ("[NET]:
Elminate spurious print_mac() calls.")
Signed-off-by: David S. Miller <davem@davemloft.net>
The neigh_hash_grow() may update the tbl->hash_rnd value, which
is used in all tbl->hash callbacks to calculate the hashval.
Two lookup routines may race with this, since they call the
->hash callback without the tbl->lock held. Since the hash_rnd
is changed with this lock write-locked moving the calls to ->hash
under this lock read-locked closes this gap.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
RTM_NEWLINK allows for already existing links to be modified. For this
purpose do_setlink() is called which expects address attributes with a
payload length of at least dev->addr_len. This patch adds the necessary
validation for the RTM_NEWLINK case.
The address length for links to be created is not checked for now as the
actual attribute length is used when copying the address to the netdevice
structure. It might make sense to report an error if less than addr_len
bytes are provided but enforcing this might break drivers trying to be
smart with not transmitting all zero addresses.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
During the last step of hibernation in the "platform" mode (with the
help of ACPI) we use the suspend code, including the devices'
->suspend() methods, to prepare the system for entering the ACPI S4
system sleep state.
But at least for some devices the operations performed by the
->suspend() callback in that case must be different from its operations
during regular suspend.
For this reason, introduce the new PM event type PM_EVENT_HIBERNATE and
pass it to the device drivers' ->suspend() methods during the last phase
of hibernation, so that they can distinguish this case and handle it as
appropriate. Modify the drivers that handle PM_EVENT_SUSPEND in a
special way and need to handle PM_EVENT_HIBERNATE in the same way.
These changes are necessary to fix a hibernation regression related
to the i915 driver (ref. http://lkml.org/lkml/2008/2/22/488).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sorry for the noise, but here's the v3 of this compilation fix :)
There are some places, which declare the char buf[...] on the stack
to push it later into dprintk(). Since the dprintk sometimes (if the
CONFIG_SYSCTL=n) becomes an empty do { } while (0) stub, these buffers
cause gcc to produce appropriate warnings.
Wrap these buffers with RPC_IFDEBUG macro, as Trond proposed, to
compile them out when not needed.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
release_net is missed on the error path in pneigh_lookup.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This counter is currently write-only.
Drawing an analogy with the similar tcp counter, I think
that this one should be pointed by the sockets_allocated
members of sctp_prot and sctpv6_prot.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The host address parts need to be converted to host-endian first
before arithmetic makes any sense on them.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
By allocating ->hinfo, we already have the needed indirection to cope
with the per-cpu xtables struct match_entry.
[Patrick: do this now before the revision 1 struct is used by userspace]
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
http://bugzilla.kernel.org/show_bug.cgi?id=9920
The function skb_make_writable returns true or false.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The int ret variable is used only to trigger the BUG_ON() after
the skb_copy_bits() call, so check the call failure directly
and drop the variable.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
As reported by Tomas Simonaitis <tomas.simonaitis@gmail.com>,
inserting new data in skbs queued over {ip,ip6,nfnetlink}_queue
triggers a SKB_LINEAR_ASSERT in skb_put().
Going back through the git history, it seems this bug is present since
at least 2.6.12-rc2, probably even since the removal of
skb_linearize() for netfilter.
Linearize non-linear skbs through skb_copy_expand() when enlarging
them. Tested by Thomas, fixes bugzilla #9933.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unless I miss a guaranteed relation between between "f" and
"new_fa->fa_info" this patch is required for fixing a NULL dereference
introduced by commit a6501e080c ("[IPV4]
FIB_HASH: Reduce memory needs and speedup lookups") and spotted by the
Coverity checker.
Eric Dumazet says:
Hum, you are right, kmem_cache_free() doesnt allow a NULL
object, like kfree() does.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Coverity checker spotted that less memory than required was
allocated.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IFLA_LINK is no longer a write-only attribute on the kernel side and
must thus be validated. Same goes for the newly introduced
IFLA_LINKINFO.
Fixes undefined behaviour if either of the attributes are not well
formed.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit a0a400d79e ("[NET]: dev_mcast:
add multicast list synchronization helpers") from you introduced a new
field "da_synced" to struct dev_addr_list that is not properly
initialized to 0. So when any of the current users (8021q, macvlan,
mac80211) calls dev_mc_sync/unsync they mess the address list for both
devices.
The attached patch fixed it for me and avoid future problems.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (60 commits)
[NIU]: Bump driver version and release date.
[NIU]: Fix BMAC alternate MAC address indexing.
net: fix kernel-doc warnings in header files
[IPV6]: Use BUG_ON instead of if + BUG in fib6_del_route.
[IPV6]: dst_entry leak in ip4ip6_err. (resend)
bluetooth: do not move child device other than rfcomm
bluetooth: put hci dev after del conn
[NET]: Elminate spurious print_mac() calls.
[BLUETOOTH] hci_sysfs.c: Kill build warning.
[NET]: Remove MAC_FMT
net/8021q/vlan_dev.c: Use print_mac.
[XFRM]: Fix ordering issue in xfrm_dst_hash_transfer().
[BLUETOOTH] net/bluetooth/hci_core.c: Use time_* macros
[IPV6]: Fix hardcoded removing of old module code
[NETLABEL]: Move some initialization code into __init section.
[NETLABEL]: Shrink the genl-ops registration code.
[AX25] ax25_out: check skb for NULL in ax25_kick()
[TCP]: Fix tcp_v4_send_synack() comment
[IPV4]: fix alignment of IP-Config output
Documentation: fix tcp.txt
...
The result of the ip_route_output is not assigned to skb. This means that
- it is leaked
- possible OOPS below dereferrencing skb->dst
- no ICMP message for this case
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
hci conn child devices other than rfcomm tty should not be moved here.
This is my lost, thanks for Barnaby's reporting and testing.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move hci_dev_put to del_conn to avoid hci dev going away before hci conn.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/bluetooth/hci_sysfs.c: In function ‘del_conn’:
net/bluetooth/hci_sysfs.c:339: warning: suggest parentheses around assignment used as truth value
Signed-off-by: David S. Miller <davem@davemloft.net>
Keep ordering of policy entries with same selector in
xfrm_dst_hash_transfer().
Issue should not appear in usual cases because multiple policy entries
with same selector are basically not allowed so far. Bug was pointed
out by Sebastien Decugis <sdecugis@hongo.wide.ad.jp>.
We could convert bydst from hlist to list and use list_add_tail()
instead.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Sebastien Decugis <sdecugis@hongo.wide.ad.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
The functions time_before, time_before_eq, time_after, and
time_after_eq are more robust for comparing jiffies against other
values.
So following patch implements usage of the time_after() macro, defined
at linux/jiffies.h, which deals with wrapping correctly
Signed-off-by: S.Çağlar Onur <caglar@pardus.org.tr>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rusty hardcoded the old module code.
We can remove it now.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Everything that is called from netlbl_init() can be marked with
__init. This moves 620 bytes from .text section to .text.init one.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Turning them to array and registration in a loop saves
80 lines of code and ~300 bytes from text section.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to some OOPS reports ax25_kick tries to clone NULL skbs
sometimes. It looks like a race with ax25_clear_queues(). Probably
there is no need to add more than a simple check for this yet.
Another report suggested there are probably also cases where ax25
->paclen == 0 can happen in ax25_output(); this wasn't confirmed
during testing but let's leave this debugging check for some time.
Reported-and-tested-by: Jann Traschewski <jann@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the indented lines aligned in the output (not in the code).
Signed-off-by: Uwe Kleine-Koenig <Uwe.Kleine-Koenig@digi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
side-effects to allow a definition of BUG_ON that drops the code completely.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@ disable unlikely @ expression E,f; @@
(
if (<... f(...) ...>) { BUG(); }
|
- if (unlikely(E)) { BUG(); }
+ BUG_ON(E);
)
@@ expression E,f; @@
(
if (<... f(...) ...>) { BUG(); }
|
- if (E) { BUG(); }
+ BUG_ON(E);
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
side-effects to allow a definition of BUG_ON that drops the code completely.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@ disable unlikely @ expression E,f; @@
(
if (<... f(...) ...>) { BUG(); }
|
- if (unlikely(E)) { BUG(); }
+ BUG_ON(E);
)
@@ expression E,f; @@
(
if (<... f(...) ...>) { BUG(); }
|
- if (E) { BUG(); }
+ BUG_ON(E);
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 69cc64d8d9.
It causes recursive locking in IPV6 because unlike other
neighbour layer clients, it even needs neighbour cache
entries to send neighbour soliciation messages :-(
We'll have to find another way to fix this race.
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 45b5035482.
It break locking around dev->link_mode as well as cause
other bootup problems.
Signed-off-by: David S. Miller <davem@davemloft.net>
On the initial device-open we need to defer the hardware reconfiguration
after we incremented the open_count, because the hw_config checks this flag
and won't call the lowlevel driver in case it is zero.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (82 commits)
[NET]: Make sure sockets implement splice_read
netconsole: avoid null pointer dereference at show_local_mac()
[IPV6]: Fix reversed local_df test in ip6_fragment
[XFRM]: Avoid bogus BUG() when throwing new policy away.
[AF_KEY]: Fix bug in spdadd
[NETFILTER] nf_conntrack_proto_tcp.c: Mistyped state corrected.
net: xfrm statistics depend on INET
[NETFILTER]: make secmark_tg_destroy() static
[INET]: Unexport inet_listen_wlock
[INET]: Unexport __inet_hash_connect
[NET]: Improve cache line coherency of ingress qdisc
[NET]: Fix race in dev_close(). (Bug 9750)
[IPSEC]: Fix bogus usage of u64 on input sequence number
[RTNETLINK]: Send a single notification on device state changes.
[NETLABLE]: Hide netlbl_unlabel_audit_addr6 under ifdef CONFIG_IPV6.
[NETLABEL]: Don't produce unused variables when IPv6 is off.
[NETLABEL]: Compilation for CONFIG_AUDIT=n case.
[GENETLINK]: Relax dances with genl_lock.
[NETLABEL]: Fix lookup logic of netlbl_domhsh_search_def.
[IPV6]: remove unused method declaration (net/ndisc.h).
...
Fixes a segmentation fault when trying to splice from a non-TCP socket.
Signed-off-by: Rémi Denis-Courmont <rdenis@simphalempin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I managed to reverse the local_df test when forward-porting this
patch so it actually makes things worse by never fragmenting at
all.
Thanks to David Stevens for testing and reporting this bug.
Bill Fink pointed out that the local_df setting is also the wrong
way around.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Add path_put() functions for releasing a reference to the dentry and
vfsmount of a struct path in the right order
* Switch from path_release(nd) to path_put(&nd->path)
* Rename dput_path() to path_put_conditional()
[akpm@linux-foundation.org: fix cifs]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the central patch of a cleanup series. In most cases there is no good
reason why someone would want to use a dentry for itself. This series reflects
that fact and embeds a struct path into nameidata.
Together with the other patches of this series
- it enforced the correct order of getting/releasing the reference count on
<dentry,vfsmount> pairs
- it prepares the VFS for stacking support since it is essential to have a
struct path in every place where the stack can be traversed
- it reduces the overall code size:
without patch series:
text data bss dec hex filename
5321639 858418 715768 6895825 6938d1 vmlinux
with patch series:
text data bss dec hex filename
5320026 858418 715768 6894212 693284 vmlinux
This patch:
Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix cifs]
[akpm@linux-foundation.org: fix smack]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
From: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
When we destory a new policy entry, we need to tell
xfrm_policy_destroy() explicitly that the entry is not
alive yet.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fix a BUG when adding spds which have same selector.
Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/built-in.o: In function `xfrm_policy_init':
/home/pmundt/devel/git/sh-2.6.25/net/xfrm/xfrm_policy.c:2338: undefined reference to `snmp_mib_init'
snmp_mib_init() is only built in if CONFIG_INET is set.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the needlessly global secmark_tg_destroy() static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the no longer used EXPORT_SYMBOL(inet_listen_wlock).
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the unused EXPORT_SYMBOL_GPL(__inet_hash_connect).
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move networking (core and drivers) docbook to its own networking book.
Fix a few kernel-doc errors in header and source files.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use updated file list for docbook files and
fix kernel-doc warnings in sunrpc:
Warning(linux-2.6.24-git12//net/sunrpc/rpc_pipe.c:689): No description found for parameter 'rpc_client'
Warning(linux-2.6.24-git12//net/sunrpc/rpc_pipe.c:765): No description found for parameter 'flags'
Warning(linux-2.6.24-git12//net/sunrpc/clnt.c:584): No description found for parameter 'tk_ops'
Warning(linux-2.6.24-git12//net/sunrpc/clnt.c:618): No description found for parameter 'bufsize'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fastcall always expands to empty, remove it.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a race in Linux kernel file net/core/dev.c, function dev_close.
The function calls function dev_deactivate, which calls function
dev_watchdog_down that deletes the watchdog timer. However, after that, a
driver can call netif_carrier_ok, which calls function
__netdev_watchdog_up that can add the watchdog timer again. Function
unregister_netdevice calls function dev_shutdown that traps the bug
!timer_pending(&dev->watchdog_timer). Moving dev_deactivate after
netif_running() has been cleared prevents function netif_carrier_on
from calling __netdev_watchdog_up and adding the watchdog timer again.
Signed-off-by: Matti Linnanvuori <mattilinnanvuori@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro spotted a bogus use of u64 on the input sequence number which
is big-endian. This patch fixes it by giving the input sequence number
its own member in the xfrm_skb_cb structure.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
In do_setlink() a single notification is sent at the end of the
function if any modification occured. If the address has been changed,
another notification is sent.
Both of them is required because originally only the NETDEV_CHANGEADDR
notification was sent and although device state change implies address
change, some programs may expect the original notification. It remains
for compatibity.
If set_operstate() is called from do_setlink(), it doesn't send a
notification, only if it is called from rtnl_create_link() as earlier.
Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
This one is called from under this config only, so move
it in the same place.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some code declares variables on the stack, but uses them
under #ifdef CONFIG_IPV6, so thay become unused when ipv6
is off. Fortunately, they are used in a switch's case
branches, so the fix is rather simple.
Is it OK from coding style POV to add braces inside "cases",
or should I better avoid such style and rework the patch?
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The audit_log_start() will expand into an empty do { } while (0)
construction and the audit_ctx becomes unused.
The solution: push current->audit_context into audit_log_start()
directly, since it is not required in any other place in the
calling function.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The genl_unregister_family() calls the genl_unregister_mc_groups(),
which takes and releases the genl_lock and then locks and releases
this lock itself.
Relax this behavior, all the more so the genl_unregister_mc_groups()
is called from genl_unregister_family() only.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, if the call to netlbl_domhsh_search succeeds the
return result will still be NULL.
Fix that, by returning the found entry (if any).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a long-standing bug in the IPsec IPv6 code that breaks
when we emit a IPsec tunnel-mode datagram packet. The problem
is that the code the emits the packet assumes the IPv6 stack
will fragment it later, but the IPv6 stack assumes that whoever
is emitting the packet is going to pre-fragment the packet.
In the long term we need to fix both sides, e.g., to get the
datagram code to pre-fragment as well as to get the IPv6 stack
to fragment locally generated tunnel-mode packet.
For now this patch does the second part which should make it
work for the IPsec host case.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Frank Blaschka provided the bug report and the initial suggested fix
for this bug. He also validated this version of this fix.
The problem is that the access to neigh->arp_queue is inconsistent, we
grab references when dropping the lock lock to call
neigh->ops->solicit() but this does not prevent other threads of
control from trying to send out that packet at the same time causing
corruptions because both code paths believe they have exclusive access
to the skb.
The best option seems to be to hold the write lock on neigh->lock
during the ->solicit() call. I looked at all of the ndisc_ops
implementations and this seems workable. The only case that needs
special care is the IPV4 ARP implementation of arp_solicit(). It
wants to take neigh->lock as a reader to protect the header entry in
neigh->ha during the emission of the soliciation. We can simply
remove the read lock calls to take care of that since holding the lock
as a writer at the caller providers a superset of the protection
afforded by the existing read locking.
The rest of the ->solicit() implementations don't care whether the
neigh is locked or not.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes current use of: init_timer(), add_timer()
and del_timer() to setup_timer() with mod_timer(), which
should be safer anyway.
Reported-by: Jann Traschewski <jann@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to one of Jann's OOPS reports it looks like
BUG_ON(timer_pending(timer)) triggers during add_timer()
in ax25_start_t1timer(). This patch changes current use
of: init_timer(), add_timer() and del_timer() to
setup_timer() with mod_timer(), which should be safer
anyway.
Reported-by: Jann Traschewski <jann@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
> =================================
> [ INFO: inconsistent lock state ]
> 2.6.24-dg8ngn-p02 #1
> ---------------------------------
> inconsistent {softirq-on-W} -> {in-softirq-R} usage.
> linuxnet/3046 [HC0[0]:SC1[2]:HE1:SE0] takes:
> (ax25_route_lock){--.+}, at: [<f8a0cfb7>] ax25_get_route+0x18/0xb7 [ax25]
> {softirq-on-W} state was registered at:
...
This lockdep report shows that ax25_route_lock is taken for reading in
softirq context, and for writing in process context with BHs enabled.
So, to make this safe, all write_locks in ax25_route.c are changed to
_bh versions.
Reported-by: Jann Traschewski <jann@gmx.de>,
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This lockdep warning:
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.24 #3
> -------------------------------------------------------
> swapper/0 is trying to acquire lock:
> (ax25_list_lock){-+..}, at: [<f91dd3b1>] ax25_destroy_socket+0x171/0x1f0 [ax25]
>
> but task is already holding lock:
> (slock-AF_AX25){-+..}, at: [<f91dbabc>] ax25_std_heartbeat_expiry+0x1c/0xe0 [ax25]
>
> which lock already depends on the new lock.
...
shows that ax25_list_lock and slock-AF_AX25 are taken in different
order: ax25_info_show() takes slock (bh_lock_sock(ax25->sk)) while
ax25_list_lock is held, so reversely to other functions. To fix this
the sock lock should be moved to ax25_info_start(), and there would
be still problem with breaking ax25_list_lock (it seems this "proper"
order isn't optimal yet). But, since it's only for reading proc info
it seems this is not necessary (e.g. ax25_send_to_raw() does similar
reading without this lock too).
So, this patch removes sock lock to avoid deadlock possibility; there
is also used sock_i_ino() function, which reads sk_socket under proper
read lock. Additionally printf format of this i_ino is changed to %lu.
Reported-by: Bernard Pidoux F6BVP <f6bvp@free.fr>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use key/offset caching to change /proc/net/route (use by iputils route)
from O(n^2) to O(n). This improves performance from 30sec with 160,000
routes to 1sec.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes possible problems when trie_firstleaf() returns NULL
to trie_leafindex().
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Various RFCs have all sorts of things to say about the CS field of the
DSCP value. In particular they try to make the distinction between
values that should be used by "user applications" and things like
routing daemons.
This seems to have influenced the CAP_NET_ADMIN check which exists for
IP_TOS socket option settings, but in fact it has an off-by-one error
so it wasn't allowing CS5 which is meant for "user applications" as
well.
Further adding to the inconsistency and brokenness here, IPV6 does not
validate the DSCP values specified for the IPV6_TCLASS socket option.
The real actual uses of these TOS values are system specific in the
final analysis, and these RFC recommendations are just that, "a
recommendation". In fact the standards very purposefully use
"SHOULD" and "SHOULD NOT" when describing how these values can be
used.
In the final analysis the only clean way to provide consistency here
is to remove the CAP_NET_ADMIN check. The alternatives just don't
work out:
1) If we add the CAP_NET_ADMIN check to ipv6, this can break existing
setups.
2) If we just fix the off-by-one error in the class comparison in
IPV4, certain DSCP values can be used in IPV6 but not IPV4 by
default. So people will just ask for a sysctl asking to
override that.
I checked several other freely available kernel trees and they
do not make any privilege checks in this area like we do. For
the BSD stacks, this goes back all the way to Stevens Volume 2
and beyond.
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://linux-nfs.org/~bfields/linux:
SUNPRC: Fix printk format warning
nfsd: clean up svc_reserve_auth()
NLM: don't requeue block if it was invalidated while GRANT_MSG was in flight
NLM: don't reattempt GRANT_MSG when there is already an RPC in flight
NLM: have server-side RPC clients default to soft RPC tasks
NLM: set RPC_CLNT_CREATE_NOPING for NLM RPC clients
net/sunrpc/xprtrdma/svc_rdma_sendto.c:160: warning: format '%llx'
expects type 'long long unsigned int', but argument 3 has type 'u64'
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Commit 954415e33e ("[PKT_SCHED] ematch:
tcf_em_destroy robustness") removed a cast on em->data when
passing it to kfree(), but em->data is an integer type that can
hold pointers as well as other values so the cast is necessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
htb_requeue() enqueues skbs for which htb_classify() returns NULL.
This is wrong because such skbs could be handled by NET_CLS_ACT code,
and the decision could be different than earlier in htb_enqueue().
So htb_requeue() is changed to work and look more like htb_enqueue().
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch replaces the explicit usage of the magic constant "1024"
with IP6_RT_PRIO_USER in the IPV6 tree.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the code in tcf_em_tree_destroy more robust and cleaner:
* Don't need to cast pointer to kfree() or avoid passing NULL.
* After freeing the tree, clear the pointer to avoid possible problems
from repeated free.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A couple of functions in meta match don't need to be inline.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes the code use a good proc API and the text ~50 bytes shorter.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SCPT already depends in INET, so this doesn't create additional
dependencies.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The seq files API disposes the caller of the difficulty of
checking file position, the length of data to produce and
the size of provided buffer.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mainly this removes ifdef-s from inside the ipsec_pfkey_init.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's already an option controlling the net namespaces cloning code, so make
it work the same way as all the other namespaces' options.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1) We can shrink sizeof(struct flow_cache_entry) by 8 bytes on 64bit arches.
2) No need to align these structures to hardware cache lines, this only waste
ram for very litle gain.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Same alignment requirement was removed on IP route cache in the past.
This alignment actually has bad effect on 32 bit arches, uniprocessor,
since sizeof(dn_rt_hash_bucket) is forced to 8 bytes instead of 4.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The below patch allows IPsec to use CTR mode with AES encryption
algorithm. Tested this using setkey in ipsec-tools.
Signed-off-by: Joy Latten <latten@austin.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
All these static inlines are unused:
in_own_zone 1 (net/tipc/addr.h)
msg_dataoctet 1 (net/tipc/msg.h)
msg_direct 1 (include/net/tipc/tipc_msg.h)
msg_options 1 (include/net/tipc/tipc_msg.h)
tipc_nmap_get 1 (net/tipc/bcast.h)
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
If userspace passes a unknown match index into em_meta, then
em_meta_change will return an error and the data for the match will
not be set. This then causes an null pointer dereference when the
cleanup is done in the error path via tcf_em_tree_destroy. Since the
tree structure comes kzalloc, it is initialized to NULL.
Discovered when testing a new version of tc command against an
accidental older kernel.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The loop in iucv_callback_txdone presumes existence of an entry
with msg->tag in the send_skb_q list. In error cases this
assumption might be wrong and might cause an endless loop.
Loop is rewritten to guarantee loop end in case of missing
msg->tag entry in send_skb_q.
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A race has been detected in iucv_callback_txdone().
skb_unlink has to be done inside the locked area.
In addition checkings for successful allocations are inserted.
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linux may hang when running af_iucv socket programs concurrently
with a load of module netiucv. iucv_register() tries to take the
iucv_table_lock with spin_lock_irq. This conflicts with
iucv_connect() which has a need for an smp_call_function while
holding the iucv_table_lock.
Solution: use bh-disabling locking in iucv_register()
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unneeded variable.
Rename local variable error to err like in all other places.
Some white-space changes.
Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The implementation of proto_register() has changed so that it can now
sleep. The call to proto_register() must be moved out of the
spin-locked region.
Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove local char array to construct module name.
Don't call request_module() when CONFIG_KMOD is not set.
Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
We use a percpu variable named flow_hash_info, which holds 12 bytes.
It is currently marked as ____cacheline_aligned, which makes linker
skip space to properly align this variable.
Before :
c065cc90 D per_cpu__softnet_data
c065cd00 d per_cpu__flow_tables
<Here, hole of 124 bytes>
c065cd80 d per_cpu__flow_hash_info
<Here, hole of 116 bytes>
c065ce00 d per_cpu__flow_flush_tasklets
c065ce14 d per_cpu__rt_cache_stat
This alignement is quite unproductive, and removing it reduces the
size of percpu data (by 240 bytes on my x86 machine), and improves
performance (flow_tables & flow_hash_info can share a single cache
line)
After patch :
c065cc04 D per_cpu__softnet_data
c065cc4c d per_cpu__flow_tables
c065cc50 d per_cpu__flow_hash_info
c065cc5c d per_cpu__flow_flush_tasklets
c065cc70 d per_cpu__rt_cache_stat
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_route_me_harder() may call ip_route_input() with skbs that don't
have skb->dev set for skbs rerouted in LOCAL_OUT and TCP resets
generated by the REJECT target, resulting in a crash when dereferencing
skb->dev->nd_net. Since ip_route_input() has an input device argument,
it seems correct to use that one anyway.
Bug introduced in b5921910a1 (Routing cache virtualization).
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The family for iprange_mt4 should be AF_INET, not AF_INET6.
Noticed by Jiri Moravec <jim.lkml@gmail.com>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ->move operation has two bugs:
- It is called with the same extension as source and destination,
so it doesn't update the new extension.
- The address of the old extension is calculated incorrectly,
instead of (void *)ct->ext + ct->ext->offset[i] it uses
ct->ext + ct->ext->offset[i].
Fixes a crash on x86_64 reported by Chuck Ebbert <cebbert@redhat.com>
and Thomas Woerner <twoerner@redhat.com>.
Tested-by: Thomas Woerner <twoerner@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP connection tracking in netfilter did not handle TCP reopening
properly: active close was taken into account for one side only and
not for any side, which is fixed now. The patch includes more comments
to explain the logic how the different cases are handled.
The bug was discovered by Jeff Chua.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a few instances, we need to remove the chunk from the transmitted list
prior to freeing it. This is because the free code doesn't do that any
more and so we need to do it manually.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
While recevied ASCONF chunk with serial number less then needed, kernel
will treat this chunk as a retransmitted ASCONF chunk and find cached
ASCONF-ACK chunk used sctp_assoc_lookup_asconf_ack(). But this function
will always return NO-NULL. So response with cached ASCONF-ACKs chunk
will cause kernel panic.
In function sctp_assoc_lookup_asconf_ack(), if the cached ASCONF-ACKs
list asconf_ack_list is empty, or if the serial being requested does not
exists, the function as it currectly stands returns the actuall
list_head asoc->asconf_ack_list, this is not a cache ASCONF-ACK chunk
but a bogus pointer.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Thomas Dreibholz has reported that port numbers are not filled
in the results of sctp_getladdrs() when the socket was bound
to an ephemeral port. This is only true, if the address was
not specified either. So, fill in the port number correctly.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
When we recieve a FORWARD_TSN chunk, we need to reap
all the queued fast-forwarded chunks from the ordering queue
However, if we don't have them queued, we need to see if
the next expected one is there as well. If it is, start
deliver from that point instead of waiting for the next
chunk to arrive.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
ERROR: "p9_printfcall" [net/9p/9pnet_virtio.ko] undefined!
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
This merges the mux.c (including the connection interface) with trans_fd
in preparation for transport API changes. Ultimately, trans_fd will need
to be rewritten to clean it up and simplify the implementation, but this
reorganization is viewed as the first step.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Request from rusty:
Just cleaning up patches for 2.6.25 merge, and noticed that
net/9p/trans_virtio.c doesn't have a remove function. This will crash when
removing the module (console doesn't have one because it can't really be
removed).
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
When booting from v9fs, down_interruptible in p9_idpool_get() triggered a BUG
as it was being called with IRQs disabled. A spinlock seems like the right
thing to be using since the idr functions go out of their way not to sleep.
This patch eliminates the BUG by converting the semaphore to a spinlock.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
This replaces the console-based virto client with a block-based
client using a single request queue.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Add a new transport function which allows a cut-thru directly to
the transport instead of processing request through the mux if the
cut-thru exists.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This patch fixes a bug in the copying of 9P
stat information where string references
weren't being updated properly.
Signed-off-by: Martin Sava <martin.stava@gmail.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
No available servers is more an error message than something informational. It
should also be rate-limited, else we're going to flood our logs on a busy
director, if all real servers are out of order with a weight of zero.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we're using fls(), we need to check whether the value is
non-zero first.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/sched/em_meta.c: In function 'meta_int_vlan_tag':
net/sched/em_meta.c:179: warning: 'tag' may be used uninitialized in this function
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the EXPERIMENTAL dependency, as the existing mac80211
features are stable.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add a new set of configuration functions to the NetLabel/LSM API so that
LSMs can perform their own configuration of the NetLabel subsystem without
relying on assistance from userspace.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a user reads a partial notification message, do not
update rwnd since notifications must not be counted towards
receive window.
Tested-by: Oliver Roll <mail@oliroll.de>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
I was notified by Randy Stewart that lksctp claims to be
"the reference implementation". First of all, "the
refrence implementation" was the original implementation
of SCTP in usersapce written ty Randy and a few others.
Second, after looking at the definiton of 'reference implementation',
we don't really meet the requirements.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Provide a way to use tc filters on vlan tag even if tag is buried in
skb due to hardware acceleration.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the following no longer used functions:
- rtattr_parse()
- rtattr_strlcpy()
- __rtattr_parse_nested_compat()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Somewhere along the development of my ICMP relookup patch the header
length check went AWOL on the non-IPsec path. This patch restores the
check.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The port offset calculations depend on the protocol family, but, as
Adrian noticed, I broke this logic with the commit
5ee31fc1ec
[INET]: Consolidate inet(6)_hash_connect.
Return this logic back, by passing the port offset directly into the
consolidated function.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Noticed-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
rfcomm dev could be deleted in tty_hangup, so we must not call
rfcomm_dev_del again to prevent from destroying rfcomm dev before tty
close.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove all those inlines which were either a) unneeded or b) increased code
size.
text data bss dec hex filename
before: 6997 74 8 7079 1ba7 net/bluetooth/hidp/core.o
after: 6492 74 8 6574 19ae net/bluetooth/hidp/core.o
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the bluetooth HID spec v1.0 chapter 7.4.2
"This code requests a major state change in a BT-HID device. A HID_CONTROL
request does not generate a HANDSHAKE response."
"A HID_CONTROL packet with a parameter of VIRTUAL_CABLE_UNPLUG is the only
HID_CONTROL packet a device can send to a host. A host will ignore all other
packets."
So in the hidp_precess_hid_control function, we just need to deal with the
UNLUG packet.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If SCTP-AUTH is enabled, received AUTH chunk with BAD shared key
identifier will cause kernel panic.
Test as following:
step1: enabled /proc/sys/net/sctp/auth_enable
step 2: connect to SCTP server with auth capable. Association is
established between endpoints. Then send a AUTH chunk with a bad
shareid, SCTP server will kernel panic after received that AUTH chunk.
SCTP client SCTP server
INIT ---------->
(with auth capable)
<---------- INIT-ACK
(with auth capable)
COOKIE-ECHO ---------->
<---------- COOKIE-ACK
AUTH ---------->
AUTH chunk is like this:
AUTH chunk
Chunk type: AUTH (15)
Chunk flags: 0x00
Chunk length: 28
Shared key identifier: 10
HMAC identifier: SHA-1 (1)
HMAC: 0000000000000000000000000000000000000000
The assignment of NULL to key can safely be removed, since key_for_each
(which is just list_for_each_entry under the covers does an initial
assignment to key anyway).
If the endpoint_shared_keys list is empty, or if the key_id being
requested does not exist, the function as it currently stands returns
the actuall list_head (in this case endpoint_shared_keys. Since that
list_head isn't surrounded by an actuall data structure, the last
iteration through list_for_each_entry will do a container_of on key, and
we wind up returning a bogus pointer, instead of NULL, as we should.
> Neil Horman wrote:
>> On Tue, Jan 22, 2008 at 05:29:20PM +0900, Wei Yongjun wrote:
>>
>> FWIW, Ack from me. The assignment of NULL to key can safely be
>> removed, since
>> key_for_each (which is just list_for_each_entry under the covers does
>> an initial
>> assignment to key anyway).
>> If the endpoint_shared_keys list is empty, or if the key_id being
>> requested does
>> not exist, the function as it currently stands returns the actuall
>> list_head (in
>> this case endpoint_shared_keys. Since that list_head isn't
>> surrounded by an
>> actuall data structure, the last iteration through
>> list_for_each_entry will do a
>> container_of on key, and we wind up returning a bogus pointer,
>> instead of NULL,
>> as we should. Wei's patch corrects that.
>>
>> Regards
>> Neil
>>
>> Acked-by: Neil Horman <nhorman@tuxdriver.com>
>>
>
> Yep, the patch is correct.
>
> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
>
> -vlad
>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If STCP is started while /proc/sys/net/sctp/auth_enable is set 0 and
association is established between endpoints. Then if
/proc/sys/net/sctp/auth_enable is set 1, a received AUTH chunk will
cause kernel panic.
Test as following:
step 1: echo 0> /proc/sys/net/sctp/auth_enable
step 2:
SCTP client SCTP server
INIT --------->
<--------- INIT-ACK
COOKIE-ECHO --------->
<--------- COOKIE-ACK
step 3:
echo 1> /proc/sys/net/sctp/auth_enable
step 4:
SCTP client SCTP server
AUTH -----------> Kernel Panic
This patch fix this probleam to treat AUTH chunk as unknow chunk if peer
has initialized with no auth capable.
> Sorry for the delay. Was on vacation without net access.
>
> Wei Yongjun wrote:
>>
>>
>> This patch fix this probleam to treat AUTH chunk as unknow chunk if
>> peer has initialized with no auth capable.
>>
>> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
>
> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
>
>>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The line in the /proc/net/fib_trie for route with TOS specified
- has extra \n at the end
- does not have a space after route scope
like below.
|-- 1.1.1.1
/32 universe UNICASTtos =1
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 2.6 latest git build was broken when using the following
configuration options:
CONFIG_NET_EMATCH=n
CONFIG_NET_CLS_FLOW=y
with the following error:
net/sched/cls_flow.c: In function 'flow_dump':
net/sched/cls_flow.c:598: error: 'struct tcf_ematch_tree' has no
member named 'hdr'
make[2]: *** [net/sched/cls_flow.o] Error 1
make[1]: *** [net/sched] Error 2
make: *** [net] Error 2
see the recent post by Li Zefan:
http://www.spinics.net/lists/netdev/msg54434.html
The reason for this crash is that struct tcf_ematch_tree
(net/pkt_cls.h) is empty when CONFIG_NET_EMATCH is not defined.
When CONFIG_NET_EMATCH is defined, the tcf_ematch_tree structure
indeed holds a struct tcf_ematch_tree_hdr (hdr) as flow_dump()
expects.
This patch adds #ifdef CONFIG_NET_EMATCH in flow_dump to avoid this.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A bug every C programmer makes at some point in time...
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (25 commits)
virtio: balloon driver
virtio: Use PCI revision field to indicate virtio PCI ABI version
virtio: PCI device
virtio_blk: implement naming for vda-vdz,vdaa-vdzz,vdaaa-vdzzz
virtio_blk: Dont waste major numbers
virtio_blk: provide getgeo
virtio_net: parametrize the napi_weight for virtio receive queue.
virtio: free transmit skbs when notified, not on next xmit.
virtio: flush buffers on open
virtnet: remove double ether_setup
virtio: Allow virtio to be modular and used by modules
virtio: Use the sg_phys convenience function.
virtio: Put the virtio under the virtualization menu
virtio: handle interrupts after callbacks turned off
virtio: reset function
virtio: populate network rings in the probe routine, not open
virtio: Tweak virtio_net defines
virtio: Net header needs hdr_len
virtio: remove unused id field from struct virtio_blk_outhdr
virtio: clarify NO_NOTIFY flag usage
...
It seems that virtio_net wants to disable callbacks (interrupts) before
calling netif_rx_schedule(), so we can't use the return value to do so.
Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback
now returns void, rather than a boolean.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Previously we used a type/len pair within the config space, but this
seems overkill. We now simply define a structure which represents the
layout in the config space: the config space can now only be extended
at the end.
The main driver-visible changes:
1) We indicate what fields are present with an explicit feature bit.
2) Virtqueues are explicitly numbered, and not in the config space.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use it in virtio_net (replacing buggy version there), it's also going
to be used by TAP for partial csum support.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
This way we can remove TCP and DCCP specific versions of
sk->sk_prot->get_port: both v4 and v6 use inet_csk_get_port
sk->sk_prot->hash: inet_hash is directly used, only v6 need
a specific version to deal with mapped sockets
sk->sk_prot->unhash: both v4 and v6 use inet_hash directly
struct inet_connection_sock_af_ops also gets a new member, bind_conflict, so
that inet_csk_get_port can find the per family routine.
Now only the lookup routines receive as a parameter a struct inet_hashtable.
With this we further reuse code, reducing the difference among INET transport
protocols.
Eventually work has to be done on UDP and SCTP to make them share this
infrastructure and get as a bonus inet_diag interfaces so that iproute can be
used with these protocols.
net-2.6/net/ipv4/inet_hashtables.c:
struct proto | +8
struct inet_connection_sock_af_ops | +8
2 structs changed
__inet_hash_nolisten | +18
__inet_hash | -210
inet_put_port | +8
inet_bind_bucket_create | +1
__inet_hash_connect | -8
5 functions changed, 27 bytes added, 218 bytes removed, diff: -191
net-2.6/net/core/sock.c:
proto_seq_show | +3
1 function changed, 3 bytes added, diff: +3
net-2.6/net/ipv4/inet_connection_sock.c:
inet_csk_get_port | +15
1 function changed, 15 bytes added, diff: +15
net-2.6/net/ipv4/tcp.c:
tcp_set_state | -7
1 function changed, 7 bytes removed, diff: -7
net-2.6/net/ipv4/tcp_ipv4.c:
tcp_v4_get_port | -31
tcp_v4_hash | -48
tcp_v4_destroy_sock | -7
tcp_v4_syn_recv_sock | -2
tcp_unhash | -179
5 functions changed, 267 bytes removed, diff: -267
net-2.6/net/ipv6/inet6_hashtables.c:
__inet6_hash | +8
1 function changed, 8 bytes added, diff: +8
net-2.6/net/ipv4/inet_hashtables.c:
inet_unhash | +190
inet_hash | +242
2 functions changed, 432 bytes added, diff: +432
vmlinux:
16 functions changed, 485 bytes added, 492 bytes removed, diff: -7
/home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c:
tcp_v6_get_port | -31
tcp_v6_hash | -7
tcp_v6_syn_recv_sock | -9
3 functions changed, 47 bytes removed, diff: -47
/home/acme/git/net-2.6/net/dccp/proto.c:
dccp_destroy_sock | -7
dccp_unhash | -179
dccp_hash | -49
dccp_set_state | -7
dccp_done | +1
5 functions changed, 1 bytes added, 242 bytes removed, diff: -241
/home/acme/git/net-2.6/net/dccp/ipv4.c:
dccp_v4_get_port | -31
dccp_v4_request_recv_sock | -2
2 functions changed, 33 bytes removed, diff: -33
/home/acme/git/net-2.6/net/dccp/ipv6.c:
dccp_v6_get_port | -31
dccp_v6_hash | -7
dccp_v6_request_recv_sock | +5
3 functions changed, 5 bytes added, 38 bytes removed, diff: -33
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://linux-nfs.org/~bfields/linux: (100 commits)
SUNRPC: RPC program information is stored in unsigned integers
SUNRPC: Move exported symbol definitions after function declaration part 2
NLM: tear down RPC clients in nlm_shutdown_hosts
SUNRPC: spin svc_rqst initialization to its own function
nfsd: more careful input validation in nfsctl write methods
lockd: minor log message fix
knfsd: don't bother mapping putrootfh enoent to eperm
rdma: makefile
rdma: ONCRPC RDMA protocol marshalling
rdma: SVCRDMA sendto
rdma: SVCRDMA recvfrom
rdma: SVCRDMA Core Transport Services
rdma: SVCRDMA Transport Module
rdma: SVCRMDA Header File
svc: Add svc_xprt_names service to replace svc_sock_names
knfsd: Support adding transports by writing portlist file
svc: Add svc API that queries for a transport instance
svc: Add /proc/sys/sunrpc/transport files
svc: Add transport hdr size for defer/revisit
svc: Move the xprt independent code to the svc_xprt.c file
...
Clean up: When looping over RPC version and procedure numbers, use
unsigned index variables.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Move the initialzation in __svc_create_thread that happens prior to
thread creation to a new function. Export the function to allow
services to have better control over the svc_rqst structs.
Also rearrange the rqstp initialization to prevent NULL pointer
dereferences in svc_exit_thread in case allocations fail.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Add the svcrdma module to the xprtrdma makefile.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This logic parses the ONCRDMA protocol headers that
precede the actual RPC header. It is placed in a separate
file to keep all protocol aware code in a single place.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This file implements the RDMA transport sendto function. A RPC reply
on an RDMA transport consists of some number of RDMA_WRITE requests
followed by an RDMA_SEND request. The sendto function parses the
ONCRPC RDMA reply header to determine how to send the reply back to
the client. The send queue is sized so as to be able to send complete
replies for requests in most cases. In the event that there are not
enough SQ WR slots to reply, e.g. big data, the send will block the
NFSD thread. The I/O callback functions in svc_rdma_transport.c that
reap WR completions wake any waiters blocked on the SQ. In general,
the goal is not to block NFSD threads and the has_wspace method
stall requests when the SQ is nearly full.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This file implements the RDMA transport recvfrom function. The function
dequeues work reqeust completion contexts from an I/O list that it shares
with the I/O tasklet in svc_rdma_transport.c. For ONCRPC RDMA, an RPC may
not be complete when it is received. Instead, the RDMA header that precedes
the RPC message informs the transport where to get the RPC data from on
the client and where to place it in the RPC message before it is delivered
to the server. The svc_rdma_recvfrom function therefore, parses this RDMA
header and issues any necessary RDMA operations to fetch the remainder of
the RPC from the client.
Special handling is required when the request involves an RDMA_READ.
In this case, recvfrom submits the RDMA_READ requests to the underlying
transport driver and then returns 0. When the transport
completes the last RDMA_READ for the request, it enqueues it on a
read completion queue and enqueues the transport. The recvfrom code
favors this queue over the regular DTO queue when satisfying reads.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This file implements the core transport data management and I/O
path. The I/O path for RDMA involves receiving callbacks on interrupt
context. Since all the svc transport locks are _bh locks we enqueue the
transport on a list, schedule a tasklet to dequeue data indications from
the RDMA completion queue. The tasklet in turn takes _bh locks to
enqueue receive data indications on a list for the transport. The
svc_rdma_recvfrom transport function dequeues data from this list in an
NFSD thread context.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This file implements the RDMA transport module initialization and
termination logic and registers the transport sysctl variables.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Create a transport independent version of the svc_sock_names function.
The toclose capability of the svc_sock_names service can be implemented
using the svc_xprt_find and svc_xprt_close services.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Update the write handler for the portlist file to allow creating new
listening endpoints on a transport. The general form of the string is:
<transport_name><space><port number>
For example:
echo "tcp 2049" > /proc/fs/nfsd/portlist
This is intended to support the creation of a listening endpoint for
RDMA transports without adding #ifdef code to the nfssvc.c file.
Transports can also be removed as follows:
'-'<transport_name><space><port number>
For example:
echo "-tcp 2049" > /proc/fs/nfsd/portlist
Attempting to add a listener with an invalid transport string results
in EPROTONOSUPPORT and a perror string of "Protocol not supported".
Attempting to remove an non-existent listener (.e.g. bad proto or port)
results in ENOTCONN and a perror string of
"Transport endpoint is not connected"
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Add a new svc function that allows a service to query whether a
transport instance has already been created. This is used in lockd
to determine whether or not a transport needs to be created when
a lockd instance is brought up.
Specifying 0 for the address family or port is effectively a wild-card,
and will result in matching the first transport in the service's list
that has a matching class name.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Add a file that when read lists the set of registered svc
transports.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Some transports have a header in front of the RPC header. The current
defer/revisit processing considers only the iov_len and arg_len to
determine how much to back up when saving the original request
to revisit. Add a field to the rqstp structure to save the size
of the transport header so svc_defer can correctly compute
the start of a request.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This functionally trivial patch moves all of the transport independent
functions from the svcsock.c file to the transport independent svc_xprt.c
file.
In addition the following formatting changes were made:
- White space cleanup
- Function signatures on single line
- The inline directive was removed
- Lines over 80 columns were reformatted
- The term 'socket' was changed to 'transport' in comments
- The SMP comment was moved and updated.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The svc_check_conn_limits function only manipulates xprt fields. Change references
to svc_sock->sk_xprt to svc_xprt directly.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This functionally empty patch removes rq_sock and unamed union
from rqstp structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Move the svc transport list logic into common transport creation code.
Refactor this code path to make the flow of control easier to read.
Move the setting and clearing of the BUSY_BIT during transport creation
to common code.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This function is transport independent. Change it to use svc_xprt directly
and change it's name to reflect this.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
All of the transport field and functions used by svc_recv are now
transport independent. Change the svc_recv function to use the svc_xprt
structure directly instead of the transport specific svc_sock structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The svc_sock_release function only touches transport independent fields.
Change the function to manipulate svc_xprt directly instead of the transport
dependent svc_sock structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch moves the transport sockaddr to the svc_xprt
structure. Convenience functions are added to set and
get the local and remote addresses of a transport from
the transport provider as well as determine the length
of a sockaddr.
A transport is responsible for setting the xpt_local
and xpt_remote addresses in the svc_xprt structure as
part of transport creation and xpo_accept processing. This
cannot be done in a generic way and in fact varies
between TCP, UDP and RDMA. A set of xpo_ functions
(e.g. getlocalname, getremotename) could have been
added but this would have resulted in additional
caching and copying of the addresses around. Note that
the xpt_local address should also be set on listening
endpoints; for TCP/RDMA this is done as part of
endpoint creation.
For connected transports like TCP and RDMA, the addresses
never change and can be set once and copied into the
rqstp structure for each request. For UDP, however, the
local and remote addresses may change for each request. In
this case, the address information is obtained from the
UDP recvmsg info and copied into the rqstp structure from
there.
A svc_xprt_local_port function was also added that returns
the local port given a transport. This is used by
svc_create_xprt when returning the port associated with
a newly created transport, and later when creating a
generic find transport service to check if a service is
already listening on a given port.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch moves the transport independent sk_deferred list to the svc_xprt
structure and updates the svc_deferred_req structure to keep pointers to
svc_xprt's directly. The deferral processing code is also moved out of the
transport dependent recvfrom functions and into the generic svc_recv path.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Move the authinfo cache to svc_xprt. This allows both the TCP and RDMA
transports to share this logic. A flag bit is used to determine if
auth information is to be cached or not. Previously, this code looked
at the transport protocol.
I've also changed the spin_lock/unlock logic so that a lock is not taken for
transports that are not caching auth info.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
With the implementation of the new mark and sweep algorithm for shutting
down old connections, the sk_lastrecv field is no longer needed.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Now that the svc_xprt_received function handles transports, the call
to svc_xprt_received in the xpo_tcp_accept function can be moved to
common code.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
All fields touched by svc_sock_received are now transport independent.
Change it to use svc_xprt directly. This function is called from
transport dependent code, so export it.
Update the comment to clearly state the rules for calling this function.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Move the sk_mutex field to the transport independent svc_xprt structure.
Now all the fields that svc_send touches are transport neutral. Change the
svc_send function to use the transport independent svc_xprt directly instead
of the transport dependent svc_sock structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The svc_sock_enqueue function is now transport independent since all of
the fields it touches have been moved to the transport independent svc_xprt
structure. Change the function to use the svc_xprt structure directly
instead of the transport specific svc_sock structure.
Transport specific data-ready handlers need to call this function, so
export it.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This functionally trivial patch moves the sk_reserved field to the
transport independent svc_xprt structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Move sk_list and sk_ready to svc_xprt. This involves close because these
lists are walked by svcs when closing all their transports. So I combined
the moving of these lists to svc_xprt with making close transport independent.
The svc_force_sock_close has been changed to svc_close_all and takes a list
as an argument. This removes some svc internals knowledge from the svcs.
This code races with module removal and transport addition.
Thanks to Simon Holm Thøgersen for a compile fix.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Simon Holm Thøgersen <odie@cs.aau.dk>
This is another incremental change that moves transport independent
fields from svc_sock to the svc_xprt structure. The changes
should be functionally null.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This functionally trivial change moves the transport independent sk_flags
field to the transport independent svc_xprt structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Change the atomic_t reference count to a kref and move it to the
transport indepenent svc_xprt structure. Change the reference count
wrapper names to be generic.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Modify the various kernel RPC svcs to use the svc_create_xprt service.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The svc_create_xprt function is a transport independent version
of the svc_makesock function.
Since transport instance creation contains transport dependent and
independent components, add an xpo_create transport function. The
transport implementation of this function allocates the memory for the
endpoint, implements the transport dependent initialization logic, and
calls svc_xprt_init to initialize the transport independent field (svc_xprt)
in it's data structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Move the code that poaches connections when the connection limit is hit
to a subroutine to make the accept logic path easier to follow. Since this
is in the new connection path, it should not be a performance issue.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The svc_tcp_accept function calls svc_sock_enqueue after setting the
SK_CONN bit. This doesn't actually do anything because the SK_BUSY bit
is still set. The call is unnecessary anyway because the generic code in
svc_recv calls svc_sock_received after calling the accept function.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Previously, the accept logic looked into the socket state to determine
whether to call accept or recv when data-ready was indicated on an endpoint.
Since some transports don't use sockets, this logic now uses a flag
bit (SK_LISTENER) to identify listening endpoints. A transport function
(xpo_accept) allows each transport to define its own accept processing.
A transport's initialization logic is reponsible for setting the
SK_LISTENER bit. I didn't see any way to do this in transport independent
logic since the passive side of a UDP connection doesn't listen and
always recv's.
In the svc_recv function, if the SK_LISTENER bit is set, the transport
xpo_accept function is called to handle accept processing.
Note that all functions are defined even if they don't make sense
for a given transport. For example, accept doesn't mean anything for
UDP. The function is defined anyway and bug checks if called. The
UDP transport should never set the SK_LISTENER bit.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Close handling was duplicated in the UDP and TCP recvfrom
methods. This code has been moved to the transport independent
svc_recv function.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
In order to avoid blocking a service thread, the receive side checks
to see if there is sufficient write space to reply to the request.
Each transport has a different mechanism for determining if there is
enough write space to reply.
The code that checked for write space was coupled with code that
checked for CLOSE and CONN. These checks have been broken out into
separate statements to make the code easier to read.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Some transports add fields to the RPC header for replies, e.g. the TCP
record length. This function is called when preparing the reply header
to allow each transport to add whatever fields it requires.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Add transport specific xpo_detach and xpo_free functions. The xpo_detach
function causes the transport to stop delivering data-ready events
and enqueing the transport for I/O.
The xpo_free function frees all resources associated with the particular
transport instance.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The svc_sock_release function releases pages allocated to a thread. For
UDP this frees the receive skb. For RDMA it will post a receive WR
and bump the client credit count.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The sk_sendto and sk_recvfrom are function pointers that allow svc_sock
to be used for both UDP and TCP. Move these function pointers to the
svc_xprt_ops structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The svc_max_payload function currently looks at the socket type
to determine the max payload. Add a max payload value to svc_xprt_class
so it can be returned directly.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Make TCP and UDP svc_sock transports, and register them
with the svc transport core.
A transport type (svc_sock) has an svc_xprt as its first member,
and calls svc_xprt_init to initialize this field.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The transport class (svc_xprt_class) represents a type of transport, e.g.
udp, tcp, rdma. A transport class has a unique name and a set of transport
operations kept in the svc_xprt_ops structure.
A transport class can be dynamically registered and unregisterd. The
svc_xprt_class represents the module that implements the transport
type and keeps reference counts on the module to avoid unloading while
there are active users.
The endpoint (svc_xprt) is a generic, transport independent endpoint that can
be used to send and receive data for an RPC service. It inherits it's
operations from the transport class.
A transport driver module registers and unregisters itself with svc sunrpc
by calling svc_reg_xprt_class, and svc_unreg_xprt_class respectively.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
If we don't do this then we'll end up with a pointless unusable context
sitting in the cache until the time the original context would have
expired.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Make an obvious simplification that removes a few lines and some
unnecessary indentation; no change in behavior.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Newer server features such as nfsv4 and gss depend on proc to work, so a
failure to initialize the proc files they need should be treated as
fatal.
Thanks to Andrew Morton for style fix and compile fix in case where
CONFIG_NFSD_V4 is undefined.
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Just some minor cleanup.
Also I don't see much point in trying to register further proc entries
if initial entries fail; so just stop trying in that case.
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
There's really nothing much the caller can do if cache unregistration
fails. And indeed, all any caller does in this case is print an error
and continue. So just return void and move the printk's inside
cache_unregister.
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The path here must be left over from some earlier draft; fix it. And do
some more minor cleanup while we're there.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
XDR strings, opaques, and net objects should all use unsigned lengths.
To wit, RFC 4506 says:
4.2. Unsigned Integer
An XDR unsigned integer is a 32-bit datum that encodes a non-negative
integer in the range [0,4294967295].
...
4.11. String
The standard defines a string of n (numbered 0 through n-1) ASCII
bytes to be the number n encoded as an unsigned integer (as described
above), and followed by the n bytes of the string.
After this patch, xdr_decode_string_inplace now matches the other XDR
string and array helpers that take a string length argument. See:
xdr_encode_opaque_fixed, xdr_encode_opaque, xdr_encode_array
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Make sure we compare an unsigned length to an unsigned count in
read_flush().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The error handling in ieee80211_init() is broken when any of
the built-in rate control algorithms fail to initialise, fix
it and rename the error labels.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When the rate control algorithms are built-in, their exit
functions can be called from mac80211's init function so
they cannot be marked __exit.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Driver authors should be aware of the alignment requirements, but
not everybody cares about the warning. This patch makes it depend
on a new Kconfig symbol MAC80211_DEBUG_PACKET_ALIGNMENT which can
be enabled regardless of MAC80211_DEBUG and is recommended for
driver authors (only). This also restricts the warning to data
packets so other packets need not be realigned to not trigger the
warning.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Changes the ANOM_PROMISCUOUS message to include uid and gid fields,
making it consistent with other AUDIT_ANOM_ messages and in the
format the userspace is expecting.
Signed-off-by: Klaus Heinrich Kiwi <klausk@br.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
In order to correlate audit records to an individual login add a session
id. This is incremented every time a user logs in and is included in
almost all messages which currently output the auid. The field is
labeled ses= or oses=
Signed-off-by: Eric Paris <eparis@redhat.com>
The namespace is not available in the fib_sync_down_addr, add it as a
parameter.
Looking up a device by the pointer to it is OK. Looking up using a
result from fib_trie/fib_hash table lookup is also safe. No need to
fix that at all. So, just fix lookup by address and insertion to the
hash table path.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is required to make fib_info lookups namespace aware. In the
other case initial namespace devices are marked as dead in the local
routing table during other namespace stop.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
fib_sync_down can be called with an address and with a device. In
reality it is called either with address OR with a device. The
codepath inside is completely different, so lets separate it into two
calls for these two cases.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The namespace is available when required except rtm_to_ifaddr. Add
namespace argument to it.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove error code assignment inside brackets on failure. The code
looks better if the error is assigned before condition check. Also,
the compiler treats this better.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
net->ipv4.fib_table_hash is not freed when fib4_rules_init failed.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hashlimit_ipv6_mask() is called from under IP6_NF_IPTABLES config
option, but is not under it by itself.
gcc warns us about it :) :
net/netfilter/xt_hashlimit.c:473: warning: "hashlimit_ipv6_mask" defined but not used
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new "flow" classifier, which is meant to extend the SFQ hashing
capabilities without hard-coding new hash functions and also allows
deterministic mappings of keys to classes, replacing some out of tree
iptables patches like IPCLASSIFY (maps IPs to classes), IPMARK (maps
IPs to marks, with fw filters to classes), ...
Some examples:
- Classic SFQ hash:
tc filter add ... flow hash \
keys src,dst,proto,proto-src,proto-dst divisor 1024
- Classic SFQ hash, but using information from conntrack to work properly in
combination with NAT:
tc filter add ... flow hash \
keys nfct-src,nfct-dst,proto,nfct-proto-src,nfct-proto-dst divisor 1024
- Map destination IPs of 192.168.0.0/24 to classids 1-257:
tc filter add ... flow map \
key dst addend -192.168.0.0 divisor 256
- alternatively:
tc filter add ... flow map \
key dst and 0xff
- similar, but reverse ordered:
tc filter add ... flow map \
key dst and 0xff xor 0xff
Perturbation is currently not supported because we can't reliable kill the
timer on destruction.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for dumping statistics and make internal queues visible as
classes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for external classifiers to allow using different flow
hash functions similar to ESFQ. When no classifier is attached the
built-in hash is used as before.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jens Axboe noticed that we were queueing &conn->work on both btaddconn
and keventd_wq.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the no longer used
EXPORT_SYMBOL(sysctl_tcp_tso_win_divisor).
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct ipv4_devconf can now become static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
sysctl_tr_rif_timeout can now become static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Outbound sequence number overflow error status
is counted as XfrmOutStateSeqError.
o Additionaly, it changes inbound sequence number replay
error name from XfrmInSeqOutOfWindow to XfrmInStateSeqError
to apply name scheme above.
o Inbound IPv4 UDP encapsuling type mismatch error is wrongly
mapped to XfrmInStateInvalid then this patch fiex the error
to XfrmInStateMismatch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the following no longer used EXPORT_SYMBOL's:
- xfrm_input.c: xfrm_parse_spi
- xfrm_state.c: xfrm_replay_check
- xfrm_state.c: xfrm_replay_advance
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>