Commit Graph

8839 Commits

Author SHA1 Message Date
Jan Engelhardt
6556874dc3 [NETFILTER]: xt_conntrack: fix IPv4 address comparison
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-27 12:20:41 -08:00
Jan Engelhardt
d61f89e941 [NETFILTER]: xt_conntrack: fix missing boolean clamping
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-27 12:09:05 -08:00
Patrick McHardy
4e29e9ec7e [NETFILTER]: nf_conntrack: fix smp_processor_id() in preemptible code warning
Since we're using RCU for the conntrack hash now, we need to avoid
getting preempted or interrupted by BHs while changing the stats.

Fixes warning reported by Tilman Schmidt <tilman@imap.cc> when using
preemptible RCU:

[   48.180297] BUG: using smp_processor_id() in preemptible [00000000] code: ntpdate/3562
[   48.180297] caller is __nf_conntrack_find+0x9b/0xeb [nf_conntrack]
[   48.180297] Pid: 3562, comm: ntpdate Not tainted 2.6.25-rc2-mm1-testing #1
[   48.180297]  [<c02015b9>] debug_smp_processor_id+0x99/0xb0
[   48.180297]  [<fac643a7>] __nf_conntrack_find+0x9b/0xeb [nf_conntrack]

Tested-by: Tilman Schmidt <tilman@imap.cc>
Tested-by: Christian Casteyde <casteyde.christian@free.fr> [Bugzilla #10097]

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-27 12:07:47 -08:00
YOSHIFUJI Hideaki
3bdfe7ec08 [IPV6] SYSCTL: Fix possible memory leakage in error path.
In error path, we do need to free memory just allocated.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-27 12:06:38 -08:00
Pavel Emelyanov
b37d428b24 [INET]: Don't create tunnels with '%' in name.
Four tunnel drivers (ip_gre, ipip, ip6_tunnel and sit) can receive a
pre-defined name for a device from the userspace.  Since these drivers
call the register_netdevice() (rtnl_lock, is held), which does _not_
generate the device's name, this name may contain a '%' character.

Not sure how bad is this to have a device with a '%' in its name, but
all the other places either use the register_netdev(), which call the
dev_alloc_name(), or explicitly call the dev_alloc_name() before
registering, i.e. do not allow for such names.

This had to be prior to the commit 34cc7b, but I forgot to number the
patches and this one got lost, sorry.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-26 23:51:04 -08:00
David S. Miller
d9595a7b9c [AF_KEY]: Fix oops by converting to proc_net_*().
To make sure the procfs visibility occurs after the ->proc_fs ops are
setup, use proc_net_fops_create() and proc_net_remove().

This also fixes an OOPS after module unload in that the name string
for remove was wrong, so it wouldn't actually be removed.  That bug
was introduced by commit 61145aa1a1
("[KEY]: Clean up proc files creation a bit.")

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-26 22:23:31 -08:00
Bjorn Mork
148f97292e [IPV4]: Reset scope when changing address
This bug did bite at least one user, who did have to resort to rebooting
the system after an "ifconfig eth0 127.0.0.1" typo.

Deleting the address and adding a new is a less intrusive workaround.
But I still beleive this is a bug that should be fixed.  Some way or
another.

Another possibility would be to remove the scope mangling based on
address.  This will always be incomplete (are 127/8 the only address
space with host scope requirements?)

We set the scope to RT_SCOPE_HOST if an IPv4 interface is configured
with a loopback address (127/8).  The scope is never reset, and will
remain set to RT_SCOPE_HOST after changing the address. This patch
resets the scope if the address is changed again, to restore normal
functionality.

Signed-off-by: Bjorn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-26 18:42:41 -08:00
Benjamin Thery
f1243c2db6 [IPV6]: Add missing initializations of the new nl_info.nl_net field
Add some more missing initializations of the new nl_info.nl_net field
in IPv6 stack. This field will be used when network namespaces are
fully supported.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-26 18:42:37 -08:00
Thomas Gleixner
3ab2273175 bluetooth: delete timer in l2cap_conn_del()
Delete a possibly armed timer before kfree'ing the connection object.

Solves: http://lkml.org/lkml/2008/2/15/514

Reported-by:Quel Qun <kelk1@comcast.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-26 17:42:56 -08:00
Trond Myklebust
5d00837b90 SUNRPC: Run rpc timeout functions as callbacks instead of in softirqs
An audit of the current RPC timeout functions shows that they don't really
ever need to run in the softirq context. As long as the softirq is
able to signal that the wakeup is due to a timeout (which it can do by
setting task->tk_status to -ETIMEDOUT) then the callback functions can just
run as standard task->tk_callback functions (in the rpciod/process
context).

The only possible border-line case would be xprt_timer() for the case of
UDP, when the callback is used to reduce the size of the transport
congestion window. In testing, however, the effect of moving that update
to a callback would appear to be minor.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-25 21:40:44 -08:00
Trond Myklebust
fda1393938 SUNRPC: Convert users of rpc_wake_up_task to use rpc_wake_up_queued_task
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-25 21:40:42 -08:00
Trond Myklebust
96ef13b283 SUNRPC: Add a new helper rpc_wake_up_queued_task()
In all cases where we currently use rpc_wake_up_task(), we almost always
know on which waitqueue the rpc_task is actually sleeping. This will allows
us to simplify the queue locking in a future patch.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-25 21:40:40 -08:00
Trond Myklebust
fde95c7554 SUNRPC: Clean up rpc_run_timer()
All RPC timeout callback functions are expected to wake the task up. We can
enforce this by moving the wakeup back into rpc_run_timer.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-25 21:40:39 -08:00
Trond Myklebust
32bfb5c0f4 SUNRPC: Allow the rpc_release() callback to be run on another workqueue
A lot of the work done by the rpc_release() callback is inappropriate for
rpciod as it will often involve things like starting a new rpc call in
order to clean up state after an interrupted NFSv4 open() call, or
calls to mntput(), etc.

This patch allows the caller of rpc_run_task() to specify that the
rpc_release callback should run on a different workqueue than the default
rpciod_workqueue.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-25 21:40:34 -08:00
Harvey Harrison
5f2f40a92e tipc: fix integer as NULL pointer sparse warnings in tipc
net/tipc/cluster.c:145:2: warning: Using plain integer as NULL pointer
net/tipc/link.c:3254:36: warning: Using plain integer as NULL pointer
net/tipc/ref.c:151:15: warning: Using plain integer as NULL pointer
net/tipc/zone.c:85:2: warning: Using plain integer as NULL pointer

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-24 18:38:31 -08:00
Linus Torvalds
bdc0894289 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits)
  [NETFILTER]: fix ebtable targets return
  [IP_TUNNEL]: Don't limit the number of tunnels with generic name explicitly.
  [NET]: Restore sanity wrt. print_mac().
  [NEIGH]: Fix race between neighbor lookup and table's hash_rnd update.
  [RTNL]: Validate hardware and broadcast address attribute for RTM_NEWLINK
  tg3: ethtool phys_id default
  [BNX2]: Update version to 1.7.4.
  [BNX2]: Disable parallel detect on an HP blade.
  [BNX2]: More 5706S link down workaround.
  ssb: Fix support for PCI devices behind a SSB->PCI bridge
  zd1211rw: fix sparse warnings
  rtl818x: fix sparse warnings
  ssb: Fix pcicore cardbus mode
  ssb: Make the GPIO API reentrancy safe
  ssb: Fix the GPIO API
  ssb: Fix watchdog access for devices without a chipcommon
  ssb: Fix serial console on new bcm47xx devices
  ath5k: Fix build warnings on some 64-bit platforms.
  WDEV, ath5k, don't return int from bool function
  WDEV: ath5k, fix lock imbalance
  ...
2008-02-23 21:07:10 -08:00
Joonwoo Park
1b04ab4597 [NETFILTER]: fix ebtable targets return
The function ebt_do_table doesn't take NF_DROP as a verdict from the targets.

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-23 20:22:27 -08:00
Pavel Emelyanov
34cc7ba639 [IP_TUNNEL]: Don't limit the number of tunnels with generic name explicitly.
Use the added dev_alloc_name() call to create tunnel device name,
rather than iterate in a hand-made loop with an artificial limit.

Thanks Patrick for noticing this.

[ The way this works is, when the device is actually registered,
  the generic code noticed the '%' in the name and invokes
  dev_alloc_name() to fully resolve the name.  -DaveM ]

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-23 20:19:20 -08:00
David S. Miller
55b01e8681 [NET]: Restore sanity wrt. print_mac().
MAC_FMT had only one user and we tried to get rid of
that, but this created more problems than it solved.

As a result, this reverts three commits:

235365f3aa ("net/8021q/vlan_dev.c: Use
print_mac."), fea5fa875e ("[NET]: Remove
MAC_FMT"), and 8f789c4844 ("[NET]:
Elminate spurious print_mac() calls.")

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-23 20:09:11 -08:00
Pavel Emelyanov
bc4bf5f38c [NEIGH]: Fix race between neighbor lookup and table's hash_rnd update.
The neigh_hash_grow() may update the tbl->hash_rnd value, which 
is used in all tbl->hash callbacks to calculate the hashval.

Two lookup routines may race with this, since they call the 
->hash callback without the tbl->lock held. Since the hash_rnd
is changed with this lock write-locked moving the calls to ->hash
under this lock read-locked closes this gap.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-23 19:57:02 -08:00
Thomas Graf
1840bb13c2 [RTNL]: Validate hardware and broadcast address attribute for RTM_NEWLINK
RTM_NEWLINK allows for already existing links to be modified. For this
purpose do_setlink() is called which expects address attributes with a
payload length of at least dev->addr_len. This patch adds the necessary
validation for the RTM_NEWLINK case.

The address length for links to be created is not checked for now as the
actual attribute length is used when copying the address to the netdevice
structure. It might make sense to report an error if less than addr_len
bytes are provided but enforcing this might break drivers trying to be
smart with not transmitting all zero addresses.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-23 19:54:36 -08:00
Rafael J. Wysocki
3a2d5b7001 PM: Introduce PM_EVENT_HIBERNATE callback state
During the last step of hibernation in the "platform" mode (with the
help of ACPI) we use the suspend code, including the devices'
->suspend() methods, to prepare the system for entering the ACPI S4
system sleep state.

But at least for some devices the operations performed by the
->suspend() callback in that case must be different from its operations
during regular suspend.

For this reason, introduce the new PM event type PM_EVENT_HIBERNATE and
pass it to the device drivers' ->suspend() methods during the last phase
of hibernation, so that they can distinguish this case and handle it as
appropriate.  Modify the drivers that handle PM_EVENT_SUSPEND in a
special way and need to handle PM_EVENT_HIBERNATE in the same way.

These changes are necessary to fix a hibernation regression related
to the i915 driver (ref. http://lkml.org/lkml/2008/2/22/488).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-23 10:40:04 -08:00
Pavel Emelyanov
5216a8e70e Wrap buffers used for rpc debug printks into RPC_IFDEBUG
Sorry for the noise, but here's the v3 of this compilation fix :)

There are some places, which declare the char buf[...] on the stack
to push it later into dprintk(). Since the dprintk sometimes (if the
CONFIG_SYSCTL=n) becomes an empty do { } while (0) stub, these buffers
cause gcc to produce appropriate warnings.

Wrap these buffers with RPC_IFDEBUG macro, as Trond proposed, to
compile them out when not needed.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-21 18:42:29 -05:00
Denis V. Lunev
da12f7356d [NETNS]: Namespace leak in pneigh_lookup.
release_net is missed on the error path in pneigh_lookup.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-20 00:26:16 -08:00
Pavel Emelyanov
5f31886ff0 [SCTP]: Pick up an orphaned sctp_sockets_allocated counter.
This counter is currently write-only.

Drawing an analogy with the similar tcp counter, I think
that this one should be pointed by the sockets_allocated
members of sctp_prot and sctpv6_prot.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-20 00:23:01 -08:00
Jan Engelhardt
27ecb1ff0a [NETFILTER]: xt_iprange: fix subtraction-based comparison
The host address parts need to be converted to host-endian first
before arithmetic makes any sense on them.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-19 17:20:06 -08:00
Jan Engelhardt
7d9904c260 [NETFILTER]: xt_hashlimit: remove unneeded struct member
By allocating ->hinfo, we already have the needed indirection to cope
with the per-cpu xtables struct match_entry.

[Patrick: do this now before the revision 1 struct is used by userspace]

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-19 17:19:44 -08:00
Joonwoo Park
eb1197bc0e [NETFILTER]: Fix incorrect use of skb_make_writable
http://bugzilla.kernel.org/show_bug.cgi?id=9920
The function skb_make_writable returns true or false.

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-19 17:18:47 -08:00
Pavel Emelyanov
f449b3b54d [NETFILTER]: xt_u32: drop the actually unused variable from u32_match_it
The int ret variable is used only to trigger the BUG_ON() after
the skb_copy_bits() call, so check the call failure directly
and drop the variable.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-19 17:18:20 -08:00
Patrick McHardy
e2b58a67b9 [NETFILTER]: {ip,ip6,nfnetlink}_queue: fix SKB_LINEAR_ASSERT when mangling packet data
As reported by Tomas Simonaitis <tomas.simonaitis@gmail.com>,
inserting new data in skbs queued over {ip,ip6,nfnetlink}_queue
triggers a SKB_LINEAR_ASSERT in skb_put().

Going back through the git history, it seems this bug is present since
at least 2.6.12-rc2, probably even since the removal of
skb_linearize() for netfilter.

Linearize non-linear skbs through skb_copy_expand() when enlarging
them.  Tested by Thomas, fixes bugzilla #9933.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-19 17:17:52 -08:00
Adrian Bunk
94cb1503c7 ipv4/fib_hash.c: fix NULL dereference
Unless I miss a guaranteed relation between between "f" and
"new_fa->fa_info" this patch is required for fixing a NULL dereference
introduced by commit a6501e080c ("[IPV4]
FIB_HASH: Reduce memory needs and speedup lookups") and spotted by the
Coverity checker.

Eric Dumazet says:

	Hum, you are right, kmem_cache_free() doesnt allow a NULL
	object, like kfree() does.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-19 16:28:54 -08:00
Adrian Bunk
15e29b8b05 net/9p/trans_virtio.c: kmalloc() enough memory
The Coverity checker spotted that less memory than required was
allocated.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-19 16:25:30 -08:00
Thomas Graf
76e87306c2 [RTNL]: Add missing link netlink attribute policy definitions
IFLA_LINK is no longer a write-only attribute on the kernel side and
must thus be validated. Same goes for the newly introduced
IFLA_LINKINFO.

Fixes undefined behaviour if either of the attributes are not well
formed.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-19 16:12:08 -08:00
Jorge Boncompte [DTI2]
12aa343add [NET]: Messed multicast lists after dev_mc_sync/unsync
Commit a0a400d79e ("[NET]: dev_mcast:
add multicast list synchronization helpers") from you introduced a new
field "da_synced" to struct dev_addr_list that is not properly
initialized to 0. So when any of the current users (8021q, macvlan,
mac80211) calls dev_mc_sync/unsync they mess the address list for both
devices.

The attached patch fixed it for me and avoid future problems.

Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-19 14:17:04 -08:00
Linus Torvalds
07ce198a1e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (60 commits)
  [NIU]: Bump driver version and release date.
  [NIU]: Fix BMAC alternate MAC address indexing.
  net: fix kernel-doc warnings in header files
  [IPV6]: Use BUG_ON instead of if + BUG in fib6_del_route.
  [IPV6]: dst_entry leak in ip4ip6_err. (resend)
  bluetooth: do not move child device other than rfcomm
  bluetooth: put hci dev after del conn
  [NET]: Elminate spurious print_mac() calls.
  [BLUETOOTH] hci_sysfs.c: Kill build warning.
  [NET]: Remove MAC_FMT
  net/8021q/vlan_dev.c: Use print_mac.
  [XFRM]: Fix ordering issue in xfrm_dst_hash_transfer().
  [BLUETOOTH] net/bluetooth/hci_core.c: Use time_* macros
  [IPV6]: Fix hardcoded removing of old module code
  [NETLABEL]: Move some initialization code into __init section.
  [NETLABEL]: Shrink the genl-ops registration code.
  [AX25] ax25_out: check skb for NULL in ax25_kick()
  [TCP]: Fix tcp_v4_send_synack() comment
  [IPV4]: fix alignment of IP-Config output
  Documentation: fix tcp.txt
  ...
2008-02-19 07:52:45 -08:00
Pavel Emelyanov
2df96af03d [IPV6]: Use BUG_ON instead of if + BUG in fib6_del_route.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-18 20:50:42 -08:00
Denis V. Lunev
9937ded8e4 [IPV6]: dst_entry leak in ip4ip6_err. (resend)
The result of the ip_route_output is not assigned to skb. This means that
- it is leaked
- possible OOPS below dereferrencing skb->dst
- no ICMP message for this case

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-18 20:49:36 -08:00
Dave Young
8ac62dc773 bluetooth: do not move child device other than rfcomm
hci conn child devices other than rfcomm tty should not be moved here.
This is my lost, thanks for Barnaby's reporting and testing.

Signed-off-by: Dave Young <hidave.darkstar@gmail.com> 
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-18 20:45:41 -08:00
Dave Young
0cd63c8089 bluetooth: put hci dev after del conn
Move hci_dev_put to del_conn to avoid hci dev going away before hci conn.

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-18 20:44:01 -08:00
David S. Miller
988d0093f9 [BLUETOOTH] hci_sysfs.c: Kill build warning.
net/bluetooth/hci_sysfs.c: In function ‘del_conn’:
net/bluetooth/hci_sysfs.c:339: warning: suggest parentheses around assignment used as truth value

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-18 00:20:50 -08:00
Joe Perches
235365f3aa net/8021q/vlan_dev.c: Use print_mac.
Remove direct use of MAC_FMT

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17 23:34:54 -08:00
YOSHIFUJI Hideaki
b791160b5a [XFRM]: Fix ordering issue in xfrm_dst_hash_transfer().
Keep ordering of policy entries with same selector in
xfrm_dst_hash_transfer().

Issue should not appear in usual cases because multiple policy entries
with same selector are basically not allowed so far.  Bug was pointed
out by Sebastien Decugis <sdecugis@hongo.wide.ad.jp>.

We could convert bydst from hlist to list and use list_add_tail()
instead.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Sebastien Decugis <sdecugis@hongo.wide.ad.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17 23:29:30 -08:00
S.Çağlar Onur
82453021b8 [BLUETOOTH] net/bluetooth/hci_core.c: Use time_* macros
The functions time_before, time_before_eq, time_after, and
time_after_eq are more robust for comparing jiffies against other
values.

So following patch implements usage of the time_after() macro, defined
at linux/jiffies.h, which deals with wrapping correctly

Signed-off-by: S.Çağlar Onur <caglar@pardus.org.tr>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17 23:25:57 -08:00
Wang Chen
5ee46e562c [IPV6]: Fix hardcoded removing of old module code
Rusty hardcoded the old module code.
We can remove it now.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17 22:34:53 -08:00
Pavel Emelyanov
05705e4e11 [NETLABEL]: Move some initialization code into __init section.
Everything that is called from netlbl_init() can be marked with
__init. This moves 620 bytes from .text section to .text.init one.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17 22:33:57 -08:00
Pavel Emelyanov
227c43c3bc [NETLABEL]: Shrink the genl-ops registration code.
Turning them to array and registration in a loop saves
80 lines of code and ~300 bytes from text section.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17 22:33:16 -08:00
Jarek Poplawski
f47b7257c7 [AX25] ax25_out: check skb for NULL in ax25_kick()
According to some OOPS reports ax25_kick tries to clone NULL skbs
sometimes. It looks like a race with ax25_clear_queues(). Probably
there is no need to add more than a simple check for this yet.
Another report suggested there are probably also cases where ax25
->paclen == 0 can happen in ax25_output(); this wasn't confirmed
during testing but let's leave this debugging check for some time.

Reported-and-tested-by: Jann Traschewski <jann@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17 22:31:19 -08:00
Kris Katterjohn
9bf1d83e7e [TCP]: Fix tcp_v4_send_synack() comment
Signed-off-by: Kris Katterjohn <katterjohn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17 22:29:19 -08:00
Uwe Kleine-Koenig
9c00409a2a [IPV4]: fix alignment of IP-Config output
Make the indented lines aligned in the output (not in the code).

Signed-off-by: Uwe Kleine-Koenig <Uwe.Kleine-Koenig@digi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17 22:28:32 -08:00
Julia Lawall
d6584f3a08 net/9p/trans_virtio.c: Use BUG_ON
if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
side-effects to allow a definition of BUG_ON that drops the code completely.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@ disable unlikely @ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (unlikely(E)) { BUG(); }
+ BUG_ON(E);
)

@@ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (E) { BUG(); }
+ BUG_ON(E);
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17 18:42:53 -08:00
Julia Lawall
163e3cb7da net/rxrpc: Use BUG_ON
if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
side-effects to allow a definition of BUG_ON that drops the code completely.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@ disable unlikely @ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (unlikely(E)) { BUG(); }
+ BUG_ON(E);
)

@@ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (E) { BUG(); }
+ BUG_ON(E);
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17 18:42:03 -08:00
David S. Miller
9ff5660746 Revert "[NDISC]: Fix race in generic address resolution"
This reverts commit 69cc64d8d9.

It causes recursive locking in IPV6 because unlike other
neighbour layer clients, it even needs neighbour cache
entries to send neighbour soliciation messages :-(

We'll have to find another way to fix this race.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17 18:39:54 -08:00
David S. Miller
93b2d4a208 Revert "[RTNETLINK]: Send a single notification on device state changes."
This reverts commit 45b5035482.

It break locking around dev->link_mode as well as cause
other bootup problems.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17 18:35:07 -08:00
David S. Miller
42fe95cae5 Merge branch 'fixes' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6 2008-02-15 15:56:47 -08:00
Michael Buesch
ceffefd15a mac80211: Fix initial hardware configuration
On the initial device-open we need to defer the hardware reconfiguration
after we incremented the open_count, because the hw_config checks this flag
and won't call the lowlevel driver in case it is zero.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-15 13:44:19 -05:00
Linus Torvalds
f6866fecd6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (82 commits)
  [NET]: Make sure sockets implement splice_read
  netconsole: avoid null pointer dereference at show_local_mac()
  [IPV6]: Fix reversed local_df test in ip6_fragment
  [XFRM]: Avoid bogus BUG() when throwing new policy away.
  [AF_KEY]: Fix bug in spdadd
  [NETFILTER] nf_conntrack_proto_tcp.c: Mistyped state corrected.
  net: xfrm statistics depend on INET
  [NETFILTER]: make secmark_tg_destroy() static
  [INET]: Unexport inet_listen_wlock
  [INET]: Unexport __inet_hash_connect
  [NET]: Improve cache line coherency of ingress qdisc
  [NET]: Fix race in dev_close(). (Bug 9750)
  [IPSEC]: Fix bogus usage of u64 on input sequence number
  [RTNETLINK]: Send a single notification on device state changes.
  [NETLABLE]: Hide netlbl_unlabel_audit_addr6 under ifdef CONFIG_IPV6.
  [NETLABEL]: Don't produce unused variables when IPv6 is off.
  [NETLABEL]: Compilation for CONFIG_AUDIT=n case.
  [GENETLINK]: Relax dances with genl_lock.
  [NETLABEL]: Fix lookup logic of netlbl_domhsh_search_def.
  [IPV6]: remove unused method declaration (net/ndisc.h).
  ...
2008-02-15 07:33:07 -08:00
Rémi Denis-Courmont
997b37da15 [NET]: Make sure sockets implement splice_read
Fixes a segmentation fault when trying to splice from a non-TCP socket.

Signed-off-by: Rémi Denis-Courmont <rdenis@simphalempin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-15 02:35:45 -08:00
Herbert Xu
b5c15fc004 [IPV6]: Fix reversed local_df test in ip6_fragment
I managed to reverse the local_df test when forward-porting this
patch so it actually makes things worse by never fragmenting at
all.

Thanks to David Stevens for testing and reporting this bug.

Bill Fink pointed out that the local_df setting is also the wrong
way around.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-14 23:49:37 -08:00
Jan Blunck
1d957f9bf8 Introduce path_put()
* Add path_put() functions for releasing a reference to the dentry and
  vfsmount of a struct path in the right order

* Switch from path_release(nd) to path_put(&nd->path)

* Rename dput_path() to path_put_conditional()

[akpm@linux-foundation.org: fix cifs]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:33 -08:00
Jan Blunck
4ac9137858 Embed a struct path into struct nameidata instead of nd->{dentry,mnt}
This is the central patch of a cleanup series. In most cases there is no good
reason why someone would want to use a dentry for itself. This series reflects
that fact and embeds a struct path into nameidata.

Together with the other patches of this series
- it enforced the correct order of getting/releasing the reference count on
  <dentry,vfsmount> pairs
- it prepares the VFS for stacking support since it is essential to have a
  struct path in every place where the stack can be traversed
- it reduces the overall code size:

without patch series:
   text    data     bss     dec     hex filename
5321639  858418  715768 6895825  6938d1 vmlinux

with patch series:
   text    data     bss     dec     hex filename
5320026  858418  715768 6894212  693284 vmlinux

This patch:

Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix cifs]
[akpm@linux-foundation.org: fix smack]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:33 -08:00
YOSHIFUJI Hideaki
073a371987 [XFRM]: Avoid bogus BUG() when throwing new policy away.
From: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

When we destory a new policy entry, we need to tell
xfrm_policy_destroy() explicitly that the entry is not
alive yet.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-14 14:52:38 -08:00
Kazunori MIYAZAWA
a4d6b8af1e [AF_KEY]: Fix bug in spdadd
This patch fix a BUG when adding spds which have same selector.

Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-14 14:51:38 -08:00
Jozsef Kadlecsik
d0c1fd7a8f [NETFILTER] nf_conntrack_proto_tcp.c: Mistyped state corrected.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-14 14:50:21 -08:00
Paul Mundt
0f4bda005f net: xfrm statistics depend on INET
net/built-in.o: In function `xfrm_policy_init':
/home/pmundt/devel/git/sh-2.6.25/net/xfrm/xfrm_policy.c:2338: undefined reference to `snmp_mib_init'

snmp_mib_init() is only built in if CONFIG_INET is set.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-14 14:48:45 -08:00
Trond Myklebust
cbc2005925 SUNRPC: Declare as const the rpc_message arguments to rpc_call_sync/async
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-14 11:17:24 -05:00
Adrian Bunk
f51f5ec690 [NETFILTER]: make secmark_tg_destroy() static
This patch makes the needlessly global secmark_tg_destroy() static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-13 17:41:39 -08:00
Adrian Bunk
324b57619b [INET]: Unexport inet_listen_wlock
This patch removes the no longer used EXPORT_SYMBOL(inet_listen_wlock).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-13 17:40:25 -08:00
Adrian Bunk
74da4d34e4 [INET]: Unexport __inet_hash_connect
This patch removes the unused EXPORT_SYMBOL_GPL(__inet_hash_connect).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-13 17:39:34 -08:00
Randy Dunlap
bc2cda1ebd docbook: make a networking book and fix a few errors
Move networking (core and drivers) docbook to its own networking book.
Fix a few kernel-doc errors in header and source files.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-13 16:21:19 -08:00
Randy Dunlap
65b6e42cdc docbook: sunrpc filenames and notation fixes
Use updated file list for docbook files and
fix kernel-doc warnings in sunrpc:
Warning(linux-2.6.24-git12//net/sunrpc/rpc_pipe.c:689): No description found for parameter 'rpc_client'
Warning(linux-2.6.24-git12//net/sunrpc/rpc_pipe.c:765): No description found for parameter 'flags'
Warning(linux-2.6.24-git12//net/sunrpc/clnt.c:584): No description found for parameter 'tk_ops'
Warning(linux-2.6.24-git12//net/sunrpc/clnt.c:618): No description found for parameter 'bufsize'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-13 16:21:19 -08:00
Harvey Harrison
b5606c2d44 remove final fastcall users
fastcall always expands to empty, remove it.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-13 16:21:18 -08:00
Matti Linnanvuori
d8b2a4d21e [NET]: Fix race in dev_close(). (Bug 9750)
There is a race in Linux kernel file net/core/dev.c, function dev_close.
The function calls function dev_deactivate, which calls function
dev_watchdog_down that deletes the watchdog timer. However, after that, a
driver can call netif_carrier_ok, which calls function
__netdev_watchdog_up that can add the watchdog timer again. Function
unregister_netdevice calls function dev_shutdown that traps the bug
!timer_pending(&dev->watchdog_timer). Moving dev_deactivate after
netif_running() has been cleared prevents function netif_carrier_on
from calling __netdev_watchdog_up and adding the watchdog timer again.

Signed-off-by: Matti Linnanvuori <mattilinnanvuori@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 23:11:16 -08:00
Herbert Xu
b318e0e4ef [IPSEC]: Fix bogus usage of u64 on input sequence number
Al Viro spotted a bogus use of u64 on the input sequence number which
is big-endian.  This patch fixes it by giving the input sequence number
its own member in the xfrm_skb_cb structure.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 22:50:35 -08:00
Laszlo Attila Toth
45b5035482 [RTNETLINK]: Send a single notification on device state changes.
In do_setlink() a single notification is sent at the end of the
function if any modification occured. If the address has been changed,
another notification is sent.

Both of them is required because originally only the NETDEV_CHANGEADDR
notification was sent and although device state change implies address
change, some programs may expect the original notification. It remains
for compatibity.

If set_operstate() is called from do_setlink(), it doesn't send a
notification, only if it is called from rtnl_create_link() as earlier.

Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 22:42:09 -08:00
Pavel Emelyanov
370125f0a4 [NETLABLE]: Hide netlbl_unlabel_audit_addr6 under ifdef CONFIG_IPV6.
This one is called from under this config only, so move
it in the same place.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 22:38:06 -08:00
Pavel Emelyanov
56628b1d89 [NETLABEL]: Don't produce unused variables when IPv6 is off.
Some code declares variables on the stack, but uses them
under #ifdef CONFIG_IPV6, so thay become unused when ipv6
is off. Fortunately, they are used in a switch's case
branches, so the fix is rather simple.

Is it OK from coding style POV to add braces inside "cases",
or should I better avoid such style and rework the patch?

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 22:37:19 -08:00
Pavel Emelyanov
94de7feb2d [NETLABEL]: Compilation for CONFIG_AUDIT=n case.
The audit_log_start() will expand into an empty do { } while (0)
construction and the audit_ctx becomes unused.

The solution: push current->audit_context into audit_log_start()
directly, since it is not required in any other place in the 
calling function.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 22:35:37 -08:00
Pavel Emelyanov
910d6c320c [GENETLINK]: Relax dances with genl_lock.
The genl_unregister_family() calls the genl_unregister_mc_groups(), 
which takes and releases the genl_lock and then locks and releases
this lock itself.

Relax this behavior, all the more so the genl_unregister_mc_groups() 
is called from genl_unregister_family() only.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 22:16:33 -08:00
Pavel Emelyanov
4c3a0a254e [NETLABEL]: Fix lookup logic of netlbl_domhsh_search_def.
Currently, if the call to netlbl_domhsh_search succeeds the
return result will still be NULL.

Fix that, by returning the found entry (if any).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 22:15:14 -08:00
Urs Thuermann
fee54fa517 [NET]: Fix comment for skb_pull_rcsum
Fix comment for skb_pull_rcsum

Signed-off-by: Urs Thuermann <urs@isnogud.escape.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 22:03:25 -08:00
Herbert Xu
28a89453b1 [IPV6]: Fix IPsec datagram fragmentation
This is a long-standing bug in the IPsec IPv6 code that breaks
when we emit a IPsec tunnel-mode datagram packet.  The problem
is that the code the emits the packet assumes the IPv6 stack
will fragment it later, but the IPv6 stack assumes that whoever
is emitting the packet is going to pre-fragment the packet.

In the long term we need to fix both sides, e.g., to get the
datagram code to pre-fragment as well as to get the IPv6 stack
to fragment locally generated tunnel-mode packet.

For now this patch does the second part which should make it
work for the IPsec host case.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 18:07:27 -08:00
David S. Miller
69cc64d8d9 [NDISC]: Fix race in generic address resolution
Frank Blaschka provided the bug report and the initial suggested fix
for this bug.  He also validated this version of this fix.

The problem is that the access to neigh->arp_queue is inconsistent, we
grab references when dropping the lock lock to call
neigh->ops->solicit() but this does not prevent other threads of
control from trying to send out that packet at the same time causing
corruptions because both code paths believe they have exclusive access
to the skb.

The best option seems to be to hold the write lock on neigh->lock
during the ->solicit() call.  I looked at all of the ndisc_ops
implementations and this seems workable.  The only case that needs
special care is the IPV4 ARP implementation of arp_solicit().  It
wants to take neigh->lock as a reader to protect the header entry in
neigh->ha during the emission of the soliciation.  We can simply
remove the read lock calls to take care of that since holding the lock
as a writer at the caller providers a superset of the protection
afforded by the existing read locking.

The rest of the ->solicit() implementations don't care whether the
neigh is locked or not.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 17:54:17 -08:00
Jarek Poplawski
e848b583e0 [AX25] ax25_ds_timer: use mod_timer instead of add_timer
This patch changes current use of: init_timer(), add_timer()
and del_timer() to setup_timer() with mod_timer(), which
should be safer anyway.

Reported-by: Jann Traschewski <jann@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 17:53:34 -08:00
Jarek Poplawski
21fab4a86a [AX25] ax25_timer: use mod_timer instead of add_timer
According to one of Jann's OOPS reports it looks like
BUG_ON(timer_pending(timer)) triggers during add_timer()
in ax25_start_t1timer(). This patch changes current use
of: init_timer(), add_timer() and del_timer() to
setup_timer() with mod_timer(), which should be safer
anyway.

Reported-by: Jann Traschewski <jann@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 17:53:33 -08:00
Jarek Poplawski
4de211f1a2 [AX25] ax25_route: make ax25_route_lock BH safe
> =================================
> [ INFO: inconsistent lock state ]
> 2.6.24-dg8ngn-p02 #1
> ---------------------------------
> inconsistent {softirq-on-W} -> {in-softirq-R} usage.
> linuxnet/3046 [HC0[0]:SC1[2]:HE1:SE0] takes:
>  (ax25_route_lock){--.+}, at: [<f8a0cfb7>] ax25_get_route+0x18/0xb7 [ax25]
> {softirq-on-W} state was registered at:
...

This lockdep report shows that ax25_route_lock is taken for reading in
softirq context, and for writing in process context with BHs enabled.
So, to make this safe, all write_locks in ax25_route.c are changed to
_bh versions.

Reported-by: Jann Traschewski <jann@gmx.de>,
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 17:53:32 -08:00
Jarek Poplawski
1105b5d1d4 [AX25] af_ax25: remove sock lock in ax25_info_show()
This lockdep warning:

> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.24 #3
> -------------------------------------------------------
> swapper/0 is trying to acquire lock:
>  (ax25_list_lock){-+..}, at: [<f91dd3b1>] ax25_destroy_socket+0x171/0x1f0 [ax25]
>
> but task is already holding lock:
>  (slock-AF_AX25){-+..}, at: [<f91dbabc>] ax25_std_heartbeat_expiry+0x1c/0xe0 [ax25]
>
> which lock already depends on the new lock.
...

shows that ax25_list_lock and slock-AF_AX25 are taken in different
order: ax25_info_show() takes slock (bh_lock_sock(ax25->sk)) while
ax25_list_lock is held, so reversely to other functions. To fix this
the sock lock should be moved to ax25_info_start(), and there would
be still problem with breaking ax25_list_lock (it seems this "proper"
order isn't optimal yet). But, since it's only for reading proc info
it seems this is not necessary (e.g.  ax25_send_to_raw() does similar
reading without this lock too).

So, this patch removes sock lock to avoid deadlock possibility; there
is also used sock_i_ino() function, which reads sk_socket under proper
read lock. Additionally printf format of this i_ino is changed to %lu.

Reported-by: Bernard Pidoux F6BVP <f6bvp@free.fr>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 17:53:31 -08:00
Stephen Hemminger
8315f5d80a fib_trie: /proc/net/route performance improvement
Use key/offset caching to change /proc/net/route (use by iputils route)
from O(n^2) to O(n). This improves performance from 30sec with 160,000
routes to 1sec.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 17:53:31 -08:00
Stephen Hemminger
ec28cf738d fib_trie: handle empty tree
This fixes possible problems when trie_firstleaf() returns NULL
to trie_leafindex().

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 17:53:30 -08:00
David S. Miller
e4f8b5d4ed [IPV4]: Remove IP_TOS setting privilege checks.
Various RFCs have all sorts of things to say about the CS field of the
DSCP value.  In particular they try to make the distinction between
values that should be used by "user applications" and things like
routing daemons.

This seems to have influenced the CAP_NET_ADMIN check which exists for
IP_TOS socket option settings, but in fact it has an off-by-one error
so it wasn't allowing CS5 which is meant for "user applications" as
well.

Further adding to the inconsistency and brokenness here, IPV6 does not
validate the DSCP values specified for the IPV6_TCLASS socket option.

The real actual uses of these TOS values are system specific in the
final analysis, and these RFC recommendations are just that, "a
recommendation".  In fact the standards very purposefully use
"SHOULD" and "SHOULD NOT" when describing how these values can be
used.

In the final analysis the only clean way to provide consistency here
is to remove the CAP_NET_ADMIN check.  The alternatives just don't
work out:

1) If we add the CAP_NET_ADMIN check to ipv6, this can break existing
   setups.

2) If we just fix the off-by-one error in the class comparison in
   IPV4, certain DSCP values can be used in IPV6 but not IPV4 by
   default.  So people will just ask for a sysctl asking to
   override that.

I checked several other freely available kernel trees and they
do not make any privilege checks in this area like we do.  For
the BSD stacks, this goes back all the way to Stevens Volume 2
and beyond.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 17:53:29 -08:00
Linus Torvalds
0c0d61ca93 Merge branch 'for-linus' of git://linux-nfs.org/~bfields/linux
* 'for-linus' of git://linux-nfs.org/~bfields/linux:
  SUNPRC: Fix printk format warning
  nfsd: clean up svc_reserve_auth()
  NLM: don't requeue block if it was invalidated while GRANT_MSG was in flight
  NLM: don't reattempt GRANT_MSG when there is already an RPC in flight
  NLM: have server-side RPC clients default to soft RPC tasks
  NLM: set RPC_CLNT_CREATE_NOPING for NLM RPC clients
2008-02-11 09:19:47 -08:00
Roland Dreier
bb50c8012c SUNPRC: Fix printk format warning
net/sunrpc/xprtrdma/svc_rdma_sendto.c:160: warning: format '%llx'
expects type 'long long unsigned int', but argument 3 has type 'u64'

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-10 18:11:22 -05:00
David S. Miller
30ddb159ff [PKT_SCHED] ematch: Fix build warning.
Commit 954415e33e ("[PKT_SCHED] ematch:
tcf_em_destroy robustness") removed a cast on em->data when
passing it to kfree(), but em->data is an integer type that can
hold pointers as well as other values so the cast is necessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-10 03:48:15 -08:00
Jarek Poplawski
21347456ab [NET_SCHED] sch_htb: htb_requeue fix
htb_requeue() enqueues skbs for which htb_classify() returns NULL.
This is wrong because such skbs could be handled by NET_CLS_ACT code,
and the decision could be different than earlier in htb_enqueue().
So htb_requeue() is changed to work and look more like htb_enqueue().

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-09 23:44:00 -08:00
Rami Rosen
238fc7eac8 [IPV6]: Replace using the magic constant "1024" with IP6_RT_PRIO_USER for fc_metric.
This patch replaces the explicit usage of the magic constant "1024"
with IP6_RT_PRIO_USER in the IPV6 tree.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-09 23:43:11 -08:00
Stephen Hemminger
954415e33e [PKT_SCHED] ematch: tcf_em_destroy robustness
Make the code in tcf_em_tree_destroy more robust and cleaner:
 * Don't need to cast pointer to kfree() or avoid passing NULL.
 * After freeing the tree, clear the pointer to avoid possible problems
from repeated free.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-09 23:26:53 -08:00
Stephen Hemminger
ed7af3b350 [PKT_SCHED]: deinline functions in meta match
A couple of functions in meta match don't need to be inline.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-09 23:26:17 -08:00
Pavel Emelyanov
8ff65b4603 [SCTP]: Convert sctp_dbg_objcnt to seq files.
This makes the code use a good proc API and the text ~50 bytes shorter.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-09 23:24:58 -08:00
Pavel Emelyanov
3f5340a67e [SCTP]: Use snmp_fold_field instead of a homebrew analogue.
SCPT already depends in INET, so this doesn't create additional
dependencies.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-09 23:23:44 -08:00
Denis V. Lunev
cd557bc1c1 [IGMP]: Optimize kfree_skb in igmp_rcv.
Merge error paths inside igmp_rcv.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-09 23:22:26 -08:00
Pavel Emelyanov
bd2f747658 [KEY]: Convert net/pfkey to use seq files.
The seq files API disposes the caller of the difficulty of
checking file position, the length of data to produce and
the size of provided buffer.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-09 23:20:06 -08:00
Pavel Emelyanov
61145aa1a1 [KEY]: Clean up proc files creation a bit.
Mainly this removes ifdef-s from inside the ipsec_pfkey_init.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-09 23:19:14 -08:00
Stephen Hemminger
268bcca1e7 [PKT_SCHED] ematch: oops from uninitialized variable (resend)
Setting up a meta match causes a kernel OOPS because of uninitialized
elements in tree.

[   37.322381] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[   37.322381] IP: [<ffffffff883fc717>] :em_meta:em_meta_destroy+0x17/0x80

[   37.322381] Call Trace:
[   37.322381]  [<ffffffff803ec83d>] tcf_em_tree_destroy+0x2d/0xa0
[   37.322381]  [<ffffffff803ecc8c>] tcf_em_tree_validate+0x2dc/0x4a0
[   37.322381]  [<ffffffff803f06d2>] nla_parse+0x92/0xe0
[   37.322381]  [<ffffffff883f9672>] :cls_basic:basic_change+0x202/0x3c0
[   37.322381]  [<ffffffff802a3917>] kmem_cache_alloc+0x67/0xa0
[   37.322381]  [<ffffffff803ea221>] tc_ctl_tfilter+0x3b1/0x580
[   37.322381]  [<ffffffff803dffd0>] rtnetlink_rcv_msg+0x0/0x260
[   37.322381]  [<ffffffff803ee944>] netlink_rcv_skb+0x74/0xa0
[   37.322381]  [<ffffffff803dffc8>] rtnetlink_rcv+0x18/0x20
[   37.322381]  [<ffffffff803ee6c3>] netlink_unicast+0x263/0x290
[   37.322381]  [<ffffffff803cf276>] __alloc_skb+0x96/0x160
[   37.322381]  [<ffffffff803ef014>] netlink_sendmsg+0x274/0x340
[   37.322381]  [<ffffffff803c7c3b>] sock_sendmsg+0x12b/0x140
[   37.322381]  [<ffffffff8024de90>] autoremove_wake_function+0x0/0x30
[   37.322381]  [<ffffffff8024de90>] autoremove_wake_function+0x0/0x30
[   37.322381]  [<ffffffff803c7c3b>] sock_sendmsg+0x12b/0x140
[   37.322381]  [<ffffffff80288611>] zone_statistics+0xb1/0xc0
[   37.322381]  [<ffffffff803c7e5e>] sys_sendmsg+0x20e/0x360
[   37.322381]  [<ffffffff803c7411>] sockfd_lookup_light+0x41/0x80
[   37.322381]  [<ffffffff8028d04b>] handle_mm_fault+0x3eb/0x7f0
[   37.322381]  [<ffffffff8020c2fb>] system_call_after_swapgs+0x7b/0x80

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-09 03:47:19 -08:00
David S. Miller
ab1ecbabb1 Merge branch 'pending' of master.kernel.org:/pub/scm/linux/kernel/git/vxy/lksctp-dev 2008-02-09 03:44:25 -08:00
Linus Torvalds
3668805a54 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
  [IPSEC] flow: reorder "struct flow_cache_entry" and remove SLAB_HWCACHE_ALIGN
  [DECNET] ROUTE: remove unecessary alignment
  [IPSEC]: Add support for aes-ctr.
  [ISDN]: fix section mismatch warning in enpci_card_msg
  [TIPC]: declare proto_ops structures as 'const'.
  [TIPC]: Kill unused static inline (x5)
  [TC]: oops in em_meta
  [IPV6] Minor cleanup: remove unused definitions in net/ip6_fib.h
  [IPV6] Minor clenup: remove two unused definitions in net/ip6_route.h
  [AF_IUCV]: defensive programming of iucv_callback_txdone
  [AF_IUCV]: broken send_skb_q results in endless loop
  [IUCV]: wrong irq-disabling locking at module load time
  [CAN]: Minor clean-ups
  [CAN]: Move proto_{,un}register() out of spin-locked region
  [CAN]: Clean up module auto loading
  [IPSEC] flow: Remove an unnecessary ____cacheline_aligned
  [IPV4]: route: fix crash ip_route_input
  [NETFILTER]: xt_iprange: add missing #include
  [NETFILTER]: xt_iprange: fix typo in address family
  [NETFILTER]: nf_conntrack: fix ct_extend ->move operation
  ...
2008-02-08 09:27:06 -08:00
Pavel Emelyanov
cbdc738732 namespaces: mark NET_NS with "depends on NAMESPACES"
There's already an option controlling the net namespaces cloning code, so make
it work the same way as all the other namespaces' options.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:23 -08:00
Eric Dumazet
dd5a1843d5 [IPSEC] flow: reorder "struct flow_cache_entry" and remove SLAB_HWCACHE_ALIGN
1) We can shrink sizeof(struct flow_cache_entry) by 8 bytes on 64bit arches.
2) No need to align these structures to hardware cache lines, this only waste 
   ram for very litle gain.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 23:30:42 -08:00
Eric Dumazet
fca09fb732 [DECNET] ROUTE: remove unecessary alignment
Same alignment requirement was removed on IP route cache in the past.

This alignment actually has bad effect on 32 bit arches, uniprocessor,
since sizeof(dn_rt_hash_bucket) is forced to 8 bytes instead of 4.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 23:29:57 -08:00
Joy Latten
405137d16f [IPSEC]: Add support for aes-ctr.
The below patch allows IPsec to use CTR mode with AES encryption
algorithm. Tested this using setkey in ipsec-tools.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 23:11:56 -08:00
Florian Westphal
bca65eae39 [TIPC]: declare proto_ops structures as 'const'.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 18:18:01 -08:00
Ilpo Järvinen
86121fe5b4 [TIPC]: Kill unused static inline (x5)
All these static inlines are unused:

in_own_zone     1 (net/tipc/addr.h)
msg_dataoctet   1 (net/tipc/msg.h)
msg_direct      1 (include/net/tipc/tipc_msg.h)
msg_options     1 (include/net/tipc/tipc_msg.h)
tipc_nmap_get   1 (net/tipc/bcast.h)

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 18:17:13 -08:00
Stephen Hemminger
04f217aca4 [TC]: oops in em_meta
If userspace passes a unknown match index into em_meta, then
em_meta_change will return an error and the data for the match will
not be set. This then causes an null pointer dereference when the
cleanup is done in the error path via tcf_em_tree_destroy. Since the
tree structure comes kzalloc, it is initialized to NULL.

Discovered when testing a new version of tc command against an
accidental older kernel.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 18:13:00 -08:00
Ursula Braun
f2a77991a9 [AF_IUCV]: defensive programming of iucv_callback_txdone
The loop in iucv_callback_txdone presumes existence of an entry
with msg->tag in the send_skb_q list. In error cases this
assumption might be wrong and might cause an endless loop.
Loop is rewritten to guarantee loop end in case of missing
msg->tag entry in send_skb_q.

Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 18:07:44 -08:00
Ursula Braun
d44447229e [AF_IUCV]: broken send_skb_q results in endless loop
A race has been detected in iucv_callback_txdone().
skb_unlink has to be done inside the locked area.

In addition checkings for successful allocations are inserted.

Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 18:07:19 -08:00
Ursula Braun
435bc9dfc6 [IUCV]: wrong irq-disabling locking at module load time
Linux may hang when running af_iucv socket programs concurrently
with a load of module netiucv. iucv_register() tries to take the
iucv_table_lock with spin_lock_irq. This conflicts with
iucv_connect() which has a need for an smp_call_function while
holding the iucv_table_lock.
Solution: use bh-disabling locking in iucv_register()

Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 18:06:52 -08:00
Urs Thuermann
a219994bf5 [CAN]: Minor clean-ups
Remove unneeded variable.
Rename local variable error to err like in all other places.
Some white-space changes.

Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 18:05:04 -08:00
Urs Thuermann
a2fea5f19f [CAN]: Move proto_{,un}register() out of spin-locked region
The implementation of proto_register() has changed so that it can now
sleep.  The call to proto_register() must be moved out of the
spin-locked region.

Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 18:04:45 -08:00
Urs Thuermann
5423dd67bd [CAN]: Clean up module auto loading
Remove local char array to construct module name.
Don't call request_module() when CONFIG_KMOD is not set.

Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 18:04:21 -08:00
Eric Dumazet
5f58a5c872 [IPSEC] flow: Remove an unnecessary ____cacheline_aligned
We use a percpu variable named flow_hash_info, which holds 12 bytes.

It is currently marked as ____cacheline_aligned, which makes linker
skip space to properly align this variable.

Before :
c065cc90 D per_cpu__softnet_data
c065cd00 d per_cpu__flow_tables
<Here, hole of 124 bytes>
c065cd80 d per_cpu__flow_hash_info
<Here, hole of 116 bytes>
c065ce00 d per_cpu__flow_flush_tasklets
c065ce14 d per_cpu__rt_cache_stat


This alignement is quite unproductive, and removing it reduces the
size of percpu data (by 240 bytes on my x86 machine), and improves
performance (flow_tables & flow_hash_info can share a single cache
line)

After patch :
c065cc04 D per_cpu__softnet_data
c065cc4c d per_cpu__flow_tables
c065cc50 d per_cpu__flow_hash_info
c065cc5c d per_cpu__flow_flush_tasklets
c065cc70 d per_cpu__rt_cache_stat

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 18:03:18 -08:00
Patrick McHardy
4136cd523e [IPV4]: route: fix crash ip_route_input
ip_route_me_harder() may call ip_route_input() with skbs that don't
have skb->dev set for skbs rerouted in LOCAL_OUT and TCP resets
generated by the REJECT target, resulting in a crash when dereferencing
skb->dev->nd_net. Since ip_route_input() has an input device argument,
it seems correct to use that one anyway.

Bug introduced in b5921910a1 (Routing cache virtualization).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 17:58:20 -08:00
Jan Engelhardt
5da621f1c5 [NETFILTER]: xt_iprange: add missing #include
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 17:57:11 -08:00
Patrick McHardy
d9d17578d9 [NETFILTER]: xt_iprange: fix typo in address family
The family for iprange_mt4 should be AF_INET, not AF_INET6.
Noticed by Jiri Moravec <jim.lkml@gmail.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 17:56:49 -08:00
Patrick McHardy
86577c661b [NETFILTER]: nf_conntrack: fix ct_extend ->move operation
The ->move operation has two bugs:

- It is called with the same extension as source and destination,
  so it doesn't update the new extension.

- The address of the old extension is calculated incorrectly,
  instead of (void *)ct->ext + ct->ext->offset[i] it uses
  ct->ext + ct->ext->offset[i].

Fixes a crash on x86_64 reported by Chuck Ebbert <cebbert@redhat.com>
and Thomas Woerner <twoerner@redhat.com>.

Tested-by: Thomas Woerner <twoerner@redhat.com>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 17:56:34 -08:00
Jozsef Kadlecsik
b2155e7f70 [NETFILTER]: nf_conntrack: TCP conntrack reopening fix
TCP connection tracking in netfilter did not handle TCP reopening
properly: active close was taken into account for one side only and
not for any side, which is fixed now. The patch includes more comments
to explain the logic how the different cases are handled.
The bug was discovered by Jeff Chua.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-07 17:54:56 -08:00
David Howells
e231c2ee64 Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p)
Convert instances of ERR_PTR(PTR_ERR(p)) to ERR_CAST(p) using:

perl -spi -e 's/ERR_PTR[(]PTR_ERR[(](.*)[)][)]/ERR_CAST(\1)/' `grep -rl 'ERR_PTR[(]*PTR_ERR' fs crypto net security`

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:26 -08:00
Vlad Yasevich
5f9646c3d9 [SCTP]: Make sure the chunk is off the transmitted list prior to freeing.
In a few instances, we need to remove the chunk from the transmitted list
prior to freeing it.  This is because the free code doesn't do that any
more and so we need to do it manually.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2008-02-06 21:27:39 -05:00
Wei Yongjun
a869981423 [SCTP]: Fix kernel panic while received ASCONF chunk with bad serial number
While recevied ASCONF chunk with serial number less then needed, kernel
will treat this chunk as a retransmitted ASCONF chunk and find cached
ASCONF-ACK chunk used sctp_assoc_lookup_asconf_ack(). But this function
will always return NO-NULL. So response with cached ASCONF-ACKs chunk
will cause kernel panic.
In function sctp_assoc_lookup_asconf_ack(), if the cached ASCONF-ACKs
list asconf_ack_list is empty, or if the serial being requested does not
exists, the function as it currectly stands returns the actuall
list_head asoc->asconf_ack_list, this is not a cache ASCONF-ACK chunk
but a bogus pointer.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2008-02-06 21:27:39 -05:00
Vlad Yasevich
b46ae36de4 [SCTP]: Set ports in every address returned by sctp_getladdrs()
Thomas Dreibholz has reported that port numbers are not filled
in the results of sctp_getladdrs() when the socket was bound
to an ephemeral port.  This is only true, if the address was
not specified either.  So, fill in the port number correctly.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2008-02-06 21:27:39 -05:00
Vlad Yasevich
c068be5491 [SCTP]: Correctly reap SSNs when processing FORWARD_TSN chunk
When we recieve a FORWARD_TSN chunk, we need to reap
all the queued fast-forwarded chunks from the ordering queue
 However, if we don't have them queued, we need to see if
the next expected one is there as well.  If it is, start
deliver from that point instead of waiting for the next
chunk to arrive.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2008-02-06 21:26:26 -05:00
Andrew Morton
7276744354 9p: fix p9_printfcall export
ERROR: "p9_printfcall" [net/9p/9pnet_virtio.ko] undefined!

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-02-06 19:25:01 -06:00
Eric Van Hensbergen
8a0dc95fd9 9p: transport API reorganization
This merges the mux.c (including the connection interface) with trans_fd
in preparation for transport API changes.  Ultimately, trans_fd will need
to be rewritten to clean it up and simplify the implementation, but this
reorganization is viewed as the first step.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-02-06 19:25:03 -06:00
Eric Van Hensbergen
f39335453f 9p: add remove function to trans_virtio
Request from rusty:
Just cleaning up patches for 2.6.25 merge, and noticed that
net/9p/trans_virtio.c doesn't have a remove function.  This will crash when
removing the module (console doesn't have one because it can't really be
removed).

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-02-06 19:25:04 -06:00
Anthony Liguori
dea7bbb603 9p: Convert semaphore to spinlock for p9_idpool
When booting from v9fs, down_interruptible in p9_idpool_get() triggered a BUG
as it was being called with IRQs disabled.  A spinlock seems like the right
thing to be using since the idr functions go out of their way not to sleep.

This patch eliminates the BUG by converting the semaphore to a spinlock.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-02-06 19:25:04 -06:00
Eric Van Hensbergen
7c7d90f2dd 9p: Fix soft lockup in virtio transport
This fixes a poorly placed spinlock which could result in a
soft lockup condition.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-02-06 19:25:07 -06:00
Eric Van Hensbergen
e2735b7720 9p: block-based virtio client
This replaces the console-based virto client with a block-based
client using a single request queue.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-02-06 19:25:58 -06:00
Eric Van Hensbergen
043aba403e 9p: create transport rpc cut-thru
Add a new transport function which allows a cut-thru directly to
the transport instead of processing request through the mux if the
cut-thru exists.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-02-06 19:25:09 -06:00
Martin Stava
afcf0c13ae 9p: fix bug in p9_clone_stat
This patch fixes a bug in the copying of 9P
stat information where string references
weren't being updated properly.

Signed-off-by: Martin Sava <martin.stava@gmail.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-02-06 19:20:44 -06:00
Sven Wegener
9c1ca6e68a ipvs: Make wrr "no available servers" error message rate-limited
No available servers is more an error message than something informational. It
should also be rate-limited, else we're going to flood our logs on a busy
director, if all real servers are out of order with a weight of zero.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 20:00:10 -08:00
David S. Miller
a29961b33b Merge branch 'fixes' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6 2008-02-05 19:58:05 -08:00
Patrick McHardy
9ec138101f [NET_SCHED]: cls_flow: support classification based on VLAN tag
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 16:21:04 -08:00
Patrick McHardy
4f25049106 [NET_SCHED]: cls_flow: fix key mask validity check
Since we're using fls(), we need to check whether the value is
non-zero first.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 16:19:59 -08:00
Patrick McHardy
0ea9d70df8 [NET_SCHED]: em_meta: fix compile warning
net/sched/em_meta.c: In function 'meta_int_vlan_tag':
net/sched/em_meta.c:179: warning: 'tag' may be used uninitialized in this function

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 16:19:33 -08:00
Michael Buesch
03ac7a8141 mac80211: Is not EXPERIMENTAL anymore
Remove the EXPERIMENTAL dependency, as the existing mac80211
features are stable.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-05 14:35:47 -05:00
Linus Torvalds
3d412f60b7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
  [PKT_SCHED]: vlan tag match
  [NET]: Add if_addrlabel.h to sanitized headers.
  [NET] rtnetlink.c: remove no longer used functions
  [ICMP]: Restore pskb_pull calls in receive function
  [INET]: Fix accidentally broken inet(6)_hash_connect's port offset calculations.
  [NET]: Remove further references to net-modules.txt
  bluetooth rfcomm tty: destroy before tty_close()
  bluetooth: blacklist another Broadcom BCM2035 device
  drivers/bluetooth/btsdio.c: fix double-free
  drivers/bluetooth/bpa10x.c: fix memleak
  bluetooth: uninlining
  bluetooth: hidp_process_hid_control remove unnecessary parameter dealing
  tun: impossible to deassert IFF_ONE_QUEUE or IFF_NO_PI
  hamradio: fix dmascc section mismatch
  [SCTP]: Fix kernel panic while received AUTH chunk with BAD shared key identifier
  [SCTP]: Fix kernel panic while received AUTH chunk while enabled auth
  [IPV4]: Formatting fix for /proc/net/fib_trie.
  [IPV6]: Fix sysctl compilation error.
  [NET_SCHED]: Add #ifdef CONFIG_NET_EMATCH in net/sched/cls_flow.c (latest git broken build)
  [IPV4]: Fix compile error building without CONFIG_FS_PROC
  ...
2008-02-05 10:09:07 -08:00
Paul Moore
eda61d32e8 NetLabel: introduce a new kernel configuration API for NetLabel
Add a new set of configuration functions to the NetLabel/LSM API so that
LSMs can perform their own configuration of the NetLabel subsystem without
relying on assistance from userspace.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:20 -08:00
Vlad Yasevich
01f2d38498 [SCTP]: Kill silly inlines in ulpqueue.c
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2008-02-05 10:59:30 -05:00
Vlad Yasevich
0eca8fee0c [SCTP]: Do not increase rwnd when reading partial notification.
When a user reads a partial notification message, do not
update rwnd since notifications must not be counted towards
receive window.

Tested-by: Oliver Roll <mail@oliroll.de>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2008-02-05 10:59:30 -05:00
Vlad Yasevich
60c778b259 [SCTP]: Stop claiming that this is a "reference implementation"
I was notified by Randy Stewart that lksctp claims to be
"the reference implementation".  First of all, "the
refrence implementation" was the original implementation
of SCTP in usersapce written ty Randy and a few others.
Second, after looking at the definiton of 'reference implementation',
we don't really meet the requirements.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2008-02-05 10:59:07 -05:00
Stephen Hemminger
3113e88c3c [PKT_SCHED]: vlan tag match
Provide a way to use tc filters on vlan tag even if tag is buried in
skb due to hardware acceleration.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 03:20:13 -08:00
Adrian Bunk
03245ce2f0 [NET] rtnetlink.c: remove no longer used functions
This patch removes the following no longer used functions:
- rtattr_parse()
- rtattr_strlcpy()
- __rtattr_parse_nested_compat()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 03:17:22 -08:00
Herbert Xu
8cf229437f [ICMP]: Restore pskb_pull calls in receive function
Somewhere along the development of my ICMP relookup patch the header
length check went AWOL on the non-IPsec path.  This patch restores the
check.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 03:15:50 -08:00
Pavel Emelyanov
5d8c0aa943 [INET]: Fix accidentally broken inet(6)_hash_connect's port offset calculations.
The port offset calculations depend on the protocol family, but, as
Adrian noticed, I broke this logic with the commit

	5ee31fc1ec
	[INET]: Consolidate inet(6)_hash_connect.

Return this logic back, by passing the port offset directly into the
consolidated function.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Noticed-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 03:14:44 -08:00
Dave Young
93d807401c bluetooth rfcomm tty: destroy before tty_close()
rfcomm dev could be deleted in tty_hangup, so we must not call
rfcomm_dev_del again to prevent from destroying rfcomm dev before tty
close.

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 03:12:06 -08:00
Andrew Morton
91f5cca3d1 bluetooth: uninlining
Remove all those inlines which were either a) unneeded or b) increased code
size.

          text    data     bss     dec     hex filename
before:   6997      74       8    7079    1ba7 net/bluetooth/hidp/core.o
after:    6492      74       8    6574    19ae net/bluetooth/hidp/core.o

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 03:07:58 -08:00
Dave Young
eff001e35a bluetooth: hidp_process_hid_control remove unnecessary parameter dealing
According to the bluetooth HID spec v1.0 chapter 7.4.2

"This code requests a major state change in a BT-HID device.  A HID_CONTROL
request does not generate a HANDSHAKE response."

"A HID_CONTROL packet with a parameter of VIRTUAL_CABLE_UNPLUG is the only
HID_CONTROL packet a device can send to a host.  A host will ignore all other
packets."

So in the hidp_precess_hid_control function, we just need to deal with the
UNLUG packet.

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 03:07:14 -08:00
Wei Yongjun
7cc08b55fc [SCTP]: Fix kernel panic while received AUTH chunk with BAD shared key identifier
If SCTP-AUTH is enabled, received AUTH chunk with BAD shared key 
identifier will cause kernel panic.

Test as following:
step1: enabled /proc/sys/net/sctp/auth_enable
step 2:  connect  to SCTP server with auth capable. Association is 
established between endpoints. Then send a AUTH chunk with a bad 
shareid, SCTP server will kernel panic after received that AUTH chunk.

SCTP client                   SCTP server
  INIT         ---------->  
    (with auth capable)
               <----------    INIT-ACK
                              (with auth capable)
  COOKIE-ECHO  ---------->
               <----------    COOKIE-ACK
  AUTH         ---------->


AUTH chunk is like this:
  AUTH chunk
    Chunk type: AUTH (15)
    Chunk flags: 0x00
    Chunk length: 28
    Shared key identifier: 10
    HMAC identifier: SHA-1 (1)
    HMAC: 0000000000000000000000000000000000000000

The assignment of NULL to key can safely be removed, since key_for_each 
(which is just list_for_each_entry under the covers does an initial 
assignment to key anyway).

If the endpoint_shared_keys list is empty, or if the key_id being 
requested does not exist, the function as it currently stands returns 
the actuall list_head (in this case endpoint_shared_keys.  Since that 
list_head isn't surrounded by an actuall data structure, the last 
iteration through list_for_each_entry will do a container_of on key, and 
we wind up returning a bogus pointer, instead of NULL, as we should.

> Neil Horman wrote:
>> On Tue, Jan 22, 2008 at 05:29:20PM +0900, Wei Yongjun wrote:
>>
>> FWIW, Ack from me.  The assignment of NULL to key can safely be 
>> removed, since
>> key_for_each (which is just list_for_each_entry under the covers does 
>> an initial
>> assignment to key anyway).
>> If the endpoint_shared_keys list is empty, or if the key_id being 
>> requested does
>> not exist, the function as it currently stands returns the actuall 
>> list_head (in
>> this case endpoint_shared_keys.  Since that list_head isn't 
>> surrounded by an
>> actuall data structure, the last iteration through 
>> list_for_each_entry will do a
>> container_of on key, and we wind up returning a bogus pointer, 
>> instead of NULL,
>> as we should.  Wei's patch corrects that.
>>
>> Regards
>> Neil
>>
>> Acked-by: Neil Horman <nhorman@tuxdriver.com>
>>
>
> Yep, the patch is correct.
>
> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
>
> -vlad
>

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 03:03:06 -08:00
Wei Yongjun
d2f19fa13e [SCTP]: Fix kernel panic while received AUTH chunk while enabled auth
If STCP is started while /proc/sys/net/sctp/auth_enable is set 0 and
association is established between endpoints. Then if
/proc/sys/net/sctp/auth_enable is set 1, a received AUTH chunk will
cause kernel panic.

Test as following:
step 1: echo 0> /proc/sys/net/sctp/auth_enable
step 2:

   SCTP client                  SCTP server
      INIT          --------->
                    <---------   INIT-ACK
      COOKIE-ECHO   --------->
                    <---------   COOKIE-ACK
step 3:
    echo 1> /proc/sys/net/sctp/auth_enable
step 4:
   SCTP client                  SCTP server
       AUTH        ----------->  Kernel Panic


This patch fix this probleam to treat AUTH chunk as unknow chunk if peer 
has initialized with no auth capable.

> Sorry for the delay.  Was on vacation without net access.
>
> Wei Yongjun wrote:
>>
>>
>> This patch fix this probleam to treat AUTH chunk as unknow chunk if 
>> peer has initialized with no auth capable.
>>
>> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
>
> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
>
>>

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 03:02:26 -08:00
Denis V. Lunev
b9c4d82a85 [IPV4]: Formatting fix for /proc/net/fib_trie.
The line in the /proc/net/fib_trie for route with TOS specified
- has extra \n at the end
- does not have a space after route scope
like below.
           |-- 1.1.1.1
              /32 universe UNICASTtos =1

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 02:58:45 -08:00
Rami Rosen
0aead54347 [NET_SCHED]: Add #ifdef CONFIG_NET_EMATCH in net/sched/cls_flow.c (latest git broken build)
The 2.6 latest git build was broken when using the following
configuration options:
CONFIG_NET_EMATCH=n
CONFIG_NET_CLS_FLOW=y

with the following error:
net/sched/cls_flow.c: In function 'flow_dump':
net/sched/cls_flow.c:598: error: 'struct tcf_ematch_tree' has no
member named 'hdr'
make[2]: *** [net/sched/cls_flow.o] Error 1
make[1]: *** [net/sched] Error 2
make: *** [net] Error 2


see the recent post by Li Zefan:
  http://www.spinics.net/lists/netdev/msg54434.html

The reason for this crash is that struct tcf_ematch_tree
(net/pkt_cls.h) is empty when CONFIG_NET_EMATCH is not defined.

When CONFIG_NET_EMATCH is defined, the tcf_ematch_tree structure
indeed holds a struct tcf_ematch_tree_hdr (hdr) as flow_dump()
expects.

This patch adds #ifdef CONFIG_NET_EMATCH in flow_dump to avoid this.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 02:56:48 -08:00
Adrian Bunk
322c8a3c36 [IPSEC] xfrm4_beet_input(): fix an if()
A bug every C programmer makes at some point in time...

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 02:51:39 -08:00
Linus Torvalds
93890b71a3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (25 commits)
  virtio: balloon driver
  virtio: Use PCI revision field to indicate virtio PCI ABI version
  virtio: PCI device
  virtio_blk: implement naming for vda-vdz,vdaa-vdzz,vdaaa-vdzzz
  virtio_blk: Dont waste major numbers
  virtio_blk: provide getgeo
  virtio_net: parametrize the napi_weight for virtio receive queue.
  virtio: free transmit skbs when notified, not on next xmit.
  virtio: flush buffers on open
  virtnet: remove double ether_setup
  virtio: Allow virtio to be modular and used by modules
  virtio: Use the sg_phys convenience function.
  virtio: Put the virtio under the virtualization menu
  virtio: handle interrupts after callbacks turned off
  virtio: reset function
  virtio: populate network rings in the probe routine, not open
  virtio: Tweak virtio_net defines
  virtio: Net header needs hdr_len
  virtio: remove unused id field from struct virtio_blk_outhdr
  virtio: clarify NO_NOTIFY flag usage
  ...
2008-02-04 08:00:54 -08:00
Linus Torvalds
f5bb3a5e9d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (79 commits)
  Jesper Juhl is the new trivial patches maintainer
  Documentation: mention email-clients.txt in SubmittingPatches
  fs/binfmt_elf.c: spello fix
  do_invalidatepage() comment typo fix
  Documentation/filesystems/porting fixes
  typo fixes in net/core/net_namespace.c
  typo fix in net/rfkill/rfkill.c
  typo fixes in net/sctp/sm_statefuns.c
  lib/: Spelling fixes
  kernel/: Spelling fixes
  include/scsi/: Spelling fixes
  include/linux/: Spelling fixes
  include/asm-m68knommu/: Spelling fixes
  include/asm-frv/: Spelling fixes
  fs/: Spelling fixes
  drivers/watchdog/: Spelling fixes
  drivers/video/: Spelling fixes
  drivers/ssb/: Spelling fixes
  drivers/serial/: Spelling fixes
  drivers/scsi/: Spelling fixes
  ...
2008-02-04 07:58:52 -08:00
Rusty Russell
18445c4d50 virtio: explicit enable_cb/disable_cb rather than callback return.
It seems that virtio_net wants to disable callbacks (interrupts) before
calling netif_rx_schedule(), so we can't use the return value to do so.

Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback
now returns void, rather than a boolean.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:49:58 +11:00
Rusty Russell
a586d4f601 virtio: simplify config mechanism.
Previously we used a type/len pair within the config space, but this
seems overkill.  We now simply define a structure which represents the
layout in the config space: the config space can now only be extended
at the end.

The main driver-visible changes:
1) We indicate what fields are present with an explicit feature bit.
2) Virtqueues are explicitly numbered, and not in the config space.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:49:57 +11:00
Rusty Russell
f35d9d8aae virtio: Implement skb_partial_csum_set, for setting partial csums on untrusted packets.
Use it in virtio_net (replacing buggy version there), it's also going
to be used by TAP for partial csum support.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
2008-02-04 23:49:56 +11:00
Oliver Pinter
53379e57a7 typo fixes in net/core/net_namespace.c
Signed-off-by: Oliver Pinter <oliver.pntr@gmail.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:56:48 +02:00
Oliver Pinter
1dbede8714 typo fix in net/rfkill/rfkill.c
Signed-off-by: Oliver Pinter <oliver.pntr@gmail.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:55:45 +02:00
Oliver Pinter
ac461a0330 typo fixes in net/sctp/sm_statefuns.c
Signed-off-by: Oliver Pinter <oliver.pntr@gmail.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:52:41 +02:00
David S. Miller
a80f509f4a Merge branch 'fixes' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6 2008-02-03 04:43:34 -08:00
Arnaldo Carvalho de Melo
ab1e0a13d7 [SOCK] proto: Add hashinfo member to struct proto
This way we can remove TCP and DCCP specific versions of

sk->sk_prot->get_port: both v4 and v6 use inet_csk_get_port
sk->sk_prot->hash:     inet_hash is directly used, only v6 need
                       a specific version to deal with mapped sockets
sk->sk_prot->unhash:   both v4 and v6 use inet_hash directly

struct inet_connection_sock_af_ops also gets a new member, bind_conflict, so
that inet_csk_get_port can find the per family routine.

Now only the lookup routines receive as a parameter a struct inet_hashtable.

With this we further reuse code, reducing the difference among INET transport
protocols.

Eventually work has to be done on UDP and SCTP to make them share this
infrastructure and get as a bonus inet_diag interfaces so that iproute can be
used with these protocols.

net-2.6/net/ipv4/inet_hashtables.c:
  struct proto			     |   +8
  struct inet_connection_sock_af_ops |   +8
 2 structs changed
  __inet_hash_nolisten               |  +18
  __inet_hash                        | -210
  inet_put_port                      |   +8
  inet_bind_bucket_create            |   +1
  __inet_hash_connect                |   -8
 5 functions changed, 27 bytes added, 218 bytes removed, diff: -191

net-2.6/net/core/sock.c:
  proto_seq_show                     |   +3
 1 function changed, 3 bytes added, diff: +3

net-2.6/net/ipv4/inet_connection_sock.c:
  inet_csk_get_port                  |  +15
 1 function changed, 15 bytes added, diff: +15

net-2.6/net/ipv4/tcp.c:
  tcp_set_state                      |   -7
 1 function changed, 7 bytes removed, diff: -7

net-2.6/net/ipv4/tcp_ipv4.c:
  tcp_v4_get_port                    |  -31
  tcp_v4_hash                        |  -48
  tcp_v4_destroy_sock                |   -7
  tcp_v4_syn_recv_sock               |   -2
  tcp_unhash                         | -179
 5 functions changed, 267 bytes removed, diff: -267

net-2.6/net/ipv6/inet6_hashtables.c:
  __inet6_hash |   +8
 1 function changed, 8 bytes added, diff: +8

net-2.6/net/ipv4/inet_hashtables.c:
  inet_unhash                        | +190
  inet_hash                          | +242
 2 functions changed, 432 bytes added, diff: +432

vmlinux:
 16 functions changed, 485 bytes added, 492 bytes removed, diff: -7

/home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c:
  tcp_v6_get_port                    |  -31
  tcp_v6_hash                        |   -7
  tcp_v6_syn_recv_sock               |   -9
 3 functions changed, 47 bytes removed, diff: -47

/home/acme/git/net-2.6/net/dccp/proto.c:
  dccp_destroy_sock                  |   -7
  dccp_unhash                        | -179
  dccp_hash                          |  -49
  dccp_set_state                     |   -7
  dccp_done                          |   +1
 5 functions changed, 1 bytes added, 242 bytes removed, diff: -241

/home/acme/git/net-2.6/net/dccp/ipv4.c:
  dccp_v4_get_port                   |  -31
  dccp_v4_request_recv_sock          |   -2
 2 functions changed, 33 bytes removed, diff: -33

/home/acme/git/net-2.6/net/dccp/ipv6.c:
  dccp_v6_get_port                   |  -31
  dccp_v6_hash                       |   -7
  dccp_v6_request_recv_sock          |   +5
 3 functions changed, 5 bytes added, 38 bytes removed, diff: -33

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-03 04:28:52 -08:00
Linus Torvalds
63e9b66e29 Merge branch 'for-linus' of git://linux-nfs.org/~bfields/linux
* 'for-linus' of git://linux-nfs.org/~bfields/linux: (100 commits)
  SUNRPC: RPC program information is stored in unsigned integers
  SUNRPC: Move exported symbol definitions after function declaration part 2
  NLM: tear down RPC clients in nlm_shutdown_hosts
  SUNRPC: spin svc_rqst initialization to its own function
  nfsd: more careful input validation in nfsctl write methods
  lockd: minor log message fix
  knfsd: don't bother mapping putrootfh enoent to eperm
  rdma: makefile
  rdma: ONCRPC RDMA protocol marshalling
  rdma: SVCRDMA sendto
  rdma: SVCRDMA recvfrom
  rdma: SVCRDMA Core Transport Services
  rdma: SVCRDMA Transport Module
  rdma: SVCRMDA Header File
  svc: Add svc_xprt_names service to replace svc_sock_names
  knfsd: Support adding transports by writing portlist file
  svc: Add svc API that queries for a transport instance
  svc: Add /proc/sys/sunrpc/transport files
  svc: Add transport hdr size for defer/revisit
  svc: Move the xprt independent code to the svc_xprt.c file
  ...
2008-02-02 14:31:28 +11:00
Chuck Lever
ea339d46b9 SUNRPC: RPC program information is stored in unsigned integers
Clean up: When looping over RPC version and procedure numbers, use
unsigned index variables.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 17:01:31 -05:00
Trond Myklebust
d2f7e79e3b SUNRPC: Move exported symbol definitions after function declaration part 2
Do it for the server code...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 17:01:24 -05:00
Jeff Layton
0113ab3464 SUNRPC: spin svc_rqst initialization to its own function
Move the initialzation in __svc_create_thread that happens prior to
thread creation to a new function. Export the function to allow
services to have better control over the svc_rqst structs.

Also rearrange the rqstp initialization to prevent NULL pointer
dereferences in svc_exit_thread in case allocations fail.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:15 -05:00
Tom Tucker
4b8449af75 rdma: makefile
Add the svcrdma module to the xprtrdma makefile.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:14 -05:00
Tom Tucker
ef1eac0a3f rdma: ONCRPC RDMA protocol marshalling
This logic parses the ONCRDMA protocol headers that
precede the actual RPC header. It is placed in a separate
file to keep all protocol aware code in a single place.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:14 -05:00
Tom Tucker
c06b540a54 rdma: SVCRDMA sendto
This file implements the RDMA transport sendto function. A RPC reply
on an RDMA transport consists of some number of RDMA_WRITE requests
followed by an RDMA_SEND request. The sendto function parses the
ONCRPC RDMA reply header to determine how to send the reply back to
the client. The send queue is sized so as to be able to send complete
replies for requests in most cases.  In the event that there are not
enough SQ WR slots to reply, e.g.  big data, the send will block the
NFSD thread. The I/O callback functions in svc_rdma_transport.c that
reap WR completions wake any waiters blocked on the SQ. In general,
the goal is not to block NFSD threads and the has_wspace method
stall requests when the SQ is nearly full.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:14 -05:00
Tom Tucker
d5b31be682 rdma: SVCRDMA recvfrom
This file implements the RDMA transport recvfrom function. The function
dequeues work reqeust completion contexts from an I/O list that it shares
with the I/O tasklet in svc_rdma_transport.c. For ONCRPC RDMA, an RPC may
not be complete when it is received. Instead, the RDMA header that precedes
the RPC message informs the transport where to get the RPC data from on
the client and where to place it in the RPC message before it is delivered
to the server. The svc_rdma_recvfrom function therefore, parses this RDMA
header and issues any necessary RDMA operations to fetch the remainder of
the RPC from the client.

Special handling is required when the request involves an RDMA_READ.
In this case, recvfrom submits the RDMA_READ requests to the underlying
transport driver and then returns 0. When the transport
completes the last RDMA_READ for the request, it enqueues it on a
read completion queue and enqueues the transport. The recvfrom code
favors this queue over the regular DTO queue when satisfying reads.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:14 -05:00
Tom Tucker
377f9b2f45 rdma: SVCRDMA Core Transport Services
This file implements the core transport data management and I/O
path. The I/O path for RDMA involves receiving callbacks on interrupt
context. Since all the svc transport locks are _bh locks we enqueue the
transport on a list, schedule a tasklet to dequeue data indications from
the RDMA completion queue. The tasklet in turn takes _bh locks to
enqueue receive data indications on a list for the transport. The
svc_rdma_recvfrom transport function dequeues data from this list in an
NFSD thread context.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:14 -05:00
Tom Tucker
ef7fbf59e6 rdma: SVCRDMA Transport Module
This file implements the RDMA transport module initialization and
termination logic and registers the transport sysctl variables.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:14 -05:00
Tom Tucker
9571af18fa svc: Add svc_xprt_names service to replace svc_sock_names
Create a transport independent version of the svc_sock_names function.

The toclose capability of the svc_sock_names service can be implemented
using the svc_xprt_find and svc_xprt_close services.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:14 -05:00
Tom Tucker
a217813f90 knfsd: Support adding transports by writing portlist file
Update the write handler for the portlist file to allow creating new
listening endpoints on a transport. The general form of the string is:

<transport_name><space><port number>

For example:

echo "tcp 2049" > /proc/fs/nfsd/portlist

This is intended to support the creation of a listening endpoint for
RDMA transports without adding #ifdef code to the nfssvc.c file.

Transports can also be removed as follows:

'-'<transport_name><space><port number>

For example:

echo "-tcp 2049" > /proc/fs/nfsd/portlist

Attempting to add a listener with an invalid transport string results
in EPROTONOSUPPORT and a perror string of "Protocol not supported".

Attempting to remove an non-existent listener (.e.g. bad proto or port)
results in ENOTCONN and a perror string of
"Transport endpoint is not connected"

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:13 -05:00
Tom Tucker
7fcb98d58c svc: Add svc API that queries for a transport instance
Add a new svc function that allows a service to query whether a
transport instance has already been created. This is used in lockd
to determine whether or not a transport needs to be created when
a lockd instance is brought up.

Specifying 0 for the address family or port is effectively a wild-card,
and will result in matching the first transport in the service's list
that has a matching class name.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:13 -05:00
Tom Tucker
dc9a16e49d svc: Add /proc/sys/sunrpc/transport files
Add a file that when read lists the set of registered svc
transports.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:13 -05:00
Tom Tucker
260c1d1298 svc: Add transport hdr size for defer/revisit
Some transports have a header in front of the RPC header. The current
defer/revisit processing considers only the iov_len and arg_len to
determine how much to back up when saving the original request
to revisit. Add a field to the rqstp structure to save the size
of the transport header so svc_defer can correctly compute
the start of a request.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:13 -05:00
Tom Tucker
0f0257eaa5 svc: Move the xprt independent code to the svc_xprt.c file
This functionally trivial patch moves all of the transport independent
functions from the svcsock.c file to the transport independent svc_xprt.c
file.

In addition the following formatting changes were made:
- White space cleanup
- Function signatures on single line
- The inline directive was removed
- Lines over 80 columns were reformatted
- The term 'socket' was changed to 'transport' in comments
- The SMP comment was moved and updated.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:13 -05:00
Tom Tucker
18d19f949d svc: Make svc_check_conn_limits xprt independent
The svc_check_conn_limits function only manipulates xprt fields. Change references
to svc_sock->sk_xprt to svc_xprt directly.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:13 -05:00
Tom Tucker
57b1d3baba svc: Removing remaining references to rq_sock in rqstp
This functionally empty patch removes rq_sock and unamed union
from rqstp structure.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:13 -05:00
Tom Tucker
4e5caaa5f2 svc: Move create logic to common code
Move the svc transport list logic into common transport creation code.
Refactor this code path to make the flow of control easier to read.

Move the setting and clearing of the BUSY_BIT during transport creation
to common code.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:13 -05:00
Tom Tucker
9f8bfae693 svc: Make svc_age_temp_sockets svc_age_temp_transports
This function is transport independent. Change it to use svc_xprt directly
and change it's name to reflect this.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:13 -05:00
Tom Tucker
c36adb2a7f svc: Make svc_recv transport neutral
All of the transport field and functions used by svc_recv are now
transport independent. Change the svc_recv function to use the svc_xprt
structure directly instead of the transport specific svc_sock structure.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:12 -05:00
Tom Tucker
eab996d4ac svc: Make svc_sock_release svc_xprt_release
The svc_sock_release function only touches transport independent fields.
Change the function to manipulate svc_xprt directly instead of the transport
dependent svc_sock structure.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:12 -05:00
Tom Tucker
9dbc240f19 svc: Move the sockaddr information to svc_xprt
This patch moves the transport sockaddr to the svc_xprt
structure.  Convenience functions are added to set and
get the local and remote addresses of a transport from
the transport provider as well as determine the length
of a sockaddr.

A transport is responsible for setting the xpt_local
and xpt_remote addresses in the svc_xprt structure as
part of transport creation and xpo_accept processing. This
cannot be done in a generic way and in fact varies
between TCP, UDP and RDMA. A set of xpo_ functions
(e.g. getlocalname, getremotename) could have been
added but this would have resulted in additional
caching and copying of the addresses around.  Note that
the xpt_local address should also be set on listening
endpoints; for TCP/RDMA this is done as part of
endpoint creation.

For connected transports like TCP and RDMA, the addresses
never change and can be set once and copied into the
rqstp structure for each request. For UDP, however, the
local and remote addresses may change for each request. In
this case, the address information is obtained from the
UDP recvmsg info and copied into the rqstp structure from
there.

A svc_xprt_local_port function was also added that returns
the local port given a transport. This is used by
svc_create_xprt when returning the port associated with
a newly created transport, and later when creating a
generic find transport service to check if a service is
already listening on a given port.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:12 -05:00
Tom Tucker
8c7b0172a1 svc: Make deferral processing xprt independent
This patch moves the transport independent sk_deferred list to the svc_xprt
structure and updates the svc_deferred_req structure to keep pointers to
svc_xprt's directly. The deferral processing code is also moved out of the
transport dependent recvfrom functions and into the generic svc_recv path.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:12 -05:00
Tom Tucker
def13d7401 svc: Move the authinfo cache to svc_xprt.
Move the authinfo cache to svc_xprt. This allows both the TCP and RDMA
transports to share this logic. A flag bit is used to determine if
auth information is to be cached or not. Previously, this code looked
at the transport protocol.

I've also changed the spin_lock/unlock logic so that a lock is not taken for
transports that are not caching auth info.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:12 -05:00
Tom Tucker
4bc6c497b2 svc: Remove sk_lastrecv
With the implementation of the new mark and sweep algorithm for shutting
down old connections, the sk_lastrecv field is no longer needed.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:12 -05:00
Tom Tucker
6bc5ab1367 svc: Move accept call to svc_xprt_received to common code
Now that the svc_xprt_received function handles transports, the call
to svc_xprt_received in the xpo_tcp_accept function can be moved to
common code.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:12 -05:00
Tom Tucker
a6046f71f2 svc: Change svc_sock_received to svc_xprt_received and export it
All fields touched by svc_sock_received are now transport independent.
Change it to use svc_xprt directly. This function is called from
transport dependent code, so export it.

Update the comment to clearly state the rules for calling this function.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:12 -05:00
Tom Tucker
a50fea26b9 svc: Make svc_send transport neutral
Move the sk_mutex field to the transport independent svc_xprt structure.
Now all the fields that svc_send touches are transport neutral. Change the
svc_send function to use the transport independent svc_xprt directly instead
of the transport dependent svc_sock structure.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:11 -05:00
Tom Tucker
f6150c3cab svc: Make the enqueue service transport neutral and export it.
The svc_sock_enqueue function is now transport independent since all of
the fields it touches have been moved to the transport independent svc_xprt
structure. Change the function to use the svc_xprt structure directly
instead of the transport specific svc_sock structure.

Transport specific data-ready handlers need to call this function, so
export it.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:11 -05:00
Tom Tucker
7a90e8cc21 svc: Move sk_reserved to svc_xprt
This functionally trivial patch moves the sk_reserved field to the
transport independent svc_xprt structure.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:11 -05:00
Tom Tucker
7a18208383 svc: Make close transport independent
Move sk_list and sk_ready to svc_xprt. This involves close because these
lists are walked by svcs when closing all their transports. So I combined
the moving of these lists to svc_xprt with making close transport independent.

The svc_force_sock_close has been changed to svc_close_all and takes a list
as an argument. This removes some svc internals knowledge from the svcs.

This code races with module removal and transport addition.

Thanks to Simon Holm Thøgersen for a compile fix.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Simon Holm Thøgersen <odie@cs.aau.dk>
2008-02-01 16:42:11 -05:00
Tom Tucker
bb5cf160b2 svc: Move sk_server and sk_pool to svc_xprt
This is another incremental change that moves transport independent
fields from svc_sock to the svc_xprt structure. The changes
should be functionally null.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:11 -05:00
Tom Tucker
02fc6c3618 svc: Move sk_flags to the svc_xprt structure
This functionally trivial change moves the transport independent sk_flags
field to the transport independent svc_xprt structure.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:11 -05:00
Tom Tucker
e1b3157f97 svc: Change sk_inuse to a kref
Change the atomic_t reference count to a kref and move it to the
transport indepenent svc_xprt structure. Change the reference count
wrapper names to be generic.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:11 -05:00
Tom Tucker
d7c9f1ed97 svc: Change services to use new svc_create_xprt service
Modify the various kernel RPC svcs to use the svc_create_xprt service.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:09 -05:00
Tom Tucker
b700cbb11f svc: Add a generic transport svc_create_xprt function
The svc_create_xprt function is a transport independent version
of the svc_makesock function.

Since transport instance creation contains transport dependent and
independent components, add an xpo_create transport function. The
transport implementation of this function allocates the memory for the
endpoint, implements the transport dependent initialization logic, and
calls svc_xprt_init to initialize the transport independent field (svc_xprt)
in it's data structure.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:09 -05:00
Tom Tucker
f9f3cc4fae svc: Move connection limit checking to its own function
Move the code that poaches connections when the connection limit is hit
to a subroutine to make the accept logic path easier to follow. Since this
is in the new connection path, it should not be a performance issue.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:09 -05:00
Tom Tucker
44a6995b32 svc: Remove unnecessary call to svc_sock_enqueue
The svc_tcp_accept function calls svc_sock_enqueue after setting the
SK_CONN bit. This doesn't actually do anything because the SK_BUSY bit
is still set. The call is unnecessary anyway because the generic code in
svc_recv calls svc_sock_received after calling the accept function.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:09 -05:00
Tom Tucker
38a417cc99 svc: Add xpo_accept transport function
Previously, the accept logic looked into the socket state to determine
whether to call accept or recv when data-ready was indicated on an endpoint.
Since some transports don't use sockets, this logic now uses a flag
bit (SK_LISTENER) to identify listening endpoints. A transport function
(xpo_accept) allows each transport to define its own accept processing.
A transport's initialization logic is reponsible for setting the
SK_LISTENER bit. I didn't see any way to do this in transport independent
logic since the passive side of a UDP connection doesn't listen and
always recv's.

In the svc_recv function, if the SK_LISTENER bit is set, the transport
xpo_accept function is called to handle accept processing.

Note that all functions are defined even if they don't make sense
for a given transport. For example, accept doesn't mean anything for
UDP. The function is defined anyway and bug checks if called. The
UDP transport should never set the SK_LISTENER bit.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:08 -05:00
Tom Tucker
d7979ae4a0 svc: Move close processing to a single place
Close handling was duplicated in the UDP and TCP recvfrom
methods. This code has been moved to the transport independent
svc_recv function.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:08 -05:00
Tom Tucker
323bee32e9 svc: Add a transport function that checks for write space
In order to avoid blocking a service thread, the receive side checks
to see if there is sufficient write space to reply to the request.
Each transport has a different mechanism for determining if there is
enough write space to reply.

The code that checked for write space was coupled with code that
checked for CLOSE and CONN. These checks have been broken out into
separate statements to make the code easier to read.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:08 -05:00
Tom Tucker
e831fe65b1 svc: Add xpo_prep_reply_hdr
Some transports add fields to the RPC header for replies, e.g. the TCP
record length. This function is called when preparing the reply header
to allow each transport to add whatever fields it requires.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:08 -05:00
Tom Tucker
755cceaba7 svc: Add per-transport delete functions
Add transport specific xpo_detach and xpo_free functions. The xpo_detach
function causes the transport to stop delivering data-ready events
and enqueing the transport for I/O.

The xpo_free function frees all resources associated with the particular
transport instance.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:08 -05:00
Tom Tucker
5148bf4ebc svc: Add transport specific xpo_release function
The svc_sock_release function releases pages allocated to a thread. For
UDP this frees the receive skb. For RDMA it will post a receive WR
and bump the client credit count.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:08 -05:00
Tom Tucker
5d137990f5 svc: Move sk_sendto and sk_recvfrom to svc_xprt_class
The sk_sendto and sk_recvfrom are function pointers that allow svc_sock
to be used for both UDP and TCP. Move these function pointers to the
svc_xprt_ops structure.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:08 -05:00
Tom Tucker
490231558e svc: Add a max payload value to the transport
The svc_max_payload function currently looks at the socket type
to determine the max payload. Add a max payload value to svc_xprt_class
so it can be returned directly.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:08 -05:00
Tom Tucker
360d873864 svc: Make svc_sock the tcp/udp transport
Make TCP and UDP svc_sock transports, and register them
with the svc transport core.

A transport type (svc_sock) has an svc_xprt as its first member,
and calls svc_xprt_init to initialize this field.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:07 -05:00
Tom Tucker
1d8206b97a svc: Add an svc transport class
The transport class (svc_xprt_class) represents a type of transport, e.g.
udp, tcp, rdma.  A transport class has a unique name and a set of transport
operations kept in the svc_xprt_ops structure.

A transport class can be dynamically registered and unregisterd. The
svc_xprt_class represents the module that implements the transport
type and keeps reference counts on the module to avoid unloading while
there are active users.

The endpoint (svc_xprt) is a generic, transport independent endpoint that can
be used to send and receive data for an RPC service. It inherits it's
operations from the transport class.

A transport driver module registers and unregisters itself with svc sunrpc
by calling svc_reg_xprt_class, and svc_unreg_xprt_class respectively.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:07 -05:00
J. Bruce Fields
cb5c7d668e svcrpc: ensure gss DESTROY tokens free contexts from cache
If we don't do this then we'll end up with a pointless unusable context
sitting in the cache until the time the original context would have
expired.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:07 -05:00
J. Bruce Fields
b39c18fce0 sunrpc: gss: simplify rsi_parse logic
Make an obvious simplification that removes a few lines and some
unnecessary indentation; no change in behavior.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:07 -05:00
J. Bruce Fields
980e5a40a4 nfsd: fix rsi_cache reference count leak
For some reason we haven't been put()'ing the reference count here.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:07 -05:00
J. Bruce Fields
dbf847ecb6 knfsd: allow cache_register to return error on failure
Newer server features such as nfsv4 and gss depend on proc to work, so a
failure to initialize the proc files they need should be treated as
fatal.

Thanks to Andrew Morton for style fix and compile fix in case where
CONFIG_NFSD_V4 is undefined.

Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:05 -05:00
J. Bruce Fields
ffe9386b6e nfsd: move cache proc (un)registration to separate function
Just some minor cleanup.

Also I don't see much point in trying to register further proc entries
if initial entries fail; so just stop trying in that case.

Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:04 -05:00
J. Bruce Fields
df95a9d4fb knfsd: cache unregistration needn't return error
There's really nothing much the caller can do if cache unregistration
fails.  And indeed, all any caller does in this case is print an error
and continue.  So just return void and move the printk's inside
cache_unregister.

Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:04 -05:00
J. Bruce Fields
a490c681cb knfsd: fix cache.c comment
The path here must be left over from some earlier draft; fix it.  And do
some more minor cleanup while we're there.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:03 -05:00
Chuck Lever
e5cff482c7 SUNRPC: Use unsigned string lengths in xdr_decode_string_inplace
XDR strings, opaques, and net objects should all use unsigned lengths.
To wit, RFC 4506 says:

4.2.  Unsigned Integer

   An XDR unsigned integer is a 32-bit datum that encodes a non-negative
   integer in the range [0,4294967295].

 ...

4.11.  String

   The standard defines a string of n (numbered 0 through n-1) ASCII
   bytes to be the number n encoded as an unsigned integer (as described
   above), and followed by the n bytes of the string.

After this patch, xdr_decode_string_inplace now matches the other XDR
string and array helpers that take a string length argument.  See:

xdr_encode_opaque_fixed, xdr_encode_opaque, xdr_encode_array

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:02 -05:00
Chuck Lever
01b2969a85 SUNRPC: Prevent length underflow in read_flush()
Make sure we compare an unsigned length to an unsigned count in
read_flush().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:02 -05:00
Johannes Berg
3eadf5f4f6 mac80211: fix initialisation error path
The error handling in ieee80211_init() is broken when any of
the built-in rate control algorithms fail to initialise, fix
it and rename the error labels.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-01 16:15:40 -05:00
Johannes Berg
f0b9205cfb mac80211 rate control: fix section mismatch
When the rate control algorithms are built-in, their exit
functions can be called from mac80211's init function so
they cannot be marked __exit.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-01 16:14:30 -05:00
Johannes Berg
6feeb8aad7 mac80211: make alignment warning optional
Driver authors should be aware of the alignment requirements, but
not everybody cares about the warning. This patch makes it depend
on a new Kconfig symbol MAC80211_DEBUG_PACKET_ALIGNMENT which can
be enabled regardless of MAC80211_DEBUG and is recommended for
driver authors (only). This also restricts the warning to data
packets so other packets need not be realigned to not trigger the
warning.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-01 16:12:24 -05:00
Klaus Heinrich Kiwi
7759db8277 [AUDIT] Add uid, gid fields to ANOM_PROMISCUOUS message
Changes the ANOM_PROMISCUOUS message to include uid and gid fields,
making it consistent with other AUDIT_ANOM_ messages and in the
format the userspace is expecting.

Signed-off-by: Klaus Heinrich Kiwi <klausk@br.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
2008-02-01 14:25:10 -05:00
Eric Paris
4746ec5b01 [AUDIT] add session id to audit messages
In order to correlate audit records to an individual login add a session
id.  This is incremented every time a user logs in and is included in
almost all messages which currently output the auid.  The field is
labeled ses=  or oses=

Signed-off-by: Eric Paris <eparis@redhat.com>
2008-02-01 14:06:51 -05:00
Al Viro
0c11b9428f [PATCH] switch audit_get_loginuid() to task_struct *
all callers pass something->audit_context

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-02-01 14:04:59 -05:00
Denis V. Lunev
4814bdbd59 [NETNS]: Lookup in FIB semantic hashes taking into account the namespace.
The namespace is not available in the fib_sync_down_addr, add it as a
parameter.

Looking up a device by the pointer to it is OK. Looking up using a
result from fib_trie/fib_hash table lookup is also safe. No need to
fix that at all.  So, just fix lookup by address and insertion to the
hash table path.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:41 -08:00
Denis V. Lunev
7462bd744e [NETNS]: Add a namespace mark to fib_info.
This is required to make fib_info lookups namespace aware. In the
other case initial namespace devices are marked as dead in the local
routing table during other namespace stop.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:40 -08:00
Denis V. Lunev
85326fa54b [IPV4]: fib_sync_down rework.
fib_sync_down can be called with an address and with a device. In
reality it is called either with address OR with a device. The
codepath inside is completely different, so lets separate it into two
calls for these two cases.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:39 -08:00
Denis V. Lunev
4b8aa9abee [NETNS]: Process interface address manipulation routines in the namespace.
The namespace is available when required except rtm_to_ifaddr. Add
namespace argument to it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:39 -08:00
Denis V. Lunev
7b2185747c [IPV4]: Small style cleanup of the error path in rtm_to_ifaddr.
Remove error code assignment inside brackets on failure. The code
looks better if the error is assigned before condition check. Also,
the compiler treats this better.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:38 -08:00
Denis V. Lunev
dce5cbeec3 [IPV4]: Fix memory leak on error path during FIB initialization.
net->ipv4.fib_table_hash is not freed when fib4_rules_init failed.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:37 -08:00
Pavel Emelyanov
3ed5df445e [NETFILTER]: Ipv6-related xt_hashlimit compilation fix.
The hashlimit_ipv6_mask() is called from under IP6_NF_IPTABLES config
option, but is not under it by itself.

gcc warns us about it :) :
net/netfilter/xt_hashlimit.c:473: warning: "hashlimit_ipv6_mask" defined but not used

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:36 -08:00
Patrick McHardy
e5dfb81518 [NET_SCHED]: Add flow classifier
Add new "flow" classifier, which is meant to extend the SFQ hashing
capabilities without hard-coding new hash functions and also allows
deterministic mappings of keys to classes, replacing some out of tree
iptables patches like IPCLASSIFY (maps IPs to classes), IPMARK (maps
IPs to marks, with fw filters to classes), ...

Some examples:

- Classic SFQ hash:

  tc filter add ... flow hash \
  	keys src,dst,proto,proto-src,proto-dst divisor 1024

- Classic SFQ hash, but using information from conntrack to work properly in
  combination with NAT:

  tc filter add ... flow hash \
  	keys nfct-src,nfct-dst,proto,nfct-proto-src,nfct-proto-dst divisor 1024

- Map destination IPs of 192.168.0.0/24 to classids 1-257:

  tc filter add ... flow map \
  	key dst addend -192.168.0.0 divisor 256

- alternatively:

  tc filter add ... flow map \
  	key dst and 0xff

- similar, but reverse ordered:

  tc filter add ... flow map \
  	key dst and 0xff xor 0xff

Perturbation is currently not supported because we can't reliable kill the
timer on destruction.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:36 -08:00
Patrick McHardy
94de78d195 [NET_SCHED]: sch_sfq: make internal queues visible as classes
Add support for dumping statistics and make internal queues visible as
classes.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:35 -08:00
Patrick McHardy
7d2681a6ff [NET_SCHED]: sch_sfq: add support for external classifiers
Add support for external classifiers to allow using different flow
hash functions similar to ESFQ. When no classifier is attached the
built-in hash is used as before.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:34 -08:00
Patrick McHardy
5239008b0d [NET_SCHED]: Constify struct tcf_ext_map
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:34 -08:00
Dave Young
5396c9356e [BLUETOOTH]: Fix bugs in previous conn add/del workqueue changes.
Jens Axboe noticed that we were queueing &conn->work on both btaddconn
and keventd_wq.

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:33 -08:00
Adrian Bunk
30a50cc566 [TCP]: Unexport sysctl_tcp_tso_win_divisor
This patch removes the no longer used
EXPORT_SYMBOL(sysctl_tcp_tso_win_divisor).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:32 -08:00
Adrian Bunk
0027ba8434 [IPV4]: Make struct ipv4_devconf static.
struct ipv4_devconf can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:31 -08:00
Adrian Bunk
81a21eb4ec [TR] net/802/tr.c: sysctl_tr_rif_timeout static
sysctl_tr_rif_timeout can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:31 -08:00
Masahide NAKAMURA
9472c9ef64 [XFRM]: Fix statistics.
o Outbound sequence number overflow error status
  is counted as XfrmOutStateSeqError.
o Additionaly, it changes inbound sequence number replay
  error name from XfrmInSeqOutOfWindow to XfrmInStateSeqError
  to apply name scheme above.
o Inbound IPv4 UDP encapsuling type mismatch error is wrongly
  mapped to XfrmInStateInvalid then this patch fiex the error
  to XfrmInStateMismatch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:30 -08:00
Adrian Bunk
5255dc6e14 [XFRM]: Remove unused exports.
This patch removes the following no longer used EXPORT_SYMBOL's:
- xfrm_input.c: xfrm_parse_spi
- xfrm_state.c: xfrm_replay_check
- xfrm_state.c: xfrm_replay_advance

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:29 -08:00
Roel Kluin
cc8fd14dca [PKT_SCHED] sch_teql.c: Duplicate IFF_BROADCAST in FMASK, remove 2nd.
Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:29 -08:00
Eric Dumazet
29e75252da [IPV4] route cache: Introduce rt_genid for smooth cache invalidation
Current ip route cache implementation is not suited to large caches.

We can consume a lot of CPU when cache must be invalidated, since we
currently need to evict all cache entries, and this eviction is
sometimes asynchronous. min_delay & max_delay can somewhat control this
asynchronism behavior, but whole thing is a kludge, regularly triggering
infamous soft lockup messages. When entries are still in use, this also
consumes a lot of ram, filling dst_garbage.list.

A better scheme is to use a generation identifier on each entry,
so that cache invalidation can be performed by changing the table
identifier, without having to scan all entries.
No more delayed flushing, no more stalling when secret_interval expires.

Invalidated entries will then be freed at GC time (controled by
ip_rt_gc_timeout or stress), or when an invalidated entry is found
in a chain when an insert is done.
Thus we keep a normal equilibrium.

This patch :
- renames rt_hash_rnd to rt_genid (and makes it an atomic_t)
- Adds a new rt_genid field to 'struct rtable' (filling a hole on 64bit)
- Checks entry->rt_genid at appropriate places :
2008-01-31 19:28:27 -08:00
Jesse Brandeburg
174ce04831 [PKTGEN]: pktgen should not print info that it is spinning
when using pktgen to send delay packets the module prints repeatedly
to the kernel log:

sleeping for X
sleeping for X
...

This is probably just a debugging item left in and should not be
enabled for regular use of the module.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:26 -08:00
Patrick McHardy
72eb7bd269 [NET_SCHED]: sch_ingress: remove netfilter support
Since the old policer code is gone, TC actions are needed for policing.
The ingress qdisc can get packets directly from netif_receive_skb()
in case TC actions are enabled or through netfilter otherwise, but
since without TC actions there is no policer the only thing it actually
does is count packets.

Remove the netfilter support and always require TC actions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:25 -08:00
Chris Leech
e83a2ea850 [VLAN]: set_rx_mode support for unicast address list
Reuse the existing logic for multicast list synchronization for the
unicast address list. The core of dev_mc_sync/unsync are split out as
__dev_addr_sync/unsync and moved from dev_mcast.c to dev.c.  These are
then used to implement dev_unicast_sync/unsync as well.

I'm working on cleaning up Intel's FCoE stack, which generates new MAC
addresses from the fibre channel device id assigned by the fabric as
per the current draft specification in T11.  When using such a
protocol in a VLAN environment it would be nice to not always be
forced into promiscuous mode, assuming the underlying Ethernet driver
supports multiple unicast addresses as well.

Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-01-31 19:28:24 -08:00
Shan Wei
16ca3f9130 [TCP]: Fix a bug in strategy_allowed_congestion_control
In strategy_allowed_congestion_control of the 2.6.24 kernel, when
sysctl_string return 1 on success,it should call
tcp_set_allowed_congestion_control to set the allowed congestion
control.But, it don't.  the sysctl_string return 1 on success,
otherwise return negative, never return 0.The patch fix the problem.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:23 -08:00
Stephen Hemminger
71d67e666e [IPV4] fib_trie: rescan if key is lost during dump
Normally during a dump the key of the last dumped entry is used for
continuation, but since lock is dropped it might be lost. In that case
fallback to the old counter based N^2 behaviour.  This means the dump
will end up skipping some routes which matches what FIB_HASH does.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:23 -08:00
Rami Rosen
9fe7c712fc [PKTGEN]: Remove an unused definition in pktgen.c.
- Remove an unused definition (LAT_BUCKETS_MAX) in net/core/pktgen.c.
- Remove the corresponding comment.
- The LAT_BUCKETS_MAX seems to have to do with a patch from a long
time ago which was not applied (Ben Greear), which dealt with latency
counters.

See, for example : http://oss.sgi.com/archives/netdev/2002-09/msg00184.html

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:22 -08:00
Jim Paris
23717795be [IPV6]: Update MSS even if MTU is unchanged.
This is needed because in ndisc.c, we have:

  static void ndisc_router_discovery(struct sk_buff *skb)
  {
  // ...
  	if (ndopts.nd_opts_mtu) {
  // ...
  			if (rt)
  				rt->u.dst.metrics[RTAX_MTU-1] = mtu;

  			rt6_mtu_change(skb->dev, mtu);
  // ...
  }

Since the mtu is set directly here, rt6_mtu_change_route thinks that
it is unchanged, and so it fails to update the MSS accordingly.  This
patch lets rt6_mtu_change_route still update MSS if old_mtu == new_mtu.

Signed-off-by: Jim Paris <jim@jtan.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:21 -08:00
Pavel Emelyanov
fa4d3c6210 [NETNS]: Udp sockets per-net lookup.
Add the net parameter to udp_get_port family of calls and
udp_lookup one and use it to filter sockets.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:21 -08:00
Pavel Emelyanov
d86e0dac2c [NETNS]: Tcp-v6 sockets per-net lookup.
Add a net argument to inet6_lookup and propagate it further.
Actually, this is tcp-v6 implementation of what was done for
tcp-v4 sockets in a previous patch.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:20 -08:00
Pavel Emelyanov
c67499c0e7 [NETNS]: Tcp-v4 sockets per-net lookup.
Add a net argument to inet_lookup and propagate it further
into lookup calls. Plus tune the __inet_check_established.

The dccp and inet_diag, which use that lookup functions
pass the init_net into them.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:19 -08:00
Pavel Emelyanov
941b1d22cc [NETNS]: Make bind buckets live in net namespaces.
This tags the inet_bind_bucket struct with net pointer,
initializes it during creation and makes a filtering
during lookup.

A better hashfn, that takes the net into account is to
be done in the future, but currently all bind buckets
with similar port will be in one hash chain.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:18 -08:00
Pavel Emelyanov
5ee31fc1ec [INET]: Consolidate inet(6)_hash_connect.
These two functions are the same except for what they call
to "check_established" and "hash" for a socket.

This saves half-a-kilo for ipv4 and ipv6.

 add/remove: 1/0 grow/shrink: 1/4 up/down: 582/-1128 (-546)
 function                                     old     new   delta
 __inet_hash_connect                            -     577    +577
 arp_ignore                                   108     113      +5
 static.hint                                    8       4      -4
 rt_worker_func                               376     372      -4
 inet6_hash_connect                           584      25    -559
 inet_hash_connect                            586      25    -561

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:17 -08:00
Pavel Emelyanov
535174efbe [IPV6]: Introduce the INET6_TW_MATCH macro.
We have INET_MATCH, INET_TW_MATCH and INET6_MATCH to test sockets and
twbuckets for matching, but ipv6 twbuckets are tested manually.

Here's the INET6_TW_MATCH to help with it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:17 -08:00
Patrick McHardy
22e0e62cd0 [NETFILTER]: xt_iprange: fix sparse warnings
CHECK   net/netfilter/xt_iprange.c
net/netfilter/xt_iprange.c:104:19: warning: restricted degrades to integer
net/netfilter/xt_iprange.c:104:37: warning: restricted degrades to integer
net/netfilter/xt_iprange.c:104:19: warning: restricted degrades to integer
net/netfilter/xt_iprange.c:104:37: warning: restricted degrades to integer
net/netfilter/xt_iprange.c:104:19: warning: restricted degrades to integer
net/netfilter/xt_iprange.c:104:37: warning: restricted degrades to integer
net/netfilter/xt_iprange.c:104:19: warning: restricted degrades to integer
net/netfilter/xt_iprange.c:104:37: warning: restricted degrades to integer

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:16 -08:00
Patrick McHardy
969d71089f [NETFILTER]: nf_nat: fix sparse warning
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:15 -08:00
Patrick McHardy
9e232495de [NETFILTER]: nf_conntrack: fix sparse warning
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:15 -08:00
Patrick McHardy
c392a74018 [NETFILTER]: {ip,ip6}_queue: fix build error
Reported by Ingo Molnar:

 net/built-in.o: In function `ip_queue_init':
 ip_queue.c:(.init.text+0x322c): undefined reference to `net_ipv4_ctl_path'

Fix the build error and also handle CONFIG_PROC_FS=n properly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:14 -08:00
Jan Engelhardt
32948588ac [NETFILTER]: nf_conntrack: annotate l3protos with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:13 -08:00
Jan Engelhardt
7cc3864d39 [NETFILTER]: nf_{conntrack,nat}_icmp: constify and annotate
Constify a few data tables use const qualifiers on variables where
possible in the nf_conntrack_icmp* sources.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:12 -08:00
Jan Engelhardt
dc35dc5a4c [NETFILTER]: nf_{conntrack,nat}_proto_gre: annotate with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:12 -08:00
Jan Engelhardt
da3f13c95a [NETFILTER]: nf_{conntrack,nat}_proto_udp{,lite}: annotate with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:11 -08:00
Jan Engelhardt
82f568fc2f [NETFILTER]: nf_{conntrack,nat}_proto_tcp: constify and annotate TCP modules
Constify a few data tables use const qualifiers on variables where
possible in the nf_*_proto_tcp sources.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:10 -08:00
Jan Engelhardt
02e23f4057 [NETFILTER]: nf_conntrack_sane: annotate SANE helper with const
Annotate nf_conntrack_sane variables with const qualifier and remove
a few casts.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:10 -08:00
Jan Engelhardt
9ddd0ed050 [NETFILTER]: nf_{conntrack,nat}_pptp: annotate PPtP helper with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:09 -08:00
Jan Engelhardt
de24b4ebb8 [NETFILTER]: nf_{conntrack,nat}_tftp: annotate TFTP helper with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:08 -08:00
Jan Engelhardt
13f7d63c29 [NETFILTER]: nf_{conntrack,nat}_sip: annotate SIP helper with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:08 -08:00
Jan Engelhardt
905e3e8ec5 [NETFILTER]: nf_conntrack_h323: constify and annotate H.323 helper
Constify data tables (predominantly in nf_conntrack_h323_types.c, but
also a few in nf_conntrack_h323_asn1.c) and use const qualifiers on
variables where possible in the h323 sources.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:07 -08:00
Alexey Dobriyan
3cb609d57c [NETFILTER]: x_tables: create per-netns /proc/net/*_tables_*
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:06 -08:00
Alexey Dobriyan
715cf35ac9 [NETFILTER]: x_tables: netns propagation for /proc/net/*_tables_names
Propagate netns together with AF down to ->start/->next/->stop
iterators. Choose table based on netns and AF for showing.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:05 -08:00
Alexey Dobriyan
025d93d148 [NETFILTER]: x_tables: semi-rewrite of /proc/net/foo_tables_*
There are many small but still wrong things with /proc/net/*_tables_*
so I decided to do overhaul simultaneously making it more suitable for
per-netns /proc/net/*_tables_* implementation.

Fix
a) xt_get_idx() duplicating now standard seq_list_start/seq_list_next
   iterators
b) tables/matches/targets list was chosen again and again on every ->next
c) multiple useless "af >= NPROTO" checks -- we simple don't supply invalid
   AFs there and registration function should BUG_ON instead.

   Regardless, the one in ->next() is the most useless -- ->next doesn't
   run at all if ->start fails.
d) Don't use mutex_lock_interruptible() -- it can fail and ->stop is
   executed even if ->start failed, so unlock without lock is possible.

As side effect, streamline code by splitting xt_tgt_ops into xt_target_ops,
xt_matches_ops, xt_tables_ops.

xt_tables_ops hooks will be changed by per-netns code. Code of
xt_matches_ops, xt_target_ops is identical except the list chosen for
iterating, but I think consolidating code for two files not worth it
given "<< 16" hacks needed for it.

[Patrick: removed unused enum in x_tables.c]

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:05 -08:00
Jan Engelhardt
09e410def6 [NETFILTER]: xt_hashlimit match, revision 1
Introduces the xt_hashlimit match revision 1. It adds support for
kernel-level inversion and grouping source and/or destination IP
addresses, allowing to limit on a per-subnet basis. While this would
technically obsolete xt_limit, xt_hashlimit is a more expensive due
to the hashbucketing.

Kernel-level inversion: Previously you had to do user-level inversion:

	iptables -N foo
	iptables -A foo -m hashlimit --hashlimit(-upto) 5/s -j RETURN
	iptables -A foo -j DROP
	iptables -A INPUT -j foo

now it is simpler:

	iptables -A INPUT -m hashlimit --hashlimit-over 5/s -j DROP

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:04 -08:00
Ilpo Järvinen
d33b7c06bd [NETFILTER]: nf_conntrack: kill unused static inline (do_iter)
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:03 -08:00
Ilpo Järvinen
a38201e3c9 [NETFILTER]: ipt_CLUSTERIP: kill clusterip_config_entry_get
It's unused static inline.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:02 -08:00
Eric Leblond
a83099a60f [NETFILTER]: nf_conntrack_netlink: transmit mark during all events
The following feature was submitted some months ago. It forces the dump
of mark during the connection destruction event. The induced load is
quiet small and the patch is usefull to provide an easy way to filter
event on user side without having to keep an hash in userspace.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:02 -08:00
Jan Engelhardt
1f807d6eb3 [NETFILTER]: nf_conntrack_h323: clean up code a bit
-total: 81 errors, 3 warnings, 876 lines checked
+total: 44 errors, 3 warnings, 876 lines checked

There is still work to be done, but that's for another patch.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:01 -08:00
Patrick McHardy
02502f6224 [NETFILTER]: nf_nat: switch rwlock to spinlock
Since we're using RCU, all users of nf_nat_lock take a write_lock.
Switch it to a spinlock.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:00 -08:00
Patrick McHardy
4d354c5782 [NETFILTER]: nf_nat: use RCU for bysource hash
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:00 -08:00
Patrick McHardy
c88130bcd5 [NETFILTER]: nf_conntrack: naming unification
Rename all "conntrack" variables to "ct" for more consistency and
avoiding some overly long lines.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:59 -08:00
Patrick McHardy
76eb946040 [NETFILTER]: nf_conntrack: don't inline early_drop()
early_drop() is only called *very* rarely, unfortunately gcc inlines it
into the hotpath because there is only a single caller. Explicitly mark
it noinline.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:58 -08:00
Patrick McHardy
0794935e21 [NETFILTER]: nf_conntrack: optimize hash_conntrack()
Avoid calling jhash three times and hash the entire tuple in one go.

  __hash_conntrack | -485 # 760 -> 275, # inlines: 3 -> 1, size inlines: 717 -> 252
 1 function changed, 485 bytes removed

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:57 -08:00
Patrick McHardy
ba419aff2c [NETFILTER]: nf_conntrack: optimize __nf_conntrack_find()
Ignoring specific entries in __nf_conntrack_find() is only needed by NAT
for nf_conntrack_tuple_taken(). Remove it from __nf_conntrack_find()
and make nf_conntrack_tuple_taken() search the hash itself.

Saves 54 bytes of text in the hotpath on x86_64:

  __nf_conntrack_find      |  -54 # 321 -> 267, # inlines: 3 -> 2, size inlines: 181 -> 127
  nf_conntrack_tuple_taken | +305 # 15 -> 320, lexblocks: 0 -> 3, # inlines: 0 -> 3, size inlines: 0 -> 181
  nf_conntrack_find_get    |   -2 # 90 -> 88
 3 functions changed, 305 bytes added, 56 bytes removed, diff: +249

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:55 -08:00
Patrick McHardy
f8ba1affa1 [NETFILTER]: nf_conntrack: switch rwlock to spinlock
With the RCU conversion only write_lock usages of nf_conntrack_lock are
left (except one read_lock that should actually use write_lock in the
H.323 helper). Switch to a spinlock.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:54 -08:00
Patrick McHardy
76507f69c4 [NETFILTER]: nf_conntrack: use RCU for conntrack hash
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:54 -08:00
Patrick McHardy
7d0742da1c [NETFILTER]: nf_conntrack_expect: use RCU for expectation hash
Use RCU for expectation hash. This doesn't buy much for conntrack
runtime performance, but allows to reduce the use of nf_conntrack_lock
for /proc and nf_netlink_conntrack.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:53 -08:00
Patrick McHardy
c52fbb410b [NETFILTER]: nf_conntrack_core: avoid taking nf_conntrack_lock in nf_conntrack_alter_reply
The conntrack is unconfirmed, so we have an exclusive reference, which
means that the write_lock is definitely unneeded. A read_lock used to
be needed for the helper lookup, but since we're using RCU for helpers
now rcu_read_lock is enough.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:52 -08:00
Patrick McHardy
58a3c9bb0c [NETFILTER]: nf_conntrack: use RCU for conntrack helpers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:51 -08:00
Patrick McHardy
47d9504543 [NETFILTER]: nf_conntrack: fix accounting with fixed timeouts
Don't skip accounting for conntracks with the FIXED_TIMEOUT bit.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:51 -08:00
Patrick McHardy
1d670fdc8c [NETFILTER]: nf_conntrack_netlink: fix unbalanced locking
Properly drop nf_conntrack_lock on tuple parsing error.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:50 -08:00
Patrick McHardy
99fa5f5397 [NETFILTER]: nf_conntrack_ipv6: fix sparse warnings
CHECK   net/ipv6/netfilter/nf_conntrack_reasm.c
  net/ipv6/netfilter/nf_conntrack_reasm.c:77:18: warning: symbol 'nf_ct_ipv6_sysctl_table' was not declared. Should it be static?
  net/ipv6/netfilter/nf_conntrack_reasm.c:586:16: warning: symbol 'nf_ct_frag6_gather' was not declared. Should it be static?
  net/ipv6/netfilter/nf_conntrack_reasm.c:662:6: warning: symbol 'nf_ct_frag6_output' was not declared. Should it be static?
  net/ipv6/netfilter/nf_conntrack_reasm.c:683:5: warning: symbol 'nf_ct_frag6_kfree_frags' was not declared. Should it be static?
  net/ipv6/netfilter/nf_conntrack_reasm.c:698:5: warning: symbol 'nf_ct_frag6_init' was not declared. Should it be static?
  net/ipv6/netfilter/nf_conntrack_reasm.c:717:6: warning: symbol 'nf_ct_frag6_cleanup' was not declared. Should it be static?

Based on patch by Stephen Hemminger with suggestions by Yasuyuki KOZAKAI.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:49 -08:00
Patrick McHardy
b0a6363c24 [NETFILTER]: {ip,arp,ip6}_tables: fix sparse warnings in compat code
CHECK   net/ipv4/netfilter/ip_tables.c
net/ipv4/netfilter/ip_tables.c:1453:8: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1453:8:    expected int *size
net/ipv4/netfilter/ip_tables.c:1453:8:    got unsigned int [usertype] *size
net/ipv4/netfilter/ip_tables.c:1458:44: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1458:44:    expected int *size
net/ipv4/netfilter/ip_tables.c:1458:44:    got unsigned int [usertype] *size
net/ipv4/netfilter/ip_tables.c:1603:2: warning: incorrect type in argument 2 (different signedness)
net/ipv4/netfilter/ip_tables.c:1603:2:    expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1603:2:    got int *<noident>
net/ipv4/netfilter/ip_tables.c:1627:8: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1627:8:    expected int *size
net/ipv4/netfilter/ip_tables.c:1627:8:    got unsigned int *size
net/ipv4/netfilter/ip_tables.c:1634:40: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1634:40:    expected int *size
net/ipv4/netfilter/ip_tables.c:1634:40:    got unsigned int *size
net/ipv4/netfilter/ip_tables.c:1653:8: warning: incorrect type in argument 5 (different signedness)
net/ipv4/netfilter/ip_tables.c:1653:8:    expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1653:8:    got int *<noident>
net/ipv4/netfilter/ip_tables.c:1666:2: warning: incorrect type in argument 2 (different signedness)
net/ipv4/netfilter/ip_tables.c:1666:2:    expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1666:2:    got int *<noident>
  CHECK   net/ipv4/netfilter/arp_tables.c
net/ipv4/netfilter/arp_tables.c:1285:40: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/arp_tables.c:1285:40:    expected int *size
net/ipv4/netfilter/arp_tables.c:1285:40:    got unsigned int *size
net/ipv4/netfilter/arp_tables.c:1543:44: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/arp_tables.c:1543:44:    expected int *size
net/ipv4/netfilter/arp_tables.c:1543:44:    got unsigned int [usertype] *size
  CHECK   net/ipv6/netfilter/ip6_tables.c
net/ipv6/netfilter/ip6_tables.c:1481:8: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1481:8:    expected int *size
net/ipv6/netfilter/ip6_tables.c:1481:8:    got unsigned int [usertype] *size
net/ipv6/netfilter/ip6_tables.c:1486:44: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1486:44:    expected int *size
net/ipv6/netfilter/ip6_tables.c:1486:44:    got unsigned int [usertype] *size
net/ipv6/netfilter/ip6_tables.c:1631:2: warning: incorrect type in argument 2 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1631:2:    expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1631:2:    got int *<noident>
net/ipv6/netfilter/ip6_tables.c:1655:8: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1655:8:    expected int *size
net/ipv6/netfilter/ip6_tables.c:1655:8:    got unsigned int *size
net/ipv6/netfilter/ip6_tables.c:1662:40: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1662:40:    expected int *size
net/ipv6/netfilter/ip6_tables.c:1662:40:    got unsigned int *size
net/ipv6/netfilter/ip6_tables.c:1680:8: warning: incorrect type in argument 5 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1680:8:    expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1680:8:    got int *<noident>
net/ipv6/netfilter/ip6_tables.c:1693:2: warning: incorrect type in argument 2 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1693:2:    expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1693:2:    got int *<noident>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:49 -08:00
Patrick McHardy
855304af29 [NETFILTER]: ipt_recent: fix sparse warnings
net/ipv4/netfilter/ipt_recent.c:215:17: warning: symbol 't' shadows an earlier one
net/ipv4/netfilter/ipt_recent.c:179:22: originally declared here
net/ipv4/netfilter/ipt_recent.c:322:13: warning: context imbalance in 'recent_seq_start' - wrong count at exit
net/ipv4/netfilter/ipt_recent.c:354:13: warning: context imbalance in 'recent_seq_stop' - unexpected unlock

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:48 -08:00
Stephen Hemminger
dc64d02ba8 [NETFILTER]: nf_conntrack_h3223: sparse fixes
Sparse complains when a function is not really static. Putting static
on the function prototype is not enough.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:47 -08:00
Stephen Hemminger
f4f6fb714f [NETFILTER]: more sparse fixes
Some lock annotations, and make initializers static.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:46 -08:00
Stephen Hemminger
2f0d2f1039 [NETFILTER]: conntrack: get rid of sparse warnings
Teach sparse about locking here, and fix signed/unsigned warnings.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:46 -08:00
Stephen Hemminger
4e26fe2681 [NETFILTER]: nfnetlink_log: sparse warning fixes
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:45 -08:00
Stephen Hemminger
96eb24d770 [NETFILTER]: nf_conntrack: sparse warnings
The hashtable size is really unsigned so sparse complains when you pass
a signed integer.  Change all uses to make it consistent.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:44 -08:00
Stephen Hemminger
06aa10728e [NETFILTER]: nf_nat_snmp: sparse warning
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:44 -08:00
Jan Engelhardt
edc26f7aaa [NETFILTER]: xt_owner: allow matching UID/GID ranges
Add support for ranges to the new revision. This doesn't affect
compatibility since the new revision was not released yet.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:43 -08:00
Jan Engelhardt
37c08387fc [NETFILTER]: xt_TCPMSS: consider reverse route's MTU in clamp-to-pmtu
The TCPMSS target in Xtables should consider the MTU of the reverse
route on forwarded packets as part of the path MTU.

Point in case: IN=ppp0, OUT=eth0. MSS set to 1460 in spite of MTU of
ppp0 being 1392.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:42 -08:00
Alexey Dobriyan
df200969b1 [NETFILTER]: netns: put table module on netns stop
When number of entries exceeds number of initial entries, foo-tables code
will pin table module. But during table unregister on netns stop,
that additional pin was forgotten.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:41 -08:00
Alexey Dobriyan
9ea0cb2601 [NETFILTER]: arp_tables: per-netns arp_tables FILTER
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:41 -08:00
Alexey Dobriyan
79df341ab6 [NETFILTER]: arp_tables: netns preparation
* Propagate netns from userspace.
* arpt_register_table() registers table in supplied netns.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:40 -08:00
Alexey Dobriyan
8280aa6182 [NETFILTER]: ip6_tables: per-netns IPv6 FILTER, MANGLE, RAW
Now it's possible to list and manipulate per-netns ip6tables rules.
Filtering decisions are based on init_net's table so far.

P.S.: remove init_net check in inet6_create() to see the effect

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:39 -08:00
Alexey Dobriyan
336b517fdc [NETFILTER]: ip6_tables: netns preparation
* Propagate netns from userspace down to xt_find_table_lock()
* Register ip6 tables in netns (modules still use init_net)

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:39 -08:00
Alexey Dobriyan
9335f047fe [NETFILTER]: ip_tables: per-netns FILTER, MANGLE, RAW
Now, iptables show and configure different set of rules in different
netnss'. Filtering decisions are still made by consulting only
init_net's set.

Changes are identical except naming so no splitting.

P.S.: one need to remove init_net checks in nf_sockopt.c and inet_create()
      to see the effect.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:38 -08:00
Alexey Dobriyan
34bd137ba7 [NETFILTER]: ip_tables: propagate netns from userspace
.. all the way down to table searching functions.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:37 -08:00
Alexey Dobriyan
44d34e721e [NETFILTER]: x_tables: return new table from {arp,ip,ip6}t_register_table()
Typical table module registers xt_table structure (i.e. packet_filter)
and link it to list during it. We can't use one template for it because
corresponding list_head will become corrupted. We also can't unregister
with template because it wasn't changed at all and thus doesn't know in
which list it is.

So, we duplicate template at the very first step of table registration.
Table modules will save it for use during unregistration time and actual
filtering.

Do it at once to not screw bisection.

P.S.: renaming i.e. packet_filter => __packet_filter is temporary until
      full netnsization of table modules is done.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:36 -08:00
Alexey Dobriyan
8d87005207 [NETFILTER]: x_tables: per-netns xt_tables
In fact all we want is per-netns set of rules, however doing that will
unnecessary complicate routines such as ipt_hook()/ipt_do_table, so
make full xt_table array per-netns.

Every user stubbed with init_net for a while.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:35 -08:00
Alexey Dobriyan
a98da11d88 [NETFILTER]: x_tables: change xt_table_register() return value convention
Switch from 0/-E to ptr/PTR_ERR convention.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:35 -08:00
Jan Engelhardt
30083c9500 [NETFILTER]: ebtables: mark matches, targets and watchers __read_mostly
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:34 -08:00
Jan Engelhardt
f776c4cda4 [NETFILTER]: ebtables: Update modules' descriptions
Update the MODULES_DESCRIPTION() tags for all Ebtables modules.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:33 -08:00
Jan Engelhardt
abfdf1c489 [NETFILTER]: ebtables: remove casts, use consts
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:33 -08:00
Jan Engelhardt
b41649989c [NETFILTER]: xt_conntrack: add port and direction matching
Extend the xt_conntrack match revision 1 by port matching (all four
{orig,repl}{src,dst}) and by packet direction matching.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:31 -08:00
Patrick McHardy
41d0cdedd5 [NETFILTER]: nfnetlink_log: fix typo
It should use htonl for the GID, not htons.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:30 -08:00
Patrick McHardy
2fd8e526f4 [NETFILTER]: bridge netfilter: remove nf_bridge_info read-only netoutdev member
Before the removal of the deferred output hooks, netoutdev was used in
case of VLANs on top of a bridge to store the VLAN device, so the
deferred hooks would see the correct output device. This isn't
necessary anymore since we're calling the output hooks for the correct
device directly in the IP stack.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:29 -08:00
Patrick McHardy
d44caf88e8 [NETFILTER]: nf_nat: remove double bysource hash initialization
The hash table is already initialized by nf_ct_alloc_hashtable().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:28 -08:00
Jan Engelhardt
ecb6f85e11 [NETFILTER]: Use const in struct xt_match, xt_target, xt_table
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:28 -08:00
Eric Dumazet
ca7c48ca97 [NETFILTER]: Supress some sparse warnings
CHECK   net/netfilter/nf_conntrack_expect.c
net/netfilter/nf_conntrack_expect.c:429:13: warning: context imbalance in 'exp_seq_start' - wrong count at exit
net/netfilter/nf_conntrack_expect.c:441:13: warning: context imbalance in 'exp_seq_stop' - unexpected unlock
  CHECK   net/netfilter/nf_log.c
net/netfilter/nf_log.c:105:13: warning: context imbalance in 'seq_start' - wrong count at exit
net/netfilter/nf_log.c:125:13: warning: context imbalance in 'seq_stop' - unexpected unlock
  CHECK   net/netfilter/nfnetlink_queue.c
net/netfilter/nfnetlink_queue.c:363:7: warning: symbol 'size' shadows an earlier one
net/netfilter/nfnetlink_queue.c:217:9: originally declared here
net/netfilter/nfnetlink_queue.c:847:13: warning: context imbalance in 'seq_start' - wrong count at exit
net/netfilter/nfnetlink_queue.c:859:13: warning: context imbalance in 'seq_stop' - unexpected unlock

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:27 -08:00
Denis V. Lunev
3046d76746 [RAW]: Wrong content of the /proc/net/raw6.
The address of IPv6 raw sockets was shown in the wrong format, from
IPv4 ones.  The problem has been introduced by the commit
42a73808ed ("[RAW]: Consolidate proc
interface.")

Thanks to Adrian Bunk who originally noticed the problem.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:26 -08:00
Denis V. Lunev
8cd850efa4 [RAW]: Cleanup IPv4 raw_seq_show.
There is no need to use 128 bytes on the stack at all. Clean the code
in the IPv6 style.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:25 -08:00
Denis V. Lunev
377cf82d66 [RAW]: Family check in the /proc/net/raw[6] is extra.
Different hashtables are used for IPv6 and IPv4 raw sockets, so no
need to check the socket family in the iterator over hashtables. Clean
this out.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:24 -08:00
Herbert Xu
b1641064a3 [IPCOMP]: Fix reception of incompressible packets
I made a silly typo by entering IPPROTO_IP (== 0) instead of
IPPROTO_IPIP (== 4).  This broke the reception of incompressible
packets.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:24 -08:00
Eric Dumazet
e242297055 [NET]: should explicitely initialize atomic_t field in struct dst_ops
All but one struct dst_ops static initializations miss explicit
initialization of entries field.

As this field is atomic_t, we should use ATOMIC_INIT(0), and not
rely on atomic_t implementation.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:23 -08:00
Ilpo Järvinen
ad1984e844 [TCP]: NewReno must count every skb while marking losses
NewReno should add cnt per skb (as with FACK) instead of depending on
SACKED_ACKED bits which won't be set with it at all.  Effectively,
NewReno should always exists after the first iteration anyway (or
immediately if there's already head in lost_out.

This was fixed earlier in net-2.6.25 but got reverted among other
stuff and I didn't notice that this is still necessary (actually
wasn't even considering this case while trying to figure out the
reports because I lived with different kind of code than it in reality
was).

This should solve the WARN_ONs in TCP code that as a result of this
triggered multiple times in every place we check for this invariant.

Special thanks to Dave Young <hidave.darkstar@gmail.com> and Krishna
Kumar2 <krkumar2@in.ibm.com> for trying with my debug patches.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: Dave Young <hidave.darkstar@gmail.com>
Tested-by: Krishna Kumar2 <krkumar2@in.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:22 -08:00
Pavel Emelyanov
23fe18669e [NETNS]: Fix race between put_net() and netlink_kernel_create().
The comment about "race free view of the set of network
namespaces" was a bit hasty. Look (there even can be only
one CPU, as discovered by Alexey Dobriyan and Denis Lunev):

put_net()
  if (atomic_dec_and_test(&net->refcnt))
    /* true */
      __put_net(net);
        queue_work(...);

/*
 * note: the net now has refcnt 0, but still in
 * the global list of net namespaces
 */

== re-schedule ==

register_pernet_subsys(&some_ops);
  register_pernet_operations(&some_ops);
    (*some_ops)->init(net);
      /*
       * we call netlink_kernel_create() here
       * in some places
       */
      netlink_kernel_create();
         sk_alloc();
            get_net(net); /* refcnt = 1 */
         /*
          * now we drop the net refcount not to
          * block the net namespace exit in the
          * future (or this can be done on the
          * error path)
          */
         put_net(sk->sk_net);
             if (atomic_dec_and_test(&...))
                   /*
                    * true. BOOOM! The net is
                    * scheduled for release twice
                    */

When thinking on this problem, I decided, that getting and
putting the net in init callback is wrong. If some init
callback needs to have a refcount-less reference on the struct
net, _it_ has to be careful himself, rather than relying on
the infrastructure to handle this correctly.

In case of netlink_kernel_create(), the problem is that the
sk_alloc() gets the given namespace, but passing the info
that we don't want to get it inside this call is too heavy.

Instead, I propose to crate the socket inside an init_net
namespace and then re-attach it to the desired one right
after the socket is created.

After doing this, we also have to be careful on error paths
not to drop the reference on the namespace, we didn't get
the one on.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Denis Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:22 -08:00
Eric Dumazet
533cb5b0a6 [XFRM]: constify 'struct xfrm_type'
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:20 -08:00
Benjamin Thery
2216b48376 [NETNS]: Add missing initialization of nl_info.nl_net in rtm_to_fib6_config()
Add missing initialization of the new nl_info.nl_net field in
rtm_to_fib6_config(). This will be needed the store network namespace
associated to the fib6_config struct.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:20 -08:00
Laszlo Attila Toth
4a19ec5800 [NET]: Introducing socket mark socket option.
A userspace program may wish to set the mark for each packets its send
without using the netfilter MARK target. Changing the mark can be used
for mark based routing without netfilter or for packet filtering.

It requires CAP_NET_ADMIN capability.

Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:19 -08:00
Jan Engelhardt
036c2e27bc [AF_RXRPC]: constify function pointer tables
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:18 -08:00
Dave Young
b6c0632105 [BLUETOOTH]: Add conn add/del workqueues to avoid connection fail.
The bluetooth hci_conn sysfs add/del executed in the default
workqueue.  If the del_conn is executed after the new add_conn with
same target, add_conn will failed with warning of "same kobject name".

Here add btaddconn & btdelconn workqueues, flush the btdelconn
workqueue in the add_conn function to avoid the issue.

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:12 -08:00
Herbert Xu
2614fa59fa [IPCOMP]: Fetch nexthdr before ipch is destroyed
When I moved the nexthdr setting out of IPComp I accidently moved
the reading of ipch->nexthdr after the decompression.  Unfortunately
this means that we'd be reading from a stale ipch pointer which
doesn't work very well.

This patch moves the reading up so that we get the correct nexthdr
value.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:11 -08:00
Julian Anastasov
936f6f8e1b [IPV4] fib_trie: apply fixes from fib_hash
Update fib_trie with some fib_hash fixes:
- check for duplicate alternative routes for prefix+tos+priority when
replacing route
- properly insert by matching tos together with priority
- fix alias walking to use list_for_each_entry_continue for insertion
and deletion when fa_head is not NULL
- copy state from fa to new_fa on replace (not a problem for now)
- additionally, avoid replacement without error if new route is same,
as Joonwoo Park suggests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:10 -08:00
Julian Anastasov
c18865f392 [IPV4] fib: fix route replacement, fib_info is shared
fib_info can be shared by many route prefixes but we don't want
duplicate alternative routes for a prefix+tos+priority. Last change
was not correct to check fib_treeref because it accounts usage from
other prefixes. Additionally, avoid replacement without error if new
route is same, as Joonwoo Park suggests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:10 -08:00
Wei Yongjun
ec9dbb1c3e [SCTP]: Fix miss of report unrecognized HMAC Algorithm parameter
This patch fix miss of check for report unrecognized HMAC Algorithm
parameter.  When AUTH is disabled, goto fall through path to report
unrecognized parameter, else, just break

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:09 -08:00
Arnaldo Carvalho de Melo
8cf8e5a67f [INET_DIAG]: Fix inet_diag_lock_handler error path.
Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=9825

The inet_diag_lock_handler function uses ERR_PTR to encode errors but
its callers were testing against NULL.

This only happens when the only inet_diag modular user, DCCP, is not
built into the kernel or available as a module.

Also there was a problem with not dropping the mutex lock when a handler
was not found, also fixed in this patch.

This caused an OOPS and ss would then hang on subsequent calls, as
&inet_diag_table_mutex was being left locked.

Thanks to spike at ml.yaroslavl.ru for report it after trying 'ss -d'
on a kernel that doesn't have DCCP available.

This bug was introduced in cset
d523a328fb ("Fix inet_diag dead-lock
regression"), after 2.6.24-rc3, so just 2.6.24 seems to be affected.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:08 -08:00
Herbert Xu
29ffe1a5c5 [INET]: Prevent out-of-sync truesize on ip_fragment slow path
When ip_fragment has to hit the slow path the value of skb->truesize
may go out of sync because we would have updated it without changing
the packet length.  This violates the constraints on truesize.

This patch postpones the update of skb->truesize to prevent this.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:07 -08:00
maximilian attems
1987e7b485 [AX25]: Kill ax25_bind() user triggable printk.
on the last run overlooked that sfuzz triggable message.
move the message to the corresponding comment.

Signed-off-by: maximilian attems <max@stro.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:06 -08:00
maximilian attems
a44562e4d3 [AX25]: Beautify x25_init() version printk.
kill ref to old version and dup Linux.

Signed-off-by: maximilian attems <max@stro.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:06 -08:00
Ilpo Järvinen
8674204a49 [NET] 9p: kill dead static inline buf_put_string
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:05 -08:00
Herbert Xu
1a6509d991 [IPSEC]: Add support for combined mode algorithms
This patch adds support for combined mode algorithms with GCM being
the first algorithm supported.

Combined mode algorithms can be added through the xfrm_user interface
using the new algorithm payload type XFRMA_ALG_AEAD.  Each algorithms
is identified by its name and the ICV length.

For the purposes of matching algorithms in xfrm_tmpl structures,
combined mode algorithms occupy the same name space as encryption
algorithms.  This is in line with how they are negotiated using IKE.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:03 -08:00
Herbert Xu
6fbf2cb774 [IPSEC]: Allow async algorithms
Now that ESP uses authenc we can turn on the support for async
algorithms in IPsec.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:02 -08:00
Herbert Xu
38320c70d2 [IPSEC]: Use crypto_aead and authenc in ESP
This patch converts ESP to use the crypto_aead interface and in particular
the authenc algorithm.  This lays the foundations for future support of
combined mode algorithms.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:02 -08:00
Iñaky Pérez-González
303d9bf6bb rfkill: add the WiMAX radio type
Teach rfkill about wimax radios.

Had to define a KEY_WIMAX as a 'key for disabling only wimax radios',
as other radio technologies have. This makes sense as hardware has
specific keys for disabling specific radios.

The RFKILL enabling part is, otherwise, a copy and paste of any other
radio technology.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:26:46 -08:00
Ron Rindjunsky
8b6bbe7538 mac80211: fixing null qos data frames check for reordering buffer
This patch fixes a wrong condition for null qos data frames, causing us to
drop data frames needed for reordering as well.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:26:38 -08:00
Johannes Berg
691ba2346d mac80211: fix alignment warning
When I introduced the alignment warning I forgot the A-MSDU case which
has a different requirement because each frame contains 14-byte 802.3
headers in front of the IP payload. This patch moves the alignment
warning to a place where we know whether we're dealing with an A-MSDU
frame and adjusts it accordingly.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:26:34 -08:00
Linus Torvalds
75659ca0c1 Merge branch 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc
* 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits)
  Remove commented-out code copied from NFS
  NFS: Switch from intr mount option to TASK_KILLABLE
  Add wait_for_completion_killable
  Add wait_event_killable
  Add schedule_timeout_killable
  Use mutex_lock_killable in vfs_readdir
  Add mutex_lock_killable
  Use lock_page_killable
  Add lock_page_killable
  Add fatal_signal_pending
  Add TASK_WAKEKILL
  exit: Use task_is_*
  signal: Use task_is_*
  sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL
  ptrace: Use task_is_*
  power: Use task_is_*
  wait: Use TASK_NORMAL
  proc/base.c: Use task_is_*
  proc/array.c: Use TASK_REPORT
  perfmon: Use task_is_*
  ...

Fixed up conflicts in NFS/sunrpc manually..
2008-02-01 11:45:47 +11:00
Linus Torvalds
44c3b59102 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
  security: compile capabilities by default
  selinux: make selinux_set_mnt_opts() static
  SELinux: Add warning messages on network denial due to error
  SELinux: Add network ingress and egress control permission checks
  NetLabel: Add auditing to the static labeling mechanism
  NetLabel: Introduce static network labels for unlabeled connections
  SELinux: Allow NetLabel to directly cache SIDs
  SELinux: Enable dynamic enable/disable of the network access checks
  SELinux: Better integration between peer labeling subsystems
  SELinux: Add a new peer class and permissions to the Flask definitions
  SELinux: Add a capabilities bitmap to SELinux policy version 22
  SELinux: Add a network node caching mechanism similar to the sel_netif_*() functions
  SELinux: Only store the network interface's ifindex
  SELinux: Convert the netif code to use ifindex values
  NetLabel: Add IP address family information to the netlbl_skbuff_getattr() function
  NetLabel: Add secid token support to the NetLabel secattr struct
  NetLabel: Consolidate the LSM domain mapping/hashing locks
  NetLabel: Cleanup the LSM domain hash functions
  NetLabel: Remove unneeded RCU read locks
2008-01-31 09:32:24 +11:00
Linus Torvalds
dd430ca20c Merge git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: (890 commits)
  x86: fix nodemap_size according to nodeid bits
  x86: fix overlap between pagetable with bss section
  x86: add PCI IDs to k8topology_64.c
  x86: fix early_ioremap pagetable ops
  x86: use the same pgd_list for PAE and 64-bit
  x86: defer cr3 reload when doing pud_clear()
  x86: early boot debugging via FireWire (ohci1394_dma=early)
  x86: don't special-case pmd allocations as much
  x86: shrink some ifdefs in fault.c
  x86: ignore spurious faults
  x86: remove nx_enabled from fault.c
  x86: unify fault_32|64.c
  x86: unify fault_32|64.c with ifdefs
  x86: unify fault_32|64.c by ifdef'd function bodies
  x86: arch/x86/mm/init_32.c printk fixes
  x86: arch/x86/mm/init_32.c cleanup
  x86: arch/x86/mm/init_64.c printk fixes
  x86: unify ioremap
  x86: fixes some bugs about EFI memory map handling
  x86: use reboot_type on EFI 32
  ...
2008-01-31 00:40:09 +11:00
Linus Torvalds
44c45eb911 Make !NETFILTER_ADVANCED enable IP6_NF_MATCH_IPV6HEADER
We want IPV6HEADER matching for the non-advanced default netfilter
configuration, since it's part of the standard netfilter setup of at
least some distributions (eg Fedora).

Otherwise NETFILTER_ADVANCED loses much of its point, since even
non-advanced users would have to enable all the advanced options just to
get a working IPv6 netfilter setup.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-31 00:26:10 +11:00
travis@sgi.com
df3825c56d x86: change NR_CPUS arrays in numa_64
Change the following static arrays sized by NR_CPUS to
per_cpu data variables:

	char cpu_to_node_map[NR_CPUS];

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:11 +01:00
Trond Myklebust
34f5b4662b SUNRPC: Don't bother changing the sigmask for asynchronous RPC calls
The caller will never sleep in rpc_execute, so don't bother setting the
sigmask.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:10 -05:00
Chuck Lever
afc881124b SUNRPC: rpcb_getport_sync() passes incorrect address size to rpc_create()
The variable "sin" is a pointer, so sizeof(sin) is the size of a pointer,
not the size of thing that sin points to.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:09 -05:00
Chuck Lever
67d6021362 SUNRPC: Clean up block comment preceding rpcb_getport_sync()
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:09 -05:00
Chuck Lever
f1ec08cb94 SUNRPC: Use appropriate argument types in rpcb client
Clean up: Follow recommendations of Chapter 5 of Documentation/CodingStyle
and use "u32" instead of "__u32" for types in definitions that are not
shared with user space.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:09 -05:00
Chuck Lever
b91e101fca SUNRPC: rpcb_getport_sync() should use built-in hostname generator
rpc_create() can already fill in the hostname with a string representation
of the server's IP address, so remove redundant logic in in
rpcb_getport_sync() that does that.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:09 -05:00
Chuck Lever
33e01dc7f5 SUNRPC: Clean up functions that free address_strings array
Clean up: document the rule (kfree) and the exceptions
(RPC_DISPLAY_PROTO and RPC_DISPLAY_NETID) when freeing the objects in
a transport's address_strings array.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:08 -05:00
Trond Myklebust
86d61d8638 SUNRPC: Fix up constant string declarations in struct rpcbind_args
...and eliminate an unnecessary cast.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:06 -05:00
Chuck Lever
b454ae9060 SUNRPC: fewer conditionals in the format_ip_address routines
Clean up: have the set up routines explicitly pass the strings to be used
for the transport name and NETID.  This removes a number of conditionals
and dependencies on rpc_xprt.prot, which is overloaded.

Tighten up type checking on the address_strings array while we're at it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:04 -05:00
Chuck Lever
7df089952f SUNRPC: Fix use of copy_to_user() in gss_pipe_upcall()
The gss_pipe_upcall() function expects the copy_to_user() function to
return a negative error value if the call fails, but copy_to_user()
returns an unsigned long number of bytes that couldn't be copied.

Can rpc_pipefs actually retry a partially completed upcall read?  If
not, then gss_pipe_upcall() should punt any partial read, just like the
upcall logic in net/sunrpc/cache.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:00 -05:00
Trond Myklebust
ba7392bb37 SUNRPC: Add support for per-client timeout values
In order to be able to support setting the timeo and retrans parameters on
a per-mountpoint basis, we move the rpc_timeout structure into the
rpc_clnt.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:59 -05:00
Trond Myklebust
2881ae74e6 SUNRPC: Clean up the transport timeout initialisation
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:58 -05:00
Trond Myklebust
698b6d088e SUNRPC: cleanup for rpc_new_client()
There is no reason why we shouldn't just pass the rpc_create_args.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:58 -05:00
Chuck Lever
0fb2b7e945 SUNRPC: Move universal address definitions to global header
Universal addresses are defined in RFC 1833 and clarified in RFC 3530.  We
need to use them in several places in the NFS and RPC clients, so move the
relevant definition and block comment to an appropriate global include
file.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:50 -05:00
Chuck Lever
0a48f5d70f SUNRPC: RPC version numbers are u32
Clean up: use correct type for RPC version numbers in rpcbind client.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:50 -05:00
Chuck Lever
9f6ad26d2a SUNRPC: Fix socket address handling in rpcb_clnt
Make sure rpcb_clnt passes the correct address length to rpc_create().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:50 -05:00
Chuck Lever
510deb0d70 SUNRPC: rpc_create() default hostname should support AF_INET6 addresses
If the ULP doesn't pass a hostname string to rpc_create(), it manufactures
one based on the passed-in address.  Be smart enough to handle an AF_INET6
address properly in this case.

Move the default servername logic before the xprt_create_transport() call
to simplify error handling in rpc_create().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:49 -05:00
Chuck Lever
9b45b74ce2 SUNRPC: Remove an unneeded implicit type cast when calling rpc_depopulate()
The two arguments of rpc_depopulate() that pass in inode numbers should use
the same type as inode->i_ino: unsigned long.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:43 -05:00
Chuck Lever
322e2efe62 SUNRPC: temp var should match return type of xdr_skb_read_actor
The return type of xdr_skb_read_actor functions is size_t.  This fixes a
nit I unwittingly overlooked in commit dd456471.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:43 -05:00
Chuck Lever
5d40a8a525 SUNRPC: Check a return result
Minor: Replace an empty if statement with a debugging dprintk.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Thomas Talpey <Thomas.Talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:42 -05:00
Chuck Lever
d4b37ff735 SUNRPC: Fix an unnecessary implicit type cast in rpcrdma_count_chunks()
Nit: rl_nchunks is an unsigned integer, so pass it into
rpcrdma_count_chunks() via an unsigned integer argument.  This eliminates
a harmless mixed sign comparison in rpcrdma_count_chunks()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Thomas Talpey <Thomas.Talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:42 -05:00
Chuck Lever
2a428b2b8f SUNRPC: Prevent mixed sign comparisons in rpcrdma_convert_iovs()
Keep the type of the buffer position the same during iovec conversion to
reduce the likelihood of unexpected results from comparisons and length
computations.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Thomas Talpey <Thomas.Talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:41 -05:00
Trond Myklebust
a4a874990c SUNRPC: Cleanup to remove the last users of the RPC_WAITQ declaration
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:41 -05:00
Trond Myklebust
47fe064831 SUNRPC: Unexport rpc_init_task() and rpc_execute()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:40 -05:00
Trond Myklebust
e8f5d77c80 SUNRPC: allow the caller of rpc_run_task to preallocate the struct rpc_task
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:38 -05:00
Trond Myklebust
b5627943ab SUNRPC: Remove the now unused function rpc_call_setup()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:35 -05:00
Trond Myklebust
5138fde011 NFS/SUNRPC: Convert all users of rpc_call_setup()
Replace use of rpc_call_setup() with rpc_init_task(), and in cases where we
need to initialise task->tk_action, with rpc_call_start().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:32 -05:00
Trond Myklebust
b3ef8b3bb9 SUNRPC: Allow rpc_init_task() to initialise the rpc_task->tk_msg
In preparation for the removal of rpc_call_setup().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:31 -05:00
Trond Myklebust
77de2c590e SUNRPC: Add a helper rpc_call_start() that initialises task->tk_action
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:31 -05:00
Trond Myklebust
5085925902 SUNRPC: Mask signals across the call to rpc_call_setup() in rpc_run_task
To ensure that the RPCSEC_GSS upcall is performed with the correct sigmask.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:31 -05:00
Trond Myklebust
3ff7576dda SUNRPC: Clean up the initialisation of priority queue scheduling info.
We want the default scheduling priority (priority == 0) to remain
RPC_PRIORITY_NORMAL.

Also ensure that the priority wait queue scheduling is per process id
instead of sometimes being per thread, and sometimes being per inode.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:30 -05:00
Trond Myklebust
c970aa85e7 SUNRPC: Clean up rpc_run_task
Make it use the new task initialiser structure instead of acting as a
wrapper.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:30 -05:00
Trond Myklebust
84115e1cd4 SUNRPC: Cleanup of rpc_task initialisation
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:30 -05:00
Trond Myklebust
e8914c65f7 SUNRPC: Restrict sunrpc client exports
The sunrpc client exports are not meant to be part of any official kernel
API: they can change at the drop of a hat. Mark them as internal functions
using EXPORT_SYMBOL_GPL.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:28 -05:00
Trond Myklebust
a6eaf8bdf9 SUNRPC: Move exported declarations to the function declarations
Do this for all RPC client related functions and XDR functions.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:28 -05:00
J. Bruce Fields
93a44a75b9 sunrpc: document the rpc_pipefs kernel api
Add kerneldoc comments for the rpc_pipefs.c functions that are exported.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:27 -05:00
Trond Myklebust
663b8858dd SUNRPC: Reconnect immediately whenever the server isn't refusing it.
If we've disconnected from the server, rather than the other way round,
then it makes little sense to wait 3 seconds before reconnecting.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:27 -05:00
Trond Myklebust
62da3b2488 SUNRPC: Rename xprt_disconnect()
xprt_disconnect() should really only be called when the transport shutdown
is completed, and it is time to wake up any pending tasks. Rename it to
xprt_disconnect_done() in order to reflect the semantical change.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:27 -05:00
Trond Myklebust
3ebb067d92 SUNRPC: Make call_status()/call_decode() call xprt_force_disconnect()
Move the calls to xprt_disconnect() over to xprt_force_disconnect() in
order to enable the transport layer to manage the state of the
XPRT_CONNECTED flag.
Ditto in xs_tcp_read_fraghdr().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:26 -05:00
Trond Myklebust
7272dcd31d SUNRPC: xprt_autoclose() should not call xprt_disconnect()
The transport layer should do that itself whenever appropriate.

Note that the RDMA transport already assumes that it needs to call
xprt_disconnect in xprt_rdma_close().
For TCP sockets, we want to call xprt_disconnect() only after the
connection has been closed by both ends.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:26 -05:00
Trond Myklebust
e06799f958 SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket
By using shutdown() rather than close() we allow the RPC client to wait
for the TCP close handshake to complete before we start trying to reconnect
using the same port.
We use shutdown(SHUT_WR) only instead of shutting down both directions,
however we wait until the server has closed the connection on its side.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:26 -05:00
Trond Myklebust
ef80367071 SUNRPC: TCP clear XPRT_CLOSE_WAIT when the socket is closed for writes
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:25 -05:00
Trond Myklebust
3b948ae5be SUNRPC: Allow the client to detect if the TCP connection is closed
Add an xprt->state bit to enable the TCP ->state_change() method to signal
whether or not the TCP connection is in the process of closing down.
This will to be used by the reconnection logic in a separate patch.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:25 -05:00
Trond Myklebust
67a391d72c SUNRPC: Fix TCP rebinding logic
Currently the TCP rebinding logic assumes that if we're not using a
reserved port, then we don't need to reconnect on the same port if a
disconnection event occurs. This breaks most RPC duplicate reply cache
implementations.

Also take into account the fact that xprt_min_resvport and
xprt_max_resvport may change while we're reconnecting, since the user may
change them at any time via the sysctls. Ensure that we check the port
boundaries every time we loop in xs_bind4/xs_bind6. Also ensure that if the
boundaries change, we only scan the ports a maximum of 2 times.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:25 -05:00
Trond Myklebust
66af1e5585 SUNRPC: Fix a race in xs_tcp_state_change()
When scheduling the autoclose RPC call, we want to ensure that we don't
race against the test_bit() call in xprt_clear_locked().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:24 -05:00
Paul Moore
13541b3ada NetLabel: Add auditing to the static labeling mechanism
This patch adds auditing support to the NetLabel static labeling mechanism.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-30 08:17:29 +11:00
Paul Moore
8cc44579d1 NetLabel: Introduce static network labels for unlabeled connections
Most trusted OSs, with the exception of Linux, have the ability to specify
static security labels for unlabeled networks.  This patch adds this ability to
the NetLabel packet labeling framework.

If the NetLabel subsystem is called to determine the security attributes of an
incoming packet it first checks to see if any recognized NetLabel packet
labeling protocols are in-use on the packet.  If none can be found then the
unlabled connection table is queried and based on the packets incoming
interface and address it is matched with a security label as configured by the
administrator using the netlabel_tools package.  The matching security label is
returned to the caller just as if the packet was explicitly labeled using a
labeling protocol.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-30 08:17:28 +11:00
Paul Moore
d621d35e57 SELinux: Enable dynamic enable/disable of the network access checks
This patch introduces a mechanism for checking when labeled IPsec or SECMARK
are in use by keeping introducing a configuration reference counter for each
subsystem.  In the case of labeled IPsec, whenever a labeled SA or SPD entry
is created the labeled IPsec/XFRM reference count is increased and when the
entry is removed it is decreased.  In the case of SECMARK, when a SECMARK
target is created the reference count is increased and later decreased when the
target is removed.  These reference counters allow SELinux to quickly determine
if either of these subsystems are enabled.

NetLabel already has a similar mechanism which provides the netlbl_enabled()
function.

This patch also renames the selinux_relabel_packet_permission() function to
selinux_secmark_relabel_packet_permission() as the original name and
description were misleading in that they referenced a single packet label which
is not the case.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-30 08:17:26 +11:00
Paul Moore
75e22910cf NetLabel: Add IP address family information to the netlbl_skbuff_getattr() function
In order to do any sort of IP header inspection of incoming packets we need to
know which address family, AF_INET/AF_INET6/etc., it belongs to and since the
sk_buff structure does not store this information we need to pass along the
address family separate from the packet itself.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-30 08:17:20 +11:00
Paul Moore
16efd45435 NetLabel: Add secid token support to the NetLabel secattr struct
This patch adds support to the NetLabel LSM secattr struct for a secid token
and a type field, paving the way for full LSM/SELinux context support and
"static" or "fallback" labels.  In addition, this patch adds a fair amount
of documentation to the core NetLabel structures used as part of the
NetLabel kernel API.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-30 08:17:19 +11:00
Paul Moore
1c3fad936a NetLabel: Consolidate the LSM domain mapping/hashing locks
Currently we use two separate spinlocks to protect both the hash/mapping table
and the default entry.  This could be considered a bit foolish because it adds
complexity without offering any real performance advantage.  This patch
removes the dedicated default spinlock and protects the default entry with the
hash/mapping table spinlock.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-30 08:17:18 +11:00
Paul Moore
b64397e0b4 NetLabel: Cleanup the LSM domain hash functions
The NetLabel/LSM domain hash table search function used an argument to specify
if the default entry should be returned if an exact match couldn't be found in
the hash table.  This is a bit against the kernel's style so make two separate
functions to represent the separate behaviors.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-30 08:17:17 +11:00
Paul Moore
c783f1ce57 NetLabel: Remove unneeded RCU read locks
This patch removes some unneeded RCU read locks as we can treat the reads as
"safe" even without RCU.  It also converts the NetLabel configuration refcount
from a spinlock protected u32 into atomic_t to be more consistent with the rest
of the kernel.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-30 08:17:16 +11:00
YOSHIFUJI Hideaki
85040bcb46 [IPV6] ADDRLABEL: Fix double free on label deletion.
If an entry is being deleted because it has only one reference, 
we immediately delete it and blindly register the rcu handler for it,
This results in oops by double freeing that object.

This patch fixes it by consolidating the code paths for the deletion;
let its rcu handler delete the object if it has no more reference.

Bug was found by Mitsuru Chinen <mitch@linux.vnet.ibm.com>

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:46:02 -08:00
Stephen Hemminger
ac97f75faa [IPV4] fib_trie: remove unneeded NULL check
Since fib_route_seq_show now uses hlist_for_each_entry(), the leaf
info can not be NULL.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:26 -08:00
Stephen Hemminger
f638a2f057 [IPV4] fib_trie: More whitespace cleanup.
Remove extra blank lines.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:25 -08:00
Patrick McHardy
7a9c1bd409 [NET_SCHED]: Use nla_policy for attribute validation in ematches
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:24 -08:00
Patrick McHardy
53b2bf3f8a [NET_SCHED]: Use nla_policy for attribute validation in actions
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:23 -08:00
Patrick McHardy
6fa8c0144b [NET_SCHED]: Use nla_policy for attribute validation in classifiers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:23 -08:00
Patrick McHardy
27a3421e48 [NET_SCHED]: Use nla_policy for attribute validation in packet schedulers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:22 -08:00
Patrick McHardy
5feb5e1aaa [NET_SCHED]: sch_api: introduce constant for rate table size
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:21 -08:00
Patrick McHardy
1587bac49f [NET_SCHED]: Use typeful attribute parsing helpers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:21 -08:00
Patrick McHardy
24beeab539 [NET_SCHED]: Use typeful attribute construction helpers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:20 -08:00
Patrick McHardy
57e1c487a4 [NET_SCHED]: Use NLA_PUT_STRING for string dumping
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:19 -08:00
Patrick McHardy
4b3550ef53 [NET_SCHED]: Use nla_nest_start/nla_nest_end
Use nla_nest_start/nla_nest_end for dumping nested attributes.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:18 -08:00
Patrick McHardy
cee63723b3 [NET_SCHED]: Propagate nla_parse return value
nla_parse() returns more detailed errno codes, propagate them back on
error.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:18 -08:00
Patrick McHardy
ab27cfb85c [NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:17 -08:00
Patrick McHardy
c96c9471dd [NET_SCHED]: act_api: use nlmsg_parse
Convert open-coded nlmsg_parse to use the real function.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:16 -08:00
Patrick McHardy
6d834e04e5 [NET_SCHED]: act_api: fix netlink API conversion bug
Fix two invalid attribute accesses, indices start at 1 with the new
netlink API.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:15 -08:00
Patrick McHardy
b03f467200 [NET_SCHED]: sch_netem: use nla_parse_nested_compat
Replace open coded equivalent of nla_parse_nested_compat().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:15 -08:00
Patrick McHardy
f5e5cb7553 [NET_SCHED]: sch_atm: fix format string warning
Fix format string warning introduces by the netlink API conversion:

net/sched/sch_atm.c:250: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'int'.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:14 -08:00
Denis V. Lunev
dde1bc0e6f [NETNS]: Add namespace for ICMP replying code.
All needed API is done, the namespace is available when required from
the device on the DST entry from the incoming packet. So, just replace
init_net with proper namespace.

Other protocols will follow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:13 -08:00
Denis V. Lunev
b5921910a1 [NETNS]: Routing cache virtualization.
Basically, this piece looks relatively easy. Namespace is already
available on the dst entry via device and the device is safe to
dereferrence. Compare it with one of a searcher and skip entry if
appropriate.

The only exception is ip_rt_frag_needed. So, add namespace parameter to it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:13 -08:00
Patrick McHardy
7ba699c604 [NET_SCHED]: Convert actions from rtnetlink to new netlink API
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:11 -08:00
Patrick McHardy
add93b610a [NET_SCHED]: Convert classifiers from rtnetlink to new netlink API
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:11 -08:00
Patrick McHardy
1e90474c37 [NET_SCHED]: Convert packet schedulers from rtnetlink to new netlink API
Convert packet schedulers to use the netlink API. Unfortunately a gradual
conversion is not possible without breaking compilation in the middle or
adding lots of casts, so this patch converts them all in one step. The
patch has been mostly generated automatically with some minor edits to
at least allow seperate conversion of classifiers and actions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:10 -08:00
Patrick McHardy
01480e1cf5 [NETLINK]: Add nla_append()
Used to append data to a message without a header or padding.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:09 -08:00
Patrick McHardy
2eb9d75c72 [NET_SCHED]: mark classifier ops __read_mostly
Additionally remove unnecessary NULL initilizations of the next pointer.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:08 -08:00
Patrick McHardy
62e3ba1b55 [NET_SCHED]: Move EXPORT_SYMBOL next to exported symbol
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:07 -08:00
Denis V. Lunev
f206351a50 [NETNS]: Add namespace parameter to ip_route_output_key.
Needed to propagate it down to the ip_route_output_flow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:07 -08:00
Denis V. Lunev
f1b050bf7a [NETNS]: Add namespace parameter to ip_route_output_flow.
Needed to propagate it down to the __ip_route_output_key.

Signed_off_by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:06 -08:00
Denis V. Lunev
611c183ebc [NETNS]: Add namespace parameter to __ip_route_output_key.
This is only required to propagate it down to the
ip_route_output_slow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:05 -08:00
Denis V. Lunev
b40afd0e5c [NETNS]: Add namespace parameter to ip_route_output_slow.
This function needs a net namespace to lookup devices, fib tables,
etc. in, so pass it there.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:05 -08:00
Denis V. Lunev
1ab352768f [NETNS]: Add namespace parameter to ip_dev_find.
in_dev_find() need a namespace to pass it to fib_get_table(), so add
an argument.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:04 -08:00
Denis V. Lunev
010278ec4c [NETNS]: Add netns parameter to fib_select_default.
Currently fib_select_default calls fib_get_table() with the
init_net. Prepare it to provide a correct namespace to lookup default
route.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:03 -08:00
Denis V. Lunev
64c2d53829 [IPV4]: Consolidate fib_select_default.
The difference in the implementation of the fib_select_default when
CONFIG_IP_MULTIPLE_TABLES is (not) defined looks
negligible. Consolidate it and place into fib_frontend.c.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:02 -08:00
Stephen Hemminger
d5ce8a0e97 [IPV4] fib_trie: avoid rescan on dump
This converts dumping (and flushing) of large route tables form O(N^2)
to O(N). If the route dump took multiple pages then the dump routine
gets called again. The old code kept track of location by counter, the
new code instead uses the last key.

This is a really big win ( 0.3 sec vs 12 sec) for big route tables.

One side effect is that if the table changes during the dump, then the
last key will not be found, and we will return -EBUSY.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:01 -08:00
Stephen Hemminger
9195bef7fb [IPV4] fib_trie: avoid extra search on delete
Get rid of extra search that made route deletion O(n).

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:00 -08:00
Stephen Hemminger
a88ee22925 [IPV4] fib_trie: dump table in sorted order
It is easier with TRIE to dump the data traversal rather than
interating over every possible prefix. This saves some time and makes
the dump come out in sorted order.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:00 -08:00
Stephen Hemminger
82cfbb0085 [IPV4] fib_trie: iterator recode
Remove the complex loop structure of nextleaf() and replace it with a
simpler tree walker. This improves the performance and is much
cleaner.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:59 -08:00
Stephen Hemminger
64347f786d [IPV4] fib_trie: dump message multiple part flag
Match fib_hash, and set NLM_F_MULTI to handle multiple part messages.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:58 -08:00
Stephen Hemminger
1328042e26 [IPV4] fib_trie: use hash list
The code to dump can use the existing hash chain rather than doing
repeated lookup.

Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:58 -08:00
Stephen Hemminger
936722922f [IPV4] fib_trie: compute size when needed
Compute the number of prefixes when needed, rather than doing bookeeping.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:57 -08:00
Stephen Hemminger
a07f5f508a [IPV4] fib_trie: style cleanup
Style cleanups:
      * make check_leaf return -1 or plen, rather than by reference
      * Get rid of #ifdef that is always set
      * split out embedded function calls in if statements.
      * checkpatch warnings

Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:56 -08:00
Stephen Hemminger
bc3c8c1e02 [IPV4] fib_trie: put leaf nodes in a slab cache
This improves locality for operations that touch all the leaves.  Save
space since these entries don't need to be hardware cache aligned.

Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:56 -08:00
Jan Engelhardt
a1f6f5a768 [AF_X25]: constify function pointer tables
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:55 -08:00
Ross Burton
91cde6f7d2 [IrDA]: LMP discovery timer not started by default
By default, LMP sets up a 3 seconds timer for discovery.
We don't need it until discovery is set to 1.

Signed-off-by: Ross Burton <ross@openedhand.com>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:54 -08:00
Masakazu Mokuno
4248d2f811 WEXT: remove unused variable
As event_type_pk_size[] is not used,  Remove it.

Signed-off-by: Masakazu Mokuno <mokuno@sm.sony.co.jp>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:10:48 -08:00
Ron Rindjunsky
71ebb4aac8 mac80211: fix rx flow sparse errors, make functions static
This patch adds static declarations to functions in the Rx flow in order to
eliminate sparse errors

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:10:46 -08:00
Stefano Brivio
c09c7237ea rc80211-pid: fix last_sample initialization
Fix last_sample initialization. kzalloc'ing the per-STA data wasn't enough,
as jiffies can assume negative values as well. This fixes a bug which
prevented correct PID operation for a while after booting.

Thanks to Michael Buesch for reporting this.

Reported-and-tested-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:10:43 -08:00
Johannes Berg
f99b751fca mac80211: fix RCU locking in __ieee80211_rx_handle_packet
Commit c7a51bda ("mac80211: restructure __ieee80211_rx") extracted
__ieee80211_rx_handle_packet out of __ieee80211_rx and hence changed
the locking rules for __ieee80211_rx_handle_packet(), it is now
invoked under RCU lock. There is, however, one instance left where
it contains an rcu_read_unlock() in an error path, which is a bug.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:10:43 -08:00
Guy Cohen
a8bdf29c6c mac80211: Assign correct TID for local bridged packets
This patch assigns correct TID to frames transmitted between
two stations in the same BSS in AP mode.
The problem is that skb->protocol is not set to ETH_P_IP and it is wrong
to use that field at this stage.
The fix compares the LLC/Protocol headers explicitly to check if the
encapsulated frame is IP frame

Signed-off-by: Guy Cohen <guy.cohen@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:10:41 -08:00
Eric Dumazet
69a73829db [DST]: shrinks sizeof(struct rtable) by 64 bytes on x86_64
On x86_64, sizeof(struct rtable) is 0x148, which is rounded up to
0x180 bytes by SLAB allocator.

We can reduce this to exactly 0x140 bytes, without alignment overhead,
and store 12 struct rtable per PAGE instead of 10.

rate_tokens is currently defined as an "unsigned long", while its
content should not exceed 6*HZ. It can safely be converted to an
unsigned int.

Moving tclassid right after rate_tokens to fill the 4 bytes hole
permits to save 8 bytes on 'struct dst_entry', which finally permits
to save 8 bytes on 'struct rtable'

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:41 -08:00
Pavel Emelyanov
81566e8322 [NETNS][FRAGS]: Make the pernet subsystem for fragments.
On namespace start we mainly prepare the ctl variables.

When the namespace is stopped we have to kill all the fragments that
point to this namespace.  The inet_frags_exit_net() handles it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:40 -08:00
Pavel Emelyanov
3140c25c82 [NETNS][FRAGS]: Make the LRU list per namespace.
The inet_frags.lru_list is used for evicting only, so we have
to make it per-namespace, to evict only those fragments, who's
namespace exceeded its high threshold, but not the whole hash.
Besides, this helps to avoid long loops  in evictor.

The spinlock is not per-namespace because it protects the
hash table as well, which is global.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:39 -08:00
Pavel Emelyanov
3b4bc4a2bf [NETNS][FRAGS]: Isolate the secret interval from namespaces.
Since we have one hashtable to lookup the fragment, having
different secret_interval-s for hash rebuild doesn't make
sense, so move this one to inet_frags.

The inet_frags_ctl becomes empty after this, so remove it.
The appropriate ctl table is kept read-only in namespaces.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:39 -08:00
Pavel Emelyanov
e31e0bdc7e [NETNS][FRAGS]: Make thresholds work in namespaces.
This is the same as with the timeout variable.

Currently, after exceeding the high threshold _all_
the fragments are evicted, but it will be fixed in
later patch.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:38 -08:00
Pavel Emelyanov
b2fd5321dd [NETNS][FRAGS]: Make the net.ipv4.ipfrag_timeout work in namespaces.
Move it to the netns_frags, adjust the usage and
make the appropriate ctl table writable.

Now fragment, that live in different namespaces can
live for different times.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:37 -08:00
Pavel Emelyanov
e4a2d5c2bc [NETNS][FRAGS]: Duplicate sysctl tables for new namespaces.
Each namespace has to have own tables to tune their
different parameters, so duplicate the tables and
register them.

All the tables in sub-namespaces are temporarily made
read-only.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:37 -08:00
Pavel Emelyanov
6ddc082223 [NETNS][FRAGS]: Make the mem counter per-namespace.
This is also simple, but introduces more changes, since
then mem counter is altered in more places.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:36 -08:00
Pavel Emelyanov
e5a2bb842c [NETNS][FRAGS]: Make the nqueues counter per-namespace.
This is simple - just move the variable from struct inet_frags
to struct netns_frags and adjust the usage appropriately.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:35 -08:00
Pavel Emelyanov
ac18e7509e [NETNS][FRAGS]: Make the inet_frag_queue lookup work in namespaces.
Since fragment management code is consolidated, we cannot have the
pointer from inet_frag_queue to struct net, since we must know what
king of fragment this is.

So, I introduce the netns_frags structure. This one is currently
empty, but will be eventually filled with per-namespace
attributes. Each inet_frag_queue is tagged with this one.

The conntrack_reasm is not "netns-izated", so it has one static
netns_frags instance to keep working in init namespace.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:34 -08:00
Pavel Emelyanov
8d8354d2fb [NETNS][FRAGS]: Move ctl tables around.
This is a preparation for sysctl netns-ization.
Move the ctl tables to the files, where the tuning
variables reside. Plus make the helpers to register
the tables.

This will simplify the later patches and will keep
similar things closer to each other.

ipv4, ipv6 and conntrack_reasm are patched differently,
but the result is all the tables are in appropriate files.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:34 -08:00
YOSHIFUJI Hideaki
61cf46ad58 [IPV6] NDISC: Sparse: Use different variable name for local use.
Fix the following sparse warnings:
| net/ipv6/ndisc.c:1300:21: warning: symbol 'opt' shadows an earlier one
| net/ipv6/ndisc.c:1078:7: originally declared here

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-01-28 15:10:28 -08:00
YOSHIFUJI Hideaki
5d5619b40c [IPV6] ADDRCONF: Sparse: Make inet6_dump_addr() code paths more straight-forward.
Fix the following sparse warning:
| net/ipv6/addrconf.c:3384:2: warning: context imbalance in 'inet6_dump_addr' - different lock contexts for basic block

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-01-28 15:10:28 -08:00
YOSHIFUJI Hideaki
2334ecbdb2 [IPV6]: Sparse: Declare non-static ipv6_{route,icmp,frag}_sysctl_init() in header.
Fix the following sparse warnings:
| net/ipv6/route.c:2491:18: warning: symbol 'ipv6_route_sysctl_init' was not declared. Should it be static?
| net/ipv6/icmp.c:922:18: warning: symbol 'ipv6_icmp_sysctl_init' was not declared. Should it be static?
| net/ipv6/reassembly.c:628:6: warning: symbol 'ipv6_frag_sysctl_init' was not declared. Should it be static?

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-01-28 15:10:27 -08:00
YOSHIFUJI Hideaki
40fee36e11 [IPV6] ADDRLABEL: Sparse: Make several functions static.
Fix following sparse warnings:
| net/ipv6/addrlabel.c:172:25: warning: symbol 'ip6addrlbl_alloc' was not declared. Should it be static?
| net/ipv6/addrlabel.c:219:5: warning: symbol '__ip6addrlbl_add' was not declared. Should it be static?
| net/ipv6/addrlabel.c:260:5: warning: symbol 'ip6addrlbl_add' was not declared. Should it be static?
| net/ipv6/addrlabel.c:285:5: warning: symbol '__ip6addrlbl_del' was not declared. Should it be static?
| net/ipv6/addrlabel.c:311:5: warning: symbol 'ip6addrlbl_del' was not declared. Should it be static?

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-01-28 15:10:26 -08:00
YOSHIFUJI Hideaki
5e8b9df6e8 [IPV6] UDPLITE: Sparse: Declare non-static symbols in header.
Fix the following sparse warnings:
| net/ipv6/udplite.c:45:14: warning: symbol 'udplitev6_prot' was not declared. Should it be static?
| net/ipv6/udplite.c:80:12: warning: symbol 'udplitev6_init' was not declared. Should it be static?
| net/ipv6/udplite.c:99:6: warning: symbol 'udplitev6_exit' was not declared. Should it be static?

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-01-28 15:10:26 -08:00
YOSHIFUJI Hideaki
77d0d350e9 [IPV6] UDP,UDPLITE: Sparse: {__udp6_lib,udp,udplite}_err() are of void.
Fix following sparse warnings:
| net/ipv6/udp.c:262:2: warning: returning void-valued expression
| net/ipv6/udplite.c:29:2: warning: returning void-valued expression

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-01-28 15:10:25 -08:00
YOSHIFUJI Hideaki
fc80be87dc [IPV4] UDP,UDPLITE: Sparse: {__udp4_lib,udp,udplite}_err() are of void.
Fix following sparse warnings:
| net/ipv4/udp.c:421:2: warning: returning void-valued expression
| net/ipv4/udplite.c:38:2: warning: returning void-valued expression

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-01-28 15:10:24 -08:00
Denis V. Lunev
ecfdc8c542 [NETNS]: Pass correct namespace in ip_rt_get_source.
ip_rt_get_source is the infamous place for which dst_ifdown kludges
have been implemented. This means that rt->u.dst.dev can be safely
dereferrenced obtain nd_net.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:23 -08:00
Denis V. Lunev
84a885f449 [NETNS]: Pass correct namespace in ip_route_input_slow.
The packet on the input path always has a referrence to an input
network device it is passed from. Extract network namespace from it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:22 -08:00
Denis V. Lunev
86167a377f [NETNS]: Pass correct namespace in context fib_check_nh.
Correct network namespace is already used in fib_check_nh. Re-work its
usage for better readability and pass into fib_lookup &
inetdev_by_index.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:21 -08:00
Denis V. Lunev
5b707aaae4 [NETNS]: Pass correct namespace in fib_validate_source.
Correct network namespace is available inside fib_validate_source. It
can be obtained from the device passed in. The device is not NULL as
in_device is obtained from it just above.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:21 -08:00
Denis V. Lunev
7fee0ca237 [NETNS]: Add netns parameter to inetdev_by_index.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:20 -08:00
Denis V. Lunev
da0e28cb68 [NETNS]: Add netns parameter to fib_lookup.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:19 -08:00
Stephen Hemminger
ba93ef7465 [IPV4]: ipmr sparse warnings
Get rid of some of the sparse warnings.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:18 -08:00
Stephen Hemminger
dd329bfa96 [IPV4]: igmp sparse warnings
Partial sparse warning fix.  The other conditional locking
is too much for sparse to handle.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:18 -08:00
Stephen Hemminger
1402c8519a [VLAN]: sparse warning fix
Minor sparse warning fix.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:17 -08:00
Ron Rindjunsky
66dcb6bdc5 mac80211: A-MPDU Rx remove stop_rx_ba_session warning print
This patch removes a warning print from ieee80211_sta_stop_rx_ba_session
in case the tid is inactive when interface goes down.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:10:10 -08:00
Ron Rindjunsky
5b3d71d90e mac80211: A-MPDU Rx stop aggregation on proper dev
This patch adds a check to insure that Rx A-MPDU will be stopped only
for the proper device.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:10:09 -08:00
Ivo van Doorn
0f7054e32f mac80211: Initialize vif pointer
Before calling update_beacon() mac80211 must
initialize the control.vif pointer so it can
be used by the driver to determine which
interface is trying to send the beacon.

v2: ieee80211_beacon_get() should also initialize the
vif pointer since it can be called by mac80211 internally
before calling config_interface().

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:10:08 -08:00
Johannes Berg
471b3efdfc mac80211: add unified BSS configuration
This patch (based on Ron Rindjunsky's) creates a framework for
a unified way to pass BSS configuration to drivers that require
the information, e.g. for implementing power save mode.

This patch introduces new ieee80211_bss_conf structure that is
passed to the driver via the new bss_info_changed() callback
when the BSS configuration changes.

This new BSS configuration infrastructure adds the following
new features:
 * drivers are notified of their association AID
 * drivers are notified of association status

and replaces the erp_ie_changed() callback. The patch also does
the relevant driver updates for the latter change.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:09:43 -08:00
Michael Wu
2bc454b0b3 mac80211: Fix rate reporting regression
Mattias Nissler's "clean up rate selection" patch incorrectly changes
the behavior of txrate setting in sta_info. This patch backs out parts
of the rate selection consolidation in order to fix this issue for now.

Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:09:42 -08:00
Johannes Berg
4fd6931ebe mac80211: implement cfg80211 station handling
This implements station handling from userspace via cfg80211
in mac80211.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:09:39 -08:00
Johannes Berg
5dfdaf58d6 mac80211: add beacon configuration via cfg80211
This patch implements the cfg80211 hooks for configuring beaconing
on an access point interface in mac80211. While doing so, it fixes
a number of races that could badly crash the machine when the
beacon is changed while being requested by the driver.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:09:38 -08:00
Johannes Berg
51fb61e76d mac80211: move interface type to vif structure
Drivers that support mixed AP/STA operation may well need to
know the type of a virtual interface when iterating over them.
The easiest way to support that is to move the interface type
variable into the vif structure.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:09:37 -08:00
Johannes Berg
32bfd35d4b mac80211: dont use interface indices in drivers
This patch gets rid of the if_id stuff where possible in favour of
a new per-virtual-interface structure "struct ieee80211_vif". This
structure is located at the end of the per-interface structure and
contains a variable length driver-use data area.

This has two advantages:
 * removes the need to look up interfaces by if_id, this is better
   for working with network namespaces and performance
 * allows drivers to store and retrieve per-interface data without
   having to allocate own lists/hash tables

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:09:36 -08:00
Al Viro
8524f59d47 ieee80211: beacon->capability is little-endian
It's only a debugging printk, so it went unnoticed; still, the
fix is trivial, so...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:08:48 -08:00
Al Viro
d9e94d5647 ieee80211: fix misannotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:08:48 -08:00
Al Viro
c414e84b22 ieee80211softmac_auth_resp() fix
The struct ieee8021_auth * passed to it comes straight from skb->data
without any conversions; members of the struct are little-endian, so
we'd better take that into account when doing switch by auth->algorithm,
etc.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:08:47 -08:00
Al Viro
b16f13d00c several missing cpu_to_le16() in ieee80211softmac_capabilities()
on some codepaths we forgot to convert to little-endian as we do on the
rest of them and as the caller expects from us.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:08:46 -08:00
Al Viro
8fffc15dc7 eliminate byteswapping in struct ieee80211_qos_parameters
Make it match the on-the-wire endianness, eliminate byteswapping.
The only driver that used this sucker (ipw2200) updated.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:08:45 -08:00
Jan Engelhardt
1e637c74b0 [IPV4]: Enable use of 240/4 address space.
This short patch modifies the IPv4 networking to enable use of the
240.0.0.0/4 (aka "class-E") address space as propsed in the internet
draft draft-fuller-240space-00.txt.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:44 -08:00
Jarek Poplawski
96750162b5 [NET] gen_estimator: gen_replace_estimator() cosmetic changes
White spaces etc. are changed in gen_replace_estimator() to make it
similar to others in a file.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:43 -08:00
Stephen Hemminger
72348a424f [PKT_SCHED] net: add sparse annotation to ptype_seq_start/stop
Get rid of some more sparse warnings.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:42 -08:00
Stephen Hemminger
aa767bfea4 [PKT_SCHED] net classifier: style cleanup's
Classifier code cleanup. Get rid of printk wrapper, and fix whitespace
and other style stuff reported by checkpatch

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:42 -08:00
Stephen Hemminger
786a90366f [PKT_SCHED] sch_atm: style cleanup
ATM scheduler clean house:
  * get rid of printk and qdisc_priv() wrapper
  * split some assignment in if() statements
  * whitespace and line breaks.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:41 -08:00
Stephen Hemminger
9d127fbdd2 [PKT_SCHED] dsmark: checkpatch warning cleanup
Get rid of all style things checkpatch warns about, indentation and
whitespace.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:40 -08:00
Stephen Hemminger
4c30719f4f [PKT_SCHED] dsmark: handle cloned and non-linear skb's
Make dsmark work properly with non-linear and cloned skb's
Before modifying the header, it needs to check that skb header is
writeable.

Note: this makes the assumption, that if it queues a good skb
then a good skb will come out of the embedded qdisc.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:40 -08:00
David S. Miller
5b0ac72bc5 [PKT_SCHED] dsmark: Use hweight32() instead of convoluted loop.
Based upon a patch by Stephen Hemminger and suggestions
from Patrick McHardy.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:39 -08:00
Stephen Hemminger
81da99ed71 [PKT_SCHED] dsmark: get rid of wrappers
Remove extraneous macro wrappers for printk and qdisc_priv.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:38 -08:00
Stephen Hemminger
d20b3109e9 [IPV6]: addrconf sparse warnings
Get rid of a couple of sparse warnings in IPV6 addrconf code.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:37 -08:00
Patrick McHardy
13a0a096e5 [NET_SCHED]: kill obsolete NET_CLS_POLICE option
The code is already gone for about half a year, the config option
has been kept around to select the replacement options for easier
upgrades. This seems long enough, people upgrading from older
kernels will have to reconfigure a lot anyway.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:37 -08:00
Pavel Emelyanov
91b4f95475 [VLAN]: Move protocol determination to seperate function
I think, that we can make this code flow easier to understand
by introducing the vlan_set_encap_proto() function (I hope the
name is good) to setup the skb proto and merge the paths calling
netif_rx() together.

[Patrick: Modified to apply on top of my previous patches]

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:35 -08:00
Patrick McHardy
31ffdbcb59 [VLAN]: Clean up vlan_skb_recv()
- remove three instances of identical code
- remove unnecessary NULL initialization
- remove obvious and unnecessary comments

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:35 -08:00
Patrick McHardy
ad712087f7 [VLAN]: Update list address
VLAN related mail should go to netdev.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:34 -08:00
Patrick McHardy
2029cc2c84 [VLAN]: checkpatch cleanups
Checkpatch cleanups, consisting mainly of overly long lines and
missing spaces.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:33 -08:00
Patrick McHardy
9dfebcc647 [VLAN]: Turn VLAN_DEV_INFO into inline function
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:32 -08:00
Patrick McHardy
af30151709 [VLAN]: Simplify vlan unregistration
Keep track of the number of VLAN devices in a vlan group. This allows
to have the caller sense when the group is going to be destroyed and
stop using it, which in turn allows to remove the wrapper around
unregister_vlan_dev for the NETDEV_UNREGISTER notifier and avoid
iterating over all possible VLAN ids whenever a device in unregistered.

Also fix what looks like a use-after-free (but is actually safe since
we're holding the RTNL), the real_dev reference should not be dropped
while we still use it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:31 -08:00
Patrick McHardy
acc5efbcd2 [VLAN]: Clean up unregister_vlan_dev
Save two levels of indentation by aborting on error conditions,
remove unnecessary initialization to NULL and remove two obvious
comments.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:30 -08:00
Patrick McHardy
69ab4b7d6d [VLAN]: Clean up initialization code
- move module init/exit functions to end of file, remove some now unnecessary
  forward declarations
- remove some obvious comments
- clean up proc init function and move a proc-related printk there

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:30 -08:00
Patrick McHardy
198a291ce3 [VLAN]: Remove non-implemented ioctls
The GET_VLAN_INGRESS_PRIORITY_CMD/GET_VLAN_EGRESS_PRIORITY_CMD ioctls are
not implemented and won't be, new functionality will be added to the netlink
interface. Remove the code and make the ioctl handler return -EOPNOTSUPP
for unknown commands instead of -EINVAL.

Also remove a comment about passing unknown commands to the underlying
device, that doesn't make any sense since its a VLAN specific ioctl and
if its not implemented here, its implemented nowhere.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:29 -08:00
Patrick McHardy
40f98e1af4 [VLAN]: Clean up debugging and printks
- use pr_* functions and common prefix for non-device related messages

- remove VLAN_ printk levels

- kill lots of useless debugging statements

- remove a few unnecessary printks like for double VID registration (already
  returns -EEXIST) and kill of a number of unnecessary checks in
  vlan_proc_{add,rem}_dev() that are already performed by the caller

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:28 -08:00
Patrick McHardy
62f99efce6 [VLAN]: Kill useless check
vlan->real_dev is always equal to the device since thats what we used
for the lookup. It doesn't even seem worth a WARN_ON or BUG_ON.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:27 -08:00
Patrick McHardy
ef3eb3e59b [VLAN]: Move device setup to vlan_dev.c
Move device setup to vlan_dev.c and make all the VLAN device methods
static.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:26 -08:00
Patrick McHardy
7bd38d778e [VLAN]: Use dev->stats
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:25 -08:00
Patrick McHardy
b7a4a83629 [VLAN]: Kill useless VLAN_NAME define
The only user already includes __FUNCTION__ (vlan_proto_init) in the
output, which is enough to identify what the message is about.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:25 -08:00
Patrick McHardy
891687649a [NET_SCHED]: sch_ingress: remove useless printk
The printk about ingress qdisc registration error can't be triggered
under normal circumstances. Since register_qdisc only fails for two
identical registrations, the only way to trigger it is by loading the
sch_ingress modules multiple times under different names, in which
case we already return -EEXIST to userspace.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:22 -08:00
Patrick McHardy
1389356735 [NET_SCHED]: sch_ingress: avoid a few #ifdefs
Move the repeating "ifndef CONFIG_NET_CLS_ACT/ifdef CONFIG_NETFILTER"
ifdefs into a single condition.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:22 -08:00
Patrick McHardy
645a1e39e4 [NET_SCHED]: sch_ingress: move dependencies to Kconfig
Instead of complaining at scheduler initialization time, check the
dependencies in Kconfig.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:21 -08:00
Patrick McHardy
c6ee877f2e [NET_SCHED]: sch_ingress: remove unnecessary ops
- ->reset is optional
- sch_api provides identical defaults for ->dequeue/->requeue
- ->drop can't happen since ingress never has a parent qdisc

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:20 -08:00
Patrick McHardy
e037834758 [NET_SCHED]: sch_ingress: return proper error code in ingress_graft()
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:20 -08:00
Patrick McHardy
c21d4d5dd2 [NET_SCHED]: sch_ingress: remove unused inner qdisc
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:19 -08:00
Patrick McHardy
cb53c04891 [NET_SCHED]: sch_ingress: remove qdisc_priv() wrapper
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:18 -08:00
Patrick McHardy
a47812211b [NET_SCHED]: sch_ingress: remove excessive debugging
Remove excessive debugging statements and some "future use" stuff.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:18 -08:00
Patrick McHardy
58f4df423e [NET_SCHED]: sch_ingress: formatting fixes
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:17 -08:00
Stephen Hemminger
6f9e98f7a9 [PKT_SCHED] SFQ: whitespace cleanup
Add whitespace around operators, and add a few blank lines to improve
readability.

Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:16 -08:00
Stephen Hemminger
d46f8dd87d [PKT_SCHED] SFQ: use net_random
SFQ doesn't need true random numbers, it is only using them to salt a
hash. Therefore it is better to use net_random() and avoid any
possible problems with depleting the entropy pool.

Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:15 -08:00
Stephen Hemminger
d3e994830d [PKT_SCHED] SFQ: timer is deferrable
The perturbation timer used for re-keying can be deferred, it doesn't
need to be deterministic.

Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:15 -08:00
Denis V. Lunev
51314a17ba [NETNS]: Process FIB rule action in the context of the namespace.
Save namespace context on the fib rule at the rule creation time and
call routing lookup in the correct namespace.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:14 -08:00
Denis V. Lunev
9e3a548781 [NETNS]: FIB rules API cleanup.
Remove struct net from fib_rules_register(unregister)/notify_change
paths and diet code size a bit.

add/remove: 0/0 grow/shrink: 10/12 up/down: 35/-100 (-65)
function                                     old     new   delta
notify_rule_change                           273     280      +7
trie_show_stats                              471     475      +4
fn_trie_delete                               473     477      +4
fib_rules_unregister                         144     148      +4
fib4_rule_compare                            119     123      +4
resize                                      2842    2845      +3
fn_trie_select_default                       515     518      +3
inet_sk_rebuild_header                       836     838      +2
fib_trie_seq_show                            764     766      +2
__devinet_sysctl_register                    276     278      +2
fn_trie_lookup                              1124    1123      -1
ip_fib_check_default                         133     131      -2
devinet_conf_sysctl                          223     221      -2
snmp_fold_field                              126     123      -3
fn_trie_insert                              2091    2086      -5
inet_create                                  876     870      -6
fib4_rules_init                              197     191      -6
fib_sync_down                                452     444      -8
inet_gso_send_check                          334     325      -9
fib_create_info                             3003    2991     -12
fib_nl_delrule                               568     553     -15
fib_nl_newrule                               883     852     -31

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:13 -08:00
Denis V. Lunev
0359238333 [FIB]: Add netns to fib_rules_ops.
The backward link from FIB rules operations to the network namespace
will allow to simplify the API a bit.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:13 -08:00
Vlad Yasevich
853f4b5055 [SCTP]: Correctly initialize error when parameter validation failed.
When parameter validation fails, there should be error causes that
specify what type of failure we've encountered.  If the causes are not
there, we lacked memory to allocated them.  Thus make that the default
value for the error.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:11 -08:00
Adrian Bunk
e9888f5498 [IrDA]: Irport removal - part 1
This patch removes IrPORT and the old dongle drivers (all off them
have replacement drivers).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:10 -08:00
Robie Basak
5d780cd658 [IrDA]: Frame length validation.
When using a stir4200-based USB adaptor to talk to a device that uses
an mcp2150, the stir4200 sometimes drops an incoming frame causing the
mcp2150 to try and retransmit the lost frame. In this combination, the
next frame received from the mcp2150 is often invalid - either an
empty i:rsp or an IrCOMM i:rsp with an invalid clen. These corner
cases are now checked.

Signed-off-by: Robie Basak <rb-oss-1@justgohome.co.uk>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:09 -08:00
Robie Basak
6d97b53e92 [IrDA]: Resend frames on timeout.
When final timer expires, it might also mean that the i:cmd wasn't
received properly. If we have rejected frames, we can try to resend them.

Signed-off-by: Robie Basak <rb-oss-1@justgohome.co.uk>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:08 -08:00
Denis V. Lunev
775516bfa2 [NETNS]: Namespace stop vs 'ip r l' race.
During network namespace stop process kernel side netlink sockets
belonging to a namespace should be closed. They should not prevent
namespace to stop, so they do not increment namespace usage
counter. Though this counter will be put during last sock_put.

The raplacement of the correct netns for init_ns solves the problem
only partial as socket to be stoped until proper stop is a valid
netlink kernel socket and can be looked up by the user processes. This
is not a problem until it resides in initial namespace (no processes
inside this net), but this is not true for init_net.

So, hold the referrence for a socket, remove it from lookup tables and
only after that change namespace and perform a last put.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:08 -08:00
Denis V. Lunev
b7c6ba6eb1 [NETNS]: Consolidate kernel netlink socket destruction.
Create a specific helper for netlink kernel socket disposal. This just
let the code look better and provides a ground for proper disposal
inside a namespace.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:07 -08:00
Denis V. Lunev
4f84d82f7a [NETNS]: Memory leak on network namespace stop.
Network namespace allocates 2 kernel netlink sockets, fibnl &
rtnl. These sockets should be disposed properly, i.e. by
sock_release. Plain sock_put is not enough.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:06 -08:00
Denis V. Lunev
869e58f870 [NETNS]: Double free in netlink_release.
Netlink protocol table is global for all namespaces. Some netlink
protocols have been virtualized, i.e. they have per/namespace netlink
socket. This difference can easily lead to double free if more than 1
namespace is started. Count the number of kernel netlink sockets to
track that this table is not used any more.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:05 -08:00
Daniel Lezcano
7d460db953 [IPV6]: Fix ip6_frag ctl
Alexey Dobriyan reported an oops when unsharing the network
indefinitely inside a loop. This is because the ip6_frag is not per
namespace while the ctls are.

That happens at the fragment timer expiration:
inet_frag_secret_rebuild function is called and this one restarts the
timer using the value stored inside the sysctl field.

        "mod_timer(&f->secret_timer, now + f->ctl->secret_interval);"

When the network is unshared, ip6_frag.ctl is initialized with the new
sysctl instances, but ip6_frag has only one instance. A race in this
case will appear because f->ctl can be modified during the read access
in the timer callback.

Until the ip6_frag is not per namespace, I discard the assignation to
the ctl field of ip6_frags in ip6_frag_sysctl_init when the network
namespace is not the init net.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:04 -08:00
Roel Kluin
f59d978275 wireless: fix '!x & y' typo's
Fix priority mistakes similar to '!x & y'

Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-01-28 15:03:35 -08:00
Rami Rosen
191df5737e [BRIDGE]: Remove unused include of a header file in ebtables.c
In net/bridge/netfilter/ebtables.c,
- remove unused include of a header file (linux/tty.h) and remove the
  corresponding comment above it.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:48 -08:00
David S. Miller
66688ea7c8 [SCTP]: Fix build warning in sctp_sf_do_5_1C_ack().
Reported by Andrew Morton.

net/sctp/sm_statefuns.c: In function 'sctp_sf_do_5_1C_ack':
net/sctp/sm_statefuns.c:484: warning: 'error' may be used uninitialized in this function

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:48 -08:00
Daniel Lezcano
569d36452e [NETNS][DST] dst: pass the dst_ops as parameter to the gc functions
The garbage collection function receive the dst_ops structure as
parameter. This is useful for the next incoming patchset because it
will need the dst_ops (there will be several instances) and the
network namespace pointer (contained in the dst_ops).

The protocols which do not take care of the namespaces will not be
impacted by this change (expect for the function signature), they do
just ignore the parameter.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:46 -08:00
Eric Dumazet
a6501e080c [IPV4] FIB_HASH: Reduce memory needs and speedup lookups
Currently, sizeof(struct fib_alias) is 24 or 48 bytes on 32/64 bits
arches.

Because of SLAB_HWCACHE_ALIGN requirement, these are rounded to 32 and
64 bytes respectively.

This patch moves rcu to the end of fib_alias, and conditionally
defines it only for CONFIG_IP_FIB_TRIE.

We also remove SLAB_HWCACHE_ALIGN requirement for fib_alias and
fib_node objects because it is not necessary.

(BTW SLUB currently denies it for objects smaller than
cache_line_size() / 2, but not SLAB)

Finally, sizeof(fib_alias) go back to 16 and 32 bytes.

Then, we can embed one fib_alias on each fib_node, to favor locality.
Most of the time access to the fib_alias will be free because one
cache line contains both the list head (fn_alias) and (one of) the
list element.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:46 -08:00
Eric Dumazet
b59cfbf77d [FIB]: Fix rcu_dereference() abuses in fib_trie.c
node_parent() and tnode_get_child() currently use rcu_dereference().

These functions are called from both
- readers only paths (where rcu_dereference() is needed), and
- writer path (where rcu_dereference() is not needed)

To make explicit where rcu_dereference() is really needed, I
introduced new node_parent_rcu() and tnode_get_child_rcu() functions
which use rcu_dereference(), while node_parent() and tnode_get_child()
dont use it.

Then I changed calling sites where rcu_dereference() was really needed
to call the _rcu() variants.

This should have no impact but for alpha architecture, and may help
future sparse checks.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:45 -08:00
Eric Dumazet
95b7d924a5 [ROSE]: Supress sparse warnings
CHECK   net/rose/af_rose.c
net/rose/af_rose.c:125:11: warning: expensive signed divide
net/rose/af_rose.c:976:46: warning: expensive signed divide
net/rose/af_rose.c:1379:13: warning: context imbalance in 'rose_info_start' - wrong count at exit
net/rose/af_rose.c:1406:13: warning: context imbalance in 'rose_info_stop' - unexpected unlock
  CHECK   net/rose/rose_in.c
net/rose/rose_in.c:185:25: warning: expensive signed divide
  CHECK   net/rose/rose_route.c
net/rose/rose_route.c:997:46: warning: expensive signed divide
net/rose/rose_route.c:1070:13: warning: context imbalance in 'rose_node_start' - wrong count at exit
net/rose/rose_route.c:1093:13: warning: context imbalance in 'rose_node_stop' - unexpected unlock
net/rose/rose_route.c:1146:13: warning: context imbalance in 'rose_neigh_start' - wrong count at exit
net/rose/rose_route.c:1169:13: warning: context imbalance in 'rose_neigh_stop' - unexpected unlock
net/rose/rose_route.c:1229:13: warning: context imbalance in 'rose_route_start' - wrong count at exit
net/rose/rose_route.c:1252:13: warning: context imbalance in 'rose_route_stop' - unexpected unlock

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:44 -08:00
Eric Dumazet
5c17d5f112 [ATM]: Suppress some sparse warnings
CHECK   net/atm/br2684.c
net/atm/br2684.c:665:13: warning: context imbalance in 'br2684_seq_start' - wrong count at exit
net/atm/br2684.c:676:13: warning: context imbalance in 'br2684_seq_stop' - unexpected unlock
  CHECK   net/atm/lec.c
net/atm/lec.c:196:23: warning: expensive signed divide
  CHECK   net/atm/proc.c
net/atm/proc.c:151:14: warning: context imbalance in 'vcc_seq_start' - wrong count at exit
net/atm/proc.c:154:13: warning: context imbalance in 'vcc_seq_stop' - unexpected unlock

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:43 -08:00
Eric Dumazet
ca629f2472 [APPLETALK]: Annotations to clear sparse warnings
CHECK   net/appletalk/aarp.c
net/appletalk/aarp.c:951:14: warning: context imbalance in 'aarp_seq_start' - wrong count at exit
net/appletalk/aarp.c:977:13: warning: context imbalance in 'aarp_seq_stop' - unexpected unlock
  CHECK   net/appletalk/atalk_proc.c
net/appletalk/atalk_proc.c:34:11: warning: context imbalance in 'atalk_seq_interface_start' - wrong count at exit
net/appletalk/atalk_proc.c:54:13: warning: context imbalance in 'atalk_seq_interface_stop' - unexpected unlock
net/appletalk/atalk_proc.c:93:11: warning: context imbalance in 'atalk_seq_route_start' - wrong count at exit
net/appletalk/atalk_proc.c:113:13: warning: context imbalance in 'atalk_seq_route_stop' - unexpected unlock
net/appletalk/atalk_proc.c:161:11: warning: context imbalance in 'atalk_seq_socket_start' - wrong count at exit
net/appletalk/atalk_proc.c:178:13: warning: context imbalance in 'atalk_seq_socket_stop' - unexpected unlock

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:43 -08:00
Patrick McHardy
c71e916708 [NETFILTER]: nf_conntrack: make print_conntrack function optional for l4protos
Allows to remove five empty implementations.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:42 -08:00
Patrick McHardy
c56cc9c07b [NETFILTER]: nf_conntrack: remove print_conntrack function from l3protos
Its unused and unlikely to ever be used.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:41 -08:00
Patrick McHardy
b334aadc3c [NETFILTER]: nf_conntrack: clean up a few header files
- Remove declarations of non-existing variables and functions
- Move helper init/cleanup function declarations to nf_conntrack_helper.h
- Remove unneeded __nf_conntrack_attach declaration and make it static

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:41 -08:00
Patrick McHardy
4f536522da [NETFILTER]: kill nf_sysctl.c
Since there now is generic support for shared sysctl paths, the only
remains are the net/netfilter and net/ipv4/netfilter paths. Move them
to net/netfilter/core.c and net/ipv4/netfilter.c and kill nf_sysctl.c.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:40 -08:00
Patrick McHardy
86c0bf4095 [NETFILTER]: nf_conntrack_sctp: remove timeout indirection
Instead of keeping pointers to the timeout values in a table, simply
put the timeout values in the table directly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:39 -08:00
Patrick McHardy
9b1c2cfd7a [NETFILTER]: nf_conntrack_sctp: replace magic value by symbolic constant
Use SCTP_CHUNK_FLAG_T instead of 0x1.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:39 -08:00
Patrick McHardy
4a64830af0 [NETFILTER]: nf_conntrack_sctp: don't take sctp_lock once per chunk
Don't take and release the lock once per SCTP chunk, simply hold it
the entire time while iterating through the chunks.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:37 -08:00
Patrick McHardy
efe9f68afe [NETFILTER]: nf_conntrack_sctp: rename "newconntrack" variable
The name is misleading, it holds the new connection state, so rename it
to "newstate". Also rename "oldsctpstate" to "oldstate" for consistency.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:36 -08:00
Patrick McHardy
b37e933ac7 [NETFILTER]: nf_conntrack_sctp: consolidate sctp_packet() error paths
Consolidate error paths and use proper symbolic return value instead
of magic values.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:36 -08:00
Patrick McHardy
8528819adc [NETFILTER]: nf_conntrack_sctp: reduce line length further
Eliminate a few lines over 80 characters by using a local variable to
hold the conntrack direction instead of using CTINFO2DIR everywhere.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:35 -08:00
Patrick McHardy
112f35c9c1 [NETFILTER]: nf_conntrack_sctp: reduce line length
Reduce the length of some overly long lines by renaming all
"conntrack" variables to "ct".

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:34 -08:00
Patrick McHardy
35c6d3cbe1 [NETFILTER]: nf_conntrack_sctp: use proper types for bitops
Use unsigned long instead of char for the bitmap and removed lots
of casts.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:34 -08:00
Patrick McHardy
5447d4777c [NETFILTER]: nf_conntrack_sctp: basic cleanups
Reindent switch cases properly, get rid of weird constructs like "!(x == y)",
put logical operations on the end of the line instead of the next line, get
rid of superfluous braces.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:33 -08:00
Patrick McHardy
2d6462869f [NETFILTER]: nf_conntrack_tcp: remove timeout indirection
Instead of keeping pointers to the timeout values in a table, simply
put the timeout values in the table directly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:32 -08:00
Patrick McHardy
a5e73c29d9 [NETFILTER]: nf_conntrack_{tcp,sctp}: shrink state table
The TCP and SCTP conntrack state transition tables only holds
small numbers, but gcc uses 4 byte per entry for the enum. Switching
to an u8 reduces the size from 480 to 120 bytes for TCP and from
576 to 144 bytes for SCTP.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:32 -08:00
Patrick McHardy
77e2420b85 [NETFILTER]: nf_conntrack_{tcp,sctp}: mark state table const
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:31 -08:00
Denys Vlasenko
9ba99b0d3f [NETFILTER]: ipt_REJECT: properly handle IP options
The current TCP RST construction reuses the old packet and can't
deal with IP options as a consequence of that. Construct the
RST from scratch instead.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:30 -08:00
Denys Vlasenko
022748a935 [NETFILTER]: {ip,ip6}_tables: remove some inlines
This patch removes inlines except those which are used
by packet matching code and thus are performance-critical.

Before:

$ size */*/*/ip*tables*.o
   text    data     bss     dec     hex filename
   6402     500      16    6918    1b06 net/ipv4/netfilter/ip_tables.o
   7130     500      16    7646    1dde net/ipv6/netfilter/ip6_tables.o

After:

$ size */*/*/ip*tables*.o
   text    data     bss     dec     hex filename
   6307     500      16    6823    1aa7 net/ipv4/netfilter/ip_tables.o
   7010     500      16    7526    1d66 net/ipv6/netfilter/ip6_tables.o

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:29 -08:00
Jan Engelhardt
1a50c5a1fe [NETFILTER]: xt_iprange match, revision 1
Adds IPv6 support to xt_iprange, making it possible to match on IPv6
address ranges with ip6tables.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:28 -08:00
Jan Engelhardt
f72e25a897 [NETFILTER]: Rename ipt_iprange to xt_iprange
This patch moves ipt_iprange to xt_iprange, in preparation for adding
IPv6 support to xt_iprange.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:27 -08:00
Jan Engelhardt
2ae15b64e6 [NETFILTER]: Update modules' descriptions
Updates the MODULE_DESCRIPTION() tags for all Netfilter modules,
actually describing what the module does and not just
"netfilter XYZ target".

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:26 -08:00
Jan Engelhardt
917b6fbd6e [NETFILTER]: xt_policy: use the new union nf_inet_addr
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:25 -08:00
Jan Engelhardt
57de0abbff [NETFILTER]: xt_pkttype: IPv6 multicast address recognition
Signed-off-by: Jan Engelhart <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:25 -08:00
Jan Engelhardt
13b0e83b5b [NETFILTER]: xt_pkttype: Add explicit check for IPv4
In the PACKET_LOOPBACK case, the skb data was always interpreted as
IPv4, but that is not valid for IPv6, obviously. Fix this by adding an
extra condition to check for AF_INET.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:24 -08:00
Jan Engelhardt
17b0d7ef65 [NETFILTER]: xt_mark match, revision 1
Introduces the xt_mark match revision 1. It uses fixed types,
eventually obsoleting revision 0 some day (uses nonfixed types).

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:23 -08:00
Jan Engelhardt
64eb12f997 [NETFILTER]: xt_conntrack match, revision 1
Introduces the xt_conntrack match revision 1. It uses fixed types, the
new nf_inet_addr and comes with IPv6 support, thereby completely
superseding xt_state.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:23 -08:00
Jan Engelhardt
96e3227265 [NETFILTER]: xt_connmark match, revision 1
Introduces the xt_connmark match revision 1. It uses fixed types,
eventually obsoleting revision 0 some day (uses nonfixed types).
(Unfixed types like "unsigned long" do not play well with mixed
user-/kernelspace "bitness", e.g. 32/64, as is common on SPARC64,
and need extra compat code.)

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:21 -08:00
Jan Engelhardt
e0a812aea5 [NETFILTER]: xt_MARK target, revision 2
Introduces the xt_MARK target revision 2. It uses fixed types, and
also uses the more expressive XOR logic.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:21 -08:00
Jan Engelhardt
0dc8c76029 [NETFILTER]: xt_CONNMARK target, revision 1
Introduces the xt_CONNMARK target revision 1. It uses fixed types, and
also uses the more expressive XOR logic. Futhermore, it allows to
selectively pick bits from both the ctmark and the nfmark in the SAVE
and RESTORE operations.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:20 -08:00
Jan Engelhardt
cdfe8b9797 [NETFILTER]: xt_TOS: Properly set the TOS field
Fix incorrect mask value passed to ipv4_change_dsfield/ipv6_change_dsfield.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:18 -08:00
Jan Engelhardt
9bb268ed7c [NETFILTER]: xt_TOS: Change semantic of mask value
This patch changes the behavior of xt_TOS v1 so that the mask value
the user supplies means "zero out these bits" rather than "keep these
bits". This is more easy on the user, as (I would assume) people keep
more bits than zeroing, so, an example:

	Action:     Set bit 0x01.
    	before (&): iptables -j TOS --set-tos 0x01/0xFE
    	after (&~): iptables -j TOS --set-tos 0x01/0x01

This is not too "tragic" with xt_TOS, but where larger fields are used
(e.g. proposed xt_MARK v2), `--set-xmar 0x01/0x01` vs. `--set-xmark
0x01/0xFFFFFFFE` really makes a difference. Other target(!) modules,
such as xt_TPROXY also use &~ rather than &, so let's get to a common
ground.

(Since xt_TOS has not yet left the development tree en direction to
mainline, the semantic can be changed as proposed without breaking
iptables.)

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:18 -08:00
Jan Engelhardt
11fa2aa362 [NETFILTER]: remove ipt_TOS.c
Commit 88c85d81f74f92371745158aebc5cbf490412002 forgot to remove the
old ipt_TOS file (whose code has been merged into xt_DSCP). Remove
it now.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:17 -08:00
Patrick McHardy
8ce22fcab4 [NETFILTER]: Remove some EXPERIMENTAL dependencies
Most of the netfilter modules are not considered experimental anymore,
the only ones I want to keep marked as EXPERIMENTAL are:

- TCPOPTSTRIP target, which is brand new.

- SANE helper, which is quite new.

- CLUSTERIP target, which I believe hasn't had much testing despite
  being in the kernel for quite a long time.

- SCTP match and conntrack protocol, which are a mess and need to
  be reviewed and cleaned up before I would trust them.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:16 -08:00
Patrick McHardy
b26e76b7ce [NETFILTER]: Hide a few more options under NETFILTER_ADVANCED
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:16 -08:00
Stephen Hemminger
7f9b80529b [IPV4]: fib hash|trie initialization
Initialization of the slab cache's should be done when IP is
initialized to make sure of available memory, and that code can be
marked __init.

Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:15 -08:00
Stephen Hemminger
d717a9a620 [IPV4] fib_trie: size and statistics
Show number of entries in trie, the size field was being set but never used,
but it only counted leaves, not all entries. Refactor the two cases in
fib_triestat_seq_show into a single routine.

Note: the stat structure was being malloc'd but the stack usage isn't so
high (288 bytes) that it is worth the additional complexity.

Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:14 -08:00
Eric Dumazet
28d36e3702 [FIB]: Avoid using static variables without proper locking
fib_trie_seq_show() uses two helper functions, rtn_scope() and
rtn_type() that can write to static storage without locking.

Just pass to them a temporary buffer to avoid potential corruption
(probably not triggerable but still...)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:13 -08:00
Denis V. Lunev
39a6d06300 [NETNS]: Process inet_confirm_addr in the correct namespace.
inet_confirm_addr can be called with NULL in_dev from arp_ignore iff
scope is RT_SCOPE_LINK.

Lets always pass the device and check for RT_SCOPE_LINK scope inside
inet_confirm_addr. This let us take network namespace from in_device a
need for an additional argument.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:13 -08:00
Denis V. Lunev
9bd85e3264 [IPV4]: Remove extra argument from arp_ignore.
arp_ignore has two arguments: dev & in_dev. dev is used for
inet_confirm_addr calling only.

inet_confirm_addr, in turn, either gets in_dev from the device passed
or iterates over all network devices if the device passed is NULL. It
seems logical to directly pass in_dev into inet_confirm_addr.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:12 -08:00
Denis V. Lunev
06f0511df1 [ARP]: neigh_parms_put(destroy) are essentially local to core/neighbour.c.
Make them static.

[ Moved the inline before, instead of after, call sites. -DaveM ]

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:11 -08:00
Denis V. Lunev
14db4133d5 [ARP]: Remove forward declaration of neigh_changeaddr.
No need for this. It is declared in the neighbour.h

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:11 -08:00
Denis V. Lunev
486b51d370 [ARP]: Remove overkill checks from neigh_param_alloc.
Valid network device is always passed into neigh_param_alloc, so
remove extra checking for dev == NULL. Additionally, cleanup bogus
netns assignment.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:10 -08:00
Denis V. Lunev
72132c1b6c [IPV4]: fib_rules_unregister is essentially void.
fib_rules_unregister is called only after successful register and the
return code is never checked.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:09 -08:00
Denis V. Lunev
2db82b534b [NETNS]: Make arp code network namespace consistent.
Some calls in the arp.c have network namespace as an argument. Getting
init_net inside these functions is simply inconsistent. Fix this.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:08 -08:00
Denis V. Lunev
a79878f00d [ARP]: Move inet_addr_type call after simple error checks in arp_contructor.
The neighbour entry will be destroyed in the case of error, so it is
pointless to perform constly routing table lookup in this case.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:08 -08:00
Pavel Emelyanov
a308da1627 [NETNS][RAW]: Create the /proc/net/raw(6) in each namespace.
To do so, just register the proper subsystem and create files in
->init callbacks.

No other special per-namespace handling for raw sockets is required.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:07 -08:00
Pavel Emelyanov
e5ba31f11f [NETNS][RAW]: Eliminate explicit init_net references.
Happily, in all the rest places (->bind callbacks only), that require the
struct net, we have a socket, so get the net from it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:06 -08:00
Pavel Emelyanov
f51d599fbe [NETNS][RAW]: Make /proc/net/raw(6) show per-namespace socket list.
Pull the struct net pointer up to the showing functions
to filter the sockets depending on their namespaces.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:06 -08:00
Pavel Emelyanov
be185884b3 [NETNS][RAW]: Make ipv[46] raw sockets lookup namespaces aware.
This requires just to pass the appropriate struct net pointer
into __raw_v[46]_lookup and skip sockets that do not belong
to a needed namespace.

The proper net is get from skb->dev in all the cases.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:05 -08:00
Eric Dumazet
8d96544475 [FIB]: full_children & empty_children should be uint, not ushort
If declared as unsigned short, these fields can overflow, and whole
trie logic is broken. I could not make the machine crash, but some
tnode can never be freed.

Note for 64 bit arches : By reordering t_key and parent in [node,
leaf, tnode] structures, we can use 32 bits hole after t_key so that
sizeof(struct tnode) doesnt change after this patch.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:04 -08:00
Eric Dumazet
f16f3026db [AX25]: sparse cleanups
net/ax25/ax25_route.c:251:13: warning: context imbalance in
'ax25_rt_seq_start' - wrong count at exit
net/ax25/ax25_route.c:276:13: warning: context imbalance in 'ax25_rt_seq_stop'
- unexpected unlock
net/ax25/ax25_std_timer.c:65:25: warning: expensive signed divide
net/ax25/ax25_uid.c:46:1: warning: symbol 'ax25_uid_list' was not declared.
Should it be static?
net/ax25/ax25_uid.c:146:13: warning: context imbalance in 'ax25_uid_seq_start'
- wrong count at exit
net/ax25/ax25_uid.c:169:13: warning: context imbalance in 'ax25_uid_seq_stop'
- unexpected unlock
net/ax25/af_ax25.c:573:28: warning: expensive signed divide
net/ax25/af_ax25.c:1865:13: warning: context imbalance in 'ax25_info_start' -
wrong count at exit
net/ax25/af_ax25.c:1888:13: warning: context imbalance in 'ax25_info_stop' -
unexpected unlock
net/ax25/ax25_ds_timer.c:133:25: warning: expensive signed divide

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:03 -08:00
Eric Dumazet
6bf1574ee3 [X25]: Avoid divides and sparse warnings
CHECK   net/x25/af_x25.c
net/x25/af_x25.c:117:46: warning: expensive signed divide
   CHECK   net/x25/x25_facilities.c
net/x25/x25_facilities.c:209:30: warning: expensive signed divide
   CHECK   net/x25/x25_in.c
net/x25/x25_in.c:250:26: warning: expensive signed divide
   CHECK   net/x25/x25_proc.c
net/x25/x25_proc.c:48:11: warning: context imbalance in 'x25_seq_route_start'
- wrong count at exit
net/x25/x25_proc.c:72:13: warning: context imbalance in 'x25_seq_route_stop' -
unexpected unlock
net/x25/x25_proc.c:112:11: warning: context imbalance in
'x25_seq_socket_start' - wrong count at exit
net/x25/x25_proc.c:129:13: warning: context imbalance in 'x25_seq_socket_stop'
- unexpected unlock
net/x25/x25_proc.c:190:11: warning: context imbalance in
'x25_seq_forward_start' - wrong count at exit
net/x25/x25_proc.c:215:13: warning: context imbalance in
'x25_seq_forward_stop' - unexpected unlock
   CHECK   net/x25/x25_subr.c
net/x25/x25_subr.c:362:57: warning: expensive signed divide

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:03 -08:00
Eric Dumazet
4dde4610c4 [IPV4] fib_trie: removes a memset() call in tnode_new()
tnode_alloc() already clears allocated memory, using kcalloc() or
alloc_pages(GFP_KERNEL|__GFP_ZERO, ...)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:02 -08:00
David S. Miller
88ebc72f68 [IPV4] FIB: Include nexthop device indexes in fib_info hashfn.
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:01 -08:00
Ilpo Järvinen
d88c305a03 [PKT_SCHED] HTB: htb_classid is dead static inline
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:59 -08:00
Ilpo Järvinen
8519660b98 [NET] core/utils.c: digit2bin is dead static inline
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:58 -08:00
Eric Dumazet
112d8cfcbf [FIB]: Reduce text size of net/ipv4/fib_trie.o
In struct tnode, we use two fields of 5 bits for 'pos' and 'bits'.
Switching to plain 'unsigned char' (8 bits) take the same space
because of compiler alignments, and reduce text size by 435 bytes
on i386.

On i386 :
$ size net/ipv4/fib_trie.o.before_patch net/ipv4/fib_trie.o
    text    data     bss     dec     hex filename
   13714       4      64   13782    35d6 net/ipv4/fib_trie.o.before
   13279       4      64   13347    3423 net/ipv4/fib_trie.o

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:58 -08:00
Ilpo Järvinen
b9aed45507 [NETFILTER] xt_policy.c: kill some bloat
net/netfilter/xt_policy.c:
  policy_mt | -906
 1 function changed, 906 bytes removed, diff: -906

net/netfilter/xt_policy.c:
  match_xfrm_state | +427
 1 function changed, 427 bytes added, diff: +427

net/netfilter/xt_policy.o:
 2 functions changed, 427 bytes added, 906 bytes removed, diff: -479

Alternatively, this could be done by combining identical
parts of the match_policy_in/out()

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:57 -08:00
Stephen Hemminger
c95aaf9af5 [IPV4] fib_trie: Fix sparse warnings.
Make FIB TRIE go through sparse checker without warnings.

Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:56 -08:00
Stephen Hemminger
66a2f7fd2f [IPV4] fib_trie: Add statistics.
The FIB TRIE code has a bunch of statistics, but the code is hidden
behind an ifdef that was never implemented. Since it was dead code, it
was broken as well.

This patch fixes that by making it a config option.

Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:56 -08:00
Stephen Hemminger
a6db901092 [IPV4] FIB: printk related cleanups
printk related cleanups:
 * Get rid of unused printk wrappers.
 * Make bug checks into KERN_WARNING because KERN_DEBUG gets ignored
 * Turn one cryptic old message into something real
 * Make sure all messages have KERN_XXX

Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:55 -08:00
Stephen Hemminger
fea86ad812 [IPV4] fib_trie: fib_insert_node cleanup
The only error from fib_insert_node is if memory allocation fails, so
instead of passing by reference, just use the convention of returning
NULL.

Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:54 -08:00
Stephen Hemminger
187b5188a7 [IPV4] fib_trie: Use %u for unsigned printfs.
Use %u instead of %d when printing unsigned values.

Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:53 -08:00
Stephen Hemminger
93e4308b3b [IPV4] fib_trie: Get rid of unused revision element.
The revision element must of been part of an earlier design, because
currently it is set but never used.

Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:53 -08:00
Stephen Hemminger
c28a1cf448 [IPV4] fib_trie: Get rid of trie_init().
trie_init is worthless it is just zeroing stuff that is already zero!
Move the memset() down to make it obvious.

Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:52 -08:00
Ilpo Järvinen
6db105db95 [PKTGEN]: uninline getCurUs
net/core/pktgen.c:
  pktgen_stop_device   |  -50
  pktgen_run           | -105
  pktgen_if_show       |  -37
  pktgen_thread_worker | -702
 4 functions changed, 894 bytes removed, diff: -894

net/core/pktgen.c:
  getCurUs |  +36
 1 function changed, 36 bytes added, diff: +36

net/core/pktgen.o:
 5 functions changed, 36 bytes added, 894 bytes removed, diff: -858

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:51 -08:00
Ilpo Järvinen
56e252c748 [PKTGEN]: Kill dead static inlines
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:51 -08:00
Ilpo Järvinen
3f25252675 [NETLINK] af_netlink: kill some bloat
net/netlink/af_netlink.c:
  netlink_realloc_groups        |  -46
  netlink_insert                |  -49
  netlink_autobind              |  -94
  netlink_clear_multicast_users |  -48
  netlink_bind                  |  -55
  netlink_setsockopt            |  -54
  netlink_release               |  -86
  netlink_kernel_create         |  -47
  netlink_change_ngroups        |  -56
 9 functions changed, 535 bytes removed, diff: -535

net/netlink/af_netlink.c:
  netlink_table_ungrab |  +53
 1 function changed, 53 bytes added, diff: +53

net/netlink/af_netlink.o:
 10 functions changed, 53 bytes added, 535 bytes removed, diff: -482

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:50 -08:00
Ilpo Järvinen
50eb431d6e [IPV6] route: kill some bloat
net/ipv6/route.c:
  ip6_pkt_prohibit_out | -130
  ip6_pkt_discard      | -261
  ip6_pkt_discard_out  | -130
  ip6_pkt_prohibit     | -261
 4 functions changed, 782 bytes removed, diff: -782

net/ipv6/route.c:
  ip6_pkt_drop | +300
 1 function changed, 300 bytes added, diff: +300

net/ipv6/route.o:
 5 functions changed, 300 bytes added, 782 bytes removed, diff: -482

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:49 -08:00
Ilpo Järvinen
1486cbd777 [XFRM] xfrm_policy: kill some bloat
net/xfrm/xfrm_policy.c:
  xfrm_audit_policy_delete | -692
  xfrm_audit_policy_add    | -692
 2 functions changed, 1384 bytes removed, diff: -1384

net/xfrm/xfrm_policy.c:
  xfrm_audit_common_policyinfo | +704
 1 function changed, 704 bytes added, diff: +704

net/xfrm/xfrm_policy.o:
 3 functions changed, 704 bytes added, 1384 bytes removed, diff: -680

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:49 -08:00
Ilpo Järvinen
cea14e0ed6 [TCP]: Uninline tcp_is_cwnd_limited
net/ipv4/tcp_cong.c:
  tcp_reno_cong_avoid |  -65
 1 function changed, 65 bytes removed, diff: -65

net/ipv4/arp.c:
  arp_ignore |   -5
 1 function changed, 5 bytes removed, diff: -5

net/ipv4/tcp_bic.c:
  bictcp_cong_avoid |  -57
 1 function changed, 57 bytes removed, diff: -57

net/ipv4/tcp_cubic.c:
  bictcp_cong_avoid |  -61
 1 function changed, 61 bytes removed, diff: -61

net/ipv4/tcp_highspeed.c:
  hstcp_cong_avoid |  -63
 1 function changed, 63 bytes removed, diff: -63

net/ipv4/tcp_hybla.c:
  hybla_cong_avoid |  -85
 1 function changed, 85 bytes removed, diff: -85

net/ipv4/tcp_htcp.c:
  htcp_cong_avoid |  -57
 1 function changed, 57 bytes removed, diff: -57

net/ipv4/tcp_veno.c:
  tcp_veno_cong_avoid |  -52
 1 function changed, 52 bytes removed, diff: -52

net/ipv4/tcp_scalable.c:
  tcp_scalable_cong_avoid |  -61
 1 function changed, 61 bytes removed, diff: -61

net/ipv4/tcp_yeah.c:
  tcp_yeah_cong_avoid |  -75
 1 function changed, 75 bytes removed, diff: -75

net/ipv4/tcp_illinois.c:
  tcp_illinois_cong_avoid |  -54
 1 function changed, 54 bytes removed, diff: -54

net/dccp/ccids/ccid3.c:
  ccid3_update_send_interval |   -7
  ccid3_hc_tx_packet_recv    |   +7
 2 functions changed, 7 bytes added, 7 bytes removed, diff: +0

net/ipv4/tcp_cong.c:
  tcp_is_cwnd_limited |  +88
 1 function changed, 88 bytes added, diff: +88

built-in.o:
 14 functions changed, 95 bytes added, 642 bytes removed, diff: -547

...Again some gcc artifacts visible as well.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:48 -08:00
Ilpo Järvinen
490d504693 [TCP]: Uninline tcp_set_state
net/ipv4/tcp.c:
  tcp_close_state | -226
  tcp_done        | -145
  tcp_close       | -564
  tcp_disconnect  | -141
 4 functions changed, 1076 bytes removed, diff: -1076

net/ipv4/tcp_input.c:
  tcp_fin               |  -86
  tcp_rcv_state_process | -164
 2 functions changed, 250 bytes removed, diff: -250

net/ipv4/tcp_ipv4.c:
  tcp_v4_connect | -209
 1 function changed, 209 bytes removed, diff: -209

net/ipv4/arp.c:
  arp_ignore |   +5
 1 function changed, 5 bytes added, diff: +5

net/ipv6/tcp_ipv6.c:
  tcp_v6_connect | -158
 1 function changed, 158 bytes removed, diff: -158

net/sunrpc/xprtsock.c:
  xs_sendpages |   -2
 1 function changed, 2 bytes removed, diff: -2

net/dccp/ccids/ccid3.c:
  ccid3_update_send_interval |   +7
 1 function changed, 7 bytes added, diff: +7

net/ipv4/tcp.c:
  tcp_set_state | +238
 1 function changed, 238 bytes added, diff: +238

built-in.o:
 12 functions changed, 250 bytes added, 1695 bytes removed, diff: -1445

I've no explanation why some unrelated changes seem to occur
consistently as well (arp_ignore, ccid3_update_send_interval;
I checked the arp_ignore asm and it seems to be due to some
reordered of operation order causing some extra opcodes to be
generated). Still, the benefits are pretty obvious from the
codiff's results.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:47 -08:00
Daniel Lezcano
389f661224 [NETNS][IPV6]: inet6_addr - make ipv6_chk_home_addr namespace aware
Looks if the address is belonging to the network namespace, otherwise
discard the address for the check.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:46 -08:00
Daniel Lezcano
1cab3da6be [NETNS][IPV6]: inet6_addr - ipv6_get_ifaddr namespace aware
The inet6_addr_lst is browsed taking into account the network
namespace specified as parameter. If an address does not belong
to the specified namespace, it is ignored.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:45 -08:00
Daniel Lezcano
06bfe655e7 [NETNS][IPV6]: inet6_addr - ipv6_chk_same_addr namespace aware
This patch makes ipv6_chk_same_addr function to be aware of the
network namespace. The addresses not belonging to the network
namespace are discarded.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:45 -08:00
Daniel Lezcano
bfeade0870 [NETNS][IPV6]: inet6_addr - check ipv6 address per namespace
When a new address is added, we must check if the new address does not
already exists.  This patch makes this check to be aware of a network
namespace, so the check will look if the address already exists for
the specified network namespace. While the addresses are browsed, the
addresses which do not belong to the namespace are discarded.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:44 -08:00
Daniel Lezcano
3c40090a0f [NETNS][IPV6]: inet6_addr - isolate inet6 addresses from proc file
Make /proc/net/if_inet6 show only inet6 addresses belonging to the
namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:43 -08:00
David S. Miller
9993e7d313 [TCP]: Do not purge sk_forward_alloc entirely in tcp_delack_timer().
Otherwise we beat heavily on the global tcp_memory atomics
when all of the sockets in the system are slowly sending
perioding packet clumps.

Noticed and suggested by Eric Dumazet.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:42 -08:00
Pavel Emelyanov
e186932b3d [NETNS]: Use the per-net ipv6_devconf(_all) in sysctl handlers
Actually the net->ipv6.devconf_all can be used in a few places,
but to keep the /proc/sys/net/ipv6/conf/ sysctls work consistently
in the namespace we should use the per-net devconf_all in the
sysctl "forwarding" handler.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:41 -08:00
Pavel Emelyanov
441fc2a239 [NETNS]: Use the per-net ipv6_devconf_dflt
All its users are in net/ipv6/addrconf.c's sysctl handlers.
Since they already have the struct net to get from, the
per-net ipv6_devconf_dflt can already be used.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:40 -08:00
Pavel Emelyanov
e0da5a480c [NETNS]: Create ipv6 devconf-s for namespaces
This is the core. Declare and register the pernet subsys for
addrconf. The init callback the will create the devconf-s.

The init_net will reuse the existing statically declared confs,
so that accessing them from inside the ipv6 code will still
work.

The register_pernet_subsys() is moved above the ipv6_add_dev()
call for loopback, because this function will need the
net->devconf_dflt pointer to be already set.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:40 -08:00
Pavel Emelyanov
bff16c2f99 [NETNS]: Make the ctl-tables per-namespace
This includes passing the net to __addrconf_sysctl_register
and saving this on the ctl_table->extra2 to be used in
handlers (those, needing it).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:39 -08:00
Pavel Emelyanov
9589731220 [NETNS]: Make the __addrconf_sysctl_register return an error
This error code will be needed to abort the namespace
creation if needed.

Probably, this is to be checked when a new device is
created (currently it is ignored).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:38 -08:00
Pavel Emelyanov
408c4768cd [NETNS]: Clean out the ipv6-related sysctls creation/destruction
The addrconf sysctls and neigh sysctls are registered and
unregistered always in pairs, so they can be joined into
one (well, two) functions, that accept the struct inet6_dev
and do all the job.

This also get rids of unneeded ifdefs inside the code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:38 -08:00
Denis V. Lunev
4250846146 [NEIGH]: Make /proc/net/arp opening consistent with seq_net_open semantics
seq_open_net requires that first field of the seq->private data to be
struct seq_net_private. In reality this is a single pointer to a
struct net for now. The patch makes code consistent.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:37 -08:00
Denis V. Lunev
ae22120ad8 [ATM]: Simplify /proc/net/atm/arp opening
The iterator state->ns.neigh_sub_iter initialization is moved from
arp_seq_open to clip_seq_start for convinience. This should not be a
problem as the iterator will be used only after the seq_start
callback.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:36 -08:00
Denis V. Lunev
e5d69b9f4a [ATM]: Oops reading net/atm/arp
cat /proc/net/atm/arp causes the NULL pointer dereference in the
get_proc_net+0xc/0x3a. This happens as proc_get_net believes that the
parent proc dir entry contains struct net.

Fix this assumption for "net/atm" case.

The problem is introduced by the commit c0097b07abf5f92ab135d024dd41bd2aada1512f
from Eric W. Biederman/Daniel Lezcano.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:36 -08:00
Denis V. Lunev
8cced9eff1 [NETNS]: Enable routing configuration in non-initial namespace.
I.e. remove the net != &init_net checks from the places, that now can
handle other-than-init net namespace.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:35 -08:00
Denis V. Lunev
226b0b4a51 [NETNS]: Replace init_net with the correct context in fib_frontend.c
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:34 -08:00
Denis V. Lunev
1bad118a33 [NETNS]: Pass namespace through ip_rt_ioctl.
... up to rtentry_to_fib_config

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:34 -08:00
Denis V. Lunev
4b5d47d4d3 [NETNS]: Correctly fill fib_config data.
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:33 -08:00
Denis V. Lunev
6bd48fcf73 [NETNS]: Provide correct namespace for fibnl netlink socket.
This patch makes the netlink socket to be per namespace. That allows
to have each namespace its own socket for routing queries.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:32 -08:00
Denis V. Lunev
e4aef8aea3 [NETNS]: Place fib tables into netns.
The preparatory work has been done. All we need is to substitute
fib_table_hash with net->ipv4.fib_table_hash. Netns context is
available when required.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:31 -08:00
Denis V. Lunev
e4e4971c5f [NETNS]: Namespacing IPv4 fib rules.
The final trick for rules: place fib4_rules_ops into struct net and
modify initialization path for this.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:31 -08:00
Denis V. Lunev
1c340b2fd7 [NETNS]: Show routing information from correct namespace (fib_trie.c)
This is the second part (for the CONFIG_IP_FIB_TRIE case) of the patch
#4, where we have created proc files in namespaces.

Now we can dump correct info in them.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:30 -08:00
Denis V. Lunev
6e04d01dfa [NETNS]: Show routing information from correct namespace (fib_hash.c)
This is the second part (for the CONFIG_IP_FIB_HASH case) of the patch
#4, where we have created proc files in namespaces.

Now we can dump correct info in them.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:29 -08:00
Denis V. Lunev
4d1169c1e7 [NETNS]: Add netns to nl_info structure.
nl_info is used to track the end-user destination of routing change
notification. This is a natural object to hold a namespace on. Place
it there and utilize the context in the appropriate places.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:29 -08:00
Eric W. Biederman
6b175b26c1 [NETNS]: Add netns parameter to inet_(dev_)add_type.
The patch extends the inet_addr_type and inet_dev_addr_type with the
network namespace pointer. That allows to access the different tables
relatively to the network namespace.

The modification of the signature function is reported in all the
callers of the inet_addr_type using the pointer to the well known
init_net.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:27 -08:00
Denis V. Lunev
8ad4942cd5 [NETNS]: Add netns parameter to fib_get_table/fib_new_table.
This patch extends the fib_get_table and the fib_new_table functions
with the network namespace pointer. That will allow to access the
table relatively from the network namespace.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:27 -08:00
Denis V. Lunev
93456b6d77 [IPV4]: Unify access to the routing tables.
Replace the direct pointers to local and main tables with
calls to fib_get_table() with appropriate argument.

This doesn't introduce additional dereferences, but makes the access to fib
tables uniform in any (CONFIG_IP_MULTIPLE_TABLES) case.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:26 -08:00
Denis V. Lunev
7b1a74fdbb [NETNS]: Refactor fib initialization so it can handle multiple namespaces.
This patch makes the fib to be initialized as a subsystem for the
network namespaces. The code does not handle several namespaces yet,
so in case of a creation of a network namespace, the
creation/initialization will not occur.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:25 -08:00
Denis V. Lunev
dbb50165b5 [IPV4]: Check fib4_rules_init failure.
This adds error paths into both versions of fib4_rules_init
(with/without CONFIG_IP_MULTIPLE_TABLES) and returns error code to the
caller.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:25 -08:00
Denis V. Lunev
61a0265344 [NETNS]: Add namespace to API for routing /proc entries creation.
This adds netns parameter to fib_proc_init/exit and replaces __init
specifier with __net_init. After this, we will not yet have these proc
files show info from the specific namespace - this will be done when
these tables become namespaced.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:24 -08:00
Denis V. Lunev
5fd30ee7c4 [NETNS]: Namespacing in the generic fib rules code.
Move static rules_ops & rules_mod_lock to the struct net, register the
pernet subsys to init them and enjoy the fact that the core rules
infrastructure works in the namespace.

Real IPv4 fib rules virtualization requires fib tables support in the
namespace and will be done seriously later in the patchset.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:23 -08:00
Denis V. Lunev
868d13ac81 [NETNS]: Pass fib_rules_ops into default_pref method.
fib_rules_ops contains operations and the list of configured rules. ops will
become per/namespace soon, so we need them to be known in the default_pref
callback.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:22 -08:00
Denis V. Lunev
f8c26b8d58 [NETNS]: Add netns parameter to fib_rules_(un)register.
The patch extends the different fib rules API in order to pass the
network namespace pointer. That will allow to access the different
tables from a namespace relative object. As usual, the pointer to the
init_net variable is passed as parameter so we don't break the
network.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:21 -08:00
Daniel Lezcano
41a76906b3 [NETNS][IPV6]: Make icmpv6_time sysctl per namespace.
This patch moves the icmpv6_time sysctl to the network namespace
structure.

Because the ipv6 protocol is not yet per namespace, the variable is
accessed relatively to the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:20 -08:00
Daniel Lezcano
4990509f19 [NETNS][IPV6]: Make sysctls route per namespace.
All the sysctl concerning the routes are moved to the network
namespace structure. A helper function is called to initialize the
variables.

Because the ipv6 protocol is not yet per namespace, the variables are
accessed relatively from the network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:20 -08:00
Daniel Lezcano
7c76509d0d [NETNS][IPV6]: Make mld_max_msf readonly in other namespaces.
The mld_max_msf protects the system with a maximum allowed multicast
source filters. Making this variable per namespace can be potentially
an problem if someone inside a namespace set it to a big value, that
will impact the whole system including other namespaces.

I don't see any benefits to have it per namespace for now, so in order
to keep a directory entry in a newly created namespace, I make it
read-only when we are not in the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:19 -08:00
Daniel Lezcano
e71e0349eb [NETNS][IPV6]: Make ip6_frags per namespace.
The ip6_frags is moved to the network namespace structure.  Because
there can be multiple instances of the network namespaces, and the
ip6_frags is no longer a global static variable, a helper function has
been added to facilitate the initialization of the variables.

Until the ipv6 protocol is not per namespace, the variables are
accessed relatively from the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:18 -08:00
Daniel Lezcano
99bc9c4e45 [NETNS][IPV6]: Make bindv6only sysctl per namespace.
This patch moves the bindv6only sysctl to the network namespace
structure. Until the ipv6 protocol is not per namespace, the sysctl
variable is always from the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:18 -08:00
Daniel Lezcano
760f2d0186 [NETNS][IPV6]: Make multiple instance of sysctl tables.
Each network namespace wants its own set of sysctl value, eg. we
should not be able from a namespace to set a sysctl value for another
namespace , especially for the initial network namespace.

This patch duplicates the sysctl table when we register a new network
namespace for ipv6. The duplicated table are postfixed with the
"template" word to notify the developper the table is cloned.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:17 -08:00
Daniel Lezcano
89918fc270 [NETNS][IPV6]: Make the ipv6 sysctl to be a netns subsystem.
The initialization of the sysctl for the ipv6 protocol is changed to a
network namespace subsystem. That means when a new network namespace
is created the initialization function for the sysctl will be called.

That do not change the behavior of the sysctl in case of the kernel
with the network namespace disabled.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:16 -08:00
Daniel Lezcano
81c1c17804 [NETNS][IPV6]: Make a subsystem for af_inet6.
This patch add a network namespace subsystem for the af_inet6 module.
It does nothing right now, but one of its purpose is to receive the
different variables for sysctl in order to initialize them.

When the sysctl variable will be moved to the network namespace
structure, they will be no longer initialized as global static
variables, so we must find a place to initialize them. Because the
sysctl can be disabled, it has no sense to store them in the
sysctl_net_ipv6 file.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:15 -08:00
Daniel Lezcano
291480c09a [NETNS][IPV6]: Make ipv6_sysctl_register to return a value.
This patch makes the function ipv6_sysctl_register to return a
value. The af_inet6 init function is now able to handle an error and
catch it from the initialization of the sysctl.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:14 -08:00
Sebastian Siewior
50dd79653e [XFRM]: Remove ifdef crypto.
and select the crypto subsystem if neccessary

Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:12 -08:00
Rami Rosen
06eaa1a01d [BRIDGE]: Remove unused macros from ebt_vlan.c
Remove two unused macros, INV_FLAG and SET_BITMASK
from net/bridge/netfilter/ebt_vlan.c.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:11 -08:00
Pavel Emelyanov
b3fd3ffe39 [NETFILTER]: Use the ctl paths instead of hand-made analogue
The conntracks subsystem has a similar infrastructure
to maintain ctl_paths, but since we already have it
on the generic level, I think it's OK to switch to
using it.

So, basically, this patch just replaces the ctl_table-s
with ctl_path-s, nf_register_sysctl_table with
register_sysctl_paths() and removes no longer needed code.

After this the net/netfilter/nf_sysctl.c file contains
the paths only.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:11 -08:00
Pavel Emelyanov
3d7cc2ba62 [NETFILTER]: Switch to using ctl_paths in nf_queue and conntrack modules
This includes the most simple cases for netfilter.

The first part is tne queue modules for ipv4 and ipv6,
on which the net/ipv4/ and net/ipv6/ paths are reused
from the appropriate ipv4 and ipv6 code.

The conntrack module is also patched, but this hunk is
very small and simple.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:10 -08:00
Pavel Emelyanov
c6995bdff0 [AX25]: Switch to using ctl_paths.
This one is almost the same as the hunks in the
first patch, but ax25 tables are created dynamically.

So this patch differs a bit to handle this case.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:09 -08:00
Pavel Emelyanov
3151a9ab04 [DECNET]: Switch to using ctl_paths.
The decnet includes two places to patch. The first one is
the net/decnet table itself, and it is patched just like
other subsystems in the first patch in this series.

The second place is a bit more complex - it is the
net/decnet/conf/xxx entries,. similar to those in
ipv4/devinet.c and ipv6/addrconf.c. This code is made similar
to those in ipv[46].

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:09 -08:00
Pavel Emelyanov
90754f8ec0 [IPVS]: Switch to using ctl_paths.
The feature of ipvs ctls is that the net/ipv4/vs path
is common for core ipvs ctls and for two schedulers,
so I make it exported and re-use it in modules.

Two other .c files required linux/sysctl.h to make the
extern declaration of this path compile well.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:08 -08:00
Pavel Emelyanov
b5ccd792fa [NET]: Simple ctl_table to ctl_path conversions.
This patch includes many places, that only required
replacing the ctl_table-s with appropriate ctl_paths
and call register_sysctl_paths().

Nothing special was done with them.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:07 -08:00
Rami Rosen
cb7928a528 [IPV4]: Remove unsupported DNAT (RTCF_NAT and RTCF_NAT) in IPV4
- The DNAT (Destination NAT) is not implemented in IPV4.

- This patch remove the code which checks these flags
in net/ipv4/arp.c and net/ipv4/route.c.

The RTCF_NAT and RTCF_NAT should stay in the header (linux/in_route.h)
because they are used in DECnet.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:07 -08:00
Julia Lawall
4cec72c890 [TIPC]: Use tipc_port_unlock
The file net/tipc/port.c takes a lock using the function tipc_port_lock and
then releases the lock sometimes using tipc_port_unlock and sometimes using
spin_unlock_bh(p_ptr->publ.lock).  tipc_port_unlock simply does the
spin_unlock_bh, but it seems cleaner to use it everywhere.

The problem was fixed using the following semantic patch.
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
struct port *p_ptr;
@@

   p_ptr = tipc_port_lock(...)
   ...
(
   p_ptr = tipc_port_lock(...);
|
?- spin_unlock_bh(p_ptr->publ.lock);
+  tipc_port_unlock(p_ptr);
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Jon Paul Maloy <maloy@donjonn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:05 -08:00
Ivo van Doorn
cdcb006fbe mac80211: Add radio led trigger
Some devices have a seperate LED which indicates if the radio is
enabled or not. This adds a LED trigger to mac80211 where drivers
can hook into when they are interested in radio status changes.

v2: Check hw.conf.radio_enabled when calling start().

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:04 -08:00
Andrew Lutomirski
df26e7ea04 rc80211_pid should respect fixed rates.
I would argue that mac80211 should handle fixed rates outside the rate
control code, which would also allow them to take effect immediately
instead of during the rate control callback, but this is pretty close
to correct.

Signed-Off-By: Andy Lutomirski <luto@myrealbox.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:04 -08:00
Johannes Berg
4b475898ec mac80211: better rate control algorithm selection
This patch changes mac80211's Kconfig/Makefile to:
 * select between the PID and the SIMPLE rate control
   algorithm as default
 * always allow tri-state for the rate control algorithms,
   building those that are selected 'y' into the mac80211
   module (if that is a module, otherwise all into the kernel)
 * force the default rate control algorithm to be built into
   mac80211

It also makes both rate control algorithms proper modules again
with MODULE_LICENSE etc.

Only if EMBEDDED is the user allowed to select "NONE" as default
which will cause no algorithm to be selected, this will work
only when the driver brings one itself (e.g. iwlwifi drivers).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:02 -08:00
Ron Rindjunsky
688b88a488 mac80211: A-MPDU Rx handling DELBA requests
This patch opens the flow to DELBA management frames, and handles end
of A-MPDU session produced by this event.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:02 -08:00
Ron Rindjunsky
713647169e mac80211: A-MPDU Rx adding BAR handling capability
This patch adds the ability to handle Block Ack Request

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:01 -08:00
Ron Rindjunsky
b580781e03 mac80211: A-MPDU Rx handling aggregation reordering
This patch handles the reordering of the Rx A-MPDU.
This issue occurs when the sequence of the internal MPDUs is not in the
right order. such a case can be encountered for example when some MPDUs from
previous aggregations were recieved, while others failed, so current A-MPDU
will contain a mix of re-transmited MPDUs and new ones.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:00 -08:00
Ron Rindjunsky
16c5f15c73 mac80211: A-MPDU Rx MLME data initialization
This patch initialize A-MPDU MLME data for Rx sessions.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:00 -08:00
Ron Rindjunsky
07db218396 mac80211: A-MPDU Rx adding basic functionality
This patch adds the basic needed abilities and functions for A-MPDU Rx session
changed functions:
 - ieee80211_sta_process_addba_request - Rx A-MPDU initialization enabled
 - ieee80211_stop - stops all A-MPDU Rx in case interface goes down
added functions:
 - ieee80211_send_delba - used for sending out Del BA in A-MPDU sessions
 - ieee80211_sta_stop_rx_BA_session - stopping Rx A-MPDU session
 - sta_rx_agg_session_timer_expired - stops A-MPDU Rx use if load is too
low

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:59 -08:00
Ron Rindjunsky
5aae288061 mac80211: A-MPDU Rx add MLME structures
This patch adds the needed structures to describe the Rx aggregation MLME per STA
new:
 - struct tid_ampdu_rx: TID aggregation information (Rx)
 - struct sta_ampdu_mlme: MLME aggregation information for STA
changed:
 - struct sta_info: ampdu_mlme added to describe A-MPDU MLME per STA,
		    and timer_to_tid added to map timer id into TID

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:58 -08:00
Ron Rindjunsky
6368e4b18d mac80211: restructure __ieee80211_rx
This patch makes a separation between Rx frame pre-handling which stays in
__ieee80211_rx and Rx frame handlers, moving to __ieee80211_rx_handle_packet.
Although this separation has no affect in regular mode of operation, this kind
of mechanism will be used in A-MPDU frames reordering as it allows accumulation
of frames during pre-handling, dispatching them to later handling when necessary.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:57 -08:00
Johannes Berg
f704662fb7 mac80211: make rc_pid_fop_events static
No need to not be.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:56 -08:00
Stefano Brivio
1b507e7e53 rc80211-pid: fix definition of rate control interval
Fix the rate control interval definition. Thanks to Mattias Nissler for
spotting this out.

Cc: Mattias Nissler <mattias.nissler@gmx.de>
Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:55 -08:00
Johannes Berg
6d65f5db2f mac80211: remove misleading 'res' variable
When this function returns != CONTINUE, it needs to put the
station struct it has acquired. Hence, having this unused
variable is not just superfluous but also misleading.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:55 -08:00
Stefano Brivio
d439810bda rc80211-pid: pf_target tuning
Set a better value for percentage target for failed frames. The previous value
slowed down too much rate increases in case of permanently low activity. While
at it, increase readability.

Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:53 -08:00
Stefano Brivio
13e05aa631 rc80211-pid: fix sta_info refcounting
Fix a bug which caused uncorrect refcounting of PHYs in mac80211. Thanks to
Johannes Berg for spotting this out.

Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:53 -08:00
Stefano Brivio
fa44327c06 rc80211-pid: simplify and fix shift_adjust
Simplify and fix rate_control_pid_shift_adjust(). A bug prevented correct
mapping of sorted rates, and readability was seriously flawed.

Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:52 -08:00
Stefano Brivio
ca5fbca924 rc80211-pid: add kerneldoc for tunable parameters
Add a kerneldoc description for parameters which are tunable through debugfs.

Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:51 -08:00
Stefano Brivio
426706c079 rc80211-pid: export human-readable target_pf value to debugfs
Export the non-shifted target_pf value to debugfs, so that it's human-readable.

Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:50 -08:00
Helmut Schaa
69f817b654 mac80211: Restore rx.fc before every invocation of ieee80211_invoke_rx_handlers
This patch fixes a problem with rx handling on multiple interfaces. Especially
when using hardware-scanning and a wireless driver (i.e. iwlwifi) which is
able to receive data while scanning.

The rx handlers can modify the skb and the frame control field (see
ieee80211_rx_h_remove_qos_control) but since every interface gets its own
copy of the skb each should get its own copy of rx.fc too.

In my case the wlan0-interface did not remove the qos-control from the frame
because the corresponding flag in rx.fc was already removed while processing
the frame on the master interface. Therefore somehow corrupted frames were
passed to the userspace.

Signed-off-by: Helmut Schaa <hschaa@suse.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:50 -08:00
Eric Dumazet
6666351df9 [XFRM]: xfrm_state_clone() should be static, not exported
xfrm_state_clone() is not used outside of net/xfrm/xfrm_state.c
There is no need to export it.

Spoted by sparse checker.
   CHECK   net/xfrm/xfrm_state.c
net/xfrm/xfrm_state.c:1103:19: warning: symbol 'xfrm_state_clone' was not
declared. Should it be static?

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:49 -08:00
Eric Dumazet
40ccbf525e [PACKET]: Fix sparse warnings in af_packet.c
CHECK   net/packet/af_packet.c
net/packet/af_packet.c:1876:14: warning: context imbalance in 'packet_seq_start' - wrong count at exit
net/packet/af_packet.c:1888:13: warning: context imbalance in 'packet_seq_stop' - unexpected unlock

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:48 -08:00
Julia Lawall
67b23219ce [BLUETOOTH]: Use sockfd_put()
The function sockfd_lookup uses fget on the value that is stored in
the file field of the returned structure, so fput should ultimately be
applied to this value.  This can be done directly, but it seems better
to use the specific macro sockfd_put, which does the same thing.

The problem was fixed using the following semantic patch.
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
expression s;
@@

   s = sockfd_lookup(...)
   ...
+  sockfd_put(s);
?- fput(s->file);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:48 -08:00
WANG Cong
64c31b3f76 [XFRM] xfrm_policy_destroy: Rename and relative fixes.
Since __xfrm_policy_destroy is used to destory the resources
allocated by xfrm_policy_alloc. So using the name
__xfrm_policy_destroy is not correspond with xfrm_policy_alloc.
Rename it to xfrm_policy_destroy.

And along with some instances that call xfrm_policy_alloc
but not using xfrm_policy_destroy to destroy the resource,
fix them.

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:46 -08:00
Masahide NAKAMURA
d66e37a99d [XFRM] Statistics: Add outbound-dropping error.
o Increment PolError counter when flow_cache_lookup() returns
  errored pointer.

o Increment NoStates counter at larval-drop.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:45 -08:00
Ilpo Järvinen
a067d9ac39 [NET]: Remove obsolete comment
It seems that ip_build_xmit is no longer used in here and
ip_append_data is used.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:45 -08:00
Ilpo Järvinen
c4e18dade1 [CCID3]: Kill some bloat
Without a number of CONFIG.*DEBUG:

net/dccp/ccids/ccid3.c:
  ccid3_hc_tx_update_x          | -170
  ccid3_hc_tx_packet_sent       | -175
  ccid3_hc_tx_packet_recv       | -169
  ccid3_hc_tx_no_feedback_timer | -192
  ccid3_hc_tx_send_packet       | -144
 5 functions changed, 850 bytes removed, diff: -850

net/dccp/ccids/ccid3.c:
  ccid3_update_send_interval | +191
 1 function changed, 191 bytes added, diff: +191

net/dccp/ccids/ccid3.o:
 6 functions changed, 191 bytes added, 850 bytes removed, diff: -659

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:44 -08:00
Ilpo Järvinen
cf35f43e6e [XFRM]: Kill some bloat
net/xfrm/xfrm_state.c:
  xfrm_audit_state_delete          | -589
  xfrm_replay_check                | -542
  xfrm_audit_state_icvfail         | -520
  xfrm_audit_state_add             | -589
  xfrm_audit_state_replay_overflow | -523
  xfrm_audit_state_notfound_simple | -509
  xfrm_audit_state_notfound        | -521
 7 functions changed, 3793 bytes removed, diff: -3793

net/xfrm/xfrm_state.c:
  xfrm_audit_helper_pktinfo | +522
  xfrm_audit_helper_sainfo  | +598
 2 functions changed, 1120 bytes added, diff: +1120

net/xfrm/xfrm_state.o:
 9 functions changed, 1120 bytes added, 3793 bytes removed, diff: -2673

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:43 -08:00
Ilpo Järvinen
ad1b30b1c2 [IPVS]: Kill some bloat
net/ipv4/ipvs/ip_vs_xmit.c:
  ip_vs_icmp_xmit   | -638
  ip_vs_tunnel_xmit | -674
  ip_vs_nat_xmit    | -716
  ip_vs_dr_xmit     | -682
 4 functions changed, 2710 bytes removed, diff: -2710

net/ipv4/ipvs/ip_vs_xmit.c:
  __ip_vs_get_out_rt | +595
 1 function changed, 595 bytes added, diff: +595

net/ipv4/ipvs/ip_vs_xmit.o:
 5 functions changed, 595 bytes added, 2710 bytes removed, diff: -2115

Without some CONFIG.*DEBUGs:

net/ipv4/ipvs/ip_vs_xmit.o:
 5 functions changed, 383 bytes added, 1513 bytes removed, diff: -1130

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:43 -08:00
Ilpo Järvinen
bb5cf80e94 [NETFILTER]: Kill some supper dupper bloatry
/me awards the bloatiest-of-all-net/-.c-code award to
nf_conntrack_netlink.c, congratulations to all the authors :-/!

Hall of (unquestionable) fame (measured per inline, top 10 under
net/):
  -4496 ctnetlink_parse_tuple        netfilter/nf_conntrack_netlink.c
  -2165 ctnetlink_dump_tuples        netfilter/nf_conntrack_netlink.c
  -2115 __ip_vs_get_out_rt           ipv4/ipvs/ip_vs_xmit.c
  -1924 xfrm_audit_helper_pktinfo    xfrm/xfrm_state.c
  -1799 ctnetlink_parse_tuple_proto  netfilter/nf_conntrack_netlink.c
  -1268 ctnetlink_parse_tuple_ip     netfilter/nf_conntrack_netlink.c
  -1093 ctnetlink_exp_dump_expect    netfilter/nf_conntrack_netlink.c
  -1060 void ccid3_update_send_interval  dccp/ccids/ccid3.c
  -983  ctnetlink_dump_tuples_proto  netfilter/nf_conntrack_netlink.c
  -827  ctnetlink_exp_dump_tuple     netfilter/nf_conntrack_netlink.c

  (i386 / gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13) /
   allyesconfig except CONFIG_FORCED_INLINING)

...and I left < 200 byte gains as future work item.

After iterative inline removal, I finally have this:

net/netfilter/nf_conntrack_netlink.c:
  ctnetlink_exp_fill_info   | -1104
  ctnetlink_new_expect      | -1572
  ctnetlink_fill_info       | -1303
  ctnetlink_new_conntrack   | -2230
  ctnetlink_get_expect      | -341
  ctnetlink_del_expect      | -352
  ctnetlink_expect_event    | -1110
  ctnetlink_conntrack_event | -1548
  ctnetlink_del_conntrack   | -729
  ctnetlink_get_conntrack   | -728
 10 functions changed, 11017 bytes removed, diff: -11017

net/netfilter/nf_conntrack_netlink.c:
  ctnetlink_parse_tuple     | +419
  dump_nat_seq_adj          | +183
  ctnetlink_dump_counters   | +166
  ctnetlink_dump_tuples     | +261
  ctnetlink_exp_dump_expect | +633
  ctnetlink_change_status   | +460
 6 functions changed, 2122 bytes added, diff: +2122

net/netfilter/nf_conntrack_netlink.o:
 16 functions changed, 2122 bytes added, 11017 bytes removed, diff: -8895

Without a number of CONFIG.*DEBUGs, I got this:
net/netfilter/nf_conntrack_netlink.o:
 16 functions changed, 2122 bytes added, 11029 bytes removed, diff: -8907

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:41 -08:00
Eric Dumazet
2a75de0c1d [NETNS]: Should build with CONFIG_SYSCTL=n
Previous NETNS patches broke CONFIG_SYSCTL=n case

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:40 -08:00
Eric Dumazet
74feb6e84e [ICMP]: Avoid sparse warnings in net/ipv4/icmp.c
CHECK   net/ipv4/icmp.c
net/ipv4/icmp.c:249:13: warning: context imbalance in 'icmp_xmit_unlock' -
unexpected unlock
net/ipv4/icmp.c:376:13: warning: context imbalance in 'icmp_reply' - different
lock contexts for basic block
net/ipv4/icmp.c:430:6: warning: context imbalance in 'icmp_send' - different
lock contexts for basic block

Solution is to declare both icmp_xmit_lock() and icmp_xmit_unlock() as inline

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:37 -08:00
Eric Dumazet
65f7651788 [NET]: prot_inuse cleanups and optimizations
1) Cleanups (all functions are prefixed by sock_prot_inuse)

sock_prot_inc_use(prot) -> sock_prot_inuse_add(prot,-1)
sock_prot_dec_use(prot) -> sock_prot_inuse_add(prot,-1)
sock_prot_inuse()       -> sock_prot_inuse_get()

New functions :

sock_prot_inuse_init() and sock_prot_inuse_free() to abstract pcounter use.

2) if CONFIG_PROC_FS=n, we can zap 'inuse' member from "struct proto",
since nobody wants to read the inuse value.

This saves 1372 bytes on i386/SMP and some cpu cycles.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:36 -08:00
Eric Dumazet
789675e216 [NET]: Avoid divides in net/core/gen_estimator.c
We can void divides (as seen with CONFIG_CC_OPTIMIZE_FOR_SIZE=y on x86)
changing ((HZ<<idx)/4) to ((HZ/4) << idx)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:35 -08:00
Ilpo Järvinen
e870a8efcd [TCP]: Perform setting of common control fields in one place
In case of segments which are purely for control without any
data (SYN/ACK/FIN/RST), many fields are set to common values
in multiple places.

i386 results:

$ gcc --version
gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13)

$ codiff tcp_output.o.old tcp_output.o.new
net/ipv4/tcp_output.c:
  tcp_xmit_probe_skb    |  -48
  tcp_send_ack          |  -56
  tcp_retransmit_skb    |  -79
  tcp_connect           |  -43
  tcp_send_active_reset |  -35
  tcp_make_synack       |  -42
  tcp_send_fin          |  -48
 7 functions changed, 351 bytes removed

net/ipv4/tcp_output.c:
  tcp_init_nondata_skb |  +90
 1 function changed, 90 bytes added

tcp_output.o.mid:
 8 functions changed, 90 bytes added, 351 bytes removed, diff: -261

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:34 -08:00
Ilpo Järvinen
19773b4923 [TCP]: Urgent parameter effect can be simplified.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:33 -08:00
Ilpo Järvinen
f038ac8f9b [TCP]: cleanup tcp_parse_options deep indented switch
Removed case indentation level & combined some nested ifs, mostly
within 80 lines now. This is a leftover from indent patch, it
just had to be done manually to avoid messing it up completely.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:33 -08:00
Herbert Xu
dbb1db8b59 [IPSEC]: Return EOVERFLOW when output sequence number overflows
Previously we made it an error on the output path if the sequence number
overflowed.  However we did not set the err variable accordingly.  This
patch sets err to -EOVERFLOW in that case.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:32 -08:00
Eric Dumazet
9a429c4983 [NET]: Add some acquires/releases sparse annotations.
Add __acquires() and __releases() annotations to suppress some sparse
warnings.

example of warnings :

net/ipv4/udp.c:1555:14: warning: context imbalance in 'udp_seq_start' - wrong
count at exit
net/ipv4/udp.c:1571:13: warning: context imbalance in 'udp_seq_stop' -
unexpected unlock

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:31 -08:00
Eric Dumazet
680a5a5086 [PATCH] use SK_MEM_QUANTUM_SHIFT in __sk_mem_reclaim()
Avoid an expensive divide (as done in commit
18030477e70a826b91608aee40a987bbd368fec6 but lost in commit
23821d2653111d20e75472c8c5003df1a55309a8)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:27 -08:00
Ilpo Järvinen
d436d68630 [TCP]: Remove unnecessary local variable
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:26 -08:00
Ilpo Järvinen
409d22b470 [TCP]: Code duplication removal, added tcp_bound_to_half_wnd()
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:26 -08:00
Ilpo Järvinen
056834d9f6 [TCP]: cleanup tcp_{in,out}put.c style
These were manually selected from indent's results which as is
are too noisy to be of any use without human reason. In addition,
some extra newlines between function and its comment were removed
too.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:25 -08:00
Ilpo Järvinen
058dc3342b [TCP]: reduce tcp_output's indentation levels a bit
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:24 -08:00
Ilpo Järvinen
4828e7f49a [TCP]: Remove TCPCB_URG & TCPCB_AT_TAIL as unnecessary
The snd_up check should be enough. I suspect this has been
there to provide a minor optimization in clean_rtx_queue which
used to have a small if (!->sacked) block which could skip
snd_up check among the other work.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:23 -08:00
Ilpo Järvinen
cadbd0313b [TCP]: Dropped unnecessary skb/sacked accessing in reneging
SACK reneging can be precalculated to a FLAG in clean_rtx_queue
which has the right skb looked up. This will help a bit in
future because skb->sacked access will be changed eventually,
changing it already won't hurt any.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:23 -08:00
Ilpo Järvinen
90840defab [TCP]: Introduce tcp_wnd_end() to reduce line lengths
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:22 -08:00
Ilpo Järvinen
66f5fe624f [TCP]: Rename update_send_head & include related increment to it
There's very little need to have the packets_out incrementing in
a separate function. Also name the combined function
appropriately.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:21 -08:00
Ilpo Järvinen
3ccd3130b3 [TCP]: Make invariant check complain about invalid sacked_out
Earlier resolution for NewReno's sacked_out should now keep
it small enough for this to become invariant-like check.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:20 -08:00
Hideo Aoki
95766fff6b [UDP]: Add memory accounting.
Signed-off-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: Hideo Aoki <haoki@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:19 -08:00
Hideo Aoki
3ab224be6d [NET] CORE: Introducing new memory accounting interface.
This patch introduces new memory accounting functions for each network
protocol. Most of them are renamed from memory accounting functions
for stream protocols. At the same time, some stream memory accounting
functions are removed since other functions do same thing.

Renaming:
	sk_stream_free_skb()		->	sk_wmem_free_skb()
	__sk_stream_mem_reclaim()	->	__sk_mem_reclaim()
	sk_stream_mem_reclaim()		->	sk_mem_reclaim()
	sk_stream_mem_schedule 		->    	__sk_mem_schedule()
	sk_stream_pages()      		->	sk_mem_pages()
	sk_stream_rmem_schedule()	->	sk_rmem_schedule()
	sk_stream_wmem_schedule()	->	sk_wmem_schedule()
	sk_charge_skb()			->	sk_mem_charge()

Removeing
	sk_stream_rfree():	consolidates into sock_rfree()
	sk_stream_set_owner_r(): consolidates into skb_set_owner_r()
	sk_stream_mem_schedule()

The following functions are added.
    	sk_has_account(): check if the protocol supports accounting
	sk_mem_uncharge(): do the opposite of sk_mem_charge()

In addition, to achieve consolidation, updating sk_wmem_queued is
removed from sk_mem_charge().

Next, to consolidate memory accounting functions, this patch adds
memory accounting calls to network core functions. Moreover, present
memory accounting call is renamed to new accounting call.

Finally we replace present memory accounting calls with new interface
in TCP and SCTP.

Signed-off-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: Hideo Aoki <haoki@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:18 -08:00
Gui Jianfeng
a06b494b61 [IPV6]: Remove useless code from fib6_del_route().
There are useless codes in fib6_del_route(). The following patch has
been tested, every thing looks fine, as usual.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:17 -08:00
Chas Williams
fb64c735a5 [ATM]: [br2864] whitespace cleanup
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:14 -08:00
Eric Kinzie
097b19a998 [ATM]: [br2864] routed support
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:13 -08:00
Kay Sievers
ef39592f78 [ATM]: Convert struct class_device to struct device
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
2008-01-28 15:00:12 -08:00
Robert P. J. Day
6fe5452b3b [ATM]: atm is no longer experimental
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:11 -08:00
Herbert Xu
9dd3245a2a [IPSEC]: Move all calls to xfrm_audit_state_icvfail to xfrm_input
Let's nip the code duplication in the bud :)

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:10 -08:00
Herbert Xu
0883ae0e55 [IPSEC]: Fix transport-mode async resume on intput without netfilter
When netfilter is off the transport-mode async resumption doesn't work
because we don't push back the IP header.  This patch fixes that by
moving most of the code outside of ifdef NETFILTER since the only part
that's not common is the short-circuit in the protocol handler.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:10 -08:00
Herbert Xu
fcb8c156c8 [IPSEC]: Fix double free on skb on async output
When the output transform returns EINPROGRESS due to async operation we'll
free the skb the straight away as if it were an error.  This patch fixes
that so that the skb is freed when the async operation completes.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:09 -08:00
Ilpo Järvinen
c776ee01bd [TCP]: Remove seq_rtt ptr from clean_rtx_queue args
While checking Gavin's patch I noticed that the returned seq_rtt
is not used by the caller.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:07 -08:00
Ilpo Järvinen
0e3a4803aa [TCP]: Force TSO splits to MSS boundaries
If snd_wnd - snd_nxt wasn't multiple of MSS, skb was split on
odd boundary by the callers of tcp_window_allows.

We try really hard to avoid unnecessary modulos. Therefore the
old caller side check "if (skb->len < limit)" was too wide as
well because limit is not bound in any way to skb->len and can
cause spurious testing for trimming in the middle of the queue
while we only wanted that to happen at the tail of the queue.
A simple additional caller side check for tcp_write_queue_tail
would likely have resulted 2 x modulos because the limit would
have to be first calculated from window, however, doing that
unnecessary modulo is not mandatory. After a minor change to
the algorithm, simply determine first if the modulo is needed
at all and at that point immediately decide also from which
value it should be calculated from.

This approach also kills some duplicated code.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:06 -08:00
Michael Chan
7ffc49a6ee [ETH]: Combine format_addr() with print_mac().
print_mac() used many most net drivers and format_addr() used by
net-sysfs.c are very similar and they can be intergrated.

format_addr() is also identically redefined in the qla4xxx iscsi
driver.

Export a new function sysfs_format_mac() to be used by net-sysfs,
qla4xxx and others in the future.  Both print_mac() and
sysfs_format_mac() call _format_mac_addr() to do the formatting.

Changed print_mac() to use unsigned char * to be consistent with
net_device struct's dev_addr.  Added buffer length overrun checking
as suggested by Joe Perches.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:05 -08:00
Eric Dumazet
21371f768b [SOCK] Avoid divides in sk_stream_pages() and __sk_stream_mem_reclaim()
sk_forward_alloc being signed, we should take care of divides by
SK_STREAM_MEM_QUANTUM we do in sk_stream_pages() and
__sk_stream_mem_reclaim()

This patchs introduces SK_STREAM_MEM_QUANTUM_SHIFT, defined
as ilog2(SK_STREAM_MEM_QUANTUM), to be able to use right
shifts instead of plain divides.

This should help compiler to choose right shifts instead of
expensive divides (as seen with CONFIG_CC_OPTIMIZE_FOR_SIZE=y on x86)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:05 -08:00
Masahide NAKAMURA
b15c4bcd15 [XFRM]: Fix outbound statistics.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:04 -08:00
Eric W. Biederman
426b5303eb [NETNS]: Modify the neighbour table code so it handles multiple network namespaces
I'm actually surprised at how much was involved.  At first glance it
appears that the neighbour table data structures are already split by
network device so all that should be needed is to modify the user
interface commands to filter the set of neighbours by the network
namespace of their devices.

However a couple things turned up while I was reading through the
code.  The proxy neighbour table allows entries with no network
device, and the neighbour parms are per network device (except for the
defaults) so they now need a per network namespace default.

So I updated the two structures (which surprised me) with their very
own network namespace parameter.  Updated the relevant lookup and
destroy routines with a network namespace parameter and modified the
code that interacts with users to filter out neighbour table entries
for devices of other namespaces.

I'm a little concerned that we can modify and display the global table
configuration and from all network namespaces.  But this appears good
enough for now.

I keep thinking modifying the neighbour table to have per network
namespace instances of each table type would should be cleaner.  The
hash table is already dynamically sized so there are it is not a
limiter.  The default parameter would be straight forward to take care
of.  However when I look at the how the network table is built and
used I still find some assumptions that there is only a single
neighbour table for each type of table in the kernel.  The netlink
operations, neigh_seq_start, the non-core network users that call
neigh_lookup.  So while it might be doable it would require more
refactoring than my current approach of just doing a little extra
filtering in the code.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:03 -08:00
Paul Moore
e1af9f270b [XFRM]: Drop packets when replay counter would overflow
According to RFC4303, section 3.3.3 we need to drop outgoing packets which
cause the replay counter to overflow:

   3.3.3.  Sequence Number Generation

   The sender's counter is initialized to 0 when an SA is established.
   The sender increments the sequence number (or ESN) counter for this
   SA and inserts the low-order 32 bits of the value into the Sequence
   Number field.  Thus, the first packet sent using a given SA will
   contain a sequence number of 1.

   If anti-replay is enabled (the default), the sender checks to ensure
   that the counter has not cycled before inserting the new value in the
   Sequence Number field.  In other words, the sender MUST NOT send a
   packet on an SA if doing so would cause the sequence number to cycle.
   An attempt to transmit a packet that would result in sequence number
   overflow is an auditable event.  The audit log entry for this event
   SHOULD include the SPI value, current date/time, Source Address,
   Destination Address, and (in IPv6) the cleartext Flow ID.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:02 -08:00
Paul Moore
afeb14b490 [XFRM]: RFC4303 compliant auditing
This patch adds a number of new IPsec audit events to meet the auditing
requirements of RFC4303.  This includes audit hooks for the following events:

 * Could not find a valid SA [sections 2.1, 3.4.2]
   . xfrm_audit_state_notfound()
   . xfrm_audit_state_notfound_simple()

 * Sequence number overflow [section 3.3.3]
   . xfrm_audit_state_replay_overflow()

 * Replayed packet [section 3.4.3]
   . xfrm_audit_state_replay()

 * Integrity check failure [sections 3.4.4.1, 3.4.4.2]
   . xfrm_audit_state_icvfail()

While RFC4304 deals only with ESP most of the changes in this patch apply to
IPsec in general, i.e. both AH and ESP.  The one case, integrity check
failure, where ESP specific code had to be modified the same was done to the
AH code for the sake of consistency.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:01 -08:00
Eric Dumazet
dfd4f0ae2e [TCP]: Avoid two divides in __tcp_grow_window()
tcp_win_from_space() being signed, compiler might emit an integer divide
to compute tcp_win_from_space()/2 .

Using right shifts is OK here and less expensive.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:01 -08:00
Eric Dumazet
8beb5c5f12 [TCP]: Avoid a divide in tcp_mtu_probing()
tcp_mtu_to_mss() being signed, compiler might emit an integer divide
to compute tcp_mtu_to_mss()/2 .

Using a right shift is OK here and less expensive.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:00 -08:00
David S. Miller
829942c187 [TCP]: Move mss variable in tcp_mtu_probing()
Down into the only scope where it is used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:59 -08:00
Eric Dumazet
ce55dd3610 [TCP]: tcp_write_timeout.c cleanup
Before submiting a patch to change a divide to a right shift, I felt
necessary to create a helper function tcp_mtu_probing() to reduce length of
lines exceeding 100 chars in tcp_write_timeout().

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:58 -08:00
Eric Dumazet
b790cedd24 [INET]: Avoid an integer divide in rt_garbage_collect()
Since 'goal' is a signed int, compiler may emit an integer divide
to compute goal/2.

Using a right shift is OK here and less expensive.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:57 -08:00
YOSHIFUJI Hideaki
9cb5734e5b [TCP]: Convert several length variable to unsigned.
Several length variables cannot be negative, so convert int to
unsigned int.  This also allows us to do sane shift operations
on those variables.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:56 -08:00
John W. Linville
c40896de50 net/mac80211/Kconfig: whitespace corrections
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:55 -08:00
John W. Linville
d6084cb61d net/wireless/Kconfig: whitespace corrections
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:55 -08:00
Johannes Berg
c27f9830f3 mac80211: don't read ERP information from (re)association response
According to the standard, the field cannot be present, so don't
try to interpret it either.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Daniel Drake <dsd@gentoo.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:54 -08:00
Johannes Berg
176e4f8442 mac80211: move tx crypto decision
This patch moves the decision making about whether a frame is encrypted
with a certain algorithm up into the TX handlers rather than having it
in the crypto algorithm implementation.

This fixes a problem with the radiotap injection code where injecting
a non-data packet and requesting encryption could end up asking the
driver to encrypt a packet without giving it a key.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:53 -08:00
Johannes Berg
7bbdd2d987 mac80211: implement station stats retrieval
This implements the required cfg80211 callback in mac80211
to allow userspace to get station statistics.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:53 -08:00
Johannes Berg
fd5b74dcb8 cfg80211/nl80211: implement station attribute retrieval
After a station is added to the kernel's structures, userspace
has to be able to retrieve statistics about that station, especially
whether the station was idle and how much bytes were transferred
to and from it. This adds the necessary code to nl80211.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:52 -08:00
Johannes Berg
5727ef1b2e cfg80211/nl80211: station handling
This patch adds station handling to cfg80211/nl80211.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:51 -08:00
Johannes Berg
ed1b6cc7f8 cfg80211/nl80211: add beacon settings
This adds the necessary API to cfg80211/nl80211 to allow
changing beaconing settings.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:50 -08:00
Johannes Berg
62da92fb75 mac80211: support getting key sequence counters via cfg80211
This implements cfg80211's get_key() to allow retrieving the sequence
counter for a TKIP or CCMP key from userspace. It also cleans up and
documents the associated low-level driver interface.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:50 -08:00
Johannes Berg
e8cbb4cbeb mac80211: support adding/removing keys via cfg80211
This adds the necessary hooks to mac80211 to allow userspace
to edit keys with cfg80211 (through nl80211.)

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:49 -08:00
Johannes Berg
41ade00f21 cfg80211/nl80211: introduce key handling
This introduces key handling to cfg80211/nl80211. Default
and group keys can be added, changed and removed; sequence
counters for each key can be retrieved.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:48 -08:00
Johannes Berg
7d54d0ddd6 mac80211: allow easier multicast/broadcast buffering in hardware
There are various decisions influencing the decision whether to buffer
a frame for after the next DTIM beacon. The "do we have stations in PS
mode" condition cannot be tested by the driver so mac80211 has to do
that. To ease driver writing for hardware that can buffer frames until
after the next DTIM beacon, introduce a new txctl flag telling the
driver to buffer a specific frame.

While at it, restructure and comment the code for multicast buffering
and remove spurious "inline" directives.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:47 -08:00
Johannes Berg
4e20cb293c mac80211: make ieee80211_rx_mgmt_action static
The function is only used locally.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:46 -08:00
Johannes Berg
678f5f7117 mac80211: clean up eapol handling in TX path
The previous patch left only one user of the ieee80211_is_eapol()
function and that user can be eliminated easily by introducing
a new "frame is EAPOL" flag to handle the frame specially (we
already have this information) instead of doing the (expensive)
ieee80211_is_eapol() all the time.

Also, allow unencrypted frames to be sent when they are injected.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:46 -08:00
Johannes Berg
ce3edf6d0b mac80211: clean up eapol frame handling/port control
This cleans up the eapol frame handling and some related code in the
receive and transmit paths. After this patch
 * EAPOL frames addressed to us or the EAPOL group address are
   always accepted regardless of whether they are encrypted or not
 * other frames from a station are dropped if PAE is enabled and
   the station is not authorized
 * unencrypted frames (except the EAPOL frames above) are dropped if
   drop_unencrypted is enabled
 * some superfluous code that eth_type_trans handles anyway is gone
 * port control is done for transmitted packets

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:45 -08:00
Mattias Nissler
1946b74ce0 rc80211-pid: export tuning parameters through debugfs
This adds all the tunable parameters used by rc80211_pid to debugfs for easy
testing and tuning.

Signed-off-by: Mattias Nissler <mattias.nissler@gmx.de>
Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:44 -08:00
Mattias Nissler
12446c67fe rc80211-pid: add debugging
This adds a new debugfs file from which rate control relevant events can be
read one event per line. The output includes the current time, so graphs can be
created showing the rate control parameters. This helps in evaluating and
tuning rate control parameters. While at it, we split headers and code for
better readability.

Signed-off-by: Mattias Nissler <mattias.nissler@gmx.de>
Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:44 -08:00
Stefano Brivio
1dc4d1e6a1 rc80211-pid: add sharpening factor
This patch introduces a PID sharpening factor for faster response after
association and low activity events.

Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: Mattias Nissler <mattias.nissler@gmx.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:43 -08:00
Stefano Brivio
90d501d610 rc80211-pid: add rate behaviour learning algorithm
This patch introduces a learning algorithm in order for the PID controller
to learn how to map adjustment values to rates. This is better described in
code comments.

Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:42 -08:00
Stefano Brivio
c21b39aca4 mac80211: make PID rate control algorithm the default
This makes the new PID TX rate control algorithm the default instead of the
rc80211_simple rate control algorithm. The simple algorithm was flawed in
several ways: it wasn't responsive at all and didn't age the information it was
relying on properly. The PID algorithm allows us to tune characteristics such
as responsiveness by adjusting parameters and was found to generally behave
better.

The default algorithm can be overridden to select simple instead. Which
ever algorithm is the default is included as part of the mac80211
module automatically. The other algorithm (simple vs. pid) can
be selected for inclusion as well. If EMBEDDED is selected then
the choice is available to have no default specified and neither
algorithm included in mac80211. The default algorithm can be set
through a modparam.

While at it, mark rc80211-simple as deprecated, and schedule it
for removal.

Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:41 -08:00
Eric Dumazet
b92edbe0b8 [TCP] Avoid two divides in tcp_output.c
Because 'free_space' variable in __tcp_select_window() is signed,
expression (free_space / 2) forces compiler to emit an integer divide.

This can be changed to a plain right shift, less expensive.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:41 -08:00
Paul Moore
68277accb3 [XFRM]: Assorted IPsec fixups
This patch fixes a number of small but potentially troublesome things in the
XFRM/IPsec code:

 * Use the 'audit_enabled' variable already in include/linux/audit.h
   Removed the need for extern declarations local to each XFRM audit fuction

 * Convert 'sid' to 'secid' everywhere we can
   The 'sid' name is specific to SELinux, 'secid' is the common naming
   convention used by the kernel when refering to tokenized LSM labels,
   unfortunately we have to leave 'ctx_sid' in 'struct xfrm_sec_ctx' otherwise
   we risk breaking userspace

 * Convert address display to use standard NIP* macros
   Similar to what was recently done with the SPD audit code, this also also
   includes the removal of some unnecessary memcpy() calls

 * Move common code to xfrm_audit_common_stateinfo()
   Code consolidation from the "less is more" book on software development

 * Proper spacing around commas in function arguments
   Minor style tweak since I was already touching the code

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:40 -08:00
Masahide NAKAMURA
8ea843495d [XFRM]: Add packet processing statistics option.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:39 -08:00
Masahide NAKAMURA
0aa647746e [XFRM]: Support to increment packet dropping statistics.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:39 -08:00
Masahide NAKAMURA
558f82ef6e [XFRM]: Define packet dropping statistics.
This statistics is shown factor dropped by transformation
at /proc/net/xfrm_stat for developer.
It is a counter designed from current transformation source code
and defined as linux private MIB.

See Documentation/networking/xfrm_proc.txt for the detail.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:38 -08:00
Masahide NAKAMURA
9473e1f631 [XFRM] MIPv6: Fix to input RO state correctly.
Disable spin_lock during xfrm_type.input() function.
Follow design as IPsec inbound does.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:37 -08:00
Masahide NAKAMURA
a1b051405b [XFRM] IPv6: Fix dst/routing check at transformation.
IPv6 specific thing is wrongly removed from transformation at net-2.6.25.
This patch recovers it with current design.

o Update "path" of xfrm_dst since IPv6 transformation should
  care about routing changes. It is required by MIPv6 and
  off-link destined IPsec.
o Rename nfheader_len which is for non-fragment transformation used by
  MIPv6 to rt6i_nfheader_len as IPv6 name space.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:36 -08:00
Ilpo Järvinen
bd515c3e48 [TCP]: Fix TSO deferring
I'd say that most of what tcp_tso_should_defer had in between
there was dead code because of this.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:36 -08:00
Pavel Emelyanov
a43d8994b9 [NEIGH]: Make neigh_add_timer symmetrical to neigh_del_timer.
The neigh_del_timer() looks sane - it removes the timer and
(conditionally) puts the neighbor. I expected, that the
neigh_add_timer() is symmetrical to the del one - i.e. it
holds the neighbor and arms the timer - but it turned out
that it was not so.

I think, that making them look symmetrical makes the code
more readable.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:28 -08:00
Pavel Emelyanov
7054fb9376 [INET]: Uninline the inet_twsk_put function.
This one is not that big, but is widely used: saves 1200 bytes
from net/ipv4/built-in.o

add/remove: 1/0 grow/shrink: 1/12 up/down: 97/-1300 (-1203)
function                                     old     new   delta
inet_twsk_put                                  -      87     +87
__inet_lookup_listener                       274     284     +10
tcp_sacktag_write_queue                     2255    2254      -1
tcp_time_wait                                482     411     -71
__inet_check_established                     796     722     -74
tcp_v4_err                                   973     898     -75
__inet_twsk_kill                             230     154     -76
inet_twsk_deschedule                         180     103     -77
tcp_v4_do_rcv                                462     384     -78
inet_hash_connect                            686     607     -79
inet_twdr_do_twkill_work                     236     150     -86
inet_twdr_twcal_tick                         395     307     -88
tcp_v4_rcv                                  1744    1480    -264
tcp_timewait_state_process                   975     644    -331

Export it for ipv6 module.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:28 -08:00
Pavel Emelyanov
77a5ba55da [INET]: Uninline the __inet_lookup_established function.
This is -700 bytes from the net/ipv4/built-in.o

add/remove: 1/0 grow/shrink: 1/3 up/down: 340/-1040 (-700)
function                                     old     new   delta
__inet_lookup_established                      -     339    +339
tcp_sacktag_write_queue                     2254    2255      +1
tcp_v4_err                                  1304     973    -331
tcp_v4_rcv                                  2089    1744    -345
tcp_v4_do_rcv                                826     462    -364

Exporting is for dccp module (used via e.g. inet_lookup).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:27 -08:00
Pavel Emelyanov
152da81deb [INET]: Uninline the __inet_hash function.
This one is used in quite many places in the networking code and
seems to big to be inline.

After the patch net/ipv4/build-in.o loses ~650 bytes:
add/remove: 2/0 grow/shrink: 0/5 up/down: 461/-1114 (-653)
function                                     old     new   delta
__inet_hash_nolisten                           -     282    +282
__inet_hash                                    -     179    +179
tcp_sacktag_write_queue                     2255    2254      -1
__inet_lookup_listener                       284     274     -10
tcp_v4_syn_recv_sock                         755     493    -262
tcp_v4_hash                                  389      35    -354
inet_hash_connect                           1086     599    -487

This version addresses the issue pointed by Eric, that
while being inline this function was optimized by gcc
in respect to the 'listen_possible' argument.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:26 -08:00
Vlad Yasevich
d670119132 [SCTP]: Follow Add-IP security consideratiosn wrt INIT/INIT-ACK
The Security Considerations section of RFC 5061 has the following
text:

   If an SCTP endpoint that supports this extension receives an INIT
   that indicates that the peer supports the ASCONF extension but does
   NOT support the [RFC4895] extension, the receiver of such an INIT
   MUST send an ABORT in response.  Note that an implementation is
   allowed to silently discard such an INIT as an option as well, but
   under NO circumstance is an implementation allowed to proceed with
   the association setup by sending an INIT-ACK in response.

   An implementation that receives an INIT-ACK that indicates that the
   peer does not support the [RFC4895] extension MUST NOT send the
   COOKIE-ECHO to establish the association.  Instead, the
   implementation MUST discard the INIT-ACK and report to the upper-
   layer user that an association cannot be established destroying the
   Transmission Control Block (TCB).

Follow the recomendations.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:25 -08:00
Vlad Yasevich
75205f4783 [SCTP]: Implement ADD-IP special case processing for ABORT chunk
ADD-IP spec has a special case for processing ABORTs:
    F4) ... One special consideration is that ABORT
        Chunks arriving destined to the IP address being deleted MUST be
        ignored (see Section 5.3.1 for further details).

Check if the address we received on is in the DEL state, and if
so, ignore the ABORT.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:24 -08:00
Vlad Yasevich
f57d96b2e9 [SCTP]: Change use_as_src into a full address state
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:24 -08:00
Vlad Yasevich
a08de64d07 [SCTP]: Update ASCONF processing to conform to spec.
The processing of the ASCONF chunks has changed a lot in the
spec.  New items are:
    1. A list of ASCONF-ACK chunks is now cached
    2. The source of the packet is used in response.
    3. New handling for unexpect ASCONF chunks.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:23 -08:00
Vlad Yasevich
ba8a06daed [SCTP]: ADD-IP updates the states where ASCONFs can be sent
C4)  Both ASCONF and ASCONF-ACK Chunks MUST NOT be sent in any SCTP
        state except ESTABLISHED, SHUTDOWN-PENDING, SHUTDOWN-RECEIVED,
        and SHUTDOWN-SENT.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:22 -08:00
Vlad Yasevich
df21857714 [SCTP]: Update association lookup to look at ASCONF chunks as well
ADD-IP draft section 5.2 specifies that if an association can not
be found using the source and destination of the IP packet,
then, if the packet contains ASCONF chunks, the Address Parameter
TLV should be used to lookup an association.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:22 -08:00
Vlad Yasevich
d6de309759 [SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT
The ADD-IP "Set Primary IP Address" parameter is allowed in the
INIT/INIT-ACK exchange.  Allow processing of this parameter during
the INIT/INIT-ACK.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:21 -08:00
Vlad Yasevich
42e30bf346 [SCTP]: Handle the wildcard ADD-IP Address parameter
The Address Parameter in the parameter list of the ASCONF chunk
may be a wildcard address.  In this case special processing
is required.  For the 'add' case, the source IP of the packet is
added.  In the 'del' case, all addresses except the source IP
of packet are removed. In the "mark primary" case, the source
address is marked as primary.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:20 -08:00
Vlad Yasevich
6afd2e83cd [SCTP]: Discard unauthenticated ASCONF and ASCONF ACK chunks
Now that we support AUTH, discard unauthenticated ASCONF and ASCONF ACK
chunks as mandated in the ADD-IP spec.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:19 -08:00
Herbert Xu
195ad6a3ac [IPSEC]: Rename tunnel-mode functions to avoid collisions with tunnels
It appears that I've managed to create two different functions both
called xfrm6_tunnel_output.  This is because we have the plain tunnel
encapsulation named xfrmX_tunnel as well as the tunnel-mode encapsulation
which lives in the files xfrmX_mode_tunnel.c.

This patch renames functions from the latter to use the xfrmX_mode_tunnel
prefix to avoid name-space conflicts.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:18 -08:00
Mattias Nissler
ad01837593 mac80211: add PID controller based rate control algorithm
Add a new rate control algorithm based on a PID controller. It samples the
percentage of failed frames over time, feeds the result into the controller and
uses its output to control the TX rate.

Signed-off-by: Mattias Nissler <mattias.nissler@gmx.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:18 -08:00
Mattias Nissler
1abbe498e4 mac80211: clean up rate selection
Move some code out of rc80211_simple since it's probably needed for all rate
selection algorithms, and fix iwlwifi accordingly. While at it, clean up the
rate_control_get_rate() interface.

Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:17 -08:00
Ron Rindjunsky
98f0b0a3a4 mac80211: pass in PS_POLL frames
This patch fixes should_drop_frame function to pass in ps poll control
frames required for power save functioanlity. Interface types that do not
have interest for PS POLL frames now drop it in handler.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:16 -08:00
Herbert Xu
910ef70aa3 [IPSEC]: Do xfrm_state_check_space before encapsulation
While merging the IPsec output path I moved the encapsulation output
operation to the top of the loop so that it sits outside of the locked
section.  Unfortunately in doing so it now sits in front of the space
check as well which could be a fatal error.

This patch rearranges the calls so that the space check happens as
the thing on the output path.

This patch also fixes an incorrect goto should the encapsulation output
fail.

Thanks to Kazunori MIYAZAWA for finding this bug.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:13 -08:00
Patrick McHardy
33b8e77605 [NETFILTER]: Add CONFIG_NETFILTER_ADVANCED option
The NETFILTER_ADVANCED option hides lots of the rather obscure netfilter
options when disabled and provides defaults (M) that should allow to
run a distribution firewall without further thinking.

Defaults to 'y' to avoid breaking current configurations.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:12 -08:00
Patrick McHardy
34498825cb [NETFILTER]: non-power-of-two jhash optimizations
Apply Eric Dumazet's jhash optimizations where applicable. Quoting Eric:

Thanks to jhash, hash value uses full 32 bits. Instead of returning
hash % size (implying a divide) we return the high 32 bits of the
(hash * size) that will give results between [0 and size-1] and same
hash distribution.

On most cpus, a multiply is less expensive than a divide, by an order
of magnitude.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:11 -08:00
Eric Dumazet
7b21e09d1c [NETFILTER]: xt_hashlimit: reduce overhead without IPv6
This patch generalizes the (CONFIG_IP6_NF_IPTABLES || CONFIG_IP6_NF_IPTABLES_MODULE)
test done in hashlimit_init_dst() to all the xt_hashlimit module.

This permits a size reduction of "struct dsthash_dst". This saves memory and
cpu for IPV4 only hosts.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:11 -08:00
Eric Dumazet
e2f82ac3fc [NETFILTER]: xt_hashlimit: speedup hash_dst()
1) Using jhash2() instead of jhash() is a litle bit faster if applicable.

2) Thanks to jhash, hash value uses full 32 bits.
   Instead of returning hash % size (implying a divide)
   we return the high 32 bits of the (hash * size) that will
   give results between [0 and size-1] and same hash distribution.

  On most cpus, a multiply is less expensive than a divide, by an order
  of magnitude.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:10 -08:00
Jan Engelhardt
22c2d8bca2 [NETFILTER]: xt_connlimit: use the new union nf_inet_addr
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:09 -08:00
Jan Engelhardt
e79ec50b95 [NETFILTER]: Parenthesize macro parameters
Parenthesize macro parameters.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:08 -08:00
Jan Engelhardt
643a2c15a4 [NETFILTER]: Introduce nf_inet_address
A few netfilter modules provide their own union of IPv4 and IPv6
address storage. Will unify that in this patch series.

(1/4): Rename union nf_conntrack_address to union nf_inet_addr and
move it to x_tables.h.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:07 -08:00
Jan Engelhardt
df54aae022 [NETFILTER]: x_tables: use %u format specifiers
Use %u format specifiers as ->family is unsigned.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:07 -08:00
Patrick McHardy
051578ccbc [NETFILTER]: nf_nat: properly use RCU for ip_nat_decode_session
We need to use rcu_assign_pointer/rcu_dereference to avoid races.
Also remove an obsolete CONFIG_IP_NAT_NEEDED ifdef.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:06 -08:00
Patrick McHardy
1e796fda00 [NETFILTER]: constify nf_afinfo
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:05 -08:00
Patrick McHardy
76aa1ce139 [NETFILTER]: nfnetlink_log: include GID in netlink message
Similar to Maciej Soltysiak's ipt_LOG patch, include GID in addition
to UID in netlink message.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:04 -08:00
Patrick McHardy
0dfedd2874 [NETFILTER]: nfnetlink_log: use endianness-aware attribute functions
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:03 -08:00
Patrick McHardy
baab2ce7d2 [NETFILTER]: nfnetlink_{queue,log}: return proper error codes in instance_create
Currently we return EINVAL for "instance exists", "allocation failed" and
"module unloaded below us", which is completely inapproriate.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:02 -08:00
Patrick McHardy
1792bab4ca [NETFILTER]: nfnetlink_log: remove excessive debugging
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:02 -08:00
Patrick McHardy
cd21f0ac43 [NETFILTER]: nfnetlink_{queue,log}: return ENOTSUPP for unknown cfg commands
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:01 -08:00
Patrick McHardy
c0506365a9 [NETFILTER]: nfnetlink_log: fix checks in nfulnl_recv_config
Similar to the nfnetlink_queue fixes:

The peer_pid must be checked in all cases when a logging instance exists,
additionally we must check whether an instance exists before attempting
to configure it to avoid NULL ptr dereferences.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:00 -08:00
Patrick McHardy
a7c42955e0 [NETFILTER]: nf_log: remove incomprehensible comment
Whatever that comment tries to say, I don't get it and it looks like
a leftover from the time when RCU wasn't used properly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:00 -08:00
Patrick McHardy
7b2f9631e7 [NETFILTER]: nf_log: constify struct nf_logger and nf_log_packet loginfo arg
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:59 -08:00
Patrick McHardy
f01ffbd6e7 [NETFILTER]: nf_log: move logging stuff to seperate header
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:58 -08:00
Patrick McHardy
cc01dcbd26 [NETFILTER]: nf_nat: pass manip type instead of hook to nf_nat_setup_info
nf_nat_setup_info gets the hook number and translates that to the
manip type to perform. This is a relict from the time when one
manip per hook could exist, the exact hook number doesn't matter
anymore, its converted to the manip type. Most callers already
know what kind of NAT they want to perform, so pass the maniptype
in directly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:57 -08:00
Patrick McHardy
ce4b1cebdc [NETFILTER]: nf_nat: sprinkle a few __read_mostlys
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:57 -08:00
Patrick McHardy
2b628a0866 [NETFILTER]: nf_nat: mark NAT protocols const
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:56 -08:00
Patrick McHardy
3ee9e76038 [NETFILTER]: nf_nat_proto_gre: add missing module reference
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:55 -08:00
Patrick McHardy
d978e5daec [NETFILTER]: ctnetlink: fix expectation timeout dumping
When the timer is late its timeout might be before the current time,
in which case a very large value is dumped.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:55 -08:00
Patrick McHardy
77236b6e33 [NETFILTER]: ctnetlink: use netlink attribute helpers
Use NLA_PUT_BE32, nla_get_be32() etc.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:54 -08:00
Pablo Neira Ayuso
c7212e9d39 [NETFILTER]: nf_conntrack_sctp: add ctnetlink support
This patch adds support for SCTP to ctnetlink.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:52 -08:00
Pablo Neira Ayuso
37fccd8577 [NETFILTER]: ctnetlink: add support for secmark
This patch adds support for James Morris' connsecmark.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:52 -08:00
Pablo Neira Ayuso
0f417ce989 [NETFILTER]: ctnetlink: add support for master tuple event notification and dumping
This patch adds support for master tuple event notification and
dumping.  Conntrackd needs this information to recover related
connections appropriately.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:51 -08:00
Pablo Neira Ayuso
13eae15a24 [NETFILTER]: ctnetlink: add support for NAT sequence adjustments
The combination of NAT and helpers may produce TCP sequence adjustments.
In failover setups, this information needs to be replicated in order to
achieve a successful recovery of mangled, related connections. This patch is
particularly useful for conntrackd, see:

http://people.netfilter.org/pablo/conntrack-tools/

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:50 -08:00
Benjamin LaHaise
170080645d [NETFILTER]: xt_TCPMSS: don't allow netfilter --setmss to increase mss
When terminating DSL connections for an assortment of random customers, I've
found it necessary to use iptables to clamp the MSS used for connections to
work around the various ICMP blackholes in the greater net.  Unfortunately,
the current behaviour in Linux is imperfect and actually make things worse,
so I'm proposing the following: increasing the MSS in a packet can never be
a good thing, so make --set-mss only lower the MSS in a packet.

Yes, I am aware of --clamp-mss-to-pmtu, but it doesn't work for outgoing
connections from clients (ie web traffic), as it only looks at the PMTU on
the destination route, not the source of the packet (the DSL interfaces in
question have a 1442 byte MTU while the destination ethernet interface is
1500 -- there are problematic hosts which use a 1300 byte MTU).  Reworking
that is probably a good idea at some point, but it's more work than this is.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:50 -08:00
Patrick McHardy
d6a2ba07c3 [NETFILTER]: arp_tables: add compat support
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:49 -08:00
Patrick McHardy
11f6dff8af [NETFILTER]: arp_tables: resync get_entries() with ip_tables
Resync get_entries() with ip_tables.c by moving the checks from the
setsockopt handler to the function itself.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:48 -08:00
Patrick McHardy
41acd975b9 [NETFILTER]: arp_tables: move ARPT_SO_GET_INFO handling to seperate function
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:47 -08:00
Patrick McHardy
27e2c26b85 [NETFILTER]: arp_tables: move counter allocation to seperate function
More resyncing with ip_tables.c as preparation for compat support.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:47 -08:00
Patrick McHardy
fb5b6095f3 [NETFILTER]: arp_tables: move entry and target checks to seperate functions
Resync with ip_tables.c as preparation for compat support.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:46 -08:00
Patrick McHardy
70f0bfcf6a [NETFILTER]: arp_tables: remove ipchains compat hack
Remove compatiblity hack copied from ip_tables.c - ipchains didn't even
support arp_tables :)

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:45 -08:00
Patrick McHardy
197631201e [NETFILTER]: arp_tables: use vmalloc_node()
Use vmalloc_node() as in ip_tables.c.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:45 -08:00
Patrick McHardy
03dafbbdf8 [NETFILTER]: arp_tables: remove obsolete standard_check function
The size check is already performed by xt_check_target, no need
to do it again.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:43 -08:00
Patrick McHardy
6d6a55f42d [NETFILTER]: ip_tables: remove ipchains compatibility hack
ipchains support has been removed years ago. kill last remains.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:42 -08:00
Patrick McHardy
da4d0f6b3d [NETFILTER]: ip6_tables: use raw_smp_processor_id() in do_add_counters()
Use raw_smp_processor_id() in do_add_counters() as in ip_tables.c.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:41 -08:00
Patrick McHardy
b5dd674b2a [NETFILTER]: ip6_tables: fix stack leagage
Fix leakage of local variable on stack. This already got fixed in
ip_tables silently by the compat patches.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:40 -08:00
Patrick McHardy
c9d8fe1317 [NETFILTER]: {ip,ip6}_tables: fix format strings
Use %zu for sizeof() and remove casts.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:39 -08:00
Patrick McHardy
9c54795950 [NETFILTER]: {ip,ip6}_tables: reformat to eliminate differences
Reformat ip_tables.c and ip6_tables.c in order to eliminate non-functional
differences and minimize diff output.

This allows to get a view of the real differences using:

sed -e 's/IP6T/IPT/g' \
    -e 's/IP6/IP/g' \
    -e 's/INET6/INET/g' \
    -e 's/ip6t/ipt/g' \
    -e 's/ip6/ip/g' \
    -e 's/ipv6/ip/g' \
    -e 's/icmp6/icmp/g' \
    net/ipv6/netfilter/ip6_tables.c | \
    diff -wup /dev/stdin net/ipv4/netfilter/ip_tables.c

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:39 -08:00
Patrick McHardy
1fe5723773 [NETFILTER]: xt_MARK: add compat support for revision 0
Old userspace doesn't support revision 1, especially for IPv6, which
is only available in the SVN snapshot.

Add compat support for revision 0.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:38 -08:00
Patrick McHardy
311af5cbea [NETFILTER]: xt_MARK: support revision 1 for IPv6
The current netfilter SVN version includes support for this, so enable
it in the kernel as well.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:37 -08:00
Patrick McHardy
34f4c4295e [NETFILTER]: x_tables: enable compat translation for IPv6 matches/targets
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:37 -08:00
Patrick McHardy
3bc3fe5eed [NETFILTER]: ip6_tables: add compat support
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:36 -08:00
Patrick McHardy
d924357c50 [NETFILTER]: ip6_tables: resync get_entries() with ip_tables
Resync get_entries() with ip_tables.c by moving the checks from the
setsockopt handler to the function itself.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:35 -08:00
Patrick McHardy
433665c9d1 [NETFILTER]: ip6_tables: move IP6T_SO_GET_INFO handling to seperate function
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:35 -08:00
Patrick McHardy
ed1a6f5e77 [NETFILTER]: ip6_tables: move counter allocation to seperate function
More resyncing with ip_tables.c as preparation for compat support.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:34 -08:00
Patrick McHardy
3b84e92b0d [NETFILTER]: ip6_tables: use vmalloc_node()
Consistently use vmalloc_node for all counter allocations.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:33 -08:00
Patrick McHardy
f173c8a1f2 [NETFILTER]: ip6_tables: move entry, match and target checks to seperate functions
Resync with ip_tables.c as preparation for compat support.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:32 -08:00
Patrick McHardy
72f36ec14f [NETFILTER]: ip6_tables: kill a few useless defines/forward declarations
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:32 -08:00
Patrick McHardy
b386d9f596 [NETFILTER]: ip_tables: move compat offset calculation to x_tables
Its needed by ip6_tables and arp_tables as well.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:31 -08:00
Patrick McHardy
73cd598df4 [NETFILTER]: ip_tables: fix compat types
Use compat types and compat iterators when dealing with compat entries for
clarity. This doesn't actually make a difference for ip_tables, but is
needed for ip6_tables and arp_tables.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:30 -08:00
Patrick McHardy
30c08c41be [NETFILTER]: ip_tables: account for struct ipt_entry/struct compat_ipt_entry size diff
Account for size differences when dumping entries or calculating the
entry positions. This doesn't actually make any difference for IPv4
since the structures have the same size, but its logically correct
and needed for IPv6.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:30 -08:00
Patrick McHardy
8956695131 [NETFILTER]: x_tables: make xt_compat_match_from_user usable in iterator macros
Make xt_compat_match_from_user return an int to make it usable in the
*tables iterator macros and kill a now unnecessary wrapper function.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:28 -08:00
Patrick McHardy
4b4782486d [NETFILTER]: ip_tables: reformat compat code
The compat code has some very odd formating, clean it up before porting
it to ip6_tables.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:27 -08:00
Patrick McHardy
ac8e27fd89 [NETFILTER]: ip_tables: kill useless wrapper
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:27 -08:00
Dan Williams
374fdfbc67 introduce WEXT scan capabilities
Introduce scan capabilities to WEXT so that userspace can do intelligent
things with scan behavior such as handling hidden SSIDs more gracefully.
If the driver reports a specific scan capability, the driver must
respect the options specified in the iw_scan_req structure when handling
the SIOCSIWSCAN call, unless it's mode or state does not allow it to do
so, in which case it must return an error.

This version switches to Dave Kilroy's suggestion of claiming unused
padding space for the scan_capa field.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:25 -08:00
Johannes Berg
c49e5ea322 mac80211: conditionally include timestamp in radiotap information
This makes mac80211 include the low-level MAC timestamp
in the radiotap header if the driver indicated (by a new
RX flag) that the timestamp is valid.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:25 -08:00
Gerrit Renker
a07a5a86d0 [DCCP]: Remove unused inline function
The function follows48(), which is a special-case of dccp_delta_seqno(),
is nowhere used in the DCCP code, thus removed by this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:24 -08:00
Gerrit Renker
52515e77a7 [CCID3]: Nofeedback timer according to rfc3448bis
This implements the changes to the nofeedback timer handling suggested
in draft rfc3448bis00, section 4.4. In particular, these changes mean:

 * better handling of the lossless case (p == 0)
 * the timestamp for computing t_ld becomes obsolete
 * much more recent document (RFC 3448 is almost 5 years old)
 * concepts in rfc3448bis arose from a real, working implementation
   (cf. sec. 12)

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:23 -08:00
Gerrit Renker
d8d1252f74 [CCID3]: Implement rfc3448bis changes to feedback reception
This implements the algorithm to update the allowed sending rate X upon
receiving feedback packets, as described in draft rfc3448bis, 4.2/4.3.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:22 -08:00
Gerrit Renker
5bd370a63d [CCID3]: Remove two irrelevant states in TX feedback handling
* the NO_SENT state is only triggered in bidirectional mode,
   costing unnecessary processing.
 * the TERM (terminating) state is irrelevant.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:22 -08:00
Gerrit Renker
8e138e7949 [CCID3]: Use a function to update p_inv, and p is never used
This patch
 1) concentrates previously scattered computation of p_inv into one function;
 2) removes the `p' element of the CCID3 RX sock (it is redundant);
 3) makes the tfrc_rx_info structure standalone, only used on demand.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:21 -08:00
Vlad Yasevich
9ad0977fe1 [SCTP]: Use crc32c library for checksum calculations.
The crc32c library used an identical table and algorithm
as SCTP.  Switch to using the library instead of carrying
our own table.  Using crypto layer proved to have too
much overhead compared to using the library directly.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:20 -08:00
Herbert Xu
1bf40954cf [PACKET]: Fix /proc/net/packet crash due to bogus private pointer
The seq_open_net patch changed the meaning of seq->private.
Unfortunately it missed two spots in AF_PACKET, which still
used the old way of dereferencing seq->private, thus causing
weird and wonderful crashes when reading /proc/net/packet.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:19 -08:00
Joe Perches
b5cb2bbc4c [IPV4] sctp: Use ipv4_is_<type>
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:17 -08:00
Joe Perches
37ef8dd7f3 [IPV4] net/netfilter: Use ipv4_is_<type>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:16 -08:00
Joe Perches
f97c1e0c6e [IPV4] net/ipv4: Use ipv4_is_<type>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:15 -08:00
Joe Perches
21cf2253eb [IPV4] net/core: Use ipv4_is_<type>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:15 -08:00
Pavel Emelyanov
586f121152 [IPV4]: Switch users of ipv4_devconf(_all) to use the pernet one
These are scattered over the code, but almost all the
"critical" places already have the proper struct net
at hand except for snmp proc showing function and routing
rtnl handler.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:12 -08:00
Pavel Emelyanov
9355bbd685 [IPV4]: Switch users of ipv4_devconf_dflt to use the pernet one
They are all collected in the net/ipv4/devinet.c file and
mostly use the IPV4_DEVCONF_DFLT macro.

So I add the net parameter to it and patch users accordingly.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:11 -08:00
Pavel Emelyanov
752d14dc6a [IPV4]: Move the devinet pointers on the struct net
This is the core.

Add all and default pointers on the netns_ipv4 and register
a new pernet subsys to initialize them.

Also add the ctl_table_header to register the
net.ipv4.ip_forward ctl.

I don't allocate additional memory for init_net, but use
global devinets.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:11 -08:00
Pavel Emelyanov
c0ce9fb304 [IPV4]: Store the net pointer on devinet's ctl tables
Some handers and strategies of devinet sysctl tables need
to know the net to propagate the ctl change to all the
net devices.

I use the (currently unused) extra2 pointer on the tables
to get it.

Holding the reference on the struct net is not possible,
because otherwise we'll get a net->ctl_table->net circular
dependency. But since the ctl tables are unregistered during
the net destruction, this is safe to get it w/o additional
protection.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:10 -08:00
Pavel Emelyanov
32e569b727 [IPV4]: Pass the net pointer to the arp_req_set_proxy()
This one will need to set the IPV4_DEVCONF_ALL(PROXY_ARP), but
there's no ways to get the net right in place, so we have to
pull one from the inet_ioctl's struct sock.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:09 -08:00
Pavel Emelyanov
ea40b324d7 [IPV4]: Make __devinet_sysctl_register return an error
Currently, this function is void, so failures in creating
sysctls for new/renamed devices are not reported to anywhere.

Fixing this is another complex (needed?) task, but this
return value is needed during the namespaces creation to
handle the case, when we failed to create "all" and "default"
entries.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:09 -08:00
Pavel Emelyanov
4bda4f250d [XFRM]: Fix potential race vs xfrm_state(only)_find and xfrm_hash_resize.
The _find calls calculate the hash value using the
xfrm_state_hmask, without the xfrm_state_lock. But the
value of this mask can change in the _resize call under
the state_lock, so we risk to fail in finding the desired
entry in hash.

I think, that the hash value is better to calculate
under the state lock.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:07 -08:00
Herbert Xu
9055e051b8 [UDP]: Move udp_stats_in6 into net/ipv4/udp.c
Now that external users may increment the counters directly, we need
to ensure that udp_stats_in6 is always available.  Otherwise we'd
either have to requrie the external users to be built as modules or
ipv6 to be built-in.

This isn't too bad because udp_stats_in6 is just a pair of pointers
plus an EXPORT, e.g., just 40 (16 + 24) bytes on x86-64.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:06 -08:00
YOSHIFUJI Hideaki
8d614434ab [SUNRPC]: Use htonl() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:05 -08:00
YOSHIFUJI Hideaki
ae445d172a [RXRPC]: Use cpu_to_be32() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:04 -08:00
YOSHIFUJI Hideaki
f831e90971 [MAC80211]: Use htons() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:04 -08:00
YOSHIFUJI Hideaki
6d82de9e57 [IRDA]: Use htons() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:03 -08:00
YOSHIFUJI Hideaki
5661df7b6c [IPVS]: Use htons() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:02 -08:00
YOSHIFUJI Hideaki
4e39430ae3 [IEEE80211]: Use htons() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:02 -08:00
YOSHIFUJI Hideaki
b98999dc38 [DECNET]: Use htons() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:01 -08:00
YOSHIFUJI Hideaki
91c5ec3ed1 [BRIDGE]: Use cpu_to_be16() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:00 -08:00
Gerrit Renker
6179983ad3 [DCCP]: Introducing CCMPS
This introduces a CCMPS field for setting a CCID-specific upper bound on the application payload
size, as is defined in RFC 4340, section 14.

Only the TX CCID is considered in setting this limit, since the RX CCID generates comparatively
small (DCCP-Ack) feedback packets. The CCMPS field includes network and transport layer header
lengths. The only current CCMPS customer is CCID4 (via RFC 4828).

A wrapper is used to allow querying the CCMPS even at times where the CCID modules may not have
been fully negotiated yet.

In dccp_sync_mss() the variable `mss_now' has been renamed into `cur_mps', to reflect that we are
dealing with an MPS, but not an MSS.
Since the DCCP code closely follows the TCP code, the identifiers `dccp_sync_mss' and
`dccps_mss_cache' have been kept, as they have direct TCP counterparts.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:59 -08:00
Gerrit Renker
84a97b0af8 [CCID]: More informative registration
The patch makes the registration messages of CCID 2/3 a bit more
informative: instead of repeating the CCID number as currently done,

        "CCID: Registered CCID 2 (ccid2)"  or
        "CCID: Registered CCID 3 (ccid3)",

the descriptive names of the CCID's (from RFCs) are now used:

	"CCID: Registered CCID 2 (TCP-like)" and
	"CCID: Registered CCID 3 (TCP-Friendly Rate Control)".

To allow spaces in the name, the slab name string has been changed to
refer to the numeric CCID identifier, using the same format as before.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:58 -08:00
Gerrit Renker
9cb2345a8c [DCCP]: Documentation for CCID operations
This adds documentation for the ccid_operations structure.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:58 -08:00
Denis V. Lunev
f5026fabda [IPV4]: Thresholds in fib_trie.c are used as consts, so make them const.
There are several thresholds for trie fib hash management. They are used
in the code as a constants. Make them constants from the compiler point of
view.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:57 -08:00
Michal Schmidt
8a4a50f98b [IPV6] sit: Rebinding of SIT tunnels to other interfaces
This is similar to the change already done for IPIP tunnels.

Once created, a SIT tunnel can't be bound to another device.
To reproduce:

# create a tunnel:
ip tunnel add tunneltest0 mode sit remote 10.0.0.1 dev eth0
# try to change the bounding device from eth0 to eth1:
ip tunnel change tunneltest0 dev eth1
# show the result:
ip tunnel show tunneltest0

tunneltest0: ipv6/ip  remote 10.0.0.1  local any  dev eth0  ttl inherit

Notice the bound device has not changed from eth0 to eth1.

This patch fixes it. When changing the binding, it also recalculates the
MTU according to the new bound device's MTU.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:56 -08:00
Michal Schmidt
ee34c1eb35 [IP_GRE]: Rebinding of GRE tunnels to other interfaces
This is similar to the change already done for IPIP tunnels.

Once created, a GRE tunnel can't be bound to another device.
To reproduce:

# create a tunnel:
ip tunnel add tunneltest0 mode gre remote 10.0.0.1 dev eth0
# try to change the bounding device from eth0 to eth1:
ip tunnel change tunneltest0 dev eth1
# show the result:
ip tunnel show tunneltest0

tunneltest0: gre/ip  remote 10.0.0.1  local any  dev eth0  ttl inherit

Notice the bound device has not changed from eth0 to eth1.

This patch fixes it. When changing the binding, it also recalculates the
MTU according to the new bound device's MTU.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:56 -08:00
Denis V. Lunev
528c4ceb42 [IPV6]: Always pass a valid nl_info to inet6_rt_notify.
This makes the code in the inet6_rt_notify more straightforward and provides
groud for namespace passing.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:55 -08:00
Herbert Xu
aef2178599 [IPSEC]: Fix zero return value in xfrm_lookup on error
Further testing shows that my ICMP relookup patch can cause xfrm_lookup
to return zero on error which isn't very nice since it leads to the caller
dying on null pointer dereference.  The bug is due to not setting err
to ENOENT just before we leave xfrm_lookup in case of no policy.

This patch moves the err setting to where it should be.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:54 -08:00
Gerrit Renker
cf86314cb7 [DCCP]: Ignore feature negotiation on Data packets
This implements [RFC 4340, p. 32]: "any feature negotiation options received
on DCCP-Data packets MUST be ignored".

Also added a FIXME for further processing, since the code currently (wrongly)
classifies empty Confirm options as invalid - this needs to be resolved in
a separate patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:54 -08:00
Gerrit Renker
5cdae198de [DCCP]: Make code assumptions explicit
This removes several `XXX' references which indicate a missing support
for non-1-byte feature values: this is unnecessary, as all currently known
(standardised) SP feature values are 1-byte quantities.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:53 -08:00
Gerrit Renker
dd6303df09 [DCCP]: Remove unused and redundant validation functions
This removes two inlines which were both called in a single function only:

 1) dccp_feat_change() is always called with either DCCPO_CHANGE_L or DCCPO_CHANGE_R as argument
    * from dccp_set_socktopt_change() via do_dccp_setsockopt() with DCCP_SOCKOPT_CHANGE_R/L
    * from __dccp_feat_init() via dccp_feat_init() also with DCCP_SOCKOPT_CHANGE_R/L.

    Hence the dccp_feat_is_valid_type() is completely unnecessary and always returns true.

 2) Due to (1), the length test reduces to 'len >= 4', which in turn makes
    dccp_feat_is_valid_length() unnecessary.

Furthermore, the inline function dccp_feat_is_reserved() was unfolded,
since only called in a single place.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:52 -08:00
Gerrit Renker
af3b867e2f [DCCP]: Support inserting options during the 3-way handshake
This provides a separate routine to insert options during the initial handshake.
The main purpose is to conduct feature negotiation, for the moment the only user
is the timestamp echo needed for the (CCID3) handshake RTT sample.

Padding of options has been put into a small separate routine, to be shared among
the two functions. This could also be used as a generic routine to finish inserting
options.

Also removed an `XXX' comment since its content was obvious.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:52 -08:00
Gerrit Renker
b4d4f7c70f [DCCP]: Handle timestamps on Request/Response exchange separately
In DCCP, timestamps can occur on packets anytime, CCID3 uses a timestamp(/echo) on the Request/Response
exchange. This patch addresses the following situation:
	* timestamps are recorded on the listening socket;
	* Responses are sent from dccp_request_sockets;
	* suppose two connections reach the listening socket with very small time in between:
	* the first timestamp value gets overwritten by the second connection request.

This is not really good, so this patch separates timestamps into
 * those which are received by the server during the initial handshake (on dccp_request_sock);
 * those which are received by the client or the client after connection establishment.

As before, a timestamp of 0 is regarded as indicating that no (meaningful) timestamp has been
received (in addition, a warning message is printed if hosts send 0-valued timestamps).

The timestamp-echoing now works as follows:
 * when a timestamp is present on the initial Request, it is placed into dreq, due to the
   call to dccp_parse_options in dccp_v{4,6}_conn_request;
 * when a timestamp is present on the Ack leading from RESPOND => OPEN, it is copied over
   from the request_sock into the child cocket in dccp_create_openreq_child;
 * timestamps received on an (established) dccp_sock are treated as before.

Since Elapsed Time is measured in hundredths of milliseconds (13.2), the new dccp_timestamp()
function is used, as it is expected that the time between receiving the timestamp and
sending the timestamp echo will be very small against the wrap-around time. As a byproduct,
this allows smaller timestamping-time fields.

Furthermore, inserting the Timestamp Echo option has been taken out of the block starting with
'!dccp_packet_without_ack()', since Timestamp Echo can be carried on any packet (5.8 and 13.3).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:51 -08:00
Gerrit Renker
8109616e2e [DCCP]: Add (missing) option parsing to request_sock processing
This adds option-parsing code to processing of Acks in the listening state
on request_socks on the server, serving two purposes
 (i)  resolves a FIXME (removed);
 (ii) paves the way for feature-negotiation during connection-setup.

There is an intended subtlety here with regard to dccp_check_req:

 Parsing options happens only after testing whether the received packet is
 a retransmitted Request.  Otherwise, if the Request contained (a possibly
 large number of) feature-negotiation options, recomputing state would have to
 happen each time a retransmitted Request arrives, which opens the door to an
 easy DoS attack.  Since in a genuine retransmission the options should not be
 different from the original, reusing the already computed state seems better.

 The other point is - if there are timestamp options on the Request, they will
 not be answered; which means that in the presence of retransmission (likely
 due to loss and/or other problems), the use of Request/Response RTT sampling
 is suspended, so that startup problems here do not propagate.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:50 -08:00
Gerrit Renker
8b81941248 [DCCP]: Allow to parse options on Request Sockets
The option parsing code currently only parses on full sk's. This causes a problem for
options sent during the initial handshake (in particular timestamps and feature-negotiation
options). Therefore, this patch extends the option parsing code with an additional argument
for request_socks: if it is non-NULL, options are parsed on the request socket, otherwise
the normal path (parsing on the sk) is used.

Subsequent patches, which implement feature negotiation during connection setup, make use
of this facility.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:50 -08:00
Gerrit Renker
7913350663 [DCCP]: Collapse repeated `len' statements into one
This replaces 4 individual assignments for `len' with a single
one, placed where the control flow of those 4 leads to.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:49 -08:00
Gerrit Renker
b8599d2070 [DCCP]: Support for server holding timewait state
This adds a socket option and signalling support for the case where the server
holds timewait state on closing the connection, as described in RFC 4340, 8.3.

Since holding timewait state at the server is the non-usual case, it is enabled
via a socket option. Documentation for this socket option has been added.

The setsockopt statement has been made resilient against different possible cases
of expressing boolean `true' values using a suggestion by Ian McDonald.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:48 -08:00
Gerrit Renker
28be544004 [DCCP]: Use maximum-RTO backoff from DCCP spec
This removes another Fixme, using the TCP maximum RTO rather than the value
specified by the DCCP specification. Across the sections in RFC 4340, 64
seconds is consistently suggested as maximum RTO backoff value; and this is
the value which is now used.

I have checked both termination cases for retransmissions of Close/CloseReq:
with the default value 15 of `retries2', and an initial icsk_retransmit = 0,
it takes about 614 seconds to declare a non-responding peer as dead, after
which the final terminating Reset is sent. With the TCP maximum RTO value of
120 seconds it takes (as might be expected) almost twice as long, about 23
minutes.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:47 -08:00
Gerrit Renker
92d31920b8 [DCCP]: Shift the retransmit timer for active-close into output.c
When performing active close, RFC 4340, 8.3. requires to retransmit the
Close/CloseReq with a backoff-retransmit timer starting at intially 2 RTTs.

This patch shifts the existing code for active-close retransmit timer
into output.c, so that the retransmit timer is started when the first
Close/CloseReq is sent. Previously, the timer was started when, after
releasing the socket in dccp_close(), the actively-closing side had not yet
reached the CLOSED/TIMEWAIT state.

The patch further reduces the initial timeout from 3 seconds to the required
2 RTTs, where - in absence of a known RTT - the fallback value specified in
RFC 4340, 3.4 is used.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:47 -08:00
Daniel Lezcano
09f7709f49 [IPV6]: fix section mismatch warnings
Removed useless and buggy __exit section in the different
ipv6 subsystems. Otherwise they will be called inside an
init section during rollbacking in case of an error in the
protocol initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:46 -08:00
Gerrit Renker
69567d0b63 [DCCP]: Perform SHUT_RD and SHUT_WR on receiving close
This patch performs two changes:

1) Close the write-end in addition to the read-end when a fin-like segment
  (Close or CloseReq) is received by DCCP. This accounts for the fact that DCCP,
  in contrast to TCP, does not have a half-close. RFC 4340 says in this respect
  that when a fin-like segment has been sent there is no guarantee at all that
  any   further data will be processed.
  Thus this patch performs SHUT_WR in addition to the SHUT_RD when a fin-like
  segment is encountered.

2) Minor change: I noted that code appears twice in different places and think it
   makes sense to put this into a self-contained function (dccp_enqueue()).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:45 -08:00
Herbert Xu
96eba69dba [DECNET]: Fix inverted wait flag in xfrm_lookup call
My previous patch made the wait flag take the opposite value to what
it should be.  This patch fixes that.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:44 -08:00
Herbert Xu
a66207121f [NET]: Check RTNL status in unregister_netdevice
The caller must hold the RTNL so let's check it in unregister_netdevice.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:43 -08:00
Herbert Xu
aebcf82c1f [IPSEC]: Do not let packets pass when ICMP flag is off
This fixes a logical error in ICMP policy checks which lets
packets through if the state ICMP flag is off.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:43 -08:00
Herbert Xu
bb72845e69 [IPSEC]: Make callers of xfrm_lookup to use XFRM_LOOKUP_WAIT
This patch converts all callers of xfrm_lookup that used an
explicit value of 1 to indiciate blocking to use the new flag
XFRM_LOOKUP_WAIT.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:42 -08:00
Herbert Xu
7233b9f33e [IPSEC]: Fix reversed ICMP6 policy check
The policy check I added for ICMP on IPv6 is reversed.  This
patch fixes that.

It also adds an skb->sp check so that unprotected packets that
fail the policy check do not crash the machine.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:41 -08:00
Michal Schmidt
5533995b62 [IPIP]: Allow rebinding the tunnel to another interface
Once created, an IP tunnel can't be bound to another device.
(reported as https://bugzilla.redhat.com/show_bug.cgi?id=419671)

To reproduce:

# create a tunnel:
ip tunnel add tunneltest0 mode ipip remote 10.0.0.1 dev eth0
# try to change the bounding device from eth0 to eth1:
ip tunnel change tunneltest0 dev eth1
# show the result:
ip tunnel show tunneltest0

tunneltest0: ip/ip  remote 10.0.0.1  local any  dev eth0  ttl inherit

Notice the bound device has not changed from eth0 to eth1.

This patch fixes it. When changing the binding, it also recalculates the
MTU according to the new bound device's MTU.

If the change is acceptable, I'll do the same for GRE and SIT tunnels.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:25 -08:00
Denis V. Lunev
81103a52f2 [NETNS]: network namespace was passed into dev_getbyhwaddr but not used
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:24 -08:00
Herbert Xu
8b7817f3a9 [IPSEC]: Add ICMP host relookup support
RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload.  This patch implements this
for ICMP traffic that originates from or terminates on localhost.

This is activated on outbound with the new policy flag XFRM_POLICY_ICMP,
and on inbound by the new state flag XFRM_STATE_ICMP.

On inbound the policy check is now performed by the ICMP protocol so
that it can repeat the policy check where necessary.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:23 -08:00
Herbert Xu
d5422efe68 [IPSEC]: Added xfrm_decode_session_reverse and xfrmX_policy_check_reverse
RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload.  This patch adds the functions
xfrm_decode_session_reverse and xfrmX_policy_check_reverse so we can get
the reverse flow to perform such a lookup.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:22 -08:00
Herbert Xu
815f4e57e9 [IPSEC]: Make xfrm_lookup flags argument a bit-field
This patch introduces an enum for bits in the flags argument of xfrm_lookup.
This is so that we can cram more information into it later.

Since all current users use just the values 0 and 1, XFRM_LOOKUP_WAIT has
been added with the value 1 << 0 to represent the current meaning of flags.

The test in __xfrm_lookup has been changed accordingly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:21 -08:00
Gerrit Renker
3f71c81ac3 [TFRC]: Remove previous loss intervals implementation
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:20 -08:00
Gerrit Renker
954c2db868 [CCID3]: Interface CCID3 code with newer Loss Intervals Database
This hooks up the TFRC Loss Interval database with CCID 3 packet reception.
In addition, it makes the CCID-specific computation of the first loss
interval (which requires access to all the guts of CCID3) local to ccid3.c.

The patch also fixes an omission in the DCCP code, that of a default /
fallback RTT value (defined in section 3.4 of RFC 4340 as 0.2 sec); while
at it, the  upper bound of 4 seconds for an RTT sample has  been reduced to
match the initial TCP RTO value of 3 seconds from[RFC 1122, 4.2.3.1].

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:20 -08:00
Gerrit Renker
de0d411cb8 [TFRC]: CCID3 (and CCID4) needs to access these inlines
This moves two inlines back to packet_history.h: these are not private
to packet_history.c, but are needed by CCID3/4 to detect whether a new
loss is indicated, or whether a loss is already pending.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:19 -08:00
Gerrit Renker
db64196038 [CCID3]: Redundant debugging output / documentation
Each time feedback is sent two lines are printed:

	ccid3_hc_rx_send_feedback: client ... - entry
	ccid3_hc_rx_send_feedback: Interval ...usec, X_recv=..., 1/p=...

The first line is redundant and thus removed.

Further, documentation of ccid3_hc_rx_sock (capitalisation) is made consistent.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:18 -08:00
Gerrit Renker
8a9c7e92e0 [TFRC]: Ringbuffer to track loss interval history
A ringbuffer-based implementation of loss interval history is easier to
maintain, allocate, and update.

The `swap' routine to keep the RX history sorted is due to and was written
by Arnaldo Carvalho de Melo, simplifying an earlier macro-based variant.

Details:
 * access to the Loss Interval Records via macro wrappers (with safety checks);
 * simplified, on-demand allocation of entries (no extra memory consumption on
   lossless links); cache allocation is local to the module / exported as service;
 * provision of RFC-compliant algorithm to re-compute average loss interval;
 * provision of comprehensive, new loss detection algorithm
 	- support for all cases of loss, including re-ordered/duplicate packets;
 	- waiting for NDUPACK=3 packets to fill the hole;
	- updating loss records when a late-arriving packet fills a hole.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:18 -08:00
Gerrit Renker
8995a238ef [TFRC]: Loss interval code needs the macros/inlines that were moved
This moves the inlines (which were previously declared as macros) back into
packet_history.h since the loss detection code needs to be able to read entries
from the RX history in order to create the relevant loss entries: it needs at
least tfrc_rx_hist_loss_prev() and tfrc_rx_hist_last_rcv(), which in turn
require the definition of the other inlines (macros).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:16 -08:00
Gerrit Renker
df8f83fdd6 [TFRC]: Put RX/TX initialisation into tfrc.c
This separates RX/TX initialisation and puts all packet history / loss intervals
initialisation into tfrc.c.
The organisation is uniform: slab declaration -> {rx,tx}_init() -> {rx,tx}_exit()

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:15 -08:00
Denis V. Lunev
2aaef4e47f [NETNS]: separate af_packet netns data
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:15 -08:00
Denis V. Lunev
a0a53c8ba9 [NETNS]: struct net content re-work (v3)
Recently David Miller and Herbert Xu pointed out that struct net becomes
overbloated and un-maintainable. There are two solutions:
- provide a pointer to a network subsystem definition from struct net.
  This costs an additional dereferrence
- place sub-system definition into the structure itself. This will speedup
  run-time access at the cost of recompilation time

The second approach looks better for us. Other sub-systems will follow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:14 -08:00
Daniel Lezcano
7f4e4868f3 [IPV6]: make the protocol initialization to return an error code
This patchset makes the different protocols to return an error code, so
the af_inet6 module can check the initialization was correct or not.

The raw6 was taken into account to be consistent with the rest of the
protocols, but the registration is at the same place.
Because the raw6 has its own init function, the proto and the ops structure
can be moved inside the raw6.c file.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:13 -08:00
Daniel Lezcano
87c3efbfdd [IPV6]: make inet6_register_protosw to return an error code
This patch makes the inet6_register_protosw to return an error code.
The different protocols can be aware the registration was successful or
not and can pass the error to the initial caller, af_inet6.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:12 -08:00
Daniel Lezcano
853cbbaaa4 [IPV6]: make frag to return an error at initialization
This patch makes the frag_init to return an error code, so the af_inet6
module can handle the error.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:11 -08:00
Daniel Lezcano
248b238dc9 [IPV6]: make extended headers to return an error at initialization
This patch factorize the code for the differents init functions for rthdr,
nodata, destopt in a single function exthdrs_init.
This function returns an error so the af_inet6 module can check correctly
the initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:10 -08:00
Daniel Lezcano
0a3e78ac2c [IPV6]: make flowlabel to return an error
This patch makes the flowlab subsystem to return an error code and makes
some cleanup with procfs ifdefs.
The af_inet6 will use the flowlabel init return code to check the initialization
was correct.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:10 -08:00
Pavel Emelyanov
51602b2a5e [IPV4]: Cleanup sysctl manipulations in devinet.c
This includes:

 * moving neigh_sysctl_(un)register calls inside
   devinet_sysctl_(un)register ones, as they are always
   called in pairs;
 * making __devinet_sysctl_unregister() to unregister
   the ipv4_devconf struct, while original devinet_sysctl_unregister()
   works with the in_device to handle both - devconf and
   neigh sysctls;
 * make stubs for CONFIG_SYSCTL=n case to get rid of
   in-code ifdefs.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:09 -08:00
Pavel Emelyanov
95c9382a34 [INET]: Use BUILD_BUG_ON in inet_timewait_sock.c checks
Make the INET_TWDR_TWKILL_SLOTS vs sizeof(twdr->thread_slots)
check nicer.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:08 -08:00
Pavel Emelyanov
1f9e636ea2 [TCP]: Use BUILD_BUG_ON for tcp_skb_cb size checking
The sizeof(struct tcp_skb_cb) should not be less than the
sizeof(skb->cb). This is checked in net/ipv4/tcp.c, but
this check can be made more gracefully.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:07 -08:00
Eric Dumazet
ea72912c88 [NETLINK]: kzalloc() conversion
nl_pid_hash_alloc() is renamed to nl_pid_hash_zalloc().
It is now returning zeroed memory to its callers.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:06 -08:00
Eric Dumazet
64b7d96167 [NET]: dst_ifdown() cleanup
This cleanup shrinks size of net/core/dst.o on i386 from 1299 to 1289 bytes.
(This is because dev_hold()/dev_put() are doing atomic_inc()/atomic_dec() and
force compiler to re-evaluate memory contents.)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:05 -08:00
Herbert Xu
005011211f [IPSEC]: Add xfrm_input_state helper
This patch adds the xfrm_input_state helper function which returns the
current xfrm state being processed on the input path given an sk_buff.
This is currently only used by xfrm_input but will be used by ESP upon
asynchronous resumption.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:05 -08:00
Gerrit Renker
385ac2e3f2 [CCID3]: HC-receiver should not insert timestamps as HC-sender doesn't uses it
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:04 -08:00
Gerrit Renker
797eba424d [TFRC]: The function tfrc_rx_hist_entry_delete() is not used anymore
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:03 -08:00
Gerrit Renker
78282d2af5 [TFRC]: Move comment.
Moved up the comment "Receiver routines" above the first occurrence of
RX history routines.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:03 -08:00
YOSHIFUJI Hideaki
c69bce20dd [NET]: Remove unused "mibalign" argument for snmp_mib_init().
With fixes from Arnaldo Carvalho de Melo.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:02 -08:00
Denis V. Lunev
971b893e79 [IPV4]: last default route is a fib table property
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:01 -08:00
Denis V. Lunev
a2bbe6822f [IPV4]: Unify assignment of fi to fib_result
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:01 -08:00
Denis V. Lunev
c17860a039 [IPV4]: no need pass pointer to a default into fib_detect_death
ipv4: no need pass pointer to a default into fib_detect_death

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:00 -08:00
Daniel Lezcano
7e5449c215 [IPV6]: route6 remove ifdef for fib_rules
The patch defines the usual static inline functions when the code is
disabled for fib6_rules. That's allow to remove some ifdef in route.c
file and make the code a little more clear.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:59 -08:00
Daniel Lezcano
c35b7e72cd [IPV6]: remove ifdef in route6 for xfrm6
The following patch create the usual static inline functions to disable
the xfrm6_init and xfrm6_fini function when XFRM is off.
That's allow to remove some ifdef and make the code a little more clear.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:59 -08:00
Daniel Lezcano
75314fb383 [IPV6]: create route6 proc init-fini functions
Make the proc creation/destruction to be a separate function. That
allows to remove the #ifdef CONFIG_PROC_FS in the init/fini function
and make them more readable.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:58 -08:00
Pavel Emelyanov
b8e1f9b5c3 [NET] sysctl: make sysctl_somaxconn per-namespace
Just move the variable on the struct net and adjust
its usage.

Others sysctls from sys.net.core table are more
difficult to virtualize (i.e. make them per-namespace),
but I'll look at them as well a bit later.

Signed-off-by: Pavel Emelyanov <xemul@oenvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:57 -08:00
Pavel Emelyanov
790a353289 [NET] sysctl: prepare core tables to point to netns variables
Some of ctl variables are going to be on the struct
net. Here's the way to adjust the ->data pointer on the
ctl_table-s to point on the right variable.

Since some pointers still point on the global variables,
I keep turning the write bits off on such tables.

This looks to become a common procedure for net sysctls,
so later parts of this code may migrate to some more
generic place.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:56 -08:00
Pavel Emelyanov
024626e36d [NET] sysctl: make the sys.net.core sysctls per-namespace
Making them per-namespace is required for the following
two reasons:

 First, some ctl values have a per-namespace meaning.
 Second, making them writable from the sub-namespace
 is an isolation hole.

So I introduce the pernet operations to create these
tables. For init_net I use the existing statically
declared tables, for sub-namespace they are duplicated
and the write bits are removed from the mode.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:56 -08:00
Denis Cheng
b5e78337b5 [IUCV]: use LIST_HEAD instead of LIST_HEAD_INIT
these three list_head are all local variables, but can also use
LIST_HEAD.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:54 -08:00
Denis Cheng
df01812eba [XFRM] net/xfrm/xfrm_state.c: use LIST_HEAD instead of LIST_HEAD_INIT
single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:54 -08:00
Denis Cheng
0e3cf7e916 [X25]: use LIST_HEAD instead of LIST_HEAD_INIT
single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:53 -08:00
Denis Cheng
14d0e7b74e [LAPB] net/lapb/lapb_iface.c: use LIST_HEAD instead of LIST_HEAD_INIT
single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:52 -08:00
Denis Cheng
1596c97aa8 [IPV4] net/ipv4/cipso_ipv4.c: use LIST_HEAD instead of LIST_HEAD_INIT
single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:52 -08:00
Denis Cheng
3b5b34fd2b [NET] net/core/dev.c: use LIST_HEAD instead of LIST_HEAD_INIT
single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:51 -08:00
Eric W. Biederman
877a9bff38 [IPV4]: Move trie_local and trie_main into the proc iterator.
We only use these variables when displaying the trie in proc so
place them into the iterator to make this explicit.  We should
probably do something smarter to handle the CONFIG_IP_MULTIPLE_TABLES
case but at least this makes it clear that the silliness is limited
to the display in /proc.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:49 -08:00
Eric W. Biederman
bb80317586 [IPV4]: Remove ip_fib_local_table and ip_fib_main_table defines.
There are only 2 users and it doesn't hurt to call fib_get_table
instead, and it makes it easier to make the fib network namespace
aware.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:49 -08:00
Daniel Lezcano
f845ab6b7d [IPV6] route6/fib6: Don't panic a kmem_cache_create.
If the kmem_cache_creation fails, the kernel will panic. It is
acceptable if the system is booting, but if the ipv6 protocol is
compiled as a module and it is loaded after the system has booted, do
we want to panic instead of just failing to initialize the protocol ?

The init function is now returning an error and this one is checked
for protocol initialization. So the ipv6 protocol will safely fails.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:48 -08:00
Daniel Lezcano
e2fddf5e96 [IPV6]: Make af_inet6 to check ip6_route_init return value.
The af_inet6 initialization function does not check the return code of
the route initilization, so if something goes wrong, the protocol
initialization will continue anyway.  This patch takes into account
the modification made in the different route's initialization
subroutines to check the return value and to make the protocol
initialization to fail.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:47 -08:00
Daniel Lezcano
433d49c3bb [IPV6]: Make ip6_route_init to return an error code.
The route initialization function does not return any value to notify
if the initialization is successful or not. This patch checks all
calls made for the initilization in order to return a value for the
caller.

Unfortunately, proc_net_fops_create will return a NULL pointer if
CONFIG_PROC_FS is off, so we can not check the return code without an
ifdef CONFIG_PROC_FS block in the ip6_route_init function.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:47 -08:00
Daniel Lezcano
9eb87f3f7e [IPV6]: Make fib6_rules_init to return an error code.
When the fib_rules initialization finished, no return code is provided
so there is no way to know, for the caller, if the initialization has
been successful or has failed. This patch fix that.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:46 -08:00
Daniel Lezcano
0013cabab3 [IPV6]: Make xfrm6_init to return an error code.
The xfrm initialization function does not return any error code, so if
there is an error, the caller can not be advise of that.  This patch
checks the return code of the different called functions in order to
return a successful or failed initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:45 -08:00
Daniel Lezcano
d63bddbe90 [IPV6]: Make fib6_init to return an error code.
If there is an error in the initialization function, nothing is
followed up to the caller. So I add a return value to be set for the
init function.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:45 -08:00
Denis V. Lunev
5a3e55d68e [NET]: Multiple namespaces in the all dst_ifdown routines.
Move dst entries to a namespace loopback to catch refcounting leaks.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:44 -08:00
Arnaldo Carvalho de Melo
b84a2189c4 [TFRC]: New rx history code
Credit here goes to Gerrit Renker, that provided the initial implementation for
this new codebase.

I modified it just to try to make it closer to the existing API, renaming some
functions, add namespacing and fix one bug where the tfrc_rx_hist_alloc was not
freeing the allocated ring entries on the error path.

Original changeset comment from Gerrit:
      -----------
This provides a new, self-contained and generic RX history service for TFRC
based protocols.

Details:
 * new data structure, initialisation and cleanup routines;
 * allocation of dccp_rx_hist entries local to packet_history.c,
   as a service exported by the dccp_tfrc_lib module.
 * interface to automatically track highest-received seqno;
 * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1);
 * a generic function to test for `data packets' as per  RFC 4340, sec. 7.7.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:43 -08:00
Gerrit Renker
30a0eacd47 [CCID3]: The receiver of a half-connection does not set window counter values
Only the sender sets window counters [RFC 4342, sections 5 and 8.1].

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:43 -08:00
Arnaldo Carvalho de Melo
d58d1af03a [TFRC]: Rename dccp_rx_ to tfrc_rx_
This is in preparation for merging the new rx history code written by Gerrit Renker.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:42 -08:00
Arnaldo Carvalho de Melo
34a9e7ea91 [TFRC]: Make the rx history slab be global
This is in preparation for merging the new rx history code written by Gerrit Renker.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:41 -08:00
Arnaldo Carvalho de Melo
e9c8b24a6a [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:40 -08:00
Gerrit Renker
2180c41ca5 [DCCP]: Introduce generic function to test for `data packets'
as per  RFC 4340, sec. 7.7.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:40 -08:00
Gerrit Renker
c40616c597 [TFRC]: Provide central source file and debug facility
This patch changes the tfrc_lib module in the following manner:

 (1) a dedicated tfrc source file to call the packet history &
     loss interval init/exit functions.
 (2) a dedicated tfrc_pr_debug macro with toggle switch `tfrc_debug'.

Commiter note: renamed tfrc_module.c to tfrc.c, and made CONFIG_IP_DCCP_CCID3
select IP_DCCP_TFRC_LIB.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:39 -08:00
Pavel Emelyanov
f8b33fdfaf [ARP]: Consolidate some code in arp_req_set/delete_publc
The PROXY_ARP is set on devconfigs in a similar way in
both calls.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:38 -08:00
Pavel Emelyanov
46479b4329 [ARP]: Minus one level of ndentation in arp_req_delete
The same cleanup for deletion requests.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:38 -08:00
Pavel Emelyanov
43dc170117 [ARP]: Minus one level of indentation in arp_req_set
The ATF_PUBL requests are handled completely separate from
the others. Emphasize it with a separate function. This also
reduces the indentation level.

The same issue exists with the arp_delete_request, but
when I tried to make it in one patch diff produced completely
unreadable patch. So I split it into two, but they may be
done with one commit.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:37 -08:00
Pavel Emelyanov
1ff1cc202e [IPV4] ROUTE: Convert rt_hash_lock_init() macro into function
There's no need in having this function exist in a form
of macro. Properly formatted function looks much better.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:36 -08:00
Pavel Emelyanov
107f163428 [IPV4] ROUTE: Clean up proc files creation.
The rt_cache, stats/rt_cache and rt_acct(optional) files
creation looks a bit messy. Clean this out and join them
to other proc-related functions under the proper ifdef.

The struct net * argument in a new function will help net
namespaces patches look nicer.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:36 -08:00
Pavel Emelyanov
78c686e9fa [IPV4] ROUTE: Collect proc-related functions together
The net/ipv4/route.c file declares some entries for proc
to dump some routing info. The reading functions are
scattered over this file - collect them together.

Besides, remove a useless IP_RT_ACCT_CPU macro.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:35 -08:00
Herbert Xu
a59322be07 [UDP]: Only increment counter on first peek/recv
The previous move of the the UDP inDatagrams counter caused each
peek of the same packet to be counted separately.  This may be
undesirable.

This patch fixes this by adding a bit to sk_buff to record whether
this packet has already been seen through skb_recv_datagram.  We
then only increment the counter when the packet is seen for the
first time.

The only dodgy part is the fact that skb_recv_datagram doesn't have
a good way of returning this new bit of information.  So I've added
a new function __skb_recv_datagram that does return this and made
skb_recv_datagram a wrapper around it.

The plan is to eventually replace all uses of skb_recv_datagram with
this new function at which time it can be renamed its proper name.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:34 -08:00
Herbert Xu
1781f7f580 [UDP]: Restore missing inDatagrams increments
The previous move of the the UDP inDatagrams counter caused the
counting of encapsulated packets, SUNRPC data (as opposed to call)
packets and RXRPC packets to go missing.

This patch restores all of these.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:33 -08:00
Herbert Xu
27ab256864 [UDP]: Avoid repeated counting of checksum errors due to peeking
Currently it is possible for two processes to peek on the same socket
and end up incrementing the error counter twice for the same packet.

This patch fixes it by making skb_kill_datagram return whether it
succeeded in unlinking the packet and only incrementing the counter
if it did.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:32 -08:00
Pavel Emelyanov
c8fecf2242 [IPV6]: Eliminate difference in actions of sysctl and proc handler for conf.all.forwarding
The only difference in this case is that updating all.forwarding
causes the update in default.forwarding when done via proc, but
not via the system call.

Besides, this consolidates a good portion of code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:32 -08:00
Pavel Emelyanov
68dd299bc8 [INET]: Merge sys.net.ipv4.ip_forward and sys.net.ipv4.conf.all.forwarding
AFAIS these two entries should do the same thing - change the
forwarding state on ipv4_devconf and on all the devices.

I propose to merge the handlers together using ctl paths.

The inet_forward_change() is static after this and I move
it higher to be closer to other "propagation" helpers and
to avoid diff making patches based on { and } matching :)
i.e. - make them easier to read.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:31 -08:00
Pavel Emelyanov
4d43b78ac2 [IPV6]: Use sysctl paths to register ipv6 sysctl tables
I have already done this for core, ipv4 and tr tables, so repeat this
for the ipv6 ones.

This makes the ipv6.ko smaller and creates the ground needed for net
namespaces support in ipv6.ko ssctls.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:30 -08:00
Pavel Emelyanov
4a61b586cd [IPV6]: Make the ipv6/sysctl_net_ipv6.c compilation cleaner
Since this file is entirely enclosed with the
#ifdef CONFIG_SYSCTL/#endif pair, it's OK to move this
CONFIG_ into a Makefile.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:29 -08:00
Pavel Emelyanov
08913681e4 [NET]: Remove the empty net_table
I have removed all the entries from this table (core_table,
ipv4_table and tr_table), so now we can safely drop it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:29 -08:00
Pavel Emelyanov
36f0bebd98 [TR]: Use ctl paths to register net/token-ring/ table
The same thing for token-ring - use ctl paths and get
rid of external references on the tr_table.

Unfortunately, I couldn't split this patch into cleanup and
use-the-paths parts.

As a lame excuse I can say, that the cleanup is just moving
the tr_table from one file to another - closet to a single
variable, that this ctl table tunes. Since the source  file
becomes empty after the move, I remove it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:28 -08:00
Pavel Emelyanov
3e37c3f997 [IPV4]: Use ctl paths to register net/ipv4/ table
This is the same as I did for the net/core/ table in the
second patch in his series: use the paths and isolate the
whole table in the .c file.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:27 -08:00
Pavel Emelyanov
9ba6397976 [IPV4]: Cleanup the sysctl_net_ipv4.c file
This includes several cleanups:

 * tune Makefile to compile out this file when SYSCTL=n. Now
   it looks like net/core/sysctl_net_core.c one;
 * move the ipv4_config to af_inet.c to exist all the time;
 * remove additional sysctl_ip_nonlocal_bind declaration
   (it is already declared in net/ip.h);
 * remove no nonger needed ifdefs from this file.

This is a preparation for using ctl paths for net/ipv4/
sysctl table.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:27 -08:00
Pavel Emelyanov
33eb9cfc70 [NET]: Isolate the net/core/ sysctl table
Using ctl paths we can put all the stuff, related to net/core/
sysctl table, into one file and remove all the references on it.

As a good side effect this hides the "core_table" name from
the global scope :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:26 -08:00
Pavel Emelyanov
7e2e109cef [NET]: Remove unneeded ifdefs from sysctl_net_core.c
This file is already compiled out when the SYSCTL=n, so
these ifdefs, that enclose the whole file, can be removed.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:25 -08:00
Patrick McHardy
2eeeba390a [NETFILTER]: Select CONFIG_NETFILTER_NETLINK when needed
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:25 -08:00
Patrick McHardy
ab4f58c77a [NETFILTER]: remove NF_CONNTRACK_ENABLED option
Remove the NF_CONNTRACK_ENABLED option. It was meant for a smoother upgrade
to nf_conntrack, people having reconfigured their kernel at least once since
ip_conntrack was removed will have the NF_CONNTRACK option already set.
People upgrading from older kernels have to reconfigure a lot anyway.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:24 -08:00
Patrick McHardy
4ad9d4fa94 [NETFILTER]: nfnetlink_queue: update copyright
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:23 -08:00
Patrick McHardy
0ef0f46580 [NETFILTER]: nfnetlink_queue: remove useless enqueue status codes
The queueing core doesn't care about the exact return value from
the queue handler, so there's no need to go through the trouble
of returning a meaningful value as long as we indicate an error.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:23 -08:00
Patrick McHardy
861934c7c8 [NETFILTER]: nfnetlink_queue: eliminate impossible switch case
We don't need a default case in nfqnl_build_packet_message(), the
copy_mode is validated when it is set. Tell the compiler about
the possible types and remove the default case. Saves 80b of
text on x86_64.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:22 -08:00
Patrick McHardy
ea3a66ff5a [NETFILTER]: nfnetlink_queue: use endianness-aware attribute functions
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:21 -08:00
Patrick McHardy
9d6023ab8b [NETFILTER]: nfnetlink_queue: mark hash table __read_mostly
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:21 -08:00
Patrick McHardy
2bd0119729 [NETFILTER]: nfnetlink_queue: remove useless debugging
Originally I wanted to just remove the QDEBUG macro and use pr_debug, but
none of the messages seems worth keeping.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:20 -08:00
Patrick McHardy
c5de0dfde8 [NETFILTER]: nfnetlink_queue: kill useless wrapper
nfqnl_set_mode takes the queue lock and calls __nfqnl_set_mode. Just move
the code from __nfqnl_set_mode to nfqnl_set_mode since there is no other
user.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:19 -08:00
Patrick McHardy
9872bec773 [NETFILTER]: nfnetlink: use RCU for queue instances hash
Use RCU for queue instances hash. Avoids multiple atomic operations
for each packet.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:18 -08:00
Patrick McHardy
a3c8e7fd4b [NETFILTER]: nfnetlink_queue: fix checks in nfqnl_recv_config
The peer_pid must be checked in all cases when a queue exists, currently
it is not checked if for NFQA_CFG_QUEUE_MAXLEN when a NFQA_CFG_CMD
attribute exists in some cases. Same for the queue existance check,
which can cause a NULL pointer dereference.

Also consistently return -ENODEV for "queue not found". -ENOENT would
be better, but that is already used to indicate a queued skb id doesn't
exist.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:18 -08:00
Patrick McHardy
e48b9b2fb3 [NETFILTER]: nfnetlink_queue: avoid unnecessary atomic operation
The sequence counter doesn't need to be an atomic_t, just move the increment
inside the locked section.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:17 -08:00
Patrick McHardy
f9c6399050 [NETFILTER]: remove annoying debugging message
Don't log "nf_hook: Verdict = QUEUE." message with NETFILTER_DEBUG=y.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:16 -08:00
Patrick McHardy
daaa8be2e0 [NETFILTER]: nf_queue: clean up error paths
Move duplicated error handling to end of function and add a helper function
to release the device and module references from the queue entry.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:16 -08:00
Patrick McHardy
4b3d15ef4a [NETFILTER]: {nfnetlink,ip,ip6}_queue: kill issue_verdict
Now that issue_verdict doesn't need to free the queue entries anymore,
all it does is disable local BHs and call nf_reinject. Move the BH
disabling to the okfn invocation in nf_reinject and kill the
issue_verdict functions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:15 -08:00
Patrick McHardy
02f014d888 [NETFILTER]: nf_queue: move list_head/skb/id to struct nf_info
Move common fields for queue management to struct nf_info and rename it
to struct nf_queue_entry. The avoids one allocation/free per packet and
simplifies the code a bit.

Alternatively we could add some private room at the tail, but since
all current users use identical structs this seems easier.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:14 -08:00
Patrick McHardy
7a6c6653b3 [NETFILTER]: ip6_queue: resync dev-index based flushing
Resync dev_cmp to take bridge devices into account.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:14 -08:00
Patrick McHardy
171b7fc4fc [NETFILTER]: ip6_queue: deobfuscate entry lookups
A queue entry lookup currently looks like this:

ipq_find_dequeue_entry -> __ipq_find_dequeue_entry ->
	__ipq_find_entry -> cmpfn -> id_cmp

Use simple open-coded list walking and kill the cmpfn for
ipq_find_dequeue_entry. Instead add it to ipq_flush (after
similar cleanups) and use ipq_flush for both complete flushes
and flushing entries related to a device.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:13 -08:00
Patrick McHardy
9521409265 [NETFILTER]: ip_queue: deobfuscate entry lookups
A queue entry lookup currently looks like this:

ipq_find_dequeue_entry -> __ipq_find_dequeue_entry ->
	__ipq_find_entry -> cmpfn -> id_cmp

Use simple open-coded list walking and kill the cmpfn for
ipq_find_dequeue_entry. Instead add it to ipq_flush (after
similar cleanups) and use ipq_flush for both complete flushes
and flushing entries related to a device.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:12 -08:00
Patrick McHardy
b43d8d8598 [NETFILTER]: nfnetlink_queue: deobfuscate entry lookups
A queue entry lookup currently looks like this:

find_dequeue_entry -> __find_dequeue_entry ->
	__find_entry -> cmpfn -> id_cmp

Use simple open-coded list walking and kill the cmpfn for
find_dequeue_entry. Instead add it to nfqnl_flush (after
similar cleanups) and use nfqnl_flush for both complete
flushes and flushing entries related to a device.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:12 -08:00
Patrick McHardy
0ac41e8146 [NETFILTER]: {nf_netlink,ip,ip6}_queue: use list_for_each_entry
Use list_add_tail/list_for_each_entry instead of list_add and
list_for_each_prev as a preparation for switching to RCU.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:11 -08:00
Patrick McHardy
c01cd429fc [NETFILTER]: nf_queue: move queueing related functions/struct to seperate header
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:10 -08:00
Patrick McHardy
f9d8928f83 [NETFILTER]: nf_queue: remove unused data pointer
Remove the data pointer from struct nf_queue_handler. It has never been used
and is useless for the only handler that really matters, nfnetlink_queue,
since the handler is shared between all instances.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:10 -08:00
Patrick McHardy
e3ac529815 [NETFILTER]: nf_queue: make queue_handler const
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:09 -08:00
Patrick McHardy
fb46990dba [NETFILTER]: nf_queue: remove unnecessary hook existance check
We hold a module reference for each queued packet, so the hook that
queued the packet can't disappear. Also remove an obsolete  comment
stating the opposite.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:08 -08:00
Patrick McHardy
8b1cf0db2a [NETFILTER]: nf_queue: minor cleanup
Clean up

if (x) y;

constructs. We've got nothing to hide :)

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:07 -08:00
Patrick McHardy
1999414a4e [NETFILTER]: Mark hooks __read_mostly
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:07 -08:00
Patrick McHardy
41c5b31703 [NETFILTER]: Use nf_register_hooks for multiple registrations
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:06 -08:00
Patrick McHardy
279c2c74b6 [NETFILTER]: nf_conntrack_proto_icmp: kill extern declaration in .c file
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:05 -08:00
Patrick McHardy
1841a4c7ae [NETFILTER]: nf_ct_h323: remove ipv6 module dependency
nf_conntrack_h323 needs ip6_route_output for the call forwarding filter.
Add a ->route function to nf_afinfo and use that to avoid pulling in the
ipv6 module.

Fix the #ifdef for the IPv6 code while I'm at it - the IPv6 support is
only needed when IPv6 conntrack is enabled.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:05 -08:00
Patrick McHardy
193b23c5a0 [NETFILTER]: xt_hashlimit: remove ip6tables module dependency
Switch from ipv6_find_hdr to ipv6_skip_exthdr to avoid pulling in ip6_tables
and ipv6 when only using it for IPv4.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:04 -08:00
Maciej Soltysiak
17dfc93f6d [NETFILTER]: {ip,ip6}t_LOG: log GID
Log GID in addition to UID

Signed-off-by: Maciej Soltysiak <maciej.soltysiak@ae.poznan.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:03 -08:00
Patrick McHardy
50c164a81f [NETFILTER]: x_tables: add rateest match
Add rate estimator match. The rate estimator match can match on
estimated rates by the RATEEST target. It supports matching on
absolute bps/pps values, comparing two rate estimators and matching
on the difference between two rate estimators.

This is what I use to route outgoing data connections from a FTP
server over two lines based on the  available bandwidth:

# estimate outgoing rates
iptables -t mangle -A POSTROUTING -o eth0 -j RATEEST --rateest-name eth0 \
                                                     --rateest-interval 250ms \
                                                     --rateest-ewma 0.5s
iptables -t mangle -A POSTROUTING -o ppp0 -j RATEEST --rateest-name ppp0 \
                                                     --rateest-interval 250ms \
                                                     --rateest-ewma 0.5s

# mark based on available bandwidth
iptables -t mangle -A BALANCE -m state --state NEW \
                              -m helper --helper ftp \
                              -m rateest --rateest-delta \
                                         --rateest1 eth0 \
                                         --rateest-bps1 2.5mbit \
                                         --rateest-gt \
                                         --rateest2 ppp0 \
                                         --rateest-bps2 2mbit \
                              -j CONNMARK --set-mark 0x1

iptables -t mangle -A BALANCE -m state --state NEW \
                              -m helper --helper ftp \
                              -m rateest --rateest-delta \
                                         --rateest1 ppp0 \
                                         --rateest-bps1 2mbit \
                                         --rateest-gt \
                                         --rateest2 eth0 \
                                         --rateest-bps2 2.5mbit \
                              -j CONNMARK --set-mark 0x2

iptables -t mangle -A BALANCE -j CONNMARK --restore-mark

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:03 -08:00
Patrick McHardy
5859034d7e [NETFILTER]: x_tables: add RATEEST target
Add new rate estimator target (using gen_estimator). In combination with
the rateest match (next patch) this can be used for load-based multipath
routing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:02 -08:00
Patrick McHardy
cb76c6a597 [NETFILTER]: ip_tables: remove obsolete SAME target
Remove the ipt_SAME target as scheduled in feature-removal-schedule.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:01 -08:00
Jan Engelhardt
5c350e5a38 [NETFILTER]: IPv6 capable xt_TOS v1 target
Extends the xt_DSCP target by xt_TOS v1 to add support for selectively
setting and flipping any bit in the IPv4 TOS and IPv6 Priority fields.
(ipt_TOS and xt_DSCP only accepted a limited range of possible
values.)

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:00 -08:00
Jan Engelhardt
f1095ab51d [NETFILTER]: IPv6 capable xt_tos v1 match
Extends the xt_dscp match by xt_tos v1 to add support for selectively
matching any bit in the IPv4 TOS and IPv6 Priority fields. (ipt_tos
and xt_dscp only accepted a limited range of possible values.)

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:00 -08:00
Jan Engelhardt
c9fd496809 [NETFILTER]: Merge ipt_TOS into xt_DSCP
Merge ipt_TOS into xt_DSCP.

Merge ipt_TOS (tos v0 target) into xt_DSCP. They both modify the same
field in the IPv4 header, so it seems reasonable to keep them in one
piece. This is part two of the implicit 4-patch series to move tos to
xtables and extend it by IPv6.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:59 -08:00
Jan Engelhardt
c3b33e6a2c [NETFILTER]: Merge ipt_tos into xt_dscp
Merge ipt_tos into xt_dscp.

Merge ipt_tos (tos v0 match) into xt_dscp. They both match on the same
field in the IPv4 header, so it seems reasonable to keep them in one
piece. This is part one of the implicit 4-patch series to move tos to
xtables and extend it by IPv6.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:58 -08:00
Jan Engelhardt
4c37799ccf [NETFILTER]: Use lowercase names for matches in Kconfig
Unify netfilter match kconfig descriptions

Consistently use lowercase for matches in kconfig one-line
descriptions and name the match module.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:57 -08:00
Laszlo Attila Toth
e2cf5ecbea [NETFILTER]: ipt_addrtype: limit address type checking to an interface
Addrtype match has a new revision (1), which lets address type checking
limited to the interface the current packet belongs to. Either incoming
or outgoing interface can be used depending on the current hook. In the
FORWARD hook two maches should be used if both interfaces have to be checked.
The new structure is ipt_addrtype_info_v1.

Revision 0 lets older userspace programs use the match as earlier.
ipt_addrtype_info is used.

Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:56 -08:00
Laszlo Attila Toth
0553811612 [IPV4]: Add inet_dev_addr_type()
Address type search can be limited to an interface by
inet_dev_addr_type function.

Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:56 -08:00
Jan Engelhardt
0265ab44ba [NETFILTER]: merge ipt_owner/ip6t_owner in xt_owner
xt_owner merges ipt_owner and ip6t_owner, and adds a flag to match
on socket (non-)existence.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:55 -08:00
Patrick McHardy
9e67d5a739 [NETFILTER]: x_tables: remove obsolete overflow check
We're not multiplying the size with the number of CPUs anymore, so the
check is obsolete.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:54 -08:00
Eric Dumazet
259d4e41f3 [NETFILTER]: x_tables: struct xt_table_info diet
Instead of using a big array of NR_CPUS entries, we can compute the size
needed at runtime, using nr_cpu_ids

This should save some ram (especially on David's machines where NR_CPUS=4096 :
32 KB can be saved per table, and 64KB for dynamically allocated ones (because
of slab/slub alignements) )

In particular, the 'bootstrap' tables are not any more static (in data
section) but on stack as their size is now very small.

This also should reduce the size used on stack in compat functions
(get_info() declares an automatic variable, that could be bigger than kernel
stack size for big NR_CPUS)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:54 -08:00
Jan Engelhardt
d3c5ee6d54 [NETFILTER]: x_tables: consistent and unique symbol names
Give all Netfilter modules consistent and unique symbol names.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:53 -08:00
Li Zefan
4c61097957 [NETFILTER]: replace list_for_each with list_for_each_entry
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:52 -08:00
Sven Schnelle
338e8a7926 [NETFILTER]: x_tables: add TCPOPTSTRIP target
Signed-off-by: Sven Schnelle <svens@bitebene.org>
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:51 -08:00
Patrick McHardy
6ac552fdc6 [NETLINK]: af_netlink.c checkpatch cleanups
Fix large number of checkpatch errors.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:50 -08:00
Herbert Xu
2fcb45b6b8 [IPSEC]: Use the correct family for input state lookup
When merging the input paths of IPsec I accidentally left a hard-coded
AF_INET for the state lookup call.  This broke IPv6 obviously.  This
patch fixes by getting the input callers to specify the family through
skb->cb.

Credit goes to Kazunori Miyazawa for diagnosing this and providing an
initial patch.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:49 -08:00
Wang Chen
bbca17680f [UDP]: Counter increment should be in USER mode for recvmsg
System calls should be USER. So change the BH to USER for
UDP*_INC_STATS_BH().

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:49 -08:00
Wang Chen
b2bf1e2659 [UDP]: Clean up for IS_UDPLITE macro
Since we have macro IS_UDPLITE, we can use it.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:48 -08:00
Wang Chen
cb75994ec3 [UDP]: Defer InDataGrams increment until recvmsg() does checksum
Thanks dave, herbert, gerrit, andi and other people for your
discussion about this problem.

UdpInDatagrams can be confusing because it counts packets that
might be dropped later.
Move UdpInDatagrams into recvmsg() as allowed by the RFC.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:47 -08:00
Ilpo Järvinen
6859d49475 [TCP]: Abstract tp->highest_sack accessing & point to next skb
Pointing to the next skb is necessary to avoid referencing
already SACKed skbs which will soon be on a separate list.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:46 -08:00
Ilpo Järvinen
7201883599 [TCP]: Cleanup local variables of clean_rtx_queue
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:46 -08:00
Ilpo Järvinen
ea60658cde [TCP]: Add unlikely() to urgent handling in clean_rtx_queue
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:45 -08:00
Ilpo Järvinen
89d478f7f2 [TCP]: Remove duplicated code block from clean_rtx_queue
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:44 -08:00
Ilpo Järvinen
234b686070 [TCP]: Add tcp_for_write_queue_from_safe and use it in mtu_probe
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:43 -08:00
Ilpo Järvinen
d67c58e9ae [TCP]: Remove local variable and use packets_in_flight directly
Lines won't be that long and it's compiler's job to optimize
them.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:43 -08:00
Ilpo Järvinen
50c4817e99 [TCP]: MTUprobe: prepare skb fields earlier
They better be valid when call to write_queue functions is made
once things that follow are going in.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:42 -08:00
Ilpo Järvinen
c3a05c6050 [TCP]: Cong.ctrl modules: remove unused good_ack from cong_avoid
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:41 -08:00
Ilpo Järvinen
ede9f3b186 [TCP]: Unite identical code from two seqno split blocks
Bogus seqno compares just mislead, the code is identical for
both sides of the seqno compare (and was even executed just
once because of return in between).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:41 -08:00
Ilpo Järvinen
407ef1de03 [TCP]: Remove superflucious FLAG_DATA_SACKED
To get there, highest_sack must have advanced. When it advances,
a new skb is SACKed, which already sets that FLAG. Besides, the
original purpose of it has puzzled me, never understood why
LOST bit setting of retransmitted skb is marked with
FLAG_DATA_SACKED.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:40 -08:00
Ilpo Järvinen
bce392f3b0 [TCP]: Move LOSTRETRANS MIB outside !(L|S) check
Usually those skbs will have L set, not counting them as lost
retransmissions is misleading.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:39 -08:00
Pavel Emelyanov
1dab62226d [IPV6]: Use ctl paths to register addrconf sysctls
This looks very much like the patch for ipv4's devinet.

This is also intended to help us with the net namespaces
and saves the ipv6.ko size by ~320 bytes.

The difference from the first version is just the patch
offsets, that changed due to changes in the patch #2.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:38 -08:00
Pavel Emelyanov
f52295a9c5 [IPV6]: Unify and cleanup calls to addrconf_sysctl_register
Currently this call is (ab)used similar to devinet one - it
registers sysctls for devices and for the "default" confs, while
the "all" sysctls are registered separately. But unlike its
devinet brother, the passed inet6_device is needed.

The fix is to make a __addrconf_sysctl_register(), which registers
sysctls for all "devices" we need, including "default" and "all" :)

The original addrconf_sysctl_register() calls the introduced
function, passing the inet6_device, device name and ifindex (to
be used as procname and ctl_name) into it.

Thanks to Herbert again for pointing out, that we can shrink the
argument list to 1 :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:38 -08:00
Pavel Emelyanov
bfada697bd [IPV4]: Use ctl paths to register devinet sysctls
This looks very much like the patch for neighbors.

The path is also located on the stack and is prepared
inside the function. This time, the call to the registering
function is guarded with the RTNL lock, but I decided
to keep it on the stack not to litter the devinet.c file
with unneeded names and to make it look similar to the
neighbors code.

This is also intended to help us with the net namespaces
and saves the vmlinux size as well - this time by more
than 670 bytes.

The difference from the first version is just the patch
offsets, that changed due to changes in the patch #2.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:37 -08:00
Pavel Emelyanov
66f27a5203 [IPV4]: Unify and cleanup calls to devinet_sysctl_register
Currently this call is used to register sysctls for devices
and for the "default" confs. The "all" sysctls are registered
separately.

Besides, the inet_device is passed to this function, but it is
not needed there at all - just the device name and ifindex are
required.

Thanks to Herbert, who noticed, that this call doesn't even
require the devconf pointer (the last argument) - all we need
we can take from the in_device itself.

The fix is to make a __devinet_sysctl_register(), which registers
sysctls for all "devices" we need, including "default" and "all" :)

The original devinet_sysctl_register() works with struct net_device,
not the inet_device, and calls the introduced function, passing
the device name and ifindex (to be used as procname and ctl_name)
into it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:36 -08:00
John W. Linville
edae58ead5 softmac: mark as obsolete and schedule for removal
Schedule softmac for for removal in the 2.6.26 development window.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:36 -08:00
John W. Linville
88fecd092e mac80211: remove "bcn_int" and "capab" scan results info
These bits were dead code before "mac80211: Remove local->scan_flags"
(commit 6681dd3fd0e4d36a4547415853e83411baa7b705) and probably should
have been removed as part of that commit.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:34 -08:00
Ron Rindjunsky
64bd4b693f mac80211: move A-MSDU identifier to flags
This patch moves u8 amsdu_frame in ieee80211_txrx_data to the flags
section as IEEE80211_TXRXD_RX_AMSDU

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:34 -08:00
Ron Rindjunsky
d3c990fb26 mac80211: adding 802.11n configuration flows
This patch configures the 802.11n mode of operation
internally in ieee80211_conf structure and in the low-level
driver as well (through op conf_ht).
It does not include AP configuration flows.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:33 -08:00
Ron Rindjunsky
fd4c7f2fce mac80211: adding 802.11n essential A-MSDU Rx capability
This patch adds the ability to receive and handle A-MSDU frames.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:32 -08:00
Ron Rindjunsky
9f985b0eee mac80211: adding 802.11n essential A-MPDU addBA capability
This patch adds the capability to identify and answer an add block ACK
request.
As this series of patches only adds HT handling with no aggregations,
(A-MPDU aggregations acceptance is not obligatory according to 802.11n
draft) we are currently sending back a refusal upon this request.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:31 -08:00
Ron Rindjunsky
c715350828 mac80211: adding 802.11n IEs handling
This patch presents the ability to parse and compose HT IEs, and to put
the IE relevant data inside the mac80211's internal HT structures

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:31 -08:00
Ron Rindjunsky
10816d40f2 mac80211: adding 802.11n HT framework definitions
New structures:
 - ieee80211_ht_info: describing STA's HT capabilities
 - ieee80211_ht_bss_info: describing BSS's HT characteristics
Changed structures:
 - ieee80211_hw_mode: now also holds PHY HT capabilities for each HW mode
 - ieee80211_conf: ht_conf holds current self HT configuration
                   ht_bss_conf holds current BSS HT configuration
 - flag IEEE80211_CONF_SUPPORT_HT_MODE added to indicate if HT use is
   desired
 - sta_info: now also holds Peer's HT capabilities

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:30 -08:00
Ron Rindjunsky
82b3cad942 mac80211: adding MAC80211_HT_DEBUG config variable
This patch adds MAC80211_HT_DEBUG config variable
to separate HT debug features

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:29 -08:00
Johannes Berg
b1357a81a9 mac80211: allow setting drop_unencrypted with wext
This patch allows wpa_supplicant to set the drop_unencrypted setting in
mac80211.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:29 -08:00
Johannes Berg
e38bad4766 mac80211: make ieee80211_iterate_active_interfaces not need rtnl
Interface iteration in mac80211 can be done without holding any
locks because I converted it to RCU. Initially, I thought this
wouldn't be needed for ieee80211_iterate_active_interfaces but
it's turning out that multi-BSS AP support can be much simpler
in a driver if ieee80211_iterate_active_interfaces can be called
without holding locks. This converts it to use RCU, it adds a
requirement that the callback it invokes cannot sleep.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:28 -08:00
Ron Rindjunsky
76ee65bfaa mac80211: restructuring data Rx handlers
This patch restructures the Rx handlers chain by incorporating previously
handlers ieee80211_rx_h_802_1x_pae and ieee80211_rx_h_drop_unencrypted
into ieee80211_rx_h_data, already in 802.3 form. this scheme follows more
precisely after the IEEE802.11 data plane archituecture, and will prevent
code duplication to IEEE8021.11n A-MSDU handler.

added function:
 - ieee80211_data_to_8023: transfering 802.11 data frames to 802.3 frame
 - ieee80211_deliver_skb: delivering the 802.3 frames to upper stack
eliminated handlers:
 - ieee80211_rx_h_drop_unencrypted: now function ieee80211_drop_unencrypted
 - ieee80211_rx_h_802_1x_pae: now function ieee80211_802_1x_pae
changed handlers:
 - ieee80211_rx_h_data: now contains calls to four above function

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:27 -08:00
Zhu Yi
ece8edddf0 mac80211: hardware scan rework
The scan code in mac80211 makes the software scan assumption in various
places. For example, we stop the Tx queue during a software scan so that
all the Tx packets will be queued by the stack. We also drop frames not
related to scan in the software scan process. But these are not true for
hardware scan.

Some wireless hardwares (for example iwl3945/4965) has the ability to
perform the whole scan process by hardware and/or firmware. The hardware
scan is relative powerful in that it tries to maintain normal network
traffic while doing a scan in the background. Some drivers (i.e iwlwifi)
do provide a way to tune the hardware scan parameters (for example if the
STA is associated, what's the max time could the STA leave from the
associated channel, how long the scans get suspended after returning to
the service channel, etc). But basically this is transparent to the
stack. mac80211 should not stop Tx queues or drop Rx packets during a
hardware scan.

This patch resolves the above problem by spliting the current scan
indicator local->sta_scanning into local->sta_sw_scanning and
local->sta_hw_scanning. It then changes the scan related code to be aware
of hardware scan or software scan in various places. With this patch,
iwlwifi performs much better in the scan-while-associated condition and
disable_hw_scan=1 should never be required.

Cc: Mohamed Abbas <mohamed.abbas@intel.com>
Cc: Ben Cahill <ben.m.cahill@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:27 -08:00
Pavel Emelyanov
f68635e627 [IPV6]: Cleanup the addconf_sysctl_register
This only includes fixing the space-indented lines and
removing one unneeded else after the goto.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:26 -08:00
Pavel Emelyanov
9fa8964299 [IPV4]: Cleanup the devinet_sysctl_register
I moved the call to kmalloc() from the *t declaration into
the code (this is confusing when a variable is initialized
with the result of some call) and removed unneeded comment
near the error path. Just like I did with the neigh ctl-s.

Besides, I fixed the goto's and the labels - they were indented
with spaces :(

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:25 -08:00
Pavel Emelyanov
c3bac5a71b [NEIGH]: Use the ctl paths to create neighbours sysctls
The appropriate path is prepared right inside this function. It
is prepared similar to how the ctl tables were.

Since the path is modified, it is put on the stack, to avoid
possible races with multiple calls to neigh_sysctl_register() : it
is called by protocols and I didn't find any protection in this
case. Did I overlooked the rtnl lock?.

The stack growth of the neigh_sysctl_register() is 40 bytes. I
believe this is OK, since this is not that much and this function
is not called with the deep stack (device/protocols register).

The device's name is stored on the template to free it later.

This will help with the net namespaces, as each namespace should
have its own set of these ctls.

Besides, this saves ~350 bytes from the neigh template :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:24 -08:00
Pavel Emelyanov
3c607bbb47 [NEIGH]: Cleanup the neigh_sysctl_register
This mainly removes the err variable, as this call always
return the same error code (-ENOBUFS).

Besides, I moved the call to kmalloc() from the *t declaration
into the code (this is confusing when a variable is initialized
with the result of some call) and removed unneeded comment near
the error path.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:24 -08:00
Pavel Emelyanov
1597fbc0fa [UNIX]: Make the unix sysctl tables per-namespace
This is the core.

 * add the ctl_table_header on the struct net;
 * make the unix_sysctl_register and _unregister clone the table;
 * moves calls to them into per-net init and exit callbacks;
 * move the .data pointer in the proper place.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:23 -08:00
Pavel Emelyanov
1d430b913c [UNIX]: Use ctl paths to register unix ctl tables
Unlike previous ones, this patch is useful by its own,
as it decreases the vmlinux size :)

But it will be used later, when the per-namespace sysctl
is added.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:22 -08:00
Pavel Emelyanov
d392e49756 [UNIX]: Move the sysctl_unix_max_dgram_qlen
This will make all the sub-namespaces always use the
default value (10) and leave the tuning via sysctl
to the init namespace only.

Per-namespace tuning is coming.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:22 -08:00
Pavel Emelyanov
97577e3828 [UNIX]: Extend unix_sysctl_(un)register prototypes
Add the struct net * argument to both of them to use in
the future. Also make the register one return an error code.

It is useless right now, but will make the future patches
much simpler.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:21 -08:00
Denis V. Lunev
dd88590995 [DECNET]: Remove extra memset from dn_fib_check_nh
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:20 -08:00
Paul Moore
875179fa60 [IPSEC]: SPD auditing fix to include the netmask/prefix-length
Currently the netmask/prefix-length of an IPsec SPD entry is not included in
any of the SPD related audit messages.  This can cause a problem when the
audit log is examined as the netmask/prefix-length is vital in determining
what network traffic is affected by a particular SPD entry.  This patch fixes
this problem by adding two additional fields, "src_prefixlen" and
"dst_prefixlen", to the SPD audit messages to indicate the source and
destination netmasks.  These new fields are only included in the audit message
when the netmask/prefix-length is less than the address length, i.e. the SPD
entry applies to a network address and not a host address.

Example audit message:

 type=UNKNOWN[1415] msg=audit(1196105849.752:25): auid=0 \
   subj=root:system_r:unconfined_t:s0-s0:c0.c1023 op=SPD-add res=1 \
   src=192.168.0.0 src_prefixlen=24 dst=192.168.1.0 dst_prefixlen=24

In addition, this patch also fixes a few other things in the
xfrm_audit_common_policyinfo() function.  The IPv4 string formatting was
converted to use the standard NIPQUAD_FMT constant, the memcpy() was removed
from the IPv6 code path and replaced with a typecast (the memcpy() was acting
as a slow, implicit typecast anyway), and two local variables were created to
make referencing the XFRM security context and selector information cleaner.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:19 -08:00
Arnaldo Carvalho de Melo
9108d5f4b2 [TFRC]: Hide tx history details from the CCIDs
Based on a previous patch by Gerrit Renker.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:19 -08:00
Eric W. Biederman
95bdfccb2b [NET]: Implement the per network namespace sysctl infrastructure
The user interface is: register_net_sysctl_table and
unregister_net_sysctl_table.  Very much like the current
interface except there is a network namespace parameter.

With this any sysctl registered with register_net_sysctl_table
will only show up to tasks in the same network namespace.

All other sysctls continue to be globally visible.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:18 -08:00
Patrick McHardy
be0ea7d5da [NETFILTER]: Convert old checksum helper names
Kill the defines again, convert to the new checksum helper names and
remove the dependency of NET_ACT_NAT on NETFILTER.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:15 -08:00
Patrick McHardy
a99a00cf1a [NET]: Move netfilter checksum helpers to net/core/utils.c
This allows to get rid of the CONFIG_NETFILTER dependency of NET_ACT_NAT.
This patch redefines the old names to keep the noise low, the next patch
converts all users.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:14 -08:00
Gerrit Renker
3159afe0d2 [DCCP]: Remove duplicate test for CloseReq
This removes a redundant test for unexpected packet types. In dccp_rcv_state_process
it is tested twice whether a DCCP-server has received a CloseReq (Step 7):

 * first in the combined if-statement,
 * then in the call to dccp_rcv_closereq().

The latter is necesssary since dccp_rcv_closereq() is also called from
__dccp_rcv_established().

This patch removes the duplicate test.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:14 -08:00
Gerrit Renker
0c86962076 [DCCP]: Integrate state transitions for passive-close
This adds the necessary state transitions for the two forms of passive-close

 * PASSIVE_CLOSE    - which is entered when a host   receives a Close;
 * PASSIVE_CLOSEREQ - which is entered when a client receives a CloseReq.

Here is a detailed account of what the patch does in each state.

1) Receiving CloseReq

  The pseudo-code in 8.5 says:

     Step 13: Process CloseReq
          If P.type == CloseReq and S.state < CLOSEREQ,
              Generate Close
              S.state := CLOSING
              Set CLOSING timer.

  This means we need to address what to do in CLOSED, LISTEN, REQUEST, RESPOND, PARTOPEN, and OPEN.

   * CLOSED:         silently ignore - it may be a late or duplicate CloseReq;
   * LISTEN/RESPOND: will not appear, since Step 7 is performed first (we know we are the client);
   * REQUEST:        perform Step 13 directly (no need to enqueue packet);
   * OPEN/PARTOPEN:  enter PASSIVE_CLOSEREQ so that the application has a chance to process unread data.

  When already in PASSIVE_CLOSEREQ, no second CloseReq is enqueued. In any other state, the CloseReq is ignored.
  I think that this offers some robustness against rare and pathological cases: e.g. a simultaneous close where
  the client sends a Close and the server a CloseReq. The client will then be retransmitting its Close until it
  gets the Reset, so ignoring the CloseReq while in state CLOSING is sane.

2) Receiving Close

  The code below from 8.5 is unconditional.

     Step 14: Process Close
          If P.type == Close,
              Generate Reset(Closed)
              Tear down connection
              Drop packet and return

  Thus we need to consider all states:
   * CLOSED:           silently ignore, since this can happen when a retransmitted or late Close arrives;
   * LISTEN:           dccp_rcv_state_process() will generate a Reset ("No Connection");
   * REQUEST:          perform Step 14 directly (no need to enqueue packet);
   * RESPOND:          dccp_check_req() will generate a Reset ("Packet Error") -- left it at that;
   * OPEN/PARTOPEN:    enter PASSIVE_CLOSE so that application has a chance to process unread data;
   * CLOSEREQ:         server performed active-close -- perform Step 14;
   * CLOSING:          simultaneous-close: use a tie-breaker to avoid message ping-pong (see comment);
   * PASSIVE_CLOSEREQ: ignore - the peer has a bug (sending first a CloseReq and now a Close);
   * TIMEWAIT:         packet is ignored.

   Note that the condition of receiving a packet in state CLOSED here is different from the condition "there
   is no socket for such a connection": the socket still exists, but its state indicates it is unusable.

   Last, dccp_finish_passive_close sets either DCCP_CLOSED or DCCP_CLOSING = TCP_CLOSING, so that
   sk_stream_wait_close() will wait for the final Reset (which will trigger CLOSING => CLOSED).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:13 -08:00
Gerrit Renker
f11135a344 [DCCP]: Dedicated auxiliary states to support passive-close
This adds two auxiliary states to deal with passive closes:
  * PASSIVE_CLOSE    (reached from OPEN via reception of Close)    and
  * PASSIVE_CLOSEREQ (reached from OPEN via reception of CloseReq)
as internal intermediate states.

These states are used to allow a receiver to process unread data before
acknowledging the received connection-termination-request (the Close/CloseReq).

Without such support, it will happen that passively-closed sockets enter CLOSED
state while there is still unprocessed data in the queue; leading to unexpected
and erratic API behaviour.

PASSIVE_CLOSE has been mapped into TCPF_CLOSE_WAIT, so that the code will
seamlessly work with inet_accept() (which tests for this state).

The state names are thanks to Arnaldo, who suggested this naming scheme
following an earlier revision of this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:12 -08:00
Gerrit Renker
f53dc67c5e [DCCP]: Use AF-independent rebuild_header routine
This fixes a nasty bug: dccp_send_reset() is called by both DCCPv4 and DCCPv6, but uses
inet_sk_rebuild_header() in each case. This leads to unpredictable and weird behaviour:
under some conditions, DCCPv6 Resets were sent, in other not.

The fix is to use the AF-independent rebuild_header routine.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:12 -08:00
Arnaldo Carvalho de Melo
276f2edc52 [TFRC]: Migrate TX history to singly-linked lis
This patch was based on another made by Gerrit Renker, his changelog was:

    ------------------------------------------------------
The patch set migrates TFRC TX history to a singly-linked list.

The details are:
 * use of a consistent naming scheme (all TFRC functions now begin with `tfrc_');
 * allocation and cleanup are taken care of internally;
 * provision of a lookup function, which is used by the CCID TX infrastructure
   to determine the time a packet was sent (in turn used for RTT sampling);
 * integration of the new interface with the present use in CCID3.
    ------------------------------------------------------

Simplifications I did:

. removing the tfrc_tx_hist_head that had a pointer to the list head and
  another for the slabcache.
. No need for creating a slabcache for each CCID that wants to use the TFRC
  tx history routines, create a single slabcache when the dccp_tfrc_lib module
  init routine is called.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:11 -08:00
Ilpo Järvinen
ea4f76ae13 [TCP]: Two fixes to new sacktag code
1) Skip condition used to be wrong way around which made SACK
processing very broken, missed many blocks because of that.

2) Use highest_sack advancement only if some skbs are already
sacked because otherwise tcp_write_queue_next may move things
too far (occurs mainly with GSO). The other similar advancement
is not problem because highest_sack was previosly put to point
a sacked skb.

These problems were located because of problem report from Matt
Mathis <mathis@psc.edu>.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:10 -08:00
Pavel Emelyanov
df1b86c53d [NET]: Nicer WARN_ON in netstat_show
The

        if (statement)
                WARN_ON(1);

looks much better as

        WARN_ON(statement);

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:10 -08:00
Fred L. Templin
c7dc89c0ac [IPV6]: Add RFC4214 support
This patch includes support for the Intra-Site Automatic Tunnel
Addressing Protocol (ISATAP) per RFC4214. It uses the SIT
module, and is configured using extensions to the "iproute2"
utility. The diffs are specific to the Linux 2.6.24-rc2 kernel
distribution.

This version includes the diff for ./include/linux/if.h which was
missing in the v2.4 submission and is needed to make the
patch compile. The patch has been installed, compiled and
tested in a clean 2.6.24-rc2 kernel build area.

Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:09 -08:00
Pavel Emelyanov
df97c708d5 [NET]: Eliminate unused argument from sk_stream_alloc_pskb
The 3rd argument is always zero (according to grep :) Eliminate
it and merge the function with sk_stream_alloc_skb.

This saves 44 more bytes, and together with the previous patch
we have:

add/remove: 1/0 grow/shrink: 0/8 up/down: 183/-751 (-568)
function                                     old     new   delta
sk_stream_alloc_skb                            -     183    +183
ip_rt_init                                   529     525      -4
arp_ignore                                   112     107      -5
__inet_lookup_listener                       284     274     -10
tcp_sendmsg                                 2583    2481    -102
tcp_sendpage                                1449    1300    -149
tso_fragment                                 417     258    -159
tcp_fragment                                1149     988    -161
__tcp_push_pending_frames                   1998    1837    -161

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:08 -08:00
Pavel Emelyanov
f561d0f27d [NET]: Uninline the sk_stream_alloc_pskb
This function seems too big for inlining. Indeed, it saves
half-a-kilo when uninlined:

add/remove: 1/0 grow/shrink: 0/7 up/down: 195/-719 (-524)
function                                     old     new   delta
sk_stream_alloc_pskb                           -     195    +195
ip_rt_init                                   529     525      -4
__inet_lookup_listener                       284     274     -10
tcp_sendmsg                                 2583    2486     -97
tcp_sendpage                                1449    1305    -144
tso_fragment                                 417     267    -150
tcp_fragment                                1149     992    -157
__tcp_push_pending_frames                   1998    1841    -157

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:07 -08:00
Joonwoo Park
3015a347dc [IPV4] fib_hash: kmalloc + memset conversion to kzalloc
fib_hash: kmalloc + memset conversion to kzalloc
fix to avoid memset entirely.

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:07 -08:00
Joonwoo Park
88f8349164 [IPV4] fib_semantics: kmalloc + memset conversion to kzalloc
fib_semantics: kmalloc + memset conversion to kzalloc
fix to avoid memset entirely.

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:06 -08:00
Joonwoo Park
dcaee95a1b [IPSEC]: kmalloc + memset conversion to kzalloc
2007/11/26, Patrick McHardy <kaber@trash.net>:
> How about also switching vmalloc/get_free_pages to GFP_ZERO
> and getting rid of the memset entirely while you're at it?
>

xfrm_hash: kmalloc + memset conversion to kzalloc
fix to avoid memset entirely.

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:05 -08:00
Ilpo Järvinen
8512430e55 [TCP]: Move FRTO checks out from write queue abstraction funcs
Better place exists in update_send_head (other non-queue related
adjustments are done there as well) which is the only caller of
tcp_advance_send_head (now that the bogus call from mtu_probe is
gone).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:05 -08:00
Pavel Emelyanov
82d8a867ff [NET]: Make macro to specify the ptype_base size
Currently this size is 16, but as the comment says this
is so only because all the chains (except one) has the
length 1. I think, that some day this may change, so
growing this hash will be much easier.

Besides, symbolic names are read better than magic constants.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:04 -08:00
Pavel Emelyanov
8d8ad9d7c4 [NET]: Name magic constants in sock_wake_async()
The sock_wake_async() performs a bit different actions
depending on "how" argument. Unfortunately this argument
ony has numerical magic values.

I propose to give names to their constants to help people
reading this function callers understand what's going on
without looking into this function all the time.

I suppose this is 2.6.25 material, but if it's not (or the
naming seems poor/bad/awful), I can rework it against the
current net-2.6 tree.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:03 -08:00
Gerrit Renker
ce865a61c8 [DCCP]: Add support for abortive release
This continues from the previous patch and adds support for actively aborting
a DCCP connection, using a Reset Code 2, "Aborted" to inform the peer of an
abortive release.

I have tried this in various client/server settings and it works as expected.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:02 -08:00
Gerrit Renker
d83bd95bf1 [DCCP]: Check for unread data on close
This removes one FIXME with regard to close when there is still unread data.
The mechanism is implemented similar to TCP: with regard to DCCP-specifics,
a Reset with Code 2, "Aborted" is sent to the peer.

This corresponds in part to RFC 4340, 8.1.1 and 8.1.5.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:01 -08:00
Gerrit Renker
dcfbc7e97a [CCID2]: Remove misleading comment
This removes a comment which identifies an `issue' with dccp_write_xmit() where there is none.
The comment assumes it is possible that a packet is sent between the calls to

	ccid_hc_tx_send_packet(),
	dccp_transmit_skb(),
	ccid_hc_tx_packet_sent()

(in the above order) in dccp_write_xmit().

I think that this is impossible, since dccp_write_xmit() is always called under lock:

 * when called as dccp_write_xmit(sk, 1) from dccp_send_close(), the socket is locked
   (see code comment above dccp_send_close());
 * when called as dccp_write_xmit(sk, 0) from dccp_send_msg(), it is after lock_sock() has been called;
 * when called as dccp_write_xmit(sk, 0) from dccp_write_xmit_timer(), bh_lock_sock() has been called
   and the if/else statement has made sure that sk_lock.owner is not set;
 * there are no other places where dccp_write_xmit() is called.

Furthermore, the debug statement for printing the sequence number of the packet just sent has been
removed, since the entire list is being printed anyway and so the entry of that number appears last.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:01 -08:00
Gerrit Renker
a302002516 [CCID2]: Remove redundant ack-counting variable
The code used two different variables to count Acks, one of them redundant.
This patch reduces the number of Ack counters to one.

The type of the Ack counter has also been changed to u32 (twice the range of int);
and the variable has been renamed into `packets_acked' - for consistency with
RFC 3465 (and similarly named variables are used by TCP and SCTP).

Lastly, a slightly less aggressive `maxincr' increment is used (for even Ack Ratios,
maxincr was Ack Ratio/2 + 1 instead of Ack Ratio/2).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:00 -08:00
Gerrit Renker
83399361c3 [CCID2]: Remove redundant synchronisation variable
This removes the synchronisation variable `ccid2hctx_sendwait', which is set to 1
when the CCID2 sender may send a new packet, and which is set to 0 otherwise

The variable is redundant, since it is only used in combination with the hc_tx_send_packet/
hc_tx_packet_sent function pair. Both functions are called under socket lock, so the
following happens when the CCID2 may send a new packet:

 * it sets sendwait = 1 in tx_send_packet and returns 0;
 * the subsequent call to tx_packet_sent clears the sendwait flag;
 * since tx_send_packet returns 0 if and only if sendwait == 1, the BUG_ON condition
   in tx_packet_sent is never satisfied, since that function is never called when
   tx_send_packet returns a value different from 0 (cf. dccp_write_xmit);
 * the call to tx_packet_sent clears the flag so that the condition "!sendwait" is
   true the next time tx_packet_sent is called.

In other words, it is sufficient to just return 0 / not-0 to synchronise tx_send_packet
and tx_packet_sent -- which is what the patch does.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:59 -08:00
Gerrit Renker
da98e0b5d4 [CCID2]: Redundant debugging output
This reduces the amount of redundant debugging messages:

 * pipe/cwnd are printed in both tx_send_packet() and tx_packet_sent().
   Both functions are called immediately after one another, so one occurrence is sufficient.

 * Since tx_packet_sent() prints pipe/cwnd already, the second printk for pipe is redundant.

 * In tx_packet_sent() the check_sanity function is called twice (at the begin and at the end).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:59 -08:00
Gerrit Renker
95b21d7e9d [CCID2]: Replace pipe assignment-function with assignment
The function ccid2_change_pipe only does an assignment. This patch simplifies the code by
replacing the function with the assignment it performs.

Furthermore, the type of pipe is promoted from `signed' to unsigned (increasing the range).
As a result, a BUG_ON test for negative values now becomes obsolete (for safety not removed,
but replaced with a less annoying `DCCP_BUG').

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:58 -08:00
Gerrit Renker
3deeadd74b [CCID2]: Replace cwnd assignment-function with assignment
The current function ccid2_change_cwnd in effect makes only an assignment, as
the test whether cwnd has reached 0 is only required when cwnd is halved.

This patch simplifies the code by replacing the function with the assignment
it performs.

Furthermore, since ssthresh derives from cwnd and appears in many assignments and
comparisons, the type of ssthresh has also been changed to match that of cwnd.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:57 -08:00
Gerrit Renker
63df18ad7f [CCID2]: Replace read-only variable with constant
This replaces the field member `numdupack', which was used as a read-only
constant in the code, with a #define.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:57 -08:00
Gerrit Renker
7792cd8885 [CCID2]: Remove unused variable
This removes a variable `ccid2hctx_sent' which is incremented but
never referenced/read (i.e., dead code).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:56 -08:00
Gerrit Renker
900bfed471 [CCID2]: Disable broken Ack Ratio adaptation algorithm
This comments out a problematic section comprising a half-finished algorithm:

 - The variable `ccid2hctx_ackloss' is never initialised to a value different from 0 and
   hence in fact is a read-only constant.
 - The `arsent' variable counts packets other than Acks (it is incremented for every packet),
   and there is no test for Ack Loss.
 - The concept of counting Acks as such leads to a complex calculation, and the calculation
   at the moment is inconsistent with this concept.
   The problem is that the number of Acks - rather than the number of windows - is counted,
   which leads to a complex (cubic/quadratic) expression - this is not even implemented.

In its current state, the commented-out algorithm interfers with normal processing by
changing Ack Ratio incorrectly, and at the wrong times.

A new algorithm is necessary, which will not necessarily use the same variables as used by
the unfinished one; hence the old variables have been removed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:55 -08:00
Gerrit Renker
b00d2bbc45 [CCID2]: Larger initial windows also for CCID2
RFC 4341, sec. 5 states that "The cwnd parameter is initialized to at most
four packets for new connections, following the rules from [RFC3390]", which
is implemented by this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:55 -08:00
Arnaldo Carvalho de Melo
e18d7a9857 [DCCP]: Initialize dccp_sock before calling the ccid constructors
This is because in the next patch CCID2 will assume that dccps_mss_cache is
non-zero.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:54 -08:00
Gerrit Renker
d50ad163e6 [CCID2]: Deadlock and spurious timeouts when Ack Ratio > cwnd
This patch removes a bug in the current code. I agree with Andrea's comment
that there is a problem here but the way it is treated does not fix it.

The problem is that whenever Ack Ratio > cwnd, starvation/deadlock occurs:
 * the receiver will not send an Ack until (Ack Ratio - cwnd) data packets
   have arrived;
 * the sender will not send any data packet before the receipt of an Ack
   advances the send window.
The only way that the connection then progresses was via RTO timeout. In one
extreme case (bulk transfer), it was observed that this happened for every single
packet; i.e. hundreds of packets, each a RTO timeout of 1..3 seconds apart:
a transfer which normally would take a fraction of a second thus grew to
several minutes.

The solution taken by this approach is to observe the relation

                   "Ack Ratio <= cwnd"

by using the constraint (1) from RFC 4341, 6.1.2; i.e. set

                 Ack Ratio = ceil(cwnd / 2)

and update it whenever either Ack Ratio or cwnd change. This ensures that
the deadlock problem can not arise.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:53 -08:00
Gerrit Renker
df054e1d00 [CCID2]: Don't assign negative values to Ack Ratio
Since it makes not sense to assign negative values to Ack Ratio, this
patch disallows this possibility.

As a consequence, a Bug test for negative Ack Ratio values becomes obsolete.

Furthermore, a check against overflow (as Ack Ratio may not exceed 2 bytes,
due to RFC 4340, 11.3) has been added.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:53 -08:00
Gerrit Renker
cfbbeabc88 [CCID2]: Fix sequence number arithmetic/comparisons
This replaces use of normal subtraction with modulo-48 subtraction.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:52 -08:00
Gerrit Renker
3de5489f47 [CCID2]: Bug in reading Ack Vectors
In CCID2 the receiver-history is sorted in ascending order of sequence number,
but the processing of received Ack Vectors requires the list traversal in the
opposite direction.

The current code has a bug in this regard: the list traversal is upwards. As a
consequence, only Ack Vectors with a run length of 1 will pass, in all other
Ack Vectors the remaining (acked) sequence numbers are missed, and may later
falsely be identified as lost.

Note: This bug is only visible when Ack Ratio > 1, since otherwise the run
      lengths of Ack Vectors are 0.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:51 -08:00
Gerrit Renker
a47c51044a [ACKVEC]: Reduce length of identifiers
This is reduces the length of the struct ackvec/ackvec_record fields. It is
a purely text-based replacement:

	s#dccpavr_#avr_#g;
	s#dccpav_#av_#g;

and increases readability somewhat.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:51 -08:00
Pavel Emelyanov
f126734735 [IPV6]: Correct the comment concerning inetsw6 table
It seems that net/ipv6/af_inet6.c was copied from net/ipv4/af_inet.c,
but one comment was not fixed.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:49 -08:00
Pavel Emelyanov
a53eb3feb2 [UNIX] Move the unix sock iterators in to proper place
The first_unix_socket() and next_unix_sockets() are now used
in proc file and in forall_unix_socets macro only.

The forall_unix_sockets is not used in this file at all so
remove it. After this move the helpers to where they really
belong, i.e. closer to proc code under the #ifdef CONFIG_PROC_FS
option.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:49 -08:00
Gerrit Renker
c86ab2b6a5 [DCCP]: Ignore Ack Vectors / Elapsed Time on DCCP-Request also
Small update with regard to RFC 4340 (references added as documentation):
on Requests, Ack Vectors / Elapsed Time should be ignored.
Length handling of Elapsed Time also simplified.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:47 -08:00
Gerrit Renker
6d57b43bf8 [DCCP]: Remove redundant dependency on IP_DCCP
This cleans up the consequences of an earlier patch which
introduced the `if IP_DCCP' clause into net/dccp/Kconfig.

The CCID Kconfig menu is sourced within this clause; as a
consequence, all tests of type `depends on IP_DCCP' are now
redundant.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:46 -08:00
Gerrit Renker
e333b3edc4 [DCCP]: Promote CCID2 as default CCID
This patch addresses the following problems:

 1. DCCP relies for its proper functioning on having at least one CCID module
    enabled (as in TCP plugable congestion control). Currently it is possible to
    disable both CCIDs and thus leave the DCCP module in a compiled, but entirely
    non-functional state: no sockets can be created when no CCID is available.
    Furthermore, the protocol is (again like TCP) not intended to be used without
    CCIDs. Last, a non-empty CCID list is needed for doing CCID feature negotiation.

 2. Internally the default CCID that is advertised by the Linux host is set to CCID2
    (DCCPF_INITIAL_CCID in include/linux/dccp.h). Disabling CCID2 in the Kconfig
    menu without changing the defaults leads to a failure `module not found' when
    trying to load the dccp module (which internally tries to load the default CCID).

 3. The specification (RFC 4340, sec. 10) treats CCID2 somewhat like a
    `minimum common denominator'; the specification says that:

    * "New connections start with CCID 2 for both endpoints"

    * "A DCCP implementation intended for general use, such as an implementation in a
       general-purpose operating system kernel, SHOULD implement at least CCID 2.
       The intent is to make CCID 2 broadly available for interoperability [...]"

    Providing CCID2 as minimum-required CCID (like Reno/Cubic in TCP) thus seems reasonable.

Hence this patch automatically selects CCID2 when DCCP is enabled. Documentation also added.

Discussions with Ian McDonald on this subject are gratefully acknowledged.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:46 -08:00
Gerrit Renker
8e8c71f1ab [DCCP]: Honour and make use of shutdown option set by user
This extends the DCCP socket API by honouring any shutdown(2) option set by the user.
The behaviour is, as much as possible, made consistent with the API for TCP's shutdown.

This patch exploits the information provided by the user via the socket API to reduce
processing costs:
 * if the read end is closed (SHUT_RD), it is not necessary to deliver to input CCID;
 * if the write end is closed (SHUT_WR), the same idea applies, but with a difference -
   as long as the TX queue has not been drained, we need to receive feedback to keep
   congestion-control rates up to date. Hence SHUT_WR is honoured only after the last
   packet (under congestion control) has been sent;
 * although SHUT_RDWR seems nonsensical, it is nevertheless supported in the same manner
   as for TCP (and agrees with test for SHUTDOWN_MASK in dccp_poll() in net/dccp/proto.c).

Furthermore, most of the code already honours the sk_shutdown flags (dccp_recvmsg() for
instance sets the read length to 0 if SHUT_RD had been called); CCID handling is now added
to this by the present patch.

There will also no longer be any delivery when the socket is in the final stages, i.e. when
one of dccp_close(), dccp_fin(), or dccp_done() has been called - which is fine since at
that stage the connection is its final stages.

Motivation and background are on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/shutdown

A FIXME has been added to notify the other end if SHUT_RD has been set (RFC 4340, 11.7).

Note: There is a comment in inet_shutdown() in net/ipv4/af_inet.c which asks to "make
      sure the socket is a TCP socket". This should probably be extended to mean
      `TCP or DCCP socket' (the code is also used by UDP and raw sockets).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:44 -08:00
Gerrit Renker
c3ada46a00 [CCID3]: Inline for moving average
The moving average computation occurs so frequently in the CCID 3 code that
it merits an inline function  of its own. This is uses a suggestion by
Arnaldo as per http://www.mail-archive.com/dccp@vger.kernel.org/msg01662.html

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:43 -08:00
Gerrit Renker
a5358fdc9c [CCID3]: Accurately determine idle & application-limited periods
This fixes/updates the handling of idle and application-limited periods in CCID3,
which currently is broken: there is no detection as to how long a sender has been
idle - there is only one flag which is toggled in between function calls.

Being obsolete now, the `idle' flag is removed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:42 -08:00
Gerrit Renker
eb279b79c4 [CCID3]: Ignore trivial amounts of elapsed time
This patch fixes a previously undiscovered bug; the problem is in computing
the elapsed time as the time between `receiving' the packet (i.e. skb enters
CCID module) and sending feedback:

     - there is no layer-processing, queueing, or delay involved,
     - hence the elapsed time is in the order of 1 function call
     - this is in the dimension of maximally 50..100usec
     - which renders the use of elapsed time almost entirely useless.

The fix is simply to ignore such trivial amounts of elapsed time.

As a further advantage, the now useless elapsed_time field can be removed from
the socket, which reduces the socket structure by another four bytes.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:42 -08:00
Gerrit Renker
6c08b2cf48 [CCID3]: Revert use of MSS instead of s
This updates the CCID3 code with regard to two instances of using `MSS' in place of `s':

 1. The RFC3390-based initial rate: both rfc3448bis as well as the Faster Restart
    draft now consistently use `s' instead of MSS.

 2. Now agrees with section 4.2 of rfc3448bis: "If the sender is ready to send data when
    it does not yet have a round trip sample, the value of X is set to s bytes per
    second, for segment size s [...]"

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:41 -08:00
Arnaldo Carvalho de Melo
ebb53d7565 [NET] proto: Use pcounters for the inuse field
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:40 -08:00
Johannes Berg
0c884439db mac80211: remove more forgotten code
Hopefully that's the rest. Seems I didn't do a very thorough job
removing the management interface.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:39 -08:00
Helmut Schaa
48933dea47 mac80211: Remove local->scan_flags
This patch removes all references to local->scan_flags as these are not
used anymore since the removal of prism2 ioctls.

Signed-off-by: Helmut Schaa <hschaa@suse.de>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:37 -08:00
Johannes Berg
dabeb344f5 mac80211: provide interface iterator for drivers
Sometimes drivers need to know which interfaces are associated with
their hardware. Rather than forcing those drivers to keep track of
the interfaces that were added, this adds an iteration function to
mac80211.

As it is intended to be used from the interface add/remove callbacks,
the iteration function may currently only be called under RTNL.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:37 -08:00
Pavel Emelyanov
9859a79023 [NET]: Compact sk_stream_mem_schedule() code
This function references sk->sk_prot->xxx for many times.
It turned out, that there's so many code in it, that gcc
cannot always optimize access to sk->sk_prot's fields.

After saving the sk->sk_prot on the stack and comparing
disassembled code, it turned out that the function became
~10 bytes shorter and made less dereferences (on i386 and
x86_64). Stack consumption didn't grow.

Besides, this patch drives most of this function into the
80 columns limit.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:36 -08:00
Benjamin Thery
3ef1355dcb [NET]: Make netns cleanup to run in a separate queue
This patch adds a separate workqueue for cleaning up a network
namespace. If we use the keventd workqueue to execute cleanup_net(),
there is a problem to unregister devices in IPv6. Indeed the code
that cleans up also schedule work in keventd: as long as cleanup_net()
hasn't return, dst_gc_task() cannot run and as long as dst_gc_task() has
not run, there are still some references pending on the net devices and
cleanup_net() can not unregister and exit the keventd workqueue.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Acked-By: Kirill Korotaev <dev@sw.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:35 -08:00
Pavel Emelyanov
85b606800b [IPVS]: Relax the module get/put in ip_vs_app.c
Both try_module_get/module_put already handle the module == NULL
case, so no need in manual checking.

This patch fits both net-2.6 and net-2.6.25.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:35 -08:00
Adrian Bunk
02d45827fa [NET] net/core/request_sock.c: Remove unused exports.
This patch removes the following unused EXPORT_SYMBOL's:
- reqsk_queue_alloc
- __reqsk_queue_destroy
- reqsk_queue_destroy

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:33 -08:00
Eric Dumazet
beb659bd8c [PATCH] IPV4 : Move ip route cache flush (secret_rebuild) from softirq to workqueue
Every 600 seconds (ip_rt_secret_interval), a softirq flush of the
whole ip route cache is triggered. On loaded machines, this can starve
softirq for many seconds and can eventually crash.

This patch moves this flush to a workqueue context, using the worker
we intoduced in commit 39c90ece75 (IPV4:
Convert rt_check_expire() from softirq processing to workqueue.)

Also, immediate flushes (echo 0 >/proc/sys/net/ipv4/route/flush) are
using rt_do_flush() helper function, wich take attention to
rescheduling.

Next step will be to handle delayed flushes
("echo -1 >/proc/sys/net/ipv4/route/flush" or "ip route flush cache")

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:33 -08:00
Pavel Emelyanov
42a73808ed [RAW]: Consolidate proc interface.
Both ipv6/raw.c and ipv4/raw.c use the seq files to walk
through the raw sockets hash and show them.

The "walking" code is rather huge, but is identical in both
cases. The difference is the hash table to walk over and
the protocol family to check (this was not in the first
virsion of the patch, which was noticed by YOSHIFUJI)

Make the ->open store the needed hash table and the family
on the allocated raw_iter_state and make the start/next/stop
callbacks work with it.

This removes most of the code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:32 -08:00
Pavel Emelyanov
ab70768ec7 [RAW]: Consolidate proto->unhash callback
Same as the ->hash one, this is easily consolidated.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:31 -08:00
Pavel Emelyanov
65b4c50b47 [RAW]: Consolidate proto->hash callback
Having the raw_hashinfo it's easy to consolidate the
raw[46]_hash functions.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:31 -08:00
Pavel Emelyanov
b673e4dfc8 [RAW]: Introduce raw_hashinfo structure
The ipv4/raw.c and ipv6/raw.c contain many common code (most
of which is proc interface) which can be consolidated.

Most of the places to consolidate deal with the raw sockets
hashtable, so introduce a struct raw_hashinfo which describes
the raw sockets hash.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:30 -08:00