Current /proc/net is done with so called "shadows", but current
implementation is broken and has little chances to get fixed.
The problem is that dentries subtree of /proc/net directory has
fancy revalidation rules to make processes living in different
net namespaces see different entries in /proc/net subtree, but
currently, tasks see in the /proc/net subdir the contents of any
other namespace, depending on who opened the file first.
The proposed fix is to turn /proc/net into a symlink, which points
to /proc/self/net, which in turn shows what previously was in
/proc/net - the network-related info, from the net namespace the
appropriate task lives in.
# ls -l /proc/net
lrwxrwxrwx 1 root root 8 Mar 5 15:17 /proc/net -> self/net
In other words - this behaves like /proc/mounts, but unlike
"mounts", "net" is not a file, but a directory.
Changes from v2:
* Fixed discrepancy of /proc/net nlink count and selinux labeling
screwup pointed out by Stephen.
To get the correct nlink count the ->getattr callback for /proc/net
is overridden to read one from the net->proc_net entry.
To make selinux still work the net->proc_net entry is initialized
properly, i.e. with the "net" name and the proc_net parent.
Selinux fixes are
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Changes from v1:
* Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kei Tokunaga reported an interactivity problem when moving tasks
between control groups.
Tasks would retain their old vruntime when moved between groups, this
can cause funny lags. Re-set the vruntime on group move to fit within
the new tree.
Reported-by: Kei Tokunaga <tokunaga.keiich@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit db1ed684f6 ("[IPV6]
UDP: Rename IPv6 UDP files."), commit
8be8af8fa4 ("[IPV4] UDP: Move
IPv4-specific bits to other file.") and commit
e898d4db27 ("[UDP]: Allow users to
configure UDP-Lite.").
First, udplite is of such small cost, and it is a core protocol just
like TCP and normal UDP are.
We spent enormous amounts of effort to make udplite share as much code
with core UDP as possible. All of that work is less valuable if we're
just going to slap a config option on udplite support.
It is also causing build failures, as reported on linux-next, showing
that the changeset was not tested very well. In fact, this is the
second build failure resulting from the udplite change.
Finally, the config options provided was a bool, instead of a modular
option. Meaning the udplite code does not even get build tested
by allmodconfig builds, and furthermore the user is not presented
with a reasonable modular build option which is particularly needed
by distribution vendors.
Signed-off-by: David S. Miller <davem@davemloft.net>
Make them all use angle brackets and the directory name.
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
This adds the Gigabit Ethernet driver for the SSB
Gigabit Ethernet core. This driver actually is a frontend to
the Tigon3 driver. So the real work is done by tg3.
This device is used in the Linksys WRT350N.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Added support for mesh id and mesh path operation as well as
station structure dumping.
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This introduces a new WEXT type IW_MODE_MESH for mesh networks,
used for scan results.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Introduced by changeset 95e41e93e1
("[IPV6]: Make ndisc_flow_init() common for later use.")
Reported by Stephen Rothwell.
In file included from net/ipv6/netfilter/ip6_tables.c:21:
include/linux/icmpv6.h:192: warning: 'struct in6_addr' declared inside parameter list
include/linux/icmpv6.h:192: warning: its scope is only this definition or declaration, which is probably not what you want
Signed-off-by: David S. Miller <davem@davemloft.net>
(Anonymous) unions can help us to avoid ugly casts.
A common cast it the (struct rtable *)skb->dst one.
Defining an union like :
union {
struct dst_entry *dst;
struct rtable *rtable;
};
permits to use skb->rtable in place.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce new LSM interfaces to allow an FS to deal with their own mount
options. This includes a new string parsing function exported from the
LSM that an FS can use to get a security data blob and a new security
data blob. This is particularly useful for an FS which uses binary
mount data, like NFS, which does not pass strings into the vfs to be
handled by the loaded LSM. Also fix a BUG() in both SELinux and SMACK
when dealing with binary mount data. If the binary mount data is less
than one page the copy_page() in security_sb_copy_data() can cause an
illegal page fault and boom. Remove all NFSisms from the SELinux code
since they were broken by past NFS changes.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <jmorris@namei.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (22 commits)
[IPCONFIG]: The kernel gets no IP from some DHCP servers
b43legacy: Fix module init message
rndis_wlan: fix broken data copy
libertas: compare the current command with response
libertas: fix sanity check on sequence number in command response
p54: fix eeprom parser length sanity checks
p54: fix EEPROM structure endianness
ssb: Add pcibios_enable_device() return value check
rc80211-pid: fix rate adjustment
[ESP]: Add select on AUTHENC
[TCP]: Improve ipv4 established hash function.
[NETPOLL]: Revert two bogus cleanups that broke netconsole.
[PPPOL2TP]: Add missing sock_put() in pppol2tp_tunnel_closeall()
Subject: [PPPOL2TP] add missing sock_put() in pppol2tp_recv_dequeue()
[BLUETOOTH]: l2cap info_timer delete fix in hci_conn_del
[NET]: Fix race in generic address resolution.
iucv: fix build error on !SMP
[TCP]: Must count fack_count also when skipping
[TUN]: Fix RTNL-locking in tun/tap driver
[SCTP]: Use proc_create to setup de->proc_fops.
...
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
debugfs: fix sparse warnings
Driver core: Fix cleanup when failing device_add().
driver core: Remove dpm_sysfs_remove() from error path of device_add()
PM: fix new mutex-locking bug in the PM core
PM: Do not acquire device semaphores upfront during suspend
kobject: properly initialize ksets
sysfs: CONFIG_SYSFS_DEPRECATED fix
driver core: fix up Kconfig text for CONFIG_SYSFS_DEPRECATED
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6:
USB: ftdi_sio - really enable EM1010PC
USB: remove incorrect struct class_device from the printer gadget
USB: pxa2xx_udc: fix misuse of clock enable/disable calls
USB: ftdi_sio: Workaround for broken Matrix Orbital serial port
USB: Add support for AXESSTEL MV110H CDMA modem
usb-storage: update earlier scatter-gather bug fix
USB: isp116x: fix enumeration on boot
USB: ehci: handle large bulk URBs correctly (again)
USB: spruce up the device blacklist
USB: fix comment of struct usb_interface
USB: update Kconfig entry for USB_SUSPEND
usb: Add support for the mos7820/7840-based B&B USB/RS485 converter to mos7840.c
When a raid1 array is stopped, all components currently get added to the list
for auto-detection. However we should really only add components that were
found by autodetection in the first place. So add a flag to record that
information, and use it.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On an md array with a write-intent bitmap, a thread wakes up every few seconds
and scans the bitmap looking for work to do. If the array is idle, there will
be no work to do, but a lot of scanning is done to discover this.
So cache the fact that the bitmap is completely clean, and avoid scanning the
whole bitmap when the cache is known to be clean.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
iommu_is_span_boundary is used internally in the IOMMU helper
(lib/iommu-helper.c), a primitive function that judges whether a memory area
spans LLD's segment boundary or not.
It's difficult to convert some IOMMUs to use the IOMMU helper but
iommu_is_span_boundary is still useful for them. So this patch exports it.
This is needed for the parisc iommu fixes.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nothing uses mem_cgroup_uncharge apart from mem_cgroup_uncharge_page, (a
trivial wrapper around it) and mem_cgroup_end_migration (which does the same
as mem_cgroup_uncharge_page). And it often ends up having to lock just to let
its caller unlock. Remove it (but leave the silly locking until a later
patch).
Moved mem_cgroup_cache_charge next to mem_cgroup_charge in memcontrol.h.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace free_hot_cold_page's VM_BUG_ON(page_get_page_cgroup(page)) by a "Bad
page state" and clear: most users don't have CONFIG_DEBUG_VM on, and if it
were set here, it'd likely cause corruption when the page is reused.
Don't use page_assign_page_cgroup to clear it: that should be private to
memcontrol.c, and always called with the lock taken; and memmap_init_zone
doesn't need it either - like page->mapping and other pointers throughout the
kernel, Linux assumes pointers in zeroed structures are NULL pointers.
Instead use page_reset_bad_cgroup, added to memcontrol.h for this only.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Each caller of mem_cgroup_move_lists is having to use page_get_page_cgroup:
it's more convenient if it acts upon the page itself not the page_cgroup; and
in a later patch this becomes important to handle within memcontrol.c.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
vm_match_cgroup is a perverse name for a macro to match mm with cgroup: rename
it mm_match_cgroup, matching mm_init_cgroup and mm_free_cgroup.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wrap __mark_check_format() into an if(0) to make sure that parameters such as
trace_mark(mm_page_alloc, "order %u pfn %lu", order, page?page_to_pfn(page):0);
(where page_to_pfn() has side-effects) won't generate code because of the
__mark_check_format().
Thanks to Jan Kiszka for reporting this.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Include falloc.h in header-y; it defines a flag for the fallocate sysctl.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SM502 has a programmable PLL which can provide the panel pixel clock instead
of the 288MHz and 336MHz PLLs.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Cc: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We should be able to do ndelay(some_u64), but that can cause a call to
__divdi3() to be emitted because the ndelay() macros does a divide.
Fix it by switching to static inline which will force the u64 arg to be
treated as an unsigned long. udelay() takes an unsigned long arg.
[bunk@kernel.org: reported m68k build breakage]
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Martin Michlmayr <tbm@cyrius.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
People are adding `noinline' in various places to prevent excess stack
consumption due to gcc inlining. But once this is done, it is quite unobvious
why the `noinline' is present in the code. We can comment each and every
site, or we can use noinline_for_stack.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename Memory Controller to Memory Resource Controller. Reflect the same
changes in the CONFIG definition for the Memory Resource Controller. Group
together the config options for Resource Counters and Memory Resource
Controller.
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add CONFIG_HAVE_KRETPROBES to the arch/<arch>/Kconfig file for relevant
architectures with kprobes support. This facilitates easy handling of
in-kernel modules (like samples/kprobes/kretprobe_example.c) that depend on
kretprobes being present in the kernel.
Thanks to Sam Ravnborg for helping make the patch more lean.
Per Mathieu's suggestion, added CONFIG_KRETPROBES and fixed up dependencies.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a <linux/gpio.h> defining fail/warn stubs for GPIO calls on platforms that
don't support the GPIO programming interface. That includes the arch-specific
implementation glue otherwise.
This facilitates a new model for GPIO usage: drivers that can use GPIOs if
they're available, but don't require them. One example of such a driver is
NAND driver for various FreeScale chips. On platforms update with GPIO
support, they can be used instead of a worst-case delay to verify that the
BUSY signal is off.
(Also includes a couple minor unrelated doc updates.)
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The definitions of struct pci_device_id arrays should generally follow
the same pattern across the entire kernel. This macro defines this
array as const and puts it into the __devinitconst section.
There are currently many definitions scattered about the kernel that
omit the __devinitdata modifier despite the documentation stating that
it should always be there. These definitions really also should have
been const, which wasn't possible before but has become so with the
addition of the __devinitconst attribute.
Furthermore, there are definitions that use "const" and __devinitdata,
which is explicitly wrong but the compiler doesn't catch section
mismatches if there's only one such one case in the module (which is
often the case).
Adding the __devinitconst modifier where there was nothing before buys
us memory. Adding the const modifier gives the compiler a chance to do
its thing. Changing __devinitdata to __devinitconst where it was wrong
actually fixes some compiler errors in older (mid-release) kernels that
were patched over by "removing" the section attribute altogether (which
wastes memory).
This macro makes it pretty difficult to get this definition wrong in
the future...
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
update the comment for the removed "driver" field and being
out-of-order of @cur_altsetting and @num_altsetting.
Signed-off-by: Lei Ming <tom.leiming@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
extern does not belong in C files, move declaration to linux/debugfs.h
fs/debugfs/file.c:42:30: warning: symbol 'debugfs_file_operations' was not declared. Should it be static?
fs/debugfs/file.c:54:31: warning: symbol 'debugfs_link_operations' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Based upon a report by Andrew Morton and code analysis done
by Jarek Poplawski.
This reverts 33f807ba0d ("[NETPOLL]:
Kill NETPOLL_RX_DROP, set but never tested.") and
c7b6ea24b4 ("[NETPOLL]: Don't need
rx_flags.").
The rx_flags did get tested for zero vs. non-zero and therefore we do
need those tests and that code which sets NETPOLL_RX_DROP et al.
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
ioat: fix 'ack' handling, driver must ensure that 'ack' is zero
dmaengine: fix sparse warning
fsldma: do not cleanup descriptors in hardirq context
dmaengine: add driver for Freescale MPC85xx DMA controller
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
x86: disable KVM for Voyager and friends
KVM: VMX: Avoid rearranging switched guest msrs while they are loaded
KVM: MMU: Fix race when instantiating a shadow pte
KVM: Route irq 0 to vcpu 0 exclusively
KVM: Avoid infinite-frequency local apic timer
KVM: make MMU_DEBUG compile again
KVM: move alloc_apic_access_page() outside of non-preemptable region
KVM: SVM: fix Windows XP 64 bit installation crash
KVM: remove the usage of the mmap_sem for the protection of the memory slots.
KVM: emulate access to MSR_IA32_MCG_CTL
KVM: Make the supported cpuid list a host property rather than a vm property
KVM: Fix kvm_arch_vcpu_ioctl_set_sregs so that set_cr0 works properly
KVM: SVM: set NM intercept when enabling CR0.TS in the guest
KVM: SVM: Fix lazy FPU switching
The following commits cause a number of regressions:
commit 58e2d4ca58
Author: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Date: Fri Jan 25 21:08:00 2008 +0100
sched: group scheduling, change how cpu load is calculated
commit 6b2d770026
Author: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Date: Fri Jan 25 21:08:00 2008 +0100
sched: group scheduler, fix fairness of cpu bandwidth allocation for task groups
Namely:
- very frequent wakeups on SMP, reported by PowerTop users.
- cacheline trashing on (large) SMP
- some latencies larger than 500ms
While there is a mergeable patch to fix the latter, the former issues
are not fixable in a manner suitable for .25 (we're at -rc3 now).
Hence we revert them and try again in v2.6.26.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Tested-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch replaces the mmap_sem lock for the memory slots with a new
kvm private lock, it is needed beacuse untill now there were cases where
kvm accesses user memory while holding the mmap semaphore.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Create <linux/atmel_tc.h> based on <asm-arm/arch-at91/at91-tc.h> and the
at91sam9263 and at32ap7000 datasheets. Most AT91 and AT32 SOCs have one
or two of these TC blocks, which include three 16-bit timers that can be
interconnected in various ways.
These TC blocks can be used for external interfacing (such as PWM and
measurement), or used as somewhat quirky sixteen-bit timers.
Changes relative to the original version:
* Drop unneeded inclusion of <linux/mutex.h>
* Support an arbitrary number of TC blocks
* Return a struct with information about a TC block from
atmel_tc_alloc() instead of using a combination of return values
and "out" parameters.
* ioremap() the I/O registers on allocation
* Look up clocks and irqs for all channels
* Add "name" parameter to atmel_tc_alloc() and use this when
requesting the iomem resource.
* Check if the platform provided the necessary resources at probe()
time instead of when the TCB is allocated.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
This patch adds proper externs for two structs in include/linux/genhd.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch contains the following cleanups:
- make the needlessly global struct disk_type static
- #if 0 the unused genhd_media_change_notify()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Block layer alignment was used for two different purposes - memory
alignment and padding. This causes problems in lower layers because
drivers which only require memory alignment ends up with adjusted
rq->data_len. Separate out padding such that padding occurs iff
driver explicitly requests it.
Tomo: restorethe code to update bio in blk_rq_map_user
introduced by the commit 40b01b9bbd
according to padding alignment.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The meaning of rq->data_len was changed to the length of an allocated
buffer from the true data length. It breaks SG_IO friends and
bsg. This patch restores the meaning of rq->data_len to the true data
length and adds rq->extra_len to store an extended length (due to
drain buffer and padding).
This patch also removes the code to update bio in blk_rq_map_user
introduced by the commit 40b01b9bbd.
The commit adjusts bio according to memory alignment
(queue_dma_alignment). However, memory alignment is NOT padding
alignment. This adjustment also breaks SG_IO friends and bsg. Padding
alignment needs to be fixed in a proper way (by a separate patch).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
For later use, this patch is renaming ndisc_flow_init() to
icmpv6_flow_init() and putting it in common place.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
This only made sense for the alternate fastpath which was reverted last week.
Mathieu is working on a new version that addresses the fastpath issues but that
new code first needs to go through mm and it is not clear if we need the
unique end pointers with his new scheme.
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
There are some place, that calculate the ARP header length. These
calculations are correct, but
a) some operate with "magic" constants,
b) enlarge the code length (sometimes at the cost of coding style),
c) are not informative from the first glance.
The proposal is to introduce a helper, that includes all the good
sides of these calculations.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix docbook problems in USB source files.
These cause the generated docbook to be incorrect.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One of the use cases for the supported cpuid list is to create a "greatest
common denominator" of cpu capabilities in a server farm. As such, it is
useful to be able to get the list without creating a virtual machine first.
Since the code does not depend on the vm in any way, all that is needed is
to move it to the device ioctl handler. The capability identifier is also
changed so that binaries made against -rc1 will fail gracefully.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch adds functions to setup and read the CHIPCO IRQ.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds support for 8bit wide register reads/writes.
This is needed in order to support the gigabit ethernet core.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This allows precise control over what a monitor interface shows.
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This makes nl80211 export the hardware bitrate/channel capabilities
as registered in a wiphy.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
struct net_proto_family* is not used in icmp[v6]_init, ndisc_init,
igmp_init and tcp_v4_init. Remove it.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PREEMPT-RCU can get stuck if a CPU goes idle and NO_HZ is set. The
idle CPU will not progress the RCU through its grace period and a
synchronize_rcu my get stuck. Without this patch I have a box that will
not boot when PREEMPT_RCU and NO_HZ are set. That same box boots fine
with this patch.
This patch comes from the -rt kernel where it has been tested for
several months.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This removes code duplication and makes __dec_zone_page_state look like
__inc_zone_page_state.
Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
arch/sh/drivers/dma/dma-sh.c: Correct use of ! and &
serial: Move asm-sh/sci.h to linux/serial_sci.h.
sh: Fix up HAS_SR_RB typo in entry-macros.
maple: fix device detection
sh: fix rtc_resources setup for sh770x
sh: heartbeat: ioremap is expected to succeed
sh: Storage class should be before const qualifier
maple: remove unused variable
sh: SH5-103 needs to select CPU_SH5.
sh: Rename SH-3 CCR3 reg to avoid synclink_cs clash.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (79 commits)
[X25]: Use proc_create() to setup ->proc_fops first
[WANROUTER]: Use proc_create() to setup ->proc_fops first
[8021Q]: Use proc_create() to setup ->proc_fops first
[IPV4]: Use proc_create() to setup ->proc_fops first
[IPV6]: Use proc_create() to setup ->proc_fops first
[SCTP]: Use proc_create() to setup ->proc_fops first
[PKTGEN]: Use proc_create() to setup ->proc_fops first
[NEIGHBOUR]: Use proc_create() to setup ->proc_fops first
[LLC]: Use proc_create() to setup ->proc_fops first
[IPX]: Use proc_create() to setup ->proc_fops first
[SUNRPC]: Use proc_create() to setup ->proc_fops first
[ATM]: Use proc_create() to setup ->proc_fops first
[SCTP]: Update AUTH structures to match declarations in draft-16.
[SCTP]: Incorrect length was used in SCTP_*_AUTH_CHUNKS socket option
[SCTP]: Clean up naming conventions of sctp protocol/address family registration
[APPLETALK]: Use proc_create() to setup ->proc_fops first
[BNX2X]: add bnx2x to MAINTAINERS
[BNX2X]: update version, remove CVS strings
[BNX2X]: Fix Xmit bugs
[BNX2X]: Prevent PCI queue overflow
...
I overlooked the difference between __kernel_uid_t and uid_t when defining
struct compat_elf_prpsinfo. The result is a regression in 32-bit core
dumps on x86_64, where the NT_PRPSINFO note has the wrong size and layout.
This patch fixes it.
Signed-off-by: Roland McGrath <roland@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that we've tightened up the locking rules for RPC queue wakeups, we can
remove the RCU-safe kfree calls...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This is designed to replace the timeout timer in the individual rpc_tasks.
By putting the timer function in the wait queue, we will eventually be able
to reduce the total number of timers in use by the RPC subsystem.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Change xfrm_policy and xfrm_state walking algorithm from O(n^2) to O(n).
This is achieved adding the entries to one more list which is used
solely for walking the entries.
This also fixes some races where the dump can have duplicate or missing
entries when the SPD/SADB is modified during an ongoing dump.
Dumping SADB with 20000 entries using "time ip xfrm state" the sys
time dropped from 1.012s to 0.080s.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add namespace parameter to devinet_ioctl and locate device inside it for
state changes.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Properly add parens around the macro argument. This is not needed by
the kernel but the macro is exported to userspace, so it shouldn't
make any assumptions.
Also use NF_VERDICT_BITS instead of NF_VERDICT_QBTIS for the left-shift
since thats whats logically correct.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: add missing ext4_journal_stop()
ext4: ext4_find_next_zero_bit needs an aligned address on some arch
ext4: set EXT4_EXTENTS_FL only for directory and regular files
ext4: Don't mark filesystem error if fallocate fails
ext4: Fix BUG when writing to an unitialized extent
ext4: Don't use ext4_dec_count() if not needed
ext4: modify block allocation algorithm for the last group
ext4: Don't claim block from group which has corrupt bitmap
ext4: Get journal write access before modifying the extent tree
ext4: Fix memory and buffer head leak in callers to ext4_ext_find_extent()
ext4: Don't leave behind a half-created inode if ext4_mkdir() fails
ext4: Fix kernel BUG at fs/ext4/mballoc.c:910!
ext4: Fix locking hierarchy violation in ext4_fallocate()
Remove incorrect BKL comments in ext4
* 'v2.6.25-rc3-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep:
Subject: lockdep: include all lock classes in all_lock_classes
lockdep: increase MAX_LOCK_DEPTH
This header is needed on other architectures as well (namely h8300),
which currently fails to build without this in place. Rather than
duplicating the port definition completely there, just move this to a
common location instead.
This should get h8300 working again for 2.6.25, in addition to the
changes already pushed by Sato-san in -rc2.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
An audit of the current RPC timeout functions shows that they don't really
ever need to run in the softirq context. As long as the softirq is
able to signal that the wakeup is due to a timeout (which it can do by
setting task->tk_status to -ETIMEDOUT) then the callback functions can just
run as standard task->tk_callback functions (in the rpciod/process
context).
The only possible border-line case would be xprt_timer() for the case of
UDP, when the callback is used to reduce the size of the transport
congestion window. In testing, however, the effect of moving that update
to a callback would appear to be minor.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In all cases where we currently use rpc_wake_up_task(), we almost always
know on which waitqueue the rpc_task is actually sleeping. This will allows
us to simplify the queue locking in a future patch.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
A lot of the work done by the rpc_release() callback is inappropriate for
rpciod as it will often involve things like starting a new rpc call in
order to clean up state after an interrupted NFSv4 open() call, or
calls to mntput(), etc.
This patch allows the caller of rpc_run_task() to specify that the
rpc_release callback should run on a different workqueue than the default
rpciod_workqueue.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Remove an unused variable from the definition of struct maple_device
Signed-off-by: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Some code paths exceed the current max lock depth (XFS), so increase
this limit a bit. I looked at making this a dynamic allocated array,
but we should not advocate insane lock depths, so stay with this as
long as it works...
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Avoids sparse warnings:
kernel/sched.c:2170:17: warning: symbol 'schedule_tail' was not declared. Should it be static?
Avoids the need for an external declaration in arch/um/process.c
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Don't require platform code to be #ifdeffed according to whether
I2C is enabled or not ... if it's not enabled, let GCC compile out
all I2C device declarations. (Issue noted on an NSLU2 build that
didn't configure I2C.)
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
The C99 specification states in section 6.11.5:
The placement of a storage-class specifier other than at the
beginning of the declaration specifiers in a declaration is an
obsolescent feature.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata-core: fix kernel-doc warning
sata_fsl: fix build with ATA_VERBOSE_DEBUG
[libata] ahci: AMD SB700/SB800 SATA support 64bit DMA
libata-pmp: clear hob for pmp register accesses
libata: automatically use DMADIR if drive/bridge requires it
power_state: get rid of write-only variable in SATA
pata_atiixp: Use 255 sector limit
Back in 2.6.17-rc2, a libata module parameter was added for atapi_dmadir.
That's nice, but most SATA devices which need it will tell us about it
in their IDENTIFY PACKET response, as bit-15 of word-62 of the
returned data (as per ATA7, ATA8 specifications).
So for those which specify it, we should automatically use the DMADIR bit.
Otherwise, disc writing will fail by default on many SATA-ATAPI drives.
This patch adds ATA_DFLAG_DMADIR and make ata_dev_configure() set it
if atapi_dmadir is set or identify data indicates DMADIR is necessary.
atapi_xlat() is converted to check ATA_DFLAG_DMADIR before setting
DMADIR.
Original patch is from Mark Lord.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits)
[NETFILTER]: fix ebtable targets return
[IP_TUNNEL]: Don't limit the number of tunnels with generic name explicitly.
[NET]: Restore sanity wrt. print_mac().
[NEIGH]: Fix race between neighbor lookup and table's hash_rnd update.
[RTNL]: Validate hardware and broadcast address attribute for RTM_NEWLINK
tg3: ethtool phys_id default
[BNX2]: Update version to 1.7.4.
[BNX2]: Disable parallel detect on an HP blade.
[BNX2]: More 5706S link down workaround.
ssb: Fix support for PCI devices behind a SSB->PCI bridge
zd1211rw: fix sparse warnings
rtl818x: fix sparse warnings
ssb: Fix pcicore cardbus mode
ssb: Make the GPIO API reentrancy safe
ssb: Fix the GPIO API
ssb: Fix watchdog access for devices without a chipcommon
ssb: Fix serial console on new bcm47xx devices
ath5k: Fix build warnings on some 64-bit platforms.
WDEV, ath5k, don't return int from bool function
WDEV: ath5k, fix lock imbalance
...
MAC_FMT had only one user and we tried to get rid of
that, but this created more problems than it solved.
As a result, this reverts three commits:
235365f3aa ("net/8021q/vlan_dev.c: Use
print_mac."), fea5fa875e ("[NET]: Remove
MAC_FMT"), and 8f789c4844 ("[NET]:
Elminate spurious print_mac() calls.")
Signed-off-by: David S. Miller <davem@davemloft.net>
- replace old name 'cont' with 'cgrp' (Paul Menage did this cleanup for
cgroup.c in commit bd89aabc67)
- remove a duplicate declaration of cgroup_path()
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fix:
- comments about need_forkexit_callback
- comments about release agent
- typo and comment style, etc.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Not all architectures implement futex_atomic_cmpxchg_inatomic(). The default
implementation returns -ENOSYS, which is currently not handled inside of the
futex guts.
Futex PI calls and robust list exits with a held futex result in an endless
loop in the futex code on architectures which have no support.
Fixing up every place where futex_atomic_cmpxchg_inatomic() is called would
add a fair amount of extra if/else constructs to the already complex code. It
is also not possible to disable the robust feature before user space tries to
register robust lists.
Compile time disabling is not a good idea either, as there are already
architectures with runtime detection of futex_atomic_cmpxchg_inatomic support.
Detect the functionality at runtime instead by calling
cmpxchg_futex_value_locked() with a NULL pointer from the futex initialization
code. This is guaranteed to fail, but the call of
futex_atomic_cmpxchg_inatomic() happens with pagefaults disabled.
On architectures, which use the asm-generic implementation or have a runtime
CPU feature detection, a -ENOSYS return value disables the PI/robust features.
On architectures with a working implementation the call returns -EFAULT and
the PI/robust features are enabled.
The relevant syscalls return -ENOSYS and the robust list exit code is blocked,
when the detection fails.
Fixes http://lkml.org/lkml/2008/2/11/149
Originally reported by: Lennart Buytenhek
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: Riku Voipio <riku.voipio@movial.fi>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge include/linux/efs_fs{_i,_dir}.h into fs/efs/efs.h. efs_vh.h remains
there because this is the IRIX volume header and shouldn't really be
handled by efs but by the partitioning code. efs_sb.h remains there for
now because it's exported to userspace. Of course this wrong and aboot
should have a copy of it's own, but I'll leave that to a separate patch to
avoid any contention.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make is_vmalloc_addr() contingent on CONFIG_MMU=y, as it won't compile
in !MMU mode.
[ Bug introduced in commit 9e2779fa28:
"is_vmalloc_addr(): Check if an address is within the vmalloc
boundaries" ].
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix build failure on sparc:
In file included from include/linux/mm.h:39,
from include/linux/memcontrol.h:24,
from include/linux/swap.h:8,
from include/linux/suspend.h:7,
from init/do_mounts.c:6:
include/asm/pgtable.h:344: warning: parameter names (without
types) in function declaration
include/asm/pgtable.h:345: warning: parameter names (without
types) in function declaration
include/asm/pgtable.h:346: error: expected '=', ',', ';', 'asm' or
'__attribute__' before '___f___swp_entry'
viro sayeth:
I've run allmodconfig builds on a bunch of target, FWIW (essentially the
same patch). Note that these includes are recent addition caused by added
inline function that had since then become a define. So while I agree with
your comments in general, in _this_ case it's pretty safe.
The commit that had done it is 3062fc67da
("memcontrol: move mm_cgroup to header file") and the switch to #define
is in commit 60c12b1202 ("memcontrol: add
vm_match_cgroup()") (BTW, that probably warranted mentioning in the
changelog of the latter).
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Robert Reif <reif@earthlink.net>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
During the last step of hibernation in the "platform" mode (with the
help of ACPI) we use the suspend code, including the devices'
->suspend() methods, to prepare the system for entering the ACPI S4
system sleep state.
But at least for some devices the operations performed by the
->suspend() callback in that case must be different from its operations
during regular suspend.
For this reason, introduce the new PM event type PM_EVENT_HIBERNATE and
pass it to the device drivers' ->suspend() methods during the last phase
of hibernation, so that they can distinguish this case and handle it as
appropriate. Modify the drivers that handle PM_EVENT_SUSPEND in a
special way and need to handle PM_EVENT_HIBERNATE in the same way.
These changes are necessary to fix a hibernation regression related
to the i915 driver (ref. http://lkml.org/lkml/2008/2/22/488).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix macro argument substitution in PageHead() and PageTail() - 'page' should
have brackets surrounding it (commit 6d7779538f).
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds the Intel ICH10 LPC and SMBus Controller DeviceID's.
Signed-off-by: Jason Gaston <jason.d.gaston@intel.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Make hibernation work with CONFIG_DEBUG_PAGEALLOC set on x86, by
checking if the pages to be copied are marked as present in the
kernel mapping and temporarily marking them as present if that's not
the case. No functional modifications are introduced if
CONFIG_DEBUG_PAGEALLOC is unset.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
This fixes the pcicore driver to not die a horrible
crash death when inserting a cardbus card.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes the SSB watchdog access for devices without a chipcommon.
These devices have the watchdog on the extif.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes the baud settings for new devices
like the Linksys WRT350n.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Avoids lots of these, also is more readable.
include/linux/libata.h:1210:13: warning: potentially expensive pointer subtraction
Change the subtraction to addition on the other side of the comparison.
Thanks to Christer Weinigel for the suggestion.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
As reported by David Woodhouse <dwmw2@infradead.org>, using u_int32_t
in struct nf_inet_addr breaks the busybox build. Fix by using __u32.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
By allocating ->hinfo, we already have the needed indirection to cope
with the per-cpu xtables struct match_entry.
[Patrick: do this now before the revision 1 struct is used by userspace]
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the header file xt_policy.h tests __KERNEL__, it should be
unifdef'ed before exporting to userspace.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When adding __devinitconst etc. the __initconst variant
were missed.
Add this one and proper definitions for .head.text for use
in .S files.
The naming .head.text is preferred over .text.head as the
latter will conflict for a function named head when introducing
-ffunctions-sections.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
libata: implement drain buffers
libata: eliminate the home grown dma padding in favour of
block: clear drain buffer if draining for write command
block: implement request_queue->dma_drain_needed
block: add request->raw_data_len
block: update bio according to DMA alignment padding
libata: update ATAPI overflow draining
elevator: make elevator_get() attempt to load the appropriate module
cfq-iosched: add hlist for browsing parallel to the radix tree
block: make blk_rq_map_user() clear ->bio if it unmaps it
fs/block_dev.c: remove #if 0'ed code
make struct def_blk_aops static
make blk_settings_init() static
make blk_ioc_init() static
make blk-core.c:request_cachep static again
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (60 commits)
[NIU]: Bump driver version and release date.
[NIU]: Fix BMAC alternate MAC address indexing.
net: fix kernel-doc warnings in header files
[IPV6]: Use BUG_ON instead of if + BUG in fib6_del_route.
[IPV6]: dst_entry leak in ip4ip6_err. (resend)
bluetooth: do not move child device other than rfcomm
bluetooth: put hci dev after del conn
[NET]: Elminate spurious print_mac() calls.
[BLUETOOTH] hci_sysfs.c: Kill build warning.
[NET]: Remove MAC_FMT
net/8021q/vlan_dev.c: Use print_mac.
[XFRM]: Fix ordering issue in xfrm_dst_hash_transfer().
[BLUETOOTH] net/bluetooth/hci_core.c: Use time_* macros
[IPV6]: Fix hardcoded removing of old module code
[NETLABEL]: Move some initialization code into __init section.
[NETLABEL]: Shrink the genl-ops registration code.
[AX25] ax25_out: check skb for NULL in ax25_kick()
[TCP]: Fix tcp_v4_send_synack() comment
[IPV4]: fix alignment of IP-Config output
Documentation: fix tcp.txt
...
that provided by the block layer
ATA requires that all DMA transfers begin and end on word boundaries.
Because of this, a large amount of machinery grew up in ide to adjust
scatterlists on this basis. However, as of 2.5, the block layer has a
dma_alignment variable which ensures both the beginning and length of a
DMA transfer are aligned on the dma_alignment boundary. Although the
block layer does adjust the beginning of the transfer to ensure this
happens, it doesn't actually adjust the length, it merely makes sure
that space is allocated for transfers beyond the declared length. The
upshot of this is that scatterlists may be padded to any size between
the actual length and the length adjusted to the dma_alignment safely
knowing that memory is allocated in this region.
Right at the moment, SCSI takes the default dma_aligment which is on a
512 byte boundary. Note that this aligment only applies to transfers
coming in from user space. However, since all kernel allocations are
automatically aligned on a minimum of 32 byte boundaries, it is safe to
adjust them in this manner as well.
tj: * Adjusting sg after padding is done in block layer. Make libata
set queue alignment correctly for ATAPI devices and drop broken
sg mangling from ata_sg_setup().
* Use request->raw_data_len for ATAPI transfer chunk size.
* Killed qc->raw_nbytes.
* Separated out killing qc->n_iter.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Draining shouldn't be done for commands where overflow may indicate
data integrity issues. Add dma_drain_needed callback to
request_queue. Drain buffer is appened iff this function returns
non-zero.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
With padding and draining moved into it, block layer now may extend
requests as directed by queue parameters, so now a request has two
sizes - the original request size and the extended size which matches
the size of area pointed to by bios and later by sgs. The latter size
is what lower layers are primarily interested in when allocating,
filling up DMA tables and setting up the controller.
Both padding and draining extend the data area to accomodate
controller characteristics. As any controller which speaks SCSI can
handle underflows, feeding larger data area is safe.
So, this patch makes the primary data length field, request->data_len,
indicate the size of full data area and add a separate length field,
request->raw_data_len, for the unmodified request size. The latter is
used to report to higher layer (userland) and where the original
request size should be fed to the controller or device.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It's cumbersome to browse a radix tree from start to finish, especially
since we modify keys when a process exits. So add a hlist for the single
purpose of browsing over all known cfq_io_contexts, used for exit,
io prio change, etc.
This fixes http://bugzilla.kernel.org/show_bug.cgi?id=9948
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Commit b2e895dbd8 #if 0'ed this code stating:
<-- snip -->
[PATCH] revert blockdev direct io back to 2.6.19 version
Andrew Vasquez is reporting as-iosched oopses and a 65% throughput
slowdown due to the recent special-casing of direct-io against
blockdevs. We don't know why either of these things are occurring.
The patch minimally reverts us back to the 2.6.19 code for a 2.6.20
release.
<-- snip -->
It has since been dead code, and unless someone wants to revive it now
it's time to remove it.
This patch also makes bio_release_pages() static again and removes the
ki_bio_count member from struct kiocb, reverting changes that had been
done for this dead code.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
This patch makes the needlessly global struct def_blk_aops static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
Add missing structure kernel-doc descriptions to sock.h & skbuff.h
to fix kernel-doc warnings.
(I think that Stephen H. sent a similar patch, but I can't find it.
I just want to kill the warnings, with either patch.)
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy notes that print_mac() can get invoked
even if the result it unused (f.e. as an argument to
pr_debug() when DEBUG is not defined).
Mark this function as "__pure" to eliminate this problem.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix goofups of commit 76166952bb
("<linux/hdsmart.h> is not used by kernel code").
Also update include/linux/Kbuild to reflect the fact that hdsmart.h
uses __KERNEL__ ifdefs now.
Reported-by: "Robert P. J. Day" <rpjday@crashcourse.ca>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
[bart: manually ported it over via82cxxx changes]
From: Andrew Smith <asmith@tranquility.fsbusiness.co.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Fix compilation of user processes which includes videodev*.h but
not includes linux/ioctl.h:
v4l2ext_helper.c: In function 'process_ioctl':
v4l2ext_helper.c:183: warning: implicit declaration of function '_IOWR'
v4l2ext_helper.c:183: error: expected expression before 'struct'
v4l2ext_helper.c:183: error: case label does not reduce to an integer constant
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
The path variable returned via ext4_ext_find_extent is a kmalloc
variable and needs to be freeded. It also contains a reference to
buffer_head which needs to be dropped.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (82 commits)
[NET]: Make sure sockets implement splice_read
netconsole: avoid null pointer dereference at show_local_mac()
[IPV6]: Fix reversed local_df test in ip6_fragment
[XFRM]: Avoid bogus BUG() when throwing new policy away.
[AF_KEY]: Fix bug in spdadd
[NETFILTER] nf_conntrack_proto_tcp.c: Mistyped state corrected.
net: xfrm statistics depend on INET
[NETFILTER]: make secmark_tg_destroy() static
[INET]: Unexport inet_listen_wlock
[INET]: Unexport __inet_hash_connect
[NET]: Improve cache line coherency of ingress qdisc
[NET]: Fix race in dev_close(). (Bug 9750)
[IPSEC]: Fix bogus usage of u64 on input sequence number
[RTNETLINK]: Send a single notification on device state changes.
[NETLABLE]: Hide netlbl_unlabel_audit_addr6 under ifdef CONFIG_IPV6.
[NETLABEL]: Don't produce unused variables when IPv6 is off.
[NETLABEL]: Compilation for CONFIG_AUDIT=n case.
[GENETLINK]: Relax dances with genl_lock.
[NETLABEL]: Fix lookup logic of netlbl_domhsh_search_def.
[IPV6]: remove unused method declaration (net/ndisc.h).
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: add USB IDs for MacBook 3rd generation
HID: add LCSPEC from VERNIER to quirk list
HID: fix processing of event quirks
HID: Blacklist new GTCO CalComp USB device PIDs
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI: DMI: quirk for FSC ESPRIMO Mobile V5505
ACPI: DMI blacklist updates
pnpacpi: __initdata is not an identifier
ACPI: static acpi_chain_head
ACPI: static acpi_find_dsdt_initrd()
ACPI: static acpi_no_initrd_override_setup()
thinkpad_acpi: static
ACPI suspend: Execute _WAK with the right argument
cpuidle: Add Documentation
ACPI, cpuidle: Clarify C-state description in sysfs
ACPI: fix suspend regression due to idle update
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (46 commits)
sh: Fix multiple UTLB hit on UP SH-4.
sh: fix pci io access for r2d boards
sh: fix ioreadN_rep and iowriteN_rep
sh: use ctrl_in/out for on chip pci access
sh: Kill off more dead symbols.
sh: __uncached_start only on sh32.
sh: asm/irq.h needs asm/cpu/irq.h.
serial: sh-sci: Fix up SH-5 build.
sh: Get SH-5 caches working again post-unification.
maple: Fix up maple build failure.
sh: Kill off bogus SH_SDK7780_STANDALONE symbol.
sh: asm/tlb.h needs linux/pagemap.h for CONFIG_SWAP=n.
sh: Tidy include/asm-sh/hp6xx.h
maple: improve detection of attached peripherals
sh: Shut up some trivial build warnings.
sh: Update SH-5 flush_cache_sigtramp() for API changes.
sh: Fix up set_fixmap_nocache() for SH-5.
sh: Fix up pte_mkhuge() build breakage for SH-5.
sh: Disable big endian for SH-5.
sh: Handle SH7366 CPU in check_bugs().
...
* 'slab-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm:
slub: Support 4k kmallocs again to compensate for page allocator slowness
slub: Fallback to kmalloc_large for failing higher order allocs
slub: Determine gfpflags once and not every time a slab is allocated
make slub.c:slab_address() static
slub: kmalloc page allocator pass-through cleanup
slab: avoid double initialization & do initialization in 1 place
d_path() is used on a <dentry,vfsmount> pair. Lets use a struct path to
reflect this.
[akpm@linux-foundation.org: fix build in mm/memory.c]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Bryan Wu <bryan.wu@analog.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
seq_path() is always called with a dentry and a vfsmount from a struct path.
Make seq_path() take it directly as an argument.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I'm embedding struct path into struct svc_expkey.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
get_dcookie() is always called with a dentry and a vfsmount from a struct
path. Make get_dcookie() take it directly as an argument.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc_get_link() is always called with a dentry and a vfsmount from a struct
path. Make proc_get_link() take it directly as an argument.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
audit_log_d_path() is a d_path() wrapper that is used by the audit code. To
use a struct path in audit_log_d_path() I need to embed it into struct
avc_audit_data.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In nearly all cases the set_fs_{root,pwd}() calls work on a struct
path. Change the function to reflect this and use path_get() here.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Use struct path in fs_struct.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This introduces the symmetric function to path_put() for getting a reference
to the dentry and vfsmount of a struct path in the right order.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add path_put() functions for releasing a reference to the dentry and
vfsmount of a struct path in the right order
* Switch from path_release(nd) to path_put(&nd->path)
* Rename dput_path() to path_put_conditional()
[akpm@linux-foundation.org: fix cifs]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the central patch of a cleanup series. In most cases there is no good
reason why someone would want to use a dentry for itself. This series reflects
that fact and embeds a struct path into nameidata.
Together with the other patches of this series
- it enforced the correct order of getting/releasing the reference count on
<dentry,vfsmount> pairs
- it prepares the VFS for stacking support since it is essential to have a
struct path in every place where the stack can be traversed
- it reduces the overall code size:
without patch series:
text data bss dec hex filename
5321639 858418 715768 6895825 6938d1 vmlinux
with patch series:
text data bss dec hex filename
5320026 858418 715768 6894212 693284 vmlinux
This patch:
Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix cifs]
[akpm@linux-foundation.org: fix smack]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the definition of struct path into its own header file for further
patches.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
path_release_on_umount() should only be called from sys_umount(). I merged the
function into sys_umount() instead of having in in namei.c.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add explanation of I_DIRTY_DATASYNC bit.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Joern Engel <joern@logfs.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
configfs.h uses the container_of macro and as such should include kernel.h.
Signed-off-by: Ben Nizette <bn@niasdigital.com>
Cc: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes the following compile error with CONFIG_MODULES=n
caused by commit fb40bd78b0:
/home/bunk/linux/kernel-2.6/git/linux-2.6/kernel/marker.c: In function `marker_update_probes':
/home/bunk/linux/kernel-2.6/git/linux-2.6/kernel/marker.c:627: error: too few arguments to function `module_update_markers'
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we hand off PAGE_SIZEd kmallocs to the page allocator in the
mistaken belief that the page allocator can handle these allocations
effectively. However, measurements indicate a minimum slowdown by the
factor of 8 (and that is only SMP, NUMA is much worse) vs the slub fastpath
which causes regressions in tbench.
Increase the number of kmalloc caches by one so that we again handle 4k
kmallocs directly from slub. 4k page buffering for the page allocator
will be performed by slub like done by slab.
At some point the page allocator fastpath should be fixed. A lot of the kernel
would benefit from a faster ability to allocate a single page. If that is
done then the 4k allocs may again be forwarded to the page allocator and this
patch could be reverted.
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Currently we determine the gfp flags to pass to the page allocator
each time a slab is being allocated.
Determine the bits to be set at the time the slab is created. Store
in a new allocflags field and add the flags in allocate_slab().
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
This adds a proper function for kmalloc page allocator pass-through. While it
simplifies any code that does slab tracing code a lot, I think it's a
worthwhile cleanup in itself.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Various user space callers ask for relative timeouts. While we fixed
that overflow issue in hrtimer_start(), the sites which convert
relative user space values to absolute timeouts themself were uncovered.
Instead of putting overflow checks into each place add a function
which does the sanity checking and convert all affected callers to use
it.
Thanks to Frans Pop, who reported the problem and tested the fixes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Frans Pop <elendil@planet.nl>
This patch removes the now unneeded registration check variable from
struct maple_device. (This patch assumes the include/linux/maple.h file
has already been patched for whitespace errors by
http://lkml.org/lkml/2008/2/6/327)
Signed-off-by: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch is fundamentally about fixing up the whitespace problems
introduced by my previous patch (that brought the code into mainline). A
second patch will follow that will fix memory leaks. The two need to be
applied sequentially.
Signed-off-by: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Add a new sysfs entry under cpuidle states. desc - can be used by driver to
communicate to userspace any specific information about the state.
This helps in identifying the exact hardware C-states behind the ACPI C-state
definition.
Idea is to export this through powertop, which will help to map the C-state
reported by powertop to actual hardware C-state.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Convert the lmb code to use u64 instead of unsigned long for physical
addresses and sizes. This is needed to support large amounts of RAM
on 32-bit systems that support 36-bit physical addressing.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds some new magic in the MODPOST phase for CONFIG_MARKERS. Analogous
to the Module.symvers file, the build will now write a Module.markers file
when CONFIG_MARKERS=y is set. This file lists the name, defining module, and
format string of each marker, separated by \t characters. This simple text
file can be used by offline build procedures for instrumentation code,
analogous to how System.map and Module.symvers can be useful to have for
kernels other than the one you are running right now.
The strings are made easy to extract by having the __trace_mark macro define
the name and format together in a single array called __mstrtab_* in the
__markers_strings section. This is straightforward and reliable as long as
the marker structs are always defined by this macro. It is an unreasonable
amount of hairy work to extract the string pointers from the __markers section
structs, which entails handling a relocation type for every machine under the
sun.
Mathieu :
- Ran through checkpatch.pl
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: David Smith <dsmith@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
RCU style multiple probes support for the Linux Kernel Markers. Common case
(one probe) is still fast and does not require dynamic allocation or a
supplementary pointer dereference on the fast path.
- Move preempt disable from the marker site to the callback.
Since we now have an internal callback, move the preempt disable/enable to the
callback instead of the marker site.
Since the callback change is done asynchronously (passing from a handler that
supports arguments to a handler that does not setup the arguments is no
arguments are passed), we can safely update it even if it is outside the
preempt disable section.
- Move probe arm to probe connection. Now, a connected probe is automatically
armed.
Remove MARK_MAX_FORMAT_LEN, unused.
This patch modifies the Linux Kernel Markers API : it removes the probe
"arm/disarm" and changes the probe function prototype : it now expects a
va_list * instead of a "...".
If we want to have more than one probe connected to a marker at a given
time (LTTng, or blktrace, ssytemtap) then we need this patch. Without it,
connecting a second probe handler to a marker will fail.
It allow us, for instance, to do interesting combinations :
Do standard tracing with LTTng and, eventually, to compute statistics
with SystemTAP, or to have a special trigger on an event that would call
a systemtap script which would stop flight recorder tracing.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mike Mason <mmlnx@us.ibm.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: David Smith <dsmith@redhat.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On alpha, ia64 and ppc64 only relocations to local data can go into
read-only sections. The vast majority of module parameters use the global
generic param_set_*/param_get_* functions, so the 'const' attribute for
struct kernel_param is not only useless, but it also causes compile
failures due to 'section type conflict' in those rare cases where
param_set/get are local functions.
This fixes http://bugzilla.kernel.org/show_bug.cgi?id=8964
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Adrian Bunk <bunk@stusta.de>
Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move networking (core and drivers) docbook to its own networking book.
Fix a few kernel-doc errors in header and source files.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc_doulongvec_minmax() calls copy_to_user()/copy_from_user(), so we can't
hold hugetlb_lock over the call. Use a dummy variable to store the sysctl
result, like in hugetlb_sysctl_handler(), then grab the lock to update
nr_overcommit_huge_pages.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Reported-by: Miles Lane <miles.lane@gmail.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When submitting the driver for inclusion to 2.6.25 I've missed the change to
serial_core.h. This patch fixes this.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All users are gone, remove definitions and comments referring
to them.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FASTCALL() is always expanded to empty, remove it.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make the rt group scheduler compile time configurable.
Keep it experimental for now.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Change the rt_ratio interface to rt_runtime_us, to match rt_period_us.
This avoids picking a granularity for the ratio.
Extend the /sys/kernel/uids/<uid>/ interface to allow setting
the group's rt_runtime.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move the ingress qdisc members of struct net_device from the transmit
cache line to the receive cache line to avoid cache line ping-pong.
These members are only used on the receive path.
Signed-off-by: Neil Turton <nturton@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kosaki Motohito noted that "numactl --interleave=all ..." failed in the
presence of memoryless nodes. This patch attempts to fix that problem.
Some background:
numactl --interleave=all calls set_mempolicy(2) with a fully populated
[out to MAXNUMNODES] nodemask. set_mempolicy() [in do_set_mempolicy()]
calls contextualize_policy() which requires that the nodemask be a
subset of the current task's mems_allowed; else EINVAL will be returned.
A task's mems_allowed will always be a subset of node_states[N_HIGH_MEMORY]
i.e., nodes with memory. So, a fully populated nodemask will be
declared invalid if it includes memoryless nodes.
NOTE: the same thing will occur when running in a cpuset
with restricted mem_allowed--for the same reason:
node mask contains dis-allowed nodes.
mbind(2), on the other hand, just masks off any nodes in the nodemask
that are not included in the caller's mems_allowed.
In each case [mbind() and set_mempolicy()], mpol_check_policy() will
complain [again, resulting in EINVAL] if the nodemask contains any
memoryless nodes. This is somewhat redundant as mpol_new() will remove
memoryless nodes for interleave policy, as will bind_zonelist()--called
by mpol_new() for BIND policy.
Proposed fix:
1) modify contextualize_policy logic to:
a) remember whether the incoming node mask is empty.
b) if not, restrict the nodemask to allowed nodes, as is
currently done in-line for mbind(). This guarantees
that the resulting mask includes only nodes with memory.
NOTE: this is a [benign, IMO] change in behavior for
set_mempolicy(). Dis-allowed nodes will be
silently ignored, rather than returning an error.
c) fold this code into mpol_check_policy(), replace 2 calls to
contextualize_policy() to call mpol_check_policy() directly
and remove contextualize_policy().
2) In existing mpol_check_policy() logic, after "contextualization":
a) MPOL_DEFAULT: require that in coming mask "was_empty"
b) MPOL_{BIND|INTERLEAVE}: require that contextualized nodemask
contains at least one node.
c) add a case for MPOL_PREFERRED: if in coming was not empty
and resulting mask IS empty, user specified invalid nodes.
Return EINVAL.
c) remove the now redundant check for memoryless nodes
3) remove the now redundant masking of policy nodes for interleave
policy from mpol_new().
4) Now that mpol_check_policy() contextualizes the nodemask, remove
the in-line nodes_and() from sys_mbind(). I believe that this
restores mbind() to the behavior before the memoryless-nodes
patch series. E.g., we'll no longer treat an invalid nodemask
with MPOL_PREFERRED as local allocation.
[ Patch history:
v1 -> v2:
- Communicate whether or not incoming node mask was empty to
mpol_check_policy() for better error checking.
- As suggested by David Rientjes, remove the now unused
cpuset_nodes_subset_current_mems_allowed() from cpuset.h
v2 -> v3:
- As suggested by Kosaki Motohito, fold the "contextualization"
of policy nodemask into mpol_check_policy(). Looks a little
cleaner. ]
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Without this patch a Opteron test system here oopses at boot with
current git.
Calling to_pci_dev() on a NULL pointer gives a negative value so the
following NULL pointer check never triggers and then an illegal address
is referenced. Check the unadjusted original device pointer for NULL
instead.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://linux-nfs.org/~bfields/linux:
SUNPRC: Fix printk format warning
nfsd: clean up svc_reserve_auth()
NLM: don't requeue block if it was invalidated while GRANT_MSG was in flight
NLM: don't reattempt GRANT_MSG when there is already an RPC in flight
NLM: have server-side RPC clients default to soft RPC tasks
NLM: set RPC_CLNT_CREATE_NOPING for NLM RPC clients
Allow the platform data to specify to the DM9000 driver
that there is no posibility of an attached EEPROM on the
device, so default all reads to 0xff and ignore any
write operations.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Patch from: Laurent Pinchart <laurentp@cse-semaphore.com>
This patch adds a flag to the DM9000 platform data which, when set,
configures the device to use an external PHY.
Signed-off-by: Laurent Pinchart <laurentp@cse-semaphore.com>
Signed-off-by: Ben Dooks <ben-linuy@fluff.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The old code (before move) stopped further processing of the
event after it has been already processed by the quirk handler.
The new code didn't propagate the return value properly, and
therefore the processing always proceeded, which was wrong.
This patch fixes it. Pointed out in kernel.org bugzilla #9842
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
commit 813a0eb233
Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date: Fri Jan 25 22:17:10 2008 +0100
ide: switch idedisk_prepare_flush() to use REQ_TYPE_ATA_TASKFILE requests
...
broke flush requests.
Allocating IDE command structure on the stack for flush requests is not
a very brilliant idea:
- idedisk_prepare_flush() only prepares the request and it doesn't wait
for it to be completed
- there are can be multiple flush requests queued in the queue
Fix the problem (per hints from James Bottomley) by:
- dynamically allocating ide_task_t instance using kmalloc(..., GFP_ATOMIC)
- adding new taskfile flag (IDE_TFLAG_DYN)
- calling kfree() in ide_end_drive_command() if IDE_TFLAG_DYN is set
(while at it rename 'args' to 'task' and fix whitespace damage)
[ This will be fixed properly before 2.6.25 but this bug is rather
critical and the proper solution requires some more work + testing. ]
Thanks to Sebastian Siewior and Christoph Hellwig for reporting the
problem and testing patches (extra thanks to Sebastian for bisecting
it to the guilty commmit).
Tested-by: Sebastian Siewior <ide-bug@ml.breakpoint.cc>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Introduce new option CONFIG_BLK_DEV_IDEDMA_SFF for non-PCI SFF-8038i compatible
bus mastering IDE controllers (which there are a few known), thus fixing a hack
made for Palmchip BK3710 controller...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Anton Salnikov <asalnikov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
This is a void function attempting to return the return value from
another void function, which seems harmless but extremely weird, and
apparently makes some compilers complain.
While we're there, clean up a little (e.g. the switch statement had a
minor style problem and seemed overkill as long as there's only one
case).
Thanks to Trond for noticing this.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
We want to allow different implementations of pci_raw_ops for standard
and extended config space on x86. Rather than clutter generic code with
knowledge of this, we make pci_raw_ops private to x86 and use it to
implement the new raw interface -- raw_pci_read() and raw_pci_write().
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Spotted by Pavel Emelyanov and Alexey Dobriyan.
hrtimer_nanosleep() sets restart_block->arg1 = rmtp, but this rmtp points to
the local variable which lives in the caller's stack frame. This means that
if sys_restart_syscall() actually happens and it is interrupted as well, we
don't update the user-space variable, but write into the already dead stack
frame.
Introduced by commit 04c227140f
hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier
Change the callers to pass "__user *rmtp" to hrtimer_nanosleep(), and change
hrtimer_nanosleep() to use copy_to_user() to actually update *rmtp.
Small problem remains. man 2 nanosleep states that *rtmp should be written if
nanosleep() was interrupted (it says nothing whether it is OK to update *rmtp
if nanosleep returns 0), but (with or without this patch) we can dirty *rem
even if nanosleep() returns 0.
NOTE: this patch doesn't change compat_sys_nanosleep(), because it has other
bugs. Fixed by the next patch.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Pavel Emelyanov <xemul@sw.ru>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Toyo Abe <toyoa@mvista.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
include/linux/hrtimer.h | 2 -
kernel/hrtimer.c | 51 +++++++++++++++++++++++++-----------------------
kernel/posix-timers.c | 14 +------------
3 files changed, 30 insertions(+), 37 deletions(-)
clocksource initialization and error accumulation. This corrects a 280ppm
drift seen on some systems using acpi_pm, and affects other clocksources as
well (likely to a lesser degree).
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This flag is simply a generic "this is a crash/burn test filesystem"
marker. If it is set, then filesystem code which is "in development"
will be allowed to mount the filesystem. Filesystem code which is not
considered ready for prime-time will check for this flag, and if it is
not set, it will refuse to touch the filesystem.
As we start rolling ext4 out to distro's like Fedora, et. al, this makes
it less likely that a user might accidentally start using ext4 on a
production filesystem; a bad thing, since that will essentially make it
be unfsckable until e2fsprogs catches up.
Signed-off-by: Theodore Tso <tytso@MIT.EDU>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Other than the defconfigs, remove the entry in compiler-gcc4.h,
Kconfig.debug and feature-removal-schedule.txt.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sony MemoryStick cards are used in many products manufactured by Sony.
They are available both as storage and as IO expansion cards. Currently,
only MemoryStick Pro storage cards are supported via TI FlashMedia
MemoryStick interface.
[mboton@gmail.com: biuld fix]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Miguel Boton <mboton@gmail.co>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm_cgroup() is exclusively used to test whether an mm's mem_cgroup pointer
is pointing to a specific cgroup. Instead of returning the pointer, we can
just do the test itself in a new macro:
vm_match_cgroup(mm, cgroup)
returns non-zero if the mm's mem_cgroup points to cgroup. Otherwise it
returns zero.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CC mm/vmscan.o
In file included from
/home/bunk/linux/kernel-2.6/git/linux-2.6/mm/vmscan.c:44:
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h: In function 'is_swap_pte':
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h:48: error: implicit declaration of function 'pte_none'
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swapops.h:48: error: implicit declaration of function 'pte_present'
Does it ever make sense to ask "is this pte a swap entry?" on a machine
with no MMU? Presumably this also means it has no ptes too, right? In
which case, it's better to comment the whole function out. Then when
someone tries to ask the above meaningless question, they get a compile
error rather than a meaningless answer.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Reported-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes the build, but acpi_fan_add() still needs
to be updated to handle thermal_cooling_device_register()
returning NULL as a non-fatal condition.
Signed-off-by: Len Brown <len.brown@intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/core: Remove unused struct ib_device.flags member
IB/core: Add IP checksum offload support
IPoIB: Add send gather support
IPoIB: Add high DMA feature flag
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
mlx4_core: Clean up struct mlx4_buf
mlx4_core: For 64-bit systems, vmap() kernel queue buffers
IB/mlx4: Consolidate code to get an entry from a struct mlx4_buf
Thanks to Kay for keeping us honest.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Cc: "Williams, Dan J" <dan.j.williams@intel.com>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Background: I've implemented 1K/2K page tables for s390. These sub-page
page tables are required to properly support the s390 virtualization
instruction with KVM. The SIE instruction requires that the page tables
have 256 page table entries (pte) followed by 256 page status table entries
(pgste). The pgstes are only required if the process is using the SIE
instruction. The pgstes are updated by the hardware and by the hypervisor
for a number of reasons, one of them is dirty and reference bit tracking.
To avoid wasting memory the standard pte table allocation should return
1K/2K (31/64 bit) and 2K/4K if the process is using SIE.
Problem: Page size on s390 is 4K, page table size is 1K or 2K. That means
the s390 version for pte_alloc_one cannot return a pointer to a struct
page. Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
cannot return a pointer to a pte either, since that would require more than
32 bit for the return value of pte_alloc_one (and the pte * would not be
accessible since its not kmapped).
Solution: The only solution I found to this dilemma is a new typedef: a
pgtable_t. For s390 pgtable_t will be a (pte *) - to be introduced with a
later patch. For everybody else it will be a (struct page *). The
additional problem with the initialization of the ptl lock and the
NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
a destructor pgtable_page_dtor. The page table allocation and free
functions need to call these two whenever a page table page is allocated or
freed. pmd_populate will get a pgtable_t instead of a struct page pointer.
To get the pgtable_t back from a pmd entry that has been installed with
pmd_populate a new function pmd_pgtable is added. It replaces the pmd_page
call in free_pte_range and apply_to_pte_range.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Probing non-ISA interrupts using the handle_percpu_irq as their handle_irq
method may crash the system because handle_percpu_irq does not check
IRQ_WAITING. This for example hits the MIPS Qemu configuration.
This patch provides two helper functions set_irq_noprobe and set_irq_probe to
set rsp. clear the IRQ_NOPROBE flag. The only current caller is MIPS code
but this really belongs into generic code.
As an aside, interrupt probing these days has become a mostly obsolete if not
dangerous art. I think Linux interrupts should be changed to default to
non-probing but that's subject of this patch.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Acked-and-tested-by: Rob Landley <rob@landley.net>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is an outdated comment in serial_core.c also fixed.
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, for every sysfs node, the callers will be responsible for
implementing store operation, so many many callers are doing duplicate
things to validate input, they have the same mistakes because they are
calling simple_strtol/ul/ll/uul, especially for module params, they are
just numeric, but you can echo such values as 0x1234xxx, 07777888 and
1234aaa, for these cases, module params store operation just ignores
succesive invalid char and converts prefix part to a numeric although input
is acctually invalid.
This patch tries to fix the aforementioned issues and implements
strict_strtox serial functions, kernel/params.c uses them to strictly
validate input, so module params will reject such values as 0x1234xxxx and
returns an error:
write error: Invalid argument
Any modules which export numeric sysfs node can use strict_strtox instead of
simple_strtox to reject any invalid input.
Here are some test results:
Before applying this patch:
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]#
After applying this patch:
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo -n 4096 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]#
[akpm@linux-foundation.org: fix compiler warnings]
[akpm@linux-foundation.org: fix off-by-one found by tiwai@suse.de]
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since this header is exported to userspace and all the other types in the
header have been scrubbed, this brings the last straggler in line.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the arbitrary 128 device limit for NBD. nbds_max can now be set to
any number. In certain scenarios where devices are used sparsely we have
run into the 128 device limit.
Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new s_options field to struct super_block. Filesystems can save
mount options passed to them in mount or remount. It is automatically
freed when the superblock is destroyed.
A new helper function, generic_show_options() is introduced, which uses
this field to display the mount options in /proc/mounts.
Another helper function, save_mount_options() may be used by
filesystems to save the options in the super block.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Per previous discussions about cleaning up ufs_fs.h, people just want
this straight up dropped from userspace export. The only remaining
consumer (silo) has been fixed a while ago to not rely on this header.
This allows use to move it completely from include/linux/ to fs/ufs/
seeing as how the only in-kernel consumer is fs/ufs/.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Jan Kara <jack@ucw.cz>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Makes an embedded image a bit smaller.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
do_generic_mapping_read was used by gfs2 for internals reads, but this use
of the interface was rather suboptimal (as was the whole interface) and has
been replaced by an internal helper now. This patch kills
do_generic_mapping_read and surrounding damage in preparation of additional
cleanups for the buffered read path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All of the asm-*/types.h headers have been updated to no longer check
__STRICT_ANSI__ for the 64bit types, so this brings linux/types.h in line.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PWM device setup, and a simple PWM driver exposing a programming interface
giving access to each channel's full capabilities. Note that this doesn't
support starting several channels in synch.
[hskinnemoen@atmel.com: allocate platform device dynamically]
[hskinnemoen@atmel.com: Kconfig fix]
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Andrew Victor <linux@maxim.org.za>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
delayed_work_timer_fn() is a timer function, make it static.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
From version 2.6 of the SMBIOS standard, type 10 (On Board Devices
Information) becomes obsolete. The reason for this is that no further
fields can be added to this structure without adversely affecting existing
software's ability to properly parse the data.
Therefore type 41 (Onboard Devices Extended Information) was added.
The structure is as follows:
struct smbios_type_41 {
u8 type;
u8 length;
u16 handle;
u8 reference_designation_string;
u8 device_type; /* same device type as in type 10 */
u8 device_type_instance;
u16 segment_group_number;
u8 bus_number;
u8 device_function_number;
};
For more info: http://www.dmtf.org/standards/smbios
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Printing date and version of a driver makes sense if there's a maintainer
who's maintaining and using these, but printing ancient version information
only confuses users.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
udf_debug should be enclosed with do { } while (0)
to be safely used in code like below:
if (something)
udf_debug();
else
anything;
(Otherwise compiler will not compile it with:
"error: expected expression before 'else'")
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
simple_attr_close implementes ->release so it should be named accordingly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <stefano.brivio@polimi.it>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg KH <greg@kroah.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sometimes simple attributes might need to return an error, e.g. for
acquiring a mutex interruptibly. In fact we have that situation in
spufs already which is the original user of the simple attributes. This
patch merged the temporarily forked attributes in spufs back into the
main ones and allows to return errors.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <stefano.brivio@polimi.it>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg KH <greg@kroah.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset moves le*_add_cpu and be*_add_cpu functions from OCFS2 to core
header (1st), converts ext3 filesystem to this API (2nd) and replaces XFS
different named functions with new ones (3rd).
There are many places where these functions will be useful. Just look at:
grep -r 'cpu_to_[ble12346]*([ble12346]*_to_cpu.*[-+]' linux-src/ Patch for
ext3 is an example how conversions will probably look like.
This patch:
- move inline functions which add native byte order variable to
little/big endian variable to core header
* le16_add_cpu(__le16 *var, u16 val)
* le32_add_cpu(__le32 *var, u32 val)
* le64_add_cpu(__le64 *var, u64 val)
* be32_add_cpu(__be32 *var, u32 val)
- add for completeness:
* be16_add_cpu(__be16 *var, u16 val)
* be64_add_cpu(__be64 *var, u64 val)
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: Timothy Shimmin <tes@sgi.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add architecture support for the MN10300/AM33 CPUs produced by MEI to the
kernel.
This patch also adds board support for the ASB2303 with the ASB2308 daughter
board, and the ASB2305. The only processor supported is the MN103E010, which
is an AM33v2 core plus on-chip devices.
[akpm@linux-foundation.org: nuke cvs control strings]
Signed-off-by: Masakazu Urade <urade.masakazu@jp.panasonic.com>
Signed-off-by: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allocate serial port UART type IDs for the MN10300 on-chip serial ports.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Suppress A.OUT library support if CONFIG_ARCH_SUPPORTS_AOUT is not set.
Not all architectures support the A.OUT binfmt, so the ELF binfmt should not
be permitted to go looking for A.OUT libraries to load in such a case. Not
only that, but under such conditions A.OUT core dumps are not produced either.
To make this work, this patch also does the following:
(1) Makes the existence of the contents of linux/a.out.h contingent on
CONFIG_ARCH_SUPPORTS_AOUT.
(2) Renames dump_thread() to aout_dump_thread() as it's only called by A.OUT
core dumping code.
(3) Moves aout_dump_thread() into asm/a.out-core.h and makes it inline. This
is then included only where needed. This means that this bit of arch
code will be stored in the appropriate A.OUT binfmt module rather than
the core kernel.
(4) Drops A.OUT support for Blackfin (according to Mike Frysinger it's not
needed) and FRV.
This patch depends on the previous patch to move STACK_TOP[_MAX] out of
asm/a.out.h and into asm/processor.h as they're required whether or not A.OUT
format is available.
[jdike@addtoit.com: uml: re-remove accidentally restored code]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix typo in comments.
BTW: I have to fix coding style in arch/ia64/kernel/time.c also, otherwise
checkpatch.pl will be complaining.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Function timekeeping_is_continuous() no longer checks flag
CLOCK_IS_CONTINUOUS, and it checks CLOCK_SOURCE_VALID_FOR_HRES now. So rename
the function accordingly.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There's only one caller left - the kill_pgrp one - so merge these two
functions and forget the kill_pgrp_info one.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
signal_struct->tsk points to the ->group_leader and thus we have the nasty
code in de_thread() which has to change it and restart ->real_timer if the
leader is changed.
Use "struct pid *leader_pid" instead. This also allows us to kill now
unneeded send_group_sig_info().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a window when de_thread() switches the leader and drops
tasklist_lock. In that window do_each_pid_task(PIDTYPE_PID) finds both new
and old leaders.
The problem is pretty much theoretical and probably can be ignored. Currently
the only users of do_each_pid_task(PIDTYPE_PID) are send_sigio/send_sigurg, so
they can send the signal to the same process twice.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pid_vnr returns the user space pid with respect to the pid namespace the
struct pid was allocated in. What we want before we return a pid to user
space is the user space pid with respect to the pid namespace of current.
pid_vnr is a very nice optimization but because it isn't quite what we want
it is easy to use pid_vnr at times when we aren't certain the struct pid
was allocated in our pid namespace.
Currently this describes at least tiocgpgrp and tiocgsid in ttyio.c the
parent process reported in the core dumps and the parent process in
get_signal_to_deliver.
So unless the performance impact is huge having an interface that does what
we want instead of always what we want should be much more reliable and
much less error prone.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
do_signal_stop() counts all sub-thread and sets ->group_stop_count
accordingly. Every thread should decrement ->group_stop_count and stop,
the last one should notify the parent.
However a sub-thread can exit before it notices the signal_pending(), or it
may be somewhere in do_exit() already. In that case the group stop never
finishes properly.
Note: this is a minimal fix, we can add some optimizations later. Say we
can return quickly if thread_group_empty(). Also, we can move some signal
related code from exit_notify() to exit_signals().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change set_special_pids() to work with struct pid, not pid_t from global name
space. This again speedups and imho cleanups the code, also a preparation for
the next patch.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since the patch
"Fix ptrace_attach()/ptrace_traceme()/de_thread() race"
commit f5b40e363a
we set PT_ATTACHED and change child->parent "atomically" wrt task_list lock.
This means we can remove the checks like "PT_ATTACHED && ->parent != ptracer"
which were needed to catch the "ptrace attach is in progress" case. We can
also remove the flag itself since nobody else uses it.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sem_exit_ns(), msg_exit_ns() and shm_exit_ns() are all called when an
ipc_namespace is released to free all ipcs of each type. But in fact, they
do the same thing: they loop around all ipcs to free them individually by
calling a specific routine.
This patch proposes to consolidate this by introducing a common function,
free_ipcs(), that do the job. The specific routine to call on each
individual ipcs is passed as parameter. For this, these ipc-specific
'free' routines are reworked to take a generic 'struct ipc_perm' as
parameter.
Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Each ipc_namespace contains a table of 3 pointers to struct ipc_ids (3 for
msg, sem and shm, structure used to store all ipcs) These 'struct ipc_ids'
are dynamically allocated for each icp_namespace as the ipc_namespace
itself (for the init namespace, they are initialized with pointers to
static variables instead)
It is so for historical reason: in fact, before the use of idr to store the
ipcs, the ipcs were stored in tables of variable length, depending of the
maximum number of ipc allowed. Now, these 'struct ipc_ids' have a fixed
size. As they are allocated in any cases for each new ipc_namespace, there
is no gain of memory in having them allocated separately of the struct
ipc_namespace.
This patch proposes to make this table static in the struct ipc_namespace.
Thus, we can allocate all in once and get rid of all the code needed to
allocate and free these ipc_ids separately.
Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net>
Acked-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix an off by one bug in the fault reason string reporting function, and
clean up some of the code around this buglet.
[akpm@linux-foundation.org: cleanup]
Signed-off-by: mark gross <mgross@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Typical PDE creation code looks like:
pde = create_proc_entry("foo", 0, NULL);
if (pde)
pde->proc_fops = &foo_proc_fops;
Notice that PDE is first created, only then ->proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
possible to ->read without ->open (see one class of oopses below).
The fix is new API called proc_create() which makes sure ->proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:
pde = proc_create("foo", 0, NULL, &foo_proc_fops);
if (!pde)
return -ENOMEM;
Fix most networking users for a start.
In the long run, create_proc_entry() for regular files will go.
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/block/sda/sda1/dev
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0
EIP is at mutex_lock_nested+0x75/0x25d
EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
Call Trace:
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c10818b8>] proc_reg_read+0x60/0x73
[<c1081858>] proc_reg_read+0x0/0x73
[<c105a34f>] vfs_read+0x6c/0x8b
[<c105a6f3>] sys_read+0x3c/0x63
[<c10025f2>] sysenter_past_esp+0x5f/0xa5
[<c10697a7>] destroy_inode+0x24/0x33
=======================
INFO: lockdep is turned off.
Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we possibly lookup the pid in the wrong pid namespace. So
seq_file convert proc_pid_status which ensures the proper pid namespaces is
passed in.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: another build fix]
[akpm@linux-foundation.org: s390 build fix]
[akpm@linux-foundation.org: fix task_name() output]
[akpm@linux-foundation.org: fix nommu build]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andrew Morgan <morgan@kernel.org>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently many /proc/pid files use a crufty precursor to the current seq_file
api, and they don't have direct access to the pid_namespace or the pid of for
which they are displaying data.
So implement proc_single_file_operations to make the seq_file routines easy to
use, and to give access to the full state of the pid of we are displaying data
for.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Just like with the user namespaces, move the namespace management code into
the separate .c file and mark the (already existing) PID_NS option as "depend
on NAMESPACES"
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the IPC namespace management code is spread over the ipc/*.c files.
I moved this code into ipc/namespace.c file which is compiled out when needed.
The linux/ipc_namespace.h file is used to store the prototypes of the
functions in namespace.c and the stubs for NAMESPACES=n case. This is done
so, because the stub for copy_ipc_namespace requires the knowledge of the
CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in
included into many many .c files via the sys.h->sem.h sequence so adding the
sched.h into it will make all these .c depend on sched.h which is not that
good. On the other hand the knowledge about the namespaces stuff is required
in 4 .c files only.
Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
msg.c and shm.c files. It turned out that moving these functions into
namespaces.c is not that easy because they use many other calls and macros
from the original file. Moving them would make this patch complicated. On
the other hand all these functions can be consolidated, so I will send a
separate patch doing this a bit later.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently all the namespace management code is in the kernel/utsname.c file,
so just compile it out and make stubs in the appropriate header.
The init namespace itself is in init/version.c and is in the kernel all the
time.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When I replaced hugetlb_dynamic_pool with nr_overcommit_hugepages I used
proc_doulongvec_minmax() directly. However, hugetlb.c's locking rules
require that all counter modifications occur under the hugetlb_lock. Add a
callback into the hugetlb code similar to the one for nr_hugepages. Grab
the lock around the manipulation of nr_overcommit_hugepages in
proc_doulongvec_minmax().
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch contain the core infrastructure of enhanced partition
statistics. It adds to struct hd_struct the same stats data as struct
gendisk and define basics function to manipulate them.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Rearrange fields in cache order and initialize some fields that
we didn't previously init. Remove init of ->completion_data, it's
part of a union with ->hash. Luckily clearing the rb node is the same
as setting it to null!
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The below patch allows IPsec to use CTR mode with AES encryption
algorithm. Tested this using setkey in ipsec-tools.
Signed-off-by: Joy Latten <latten@austin.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'slub-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm:
SLUB: fix checkpatch warnings
Use non atomic unlock
SLUB: Support for performance statistics
SLUB: Alternate fast paths using cmpxchg_local
SLUB: Use unique end pointer for each slab page.
SLUB: Deal with annoying gcc warning on kfree()
The statistics provided here allow the monitoring of allocator behavior but
at the cost of some (minimal) loss of performance. Counters are placed in
SLUB's per cpu data structure. The per cpu structure may be extended by the
statistics to grow larger than one cacheline which will increase the cache
footprint of SLUB.
There is a compile option to enable/disable the inclusion of the runtime
statistics and its off by default.
The slabinfo tool is enhanced to support these statistics via two options:
-D Switches the line of information displayed for a slab from size
mode to activity mode.
-A Sorts the slabs displayed by activity. This allows the display of
the slabs most important to the performance of a certain load.
-r Report option will report detailed statistics on
Example (tbench load):
slabinfo -AD ->Shows the most active slabs
Name Objects Alloc Free %Fast
skbuff_fclone_cache 33 111953835 111953835 99 99
:0000192 2666 5283688 5281047 99 99
:0001024 849 5247230 5246389 83 83
vm_area_struct 1349 119642 118355 91 22
:0004096 15 66753 66751 98 98
:0000064 2067 25297 23383 98 78
dentry 10259 28635 18464 91 45
:0000080 11004 18950 8089 98 98
:0000096 1703 12358 10784 99 98
:0000128 762 10582 9875 94 18
:0000512 184 9807 9647 95 81
:0002048 479 9669 9195 83 65
anon_vma 777 9461 9002 99 71
kmalloc-8 6492 9981 5624 99 97
:0000768 258 7174 6931 58 15
So the skbuff_fclone_cache is of highest importance for the tbench load.
Pretty high load on the 192 sized slab. Look for the aliases
slabinfo -a | grep 000192
:0000192 <- xfs_btree_cur filp kmalloc-192 uid_cache tw_sock_TCP
request_sock_TCPv6 tw_sock_TCPv6 skbuff_head_cache xfs_ili
Likely skbuff_head_cache.
Looking into the statistics of the skbuff_fclone_cache is possible through
slabinfo skbuff_fclone_cache ->-r option implied if cache name is mentioned
.... Usual output ...
Slab Perf Counter Alloc Free %Al %Fr
--------------------------------------------------
Fastpath 111953360 111946981 99 99
Slowpath 1044 7423 0 0
Page Alloc 272 264 0 0
Add partial 25 325 0 0
Remove partial 86 264 0 0
RemoteObj/SlabFrozen 350 4832 0 0
Total 111954404 111954404
Flushes 49 Refill 0
Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%)
Looks good because the fastpath is overwhelmingly taken.
skbuff_head_cache:
Slab Perf Counter Alloc Free %Al %Fr
--------------------------------------------------
Fastpath 5297262 5259882 99 99
Slowpath 4477 39586 0 0
Page Alloc 937 824 0 0
Add partial 0 2515 0 0
Remove partial 1691 824 0 0
RemoteObj/SlabFrozen 2621 9684 0 0
Total 5301739 5299468
Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%)
Descriptions of the output:
Total: The total number of allocation and frees that occurred for a
slab
Fastpath: The number of allocations/frees that used the fastpath.
Slowpath: Other allocations
Page Alloc: Number of calls to the page allocator as a result of slowpath
processing
Add Partial: Number of slabs added to the partial list through free or
alloc (occurs during cpuslab flushes)
Remove Partial: Number of slabs removed from the partial list as a result of
allocations retrieving a partial slab or by a free freeing
the last object of a slab.
RemoteObj/Froz: How many times were remotely freed object encountered when a
slab was about to be deactivated. Frozen: How many times was
free able to skip list processing because the slab was in use
as the cpuslab of another processor.
Flushes: Number of times the cpuslab was flushed on request
(kmem_cache_shrink, may result from races in __slab_alloc)
Refill: Number of times we were able to refill the cpuslab from
remotely freed objects for the same slab.
Deactivate: Statistics how slabs were deactivated. Shows how they were
put onto the partial list.
In general fastpath is very good. Slowpath without partial list processing is
also desirable. Any touching of partial list uses node specific locks which
may potentially cause list lock contention.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
We use a NULL pointer on freelists to signal that there are no more objects.
However the NULL pointers of all slabs match in contrast to the pointers to
the real objects which are in different ranges for different slab pages.
Change the end pointer to be a pointer to the first object and set bit 0.
Every slab will then have a different end pointer. This is necessary to ensure
that end markers can be matched to the source slab during cmpxchg_local.
Bring back the use of the mapping field by SLUB since we would otherwise have
to call a relatively expensive function page_address() in __slab_alloc(). Use
of the mapping field allows avoiding a call to page_address() in various other
functions as well.
There is no need to change the page_mapping() function since bit 0 is set on
the mapping as also for anonymous pages. page_mapping(slab_page) will
therefore still return NULL although the mapping field is overloaded.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Many I2C hwmon drivers define a driver ID but no other code references
these, meaning that they are useless. Discard them, along with a few
IDs which are defined but never used at all.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
* Drop unused defines
* Drop unused driver ID
* Remove trailing whitespace
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
* Drop trailing spaces
* Drop unused driver ID
* Drop stray backslashes in macros
* Rename new_client to client
* Drop redundant initializations to 0
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
* Drop history, it doesn't belong there
* Drop unused struct member
* Drop bogus struct member comment
* Drop unused driver ID
* Rename new_client to client
* Drop redundant initializations to 0
* Drop useless cast
* Drop trailing space
* Fix comment
* Drop duplicate comment
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
Let drivers walk the DMI table for their own needs. Some drivers need
data stored in OEM-specific DMI records for proper operation.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (48 commits)
[SCSI] aacraid: do not set valid bit in sense information
[SCSI] ses: add new Enclosure ULD
[SCSI] enclosure: add support for enclosure services
[SCSI] sr: fix test unit ready responses
[SCSI] u14-34f: fix data direction bug
[SCSI] aacraid: pci_set_dma_max_seg_size opened up for late model controllers
[SCSI] fix BUG when sum(scatterlist) > bufflen
[SCSI] arcmsr: updates (1.20.00.15)
[SCSI] advansys: make 3 functions static
[SCSI] Small cleanups for scsi_host.h
[SCSI] dc395x: fix uninitialized var warning
[SCSI] NCR53C9x: remove driver
[SCSI] remove m68k NCR53C9x based drivers
[SCSI] dec_esp: Remove driver
[SCSI] kernel-doc: fix scsi docbook
[SCSI] update my email address
[SCSI] add protocol definitions
[SCSI] sd: handle bad lba in sense information
[SCSI] qla2xxx: Update version number to 8.02.00-k8.
[SCSI] qla2xxx: Correct issue where incorrect init-fw mailbox command was used on non-NPIV capable ISPs.
...
The enclosure misc device is really just a library providing sysfs
support for physical enclosure devices and their components.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Commit ad7f71674a ("[POWERPC] Use a
sensible default for clock_getres() in the VDSO") corrected the clock
resolution reported by the VDSO clock_getres() but introduced another
problem in that older versions of gcc (gcc-4.0 and earlier) fail to
compile the new code in arch/powerpc/kernel/asm-offsets.c.
This fixes it by introducing a new MONOTONIC_RES_NSEC define in the
generic code which is equivalent to KTIME_MONOTONIC_RES but is just an
integer constant, not a ktime union.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.infradead.org/mtd-2.6: (120 commits)
[MTD] Fix mtdoops.c compilation
[MTD] [NOR] fix startup lock when using multiple nor flash chips
[MTD] [DOC200x] eccbuf is statically defined and always evaluate to true
[MTD] Fix maps/physmap.c compilation with CONFIG_PM
[MTD] onenand: Add panic_write function to the onenand driver
[MTD] mtdoops: Use the panic_write function when present
[MTD] Add mtd panic_write function pointer
[MTD] [NAND] Freescale enhanced Local Bus Controller FCM NAND support.
[MTD] physmap.c: Add support for multiple resources
[MTD] [NAND] Fix misparenthesization introduced by commit 78b65179...
[MTD] [NAND] Fix Blackfin NFC ECC calculating bug with page size 512 bytes
[MTD] [NAND] Remove wrong operation in PM function of the BF54x NFC driver
[MTD] [NAND] Remove unused variable in plat_nand_remove
[MTD] Unlocking all Intel flash that is locked on power up.
[MTD] [NAND] at91_nand: Make mtdparts option can override board info
[MTD] mtdoops: Various minor cleanups
[MTD] mtdoops: Ensure sequential write to the buffer
[MTD] mtdoops: Perform write operations in a workqueue
[MTD] mtdoops: Add further error return code checking
[MTD] [NOR] Test devtype, not definition in flash_probe(), drivers/mtd/devices/lart.c
...
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds:
leds: Add HP Jornada 6xx driver
leds: Remove the now uneeded ixp4xx driver
leds: Add power LED to the wrap driver
leds: Fix led-gpio active_low default brightness
leds: hw acceleration for Clevo mail LED driver
leds: Add support for hardware accelerated LED flashing
leds: Standardise LED naming scheme
leds: Add clevo notebook LED driver
* git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] Add missing printk levels to e_powersaver
[CPUFREQ] Fix sparse warning in powernow-k8
[CPUFREQ] Support Model D parts and newer in e_powersaver
[CPUFREQ] Powernow-k8: Update to support the latest Turion processors
[CPUFREQ] fix configuration help message
[CPUFREQ] powernow-k8 print pstate instead of fid/did for family 10h
[CPUFREQ] Eliminate cpufreq_userspace scaling_setspeed deadlock
[CPUFREQ] gx-suspmod.c: use boot_cpu_data instead of current_cpu_data
[CPUFREQ] fix incorrect comment on show_available_freqs() in freq_table.c
[CPUFREQ] drivers/cpufreq: Add missing "space"
[CPUFREQ] arch/x86: Add missing "space"
[CPUFREQ] Remove pointless Kconfig dependancy
- Cy_EVENT_OPEN_WAKEUP is simple wake_up
- Cy_EVENT_HANGUP is wake_up + tty_hangup, which schedules its own work
- Cy_EVENT_WRITE_WAKEUP is tty_wakeup which may be called directly too
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- tty_hangup schedules a bottomhalf itself, tty_wakeup doesn't need it
- call the CD code (part of work handler previously) directly from the code
(it wakes somebody up or calls tty_hangup at worse)
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
tty_hangup schedules a work for hangup itself, no need to do it in the driver.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is no need to schedule a bottomhalf for either of them. One is fast
and the another schedules a bottomhalf itself.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Do not export asm/page.h during make headers_install. This removes PAGE_SIZE
from userspace headers.
Signed-off-by: Kirill A. Shutemov <k.shutemov@gmail.com>
Reviewed-by: David Woodhouse <dwmw2@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Do not export asm/elf.h during make headers_install.
Signed-off-by: Kirill A. Shutemov <k.shutemov@gmail.com>
Reviewed-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Do not export asm/user.h and linux/user.h during make headers_install.
Signed-off-by: Kirill A. Shutemov <k.shutemov@gmail.com>
Reviewed-by: David Woodhouse <dwmw2@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the old iget() call and the read_inode() superblock operation it uses
as these are really obsolete, and the use of read_inode() does not produce
proper error handling (no distinction between ENOMEM and EIO when marking an
inode bad).
Furthermore, this removes the temptation to use iget() to find an inode by
number in a filesystem from code outside that filesystem.
iget_locked() should be used instead. A new function is added in an earlier
patch (iget_failed) that is to be called to mark an inode as bad, unlock it
and release it should the get routine fail. Mark iget() and read_inode() as
being obsolete and remove references to them from the documentation.
Typically a filesystem will be modified such that the read_inode function
becomes an internal iget function, for example the following:
void thingyfs_read_inode(struct inode *inode)
{
...
}
would be changed into something like:
struct inode *thingyfs_iget(struct super_block *sp, unsigned long ino)
{
struct inode *inode;
int ret;
inode = iget_locked(sb, ino);
if (!inode)
return ERR_PTR(-ENOMEM);
if (!(inode->i_state & I_NEW))
return inode;
...
unlock_new_inode(inode);
return inode;
error:
iget_failed(inode);
return ERR_PTR(ret);
}
and then thingyfs_iget() would be called rather than iget(), for example:
ret = -EINVAL;
inode = iget(sb, ino);
if (!inode || is_bad_inode(inode))
goto error;
becomes:
inode = thingyfs_iget(sb, ino);
if (IS_ERR(inode)) {
ret = PTR_ERR(inode);
goto error;
}
Note that is_bad_inode() does not need to be called. The error returned by
thingyfs_iget() should render it unnecessary.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stop the QNX4 filesystem from using iget() and read_inode(). Replace
qnx4_read_inode() with qnx4_iget(), and call that instead of iget().
qnx4_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
qnx4_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Anders Larsen <al@alarsen.net>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stop the EXT4 filesystem from using iget() and read_inode(). Replace
ext4_read_inode() with ext4_iget(), and call that instead of iget().
ext4_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
ext4_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stop the EXT3 filesystem from using iget() and read_inode(). Replace
ext3_read_inode() with ext3_iget(), and call that instead of iget().
ext3_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
ext3_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stop the EFS filesystem from using iget() and read_inode(). Replace
efs_read_inode() with efs_iget(), and call that instead of iget(). efs_iget()
then uses iget_locked() directly and returns a proper error code instead of an
inode in the event of an error.
efs_fill_super() returns any error incurred when getting the root inode
instead of EACCES.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce a function to register failure in an inode construction path. This
includes marking the inode under construction as bad, unlocking it and
releasing it.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add an ERR_CAST() function to complement ERR_PTR and co. for the purposes
of casting an error entyped as one pointer type to an error of another
pointer type whilst making it explicit as to what is going on.
This provides a replacement for the ERR_PTR(PTR_ERR(p)) construct.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For readability, all the calls to vmcoreinfo_append_str() are changed to macros
having a prefix "VMCOREINFO_".
This discussion is the following:
http://www.ussg.iu.edu/hypermail/linux/kernel/0709.3/0584.html
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Acked-by: Simon Horman <horms@verge.net.au>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is better that the existing offsetof() is used for VMCOREINFO_OFFSET().
This discussion is the following:
http://www.ussg.iu.edu/hypermail/linux/kernel/0709.3/0584.html
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Acked-by: Simon Horman <horms@verge.net.au>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset is for the vmcoreinfo data.
The vmcoreinfo data has the minimum debugging information only for dump
filtering. makedumpfile (dump filtering command) gets it to distinguish
unnecessary pages, and makedumpfile creates a small dumpfile.
This patch:
VMCOREINFO_SIZE() should be renamed VMCOREINFO_STRUCT_SIZE() since it's always
returning the size of the struct with a given name. This change would allow
VMCOREINFO_TYPEDEF_SIZE() to simply become VMCOREINFO_SIZE() since it need not
be used exclusively for typedefs.
This discussion is the following:
http://www.ussg.iu.edu/hypermail/linux/kernel/0709.3/0582.html
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset adds a flags variable to reserve_bootmem() and uses the
BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
between crashkernel area and already used memory.
This patch:
Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
If that flag is set, the function returns with -EBUSY if the memory already
has been reserved in the past. This is to avoid conflicts.
Because that code runs before SMP initialisation, there's no race condition
inside reserve_bootmem_core().
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: <linux-arch@vger.kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a patch for the Compaq ASIC3 multi function chip, found in many
PDAs (iPAQs, HTCs...).
It is a simplified version of Paul Sokolovsky's first proposal [1]. With
this code, it is basically a GPIO and IRQ expander. My plan is to add more
features once this patch gets reviewed and accepted.
[1] http://lkml.org/lkml/2007/5/1/46
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
Cc: Paul Sokolovsky <pmiscml@gmail.com>
Cc: Ben Dooks <ben@trinity.fluff.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch corrects a situation that occurs when one disables all the cpus in
a cpuset.
Currently, the disabled (cpu-less) cpuset inherits the cpus of its parent,
which is incorrect because it may then overlap its cpu-exclusive sibling.
Tasks of an empty cpuset should be moved to the cpuset which is the parent of
their current cpuset. Or if the parent cpuset has no cpus, to its parent,
etc.
And the empty cpuset should be released (if it is flagged notify_on_release).
Depends on the cgroup_scan_tasks() function (proposed by David Rientjes) to
iterate through all tasks in the cpu-less cpuset. We are deliberately
avoiding a walk of the tasklist.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide cgroup_scan_tasks(), which iterates through every task in a cgroup,
calling a test function and a process function for each. And call the process
function without holding the css_set_lock lock.
The idea is David Rientjes', predicting that such a function will make it much
easier in the future to extend things that require access to each task in a
cgroup without holding the lock,
[akpm@linux-foundation.org: cleanup]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Based on the discussion at http://lkml.org/lkml/2007/12/20/383, it was felt
that control_type might not be a good thing to implement right away. We
can add this flexibility at a later point when required.
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Define function for calculating the number of scan target on each Zone/LRU.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Paul Menage <menage@google.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a handler "pre_destroy" to cgroup_subsys. It is called before
cgroup_rmdir() checks all subsys's refcnt.
I think this is useful for subsys which have some extra refs even if there
are no tasks in cgroup. By adding pre_destroy(), the kernel keeps the rule
"destroy() against subsystem is called only when refcnt=0." and allows css
ref to be used by other objects than tasks.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Paul Menage <menage@google.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While using memory control cgroup, page-migration under it works as following.
==
1. uncharge all refs at try to unmap.
2. charge regs again remove_migration_ptes()
==
This is simple but has following problems.
==
The page is uncharged and charged back again if *mapped*.
- This means that cgroup before migration can be different from one after
migration
- If page is not mapped but charged as page cache, charge is just ignored
(because not mapped, it will not be uncharged before migration)
This is memory leak.
==
This patch tries to keep memory cgroup at page migration by increasing
one refcnt during it. 3 functions are added.
mem_cgroup_prepare_migration() --- increase refcnt of page->page_cgroup
mem_cgroup_end_migration() --- decrease refcnt of page->page_cgroup
mem_cgroup_page_migration() --- copy page->page_cgroup from old page to
new page.
During migration
- old page is under PG_locked.
- new page is under PG_locked, too.
- both old page and new page is not on LRU.
These 3 facts guarantee that page_cgroup() migration has no race.
Tested and worked well in x86_64/fake-NUMA box.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Creates a helper function to return non-zero if a task is a member of a
memory controller:
int task_in_mem_cgroup(const struct task_struct *task,
const struct mem_cgroup *mem);
When the OOM killer is constrained by the memory controller, the exclusion
of tasks that are not a member of that controller was previously misplaced
and appeared in the badness scoring function. It should be excluded
during the tasklist scan in select_bad_process() instead.
[akpm@linux-foundation.org: build fix]
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Inline functions must preceed their use, so mm_cgroup() should be defined
in linux/memcontrol.h.
include/linux/memcontrol.h:48: warning: 'mm_cgroup' declared inline after
being called
include/linux/memcontrol.h:48: warning: previous declaration of
'mm_cgroup' was here
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: nuther build fix]
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin pointed out that swap cache and page cache addition routines
could be called from non GFP_KERNEL contexts. This patch makes the
charging routine aware of the gfp context. Charging might fail if the
cgroup is over it's limit, in which case a suitable error is returned.
This patch was tested on a Powerpc box. I am still looking at being able
to test the path, through which allocations happen in non GFP_KERNEL
contexts.
[kamezawa.hiroyu@jp.fujitsu.com: problem with ZONE_MOVABLE]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make page_referenced() cgroup aware. Without this patch, page_referenced()
can cause a page to be skipped while reclaiming pages. This patch ensures
that other cgroups do not hold pages in a particular cgroup hostage. It
is required to ensure that shared pages are freed from a cgroup when they
are not actively referenced from the cgroup that brought them in
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Choose if we want cached pages to be accounted or not. By default both are
accounted for. A new set of tunables are added.
echo -n 1 > mem_control_type
switches the accounting to account for only mapped pages
echo -n 3 > mem_control_type
switches the behaviour back
[bunk@kernel.org: mm/memcontrol.c: clenups]
[akpm@linux-foundation.org: fix sparc32 build]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Out of memory handling for cgroups over their limit. A task from the
cgroup over limit is chosen using the existing OOM logic and killed.
TODO:
1. As discussed in the OLS BOF session, consider implementing a user
space policy for OOM handling.
[akpm@linux-foundation.org: fix build due to oom-killer changes]
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change the interface to use bytes instead of pages. Page sizes can vary
across platforms and configurations. A new strategy routine has been added
to the resource counters infrastructure to format the data as desired.
Suggested by David Rientjes, Andrew Morton and Herbert Poetzl
Tested on a UML setup with the config for memory control enabled.
[kamezawa.hiroyu@jp.fujitsu.com: possible race fix in res_counter]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the page_cgroup to the per cgroup LRU. The reclaim algorithm has
been modified to make the isolate_lru_pages() as a pluggable component. The
scan_control data structure now accepts the cgroup on behalf of which
reclaims are carried out. try_to_free_pages() has been extended to become
cgroup aware.
[akpm@linux-foundation.org: fix warning]
[Lee.Schermerhorn@hp.com: initialize all scan_control's isolate_pages member]
[bunk@kernel.org: make do_try_to_free_pages() static]
[hugh@veritas.com: memcgroup: fix try_to_free order]
[kamezawa.hiroyu@jp.fujitsu.com: this unlock_page_cgroup() is unnecessary]
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the accounting hooks. The accounting is carried out for RSS and Page
Cache (unmapped) pages. There is now a common limit and accounting for both.
The RSS accounting is accounted at page_add_*_rmap() and page_remove_rmap()
time. Page cache is accounted at add_to_page_cache(),
__delete_from_page_cache(). Swap cache is also accounted for.
Each page's page_cgroup is protected with the last bit of the
page_cgroup pointer, this makes handling of race conditions involving
simultaneous mappings of a page easier. A reference count is kept in the
page_cgroup to deal with cases where a page might be unmapped from the RSS
of all tasks, but still lives in the page cache.
Credits go to Vaidyanathan Srinivasan for helping with reference counting work
of the page cgroup. Almost all of the page cache accounting code has help
from Vaidyanathan Srinivasan.
[hugh@veritas.com: fix swapoff breakage]
[akpm@linux-foundation.org: fix locking]
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: <Valdis.Kletnieks@vt.edu>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Basic setup routines, the mm_struct has a pointer to the cgroup that
it belongs to and the the page has a page_cgroup associated with it.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Setup the memory cgroup and add basic hooks and controls to integrate
and work with the cgroup.
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With fixes from David Rientjes <rientjes@google.com>
Introduce generic structures and routines for resource accounting.
Each resource accounting cgroup is supposed to aggregate it,
cgroup_subsystem_state and its resource-specific members within.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This legacy define from the old buffer code is now only used in a single
power pc driver than doesn't compile anyway.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename old vfs_ioctl to do_ioctl, because the comment above it clearly
indicates that it is an internal function not to be exported to modules;
therefore it should have a more traditional do_XXX name. The new do_ioctl
is exported in fs.h but not to modules.
Rename the old do_ioctl to vfs_ioctl because the names vfs_XXX should
preferably be reserved to callable VFS functions which modules may call, as
many other vfs_XXX functions already do. Export the new vfs_ioctl to GPL
modules so others can use it (including Unionfs and eCryptfs). Add DocBook
for new vfs_ioctl.
[akpm@linux-foundation.org: fix build]
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The DS1WM driver incorrectly infers the IAS bit (1-wire interrupt active
high) from IRQ settings. There are devices that have IAS=0 but still need
the IRQ to trigger on a rising edge. With this patch, machines with DS1WM
that need IAS=1 have to set .active_high=1 in the ds1wm_platform_data.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Acked-by: Matt Reimer <mreimer@vpop.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
MTDs are well suited for logging critical data and the mtdoops driver
allows kernel panics/oops to be written to flash in a blackbox flight
recorder fashion allowing better debugging and analysis of crashes.
Any kernel oops in user context can be easily handled since the kernel
continues as normal and any queued mtd writes are scheduled. Any kernel
oops in interrupt context results in a panic and the delayed writes will
not be scheduled however. The existing mtd->write function cannot be
called in interrupt context so these messages can never be written to
flash.
This patch adds a panic_write function pointer that drivers can
optionally implement which can be called in interrupt context. It is
only intended to be called when its known the kernel is about to panic
and we need to write to succeed. Since the kernel is not going to be
running for much longer, this function can break locks and delay to
ensure the write succeeds (but not sleep).
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Extends the leds subsystem with a blink_set() callback function which can
be optionally implemented by a LED driver. If implemented, the driver can use
the hardware acceleration for blinking a LED.
Signed-off-by: Márton Németh <nm127@freemail.hu>
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
This makes the SPE register data appear in ELF core dumps, using the
new n_type value NT_PPC_SPE (0x101). This new note type is not used
by any consumers of core files yet, but support can be added. I don't
even have any hardware with SPE capabilities, so I've never seen such
a note. But this demonstrates how simple it is to export register
information in core dumps when the user_regset style is used for the
low-level code.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Two cleanups to <linux/acpi.h>:
* Stop defining acpi_mp_config, it isn't used anywhere.
* Discard nested "#ifdef CONFIG_ACPI", they are useless and
error-prone.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Add a default poll idle state with 0 latency. Provides an option to users
to use poll_idle by using 0 as the latency requirement.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Export acpi_check_resource_conflict(), sometimes drivers already have
a struct resource at hand so no need to use the wrappers to build a new
one.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: "Mark M. Hoffman" <mhoffman@lightlink.com>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Small ACPICA extension to be able to store the name of operation regions in osl.c later
In ACPI, AML can define accesses to IO ports and System Memory by Operation
Regions. Those are not registered as done by PNPACPI using resource templates
(and _CRS/_SRS methods).
The IO ports and System Memory regions may get accessed by arbitrary AML code.
When native drivers are accessing the same resources bad things can happen
(e.g. a critical shutdown temperature of 3000 C every 2 months or so).
It is not really possible to register the operation regions via
request_resource, as they often overlap with pnp or other resources (e.g.
statically setup IO resources below 0x100).
This approach stores all Operation Region declarations (IO and System Memory
only) at ACPI table parse time. It offers a similar functionality like
request_region and let drivers which are known to possibly use the same IO
ports and Memory which are also often used by ACPI (hwmon and i2c) check for
ACPI interference.
A boot parameter acpi_enforce_resources=strict/lax/no is provided, which
is default set to lax:
- strict: let conflicting drivers fail to load with an error message
- lax: let conflicting driver work normal with a warning message
- no: no functional change at all
Depending on the feedback and the kind of interferences we see, this
should be set to strict at later time.
Goal of this patch set is:
- Identify ACPI interferences in bug reports (very hard to reproduce
and to identify)
- Find BIOSes for that an ACPI driver should exist for specific HW
instead of a native one.
- stability in general
Provide acpi_check_{mem_}region.
Drivers can additionally check against possible ACPI interference by also
invoking this shortly before they call request_region.
If -EBUSY is returned, the driver must not load.
Use acpi_enforce_resources=strict/lax/no options to:
- strict: let conflicting drivers fail to load with an error message
- lax: let conflicting driver work normal with a warning message
- no: no functional change at all
Cc: "Mark M. Hoffman" <mhoffman@lightlink.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Now that struct mlx4_buf.u is a struct instead of a union because of
the vmap() changes, there's no point in having a struct at all. So
move .direct and .page_list directly into struct mlx4_buf and get rid
of a bunch of unnecessary ".u"s.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Since kernel virtual memory is not a problem on 64-bit systems, there
is no reason to use our own 2-layer page mapping scheme for large
kernel queue buffers on such systems. Instead, map the page list to a
single virtually contiguous buffer with vmap(), so that can we access
buffer memory via direct indexing.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We use struct mlx4_buf for kernel QP, CQ and SRQ buffers, and the code
to look up an entry is duplicated in get_cqe_from_buf() and the QP and
SRQ versions of get_wqe(). Factor this out into mlx4_buf_offset().
This will also make it easier to switch over to using vmap() for buffers.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Eliminate cpufreq_userspace scaling_setspeed deadlock.
Luming Yu recently uncovered yet another cpufreq related deadlock.
One thread that continuously switches the governors and the other thread that
repeatedly cats the contents of cpufreq directory causes both these threads to
go into a deadlock.
Detailed examination of the deadlock showed the exact flow before the deadlock
as:
Thread 1 Thread 2
________ ________
cats files under /sys/devices/.../cpufreq/
Set governor to userspace
Adds a new sysfs entry for
scaling_setspeed
cats files under /sys/devices/.../cpufreq/
Set governor to performance
Holds cpufreq_rw_sem in write
mode
Sends a STOP notify to
userspace governor
cat /sys/devices/.../cpufreq/scaling_setspeed
Gets a handle on the above sysfs entry with
sysfs_get_active
Blocks while trying to get cpufreq_rw_sem
in read mode
Remove a sysfs entry for
scaling_setspeed
Blocks on sysfs_deactivate
while waiting for earlier
get_active (on other thread)
to drain
At this point both threads go into deadlock and any other thread that tries to
do anything with sysfs cpufreq will also block.
There seems to be no easy way to avoid this deadlock as long as
cpufreq_userspace adds/removes the sysfs entry under same kobject as cpufreq.
Below patch moves scaling_setspeed to cpufreq.c, keeping it always and calling
back the governor on read/write. This is the cleanest fix I could think of,
even though adding two callbacks in governor structure just for this seems
unnecessary.
Note that the change makes scaling_setspeed under /sys/.../cpufreq permanent
and returns <unsupported> when governor is not userspace.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
See Documentation/ABI/testing/sysfs-firmware-acpi
Based-on-original-patch-by: Luming Yu <luming.yu@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
When an ACPI table is overridden (for now this can happen only for DSDT)
display a big warning and taint the kernel with flag A.
Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'async-tx-for-linus' of git://lost.foo-projects.org/~dwillia2/git/iop:
async_tx: allow architecture specific async_tx_find_channel implementations
async_tx: replace 'int_en' with operation preparation flags
async_tx: kill tx_set_src and tx_set_dest methods
async_tx: kill ASYNC_TX_ASSUME_COHERENT
iop-adma: use LIST_HEAD instead of LIST_HEAD_INIT
async_tx: use LIST_HEAD instead of LIST_HEAD_INIT
async_tx: fix compile breakage, mark do_async_xor __always_inline
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
ata_piix.c:piix_init_one() must be __devinit
sata_via.c: Remove missleading comment.
libata-core: unblacklist HITACHI drives
sata_nv: fix ATAPI issues with memory over 4GB (v7)
ata: drivers/ata/sata_mv.c needs dmapool.h
libata: kill now unused n_iter and fix sata_fsl
ahci: fix CAP.NP and PI handling
sata_mv: Support SoC controllers
Rename: linux/pata_platform.h to linux/ata_platform.h
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (35 commits)
virtio net: fix oops on interface-up
Fix PHY Lib support for gianfar and ucc_geth
forcedeth: preserve registers
forcedeth: phy status fix
forcedeth: restart tx/rx
ipvs: Make wrr "no available servers" error message rate-limited
[PPPOL2TP]: Label unused warning when CONFIG_PROC_FS is not set.
[NET_SCHED]: cls_flow: support classification based on VLAN tag
[VLAN]: Constify skb argument to vlan_get_tag()
[NET_SCHED]: cls_flow: fix key mask validity check
[NET_SCHED]: em_meta: fix compile warning
b43: Fix DMA for 30/32-bit DMA engines
b43: fix build with CONFIG_SSB_PCIHOST=n
mac80211: Is not EXPERIMENTAL anymore
iwl3945-base.c: fix off-by-one errors
b43legacy: fix DMA slot resource leakage
b43legacy: drop packets we are not able to encrypt
b43legacy: fix suspend/resume
b43legacy: fix PIO crash
Generic HDLC - use random_ether_addr()
...
Move a few kernel-only things into __KERNEL__.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__journal_abort_hard() can now become static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Finish ITERATE_ to for_each conversion.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As this is more in line with common practice in the kernel. Also swap the
args around to be more like list_for_each.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, a given device is "claimed" by a particular array so that it cannot
be used by other arrays.
This is not ideal for DDF and other metadata schemes which have their own
partitioning concept.
So for externally managed metadata, just claim the device for md in general,
require that "offset" and "size" are set properly for each device, and make
sure that if a device is included in different arrays then the active sections
do not overlap.
This involves adding another flag to the rdev which makes it awkward to set
"->flags = 0" to clear certain flags. So now clear flags explicitly by name
when we want to clear things.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allows userspace to control resync/reshape progress and synchronise it
with other activities, such as shared access in a SAN, or backing up critical
sections during a tricky reshape.
Writing a number of sectors (which must be a multiple of the chunk size if
such is meaningful) causes a resync to pause when it gets to that point.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Add a state flag 'external' to indicate that the metadata is managed
externally (by user-space) so important changes need to be
left of user-space to handle.
Alternates are non-persistant ('none') where there is no stable metadata -
after the array is stopped there is no record of it's status - and
internal which can be version 0.90 or version 1.x
These are selected by writing to the 'metadata' attribute.
- move the updating of superblocks (sync_sbs) to after we have checked if
there are any superblocks or not.
- New array state 'write_pending'. This means that the metadata records
the array as 'clean', but a write has been requested, so the metadata has
to be updated to record a 'dirty' array before the write can continue.
This change is reported to md by writing 'active' to the array_state
attribute.
- tidy up marking of sb_dirty:
- don't set sb_dirty when resync finishes as md_check_recovery
calls md_update_sb when the sync thread finishes anyway.
- Don't set sb_dirty in multipath_run as the array might not be dirty.
- don't mark superblock dirty when switching to 'clean' if there
is no internal superblock (if external, userspace can choose to
update the superblock whenever it chooses to).
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently an md array with a write-intent bitmap does not updated that bitmap
to reflect successful partial resync. Rather the entire bitmap is updated
when the resync completes.
This is because there is no guarentee that resync requests will complete in
order, and tracking each request individually is unnecessarily burdensome.
However there is value in regularly updating the bitmap, so add code to
periodically pause while all pending sync requests complete, then update the
bitmap. Doing this only every few seconds (the same as the bitmap update
time) does not notciably affect resync performance.
[snitzer@gmail.com: export bitmap_cond_end_sync]
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: "Mike Snitzer" <snitzer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes it possible to control panel pins usage with flags passed
from the platform data. Without this patch the sm501fb driver always controls
the VBIASEN and FPEN pins. The polarity and use of these pins are very
platform specific, so this patch introduces the flags
SM501FB_FLAG_PANEL_USE_VBIASEN and SM501FB_FLAG_PANEL_USE_FPEN which enable
the use of these pins.
This patch is needed to support the a Sharp LQ104V1DG21 lcd panel on SuperH
platforms such as R2D-1 and R2D-PLUS boards. Letting the sm501fb driver
control the FPEN and VBIASEN pins like today just results in lcd panel
flicker.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This second part of an extension to support more pca953x chips renames the C
and Kconfig symbols. All affected files were updated by sed, except for a
couple of obvious exceptions. It also updates the Kconfig helptext.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
First part of an extension to let the pca9539 driver support more chips,
starting with pca9534, pca9535, pca9536, pca9537, and pca9538.
This renames the files and modifies the Makefile.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a GPIO 1-wire bus master driver. The driver used the GPIO API to
control the wire and the GPIO pin can be specified using platform data
similar to i2c-gpio. The driver was tested with AT91SAM9260 + DS2401.
Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide support to add an optional user defined callback to be run at
function entry of a kretprobe'd function. Also modify the kprobe smoke
tests to include an entry-handler during the kretprobe sanity test.
Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Acked-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kernel has a divide by zero crash when trying to run the system timer
less than 100Hz. The problem is x/(HZ/USER_HZ) and related. Now
x*(USER_HZ/HZ) will be used if HZ<USER_HZ.
I'm running the Linux kernel under qemu and went to run a slower system
timer to take less CPU (and battery) on the host. I found that the kernel
paniced under emulation because of a divide by zero in three places. Here
is the patch. The base git was updated today 01-05-2008. I went for a
20Hz system time by adding config HZ_20 etc to kernel/Kconfig.hz. With
this patch I verified the system timer by looking at /proc/interrupts.
[akpm@linux-foundation.org: partially clean up the macro maze]
Signed-off-by: David Fries <david@fries.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
groups_sort() can be quite long if user loads a large gid table.
This is because GROUP_AT(group_info, some_integer) uses an integer divide.
So having to do XXX thousand divides during one syscall can lead to very
high latencies. (NGROUPS_MAX=65536)
In the past (25 Mar 2006), an analog problem was found in groups_search()
(commit d74beb9f33 ) and at that time I
changed some variables to unsigned int.
I believe that a more generic fix is to make sure NGROUPS_PER_BLOCK is
unsigned.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Added pci device id for the Quatech SPPXP-100 ExpressCard - 0x278 - to
include/linux/pci_id.h
Modified drivers/parport/parport_pc.c to support the Quatech SPPXP-100 Parallel port PCI ExpressCard
[akpm@linux-foundation.org: build fix]
Signed-off-by: Luís P Mendes <luis.p.mendes@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds support to allow asm/ptrace.h to define two new macros,
arch_ptrace_stop_needed and arch_ptrace_stop. These control special
machine-specific actions to be done before a ptrace stop. The new code
compiles away to nothing when the new macros are not defined. This is the
case on all machines to begin with.
On ia64, these macros will be defined to solve the long-standing issue of
ptrace vs register backing store.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I couldn't find any users, so removing it..
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Cc: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I couldn't find any users, so removing it..
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Acked-by: Alan Cox <alan@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The rcu_assign_pointer() primitive currently unconditionally executes a
memory barrier, even when a NULL pointer is being assigned. This has lead
some to avoid using rcu_assign_pointer() for NULL pointers, which loses the
self-documenting advantages of rcu_assign_pointer() This patch uses
__builtin_const_p() to omit needless memory barriers for NULL-pointer
assignments at compile time with no runtime penalty, as discussed in the
following thread:
http://www.mail-archive.com/netdev@vger.kernel.org/msg54852.html
Tested on x86_64 and ppc64, also compiled the four cases (NULL/non-NULL
and const/non-const) with gcc version 4.1.2, and hand-checked the
assembly output.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NR_OPEN (historically set to 1024*1024) actually forbids processes to open
more than 1024*1024 handles.
Unfortunatly some production servers hit the not so 'ridiculously high
value' of 1024*1024 file descriptors per process.
Changing NR_OPEN is not considered safe because of vmalloc space potential
exhaust.
This patch introduces a new sysctl (/proc/sys/fs/nr_open) wich defaults to
1024*1024, so that admins can decide to change this limit if their workload
needs it.
[akpm@linux-foundation.org: export it for sparc64]
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, no notification event has been sent when inode's link count
changed. This is inconvenient for the application in some cases:
Suppose you have the following directory structure
foo/test
bar/
and you watch test. If someone does "mv foo/test bar/", you get event
IN_MOVE_SELF and you know something has happened with the file "test".
However if someone does "ln foo/test bar/test" and "rm foo/test" you get no
inotify event for the file "test" (only directories "foo" and "bar" receive
events).
Furthermore it could be argued that link count belongs to file's metadata and
thus IN_ATTRIB should be sent when it changes.
The following patch implements sending of IN_ATTRIB inotify events when link
count of the inode changes, i.e., when a hardlink to the inode is created or
when it is removed. This event is sent in addition to all the events sent so
far. In particular, when a last link to a file is removed, IN_ATTRIB event is
sent in addition to IN_DELETE_SELF event.
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Morten Welinder <mwelinder@gmail.com>
Cc: Robert Love <rlove@google.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use list_for_each_entry_reverse for super_blocks list and remove
unused sb_entry macro.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I was happy to discover the brand new IS_ALIGN macro and quickly used it in
my code. To my dismay I found that the generated code used division to
perform the test.
This patch fixes it by changing the % test to an &. This avoids the
division.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of allocating a fix sized array of NR_CPUS pointers for percpu_data,
we can use nr_cpu_ids, which is generally < NR_CPUS.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After some archeology (see http://logfs.org/logfs/inode_state_bits) I
finally figured out what the three I_DIRTY bits do. Maybe others would
prefer less effort to reach this insight.
Signed-off-by: Joern Engel <joern@logfs.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Given a number of places in the tree that need to calculate this value
explicitly, might as well just create a macro for it.
(akpm: must be implemented as a macro for callee typeof() usage)
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a proper prototype for vty_init() in include/linux/vt_kern.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ad a proper prototype for migration_init() in include/linux/fs.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a proper prototype for signals_init() in include/linux/signal.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- All implementations can be __devinit
- The function prototypes were in asm/timex.h but they all must be the same,
so create a single declaration in linux/timex.h.
- uninline the sparc64 version to match the other architectures
- Don't bother #defining ARCH_HAS_READ_CURRENT_TIMER to a particular value.
[ezk@cs.sunysb.edu: fix build]
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch contains the scheduled removal of OSS drivers whose config
options have been removed in 2.6.23.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a proper prototype for show_interrupts() in include/linux/interrupt.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allows a flag to be set on loop devices so that when they are
closed for the last time, they'll self-destruct.
In general, so that we can automatically allocate loop devices (as with
losetup -f) and have them disappear when we're done with them.
In particular, right now, so that we can stop relying on the hackish
special-case in umount(8) which kills off loop devices which were set up by
'mount -oloop'. That means we can stop putting crap in /etc/mtab which
doesn't belong there, which means it can be a symlink to /proc/mounts, which
means yet another writable file on the root filesystem is eliminated and the
'stateless' folks get happier... and OLPC trac #356 can be closed.
The mount(8) side of that is at
http://marc.info/?l=util-linux-ng&m=119362955431694&w=2
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Cc: Bernardo Innocenti <bernie@codewiz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The 32-bit version is more efficient (and apparently gives better hash
results than the 64-bit version), so users who are only hashing a 32-bit
quantity can now opt to use the 32-bit version explicitly, rather than
promoting to a long.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The source and destination addresses are included to allow channel
selection based on address alignment.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Pass a full set of flags to drivers' per-operation 'prep' routines.
Currently the only flag passed is DMA_PREP_INTERRUPT. The expectation is
that arch-specific async_tx_find_channel() implementations can exploit this
capability to find the best channel for an operation.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Reviewed-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
The tx_set_src and tx_set_dest methods were originally implemented to allow
an array of addresses to be passed down from async_xor to the dmaengine
driver while minimizing stack overhead. Removing these methods allows
drivers to have all transaction parameters available at 'prep' time, saves
two function pointers in struct dma_async_tx_descriptor, and reduces the
number of indirect branches..
A consequence of moving this data to the 'prep' routine is that
multi-source routines like async_xor need temporary storage to convert an
array of linear addresses into an array of dma addresses. In order to keep
the same stack footprint of the previous implementation the input array is
reused as storage for the dma addresses. This requires that
sizeof(dma_addr_t) be less than or equal to sizeof(void *). As a
consequence CONFIG_DMADEVICES now depends on !CONFIG_HIGHMEM64G. It also
requires that drivers be able to make descriptor resources available when
the 'prep' routine is polled.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Remove the unused ASYNC_TX_ASSUME_COHERENT flag. Async_tx is
meant to hide the difference between asynchronous hardware and synchronous
software operations, this flag requires clients to understand cache
coherency consequences of the async path.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
qc->n_iter was used for libata's own sg walking before sg chaining
replaced it. During conversion, the field and its usage in sata_fsl
were left behind. Kill the filed and update sata_fsl.
tj: This was part of James's libata-use-block-layer-padding patch.
Separated out by me.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Li Yang <leoli@freescale.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Marvell's Orion SoC includes SATA controllers based on Marvell's
PCI-to-SATA 88SX controllers. This patch extends the libATA sata_mv
driver to support those controllers.
[edited to use linux/ata_platform.h -jg]
Signed-off-by: Saeed Bishara <saeed@marvell.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Iterating through a device node's parents is simple enough, but dealing
with the refcounts properly is a little ugly, and replicating that logic
is asking for someone to get it wrong or forget it all together, eg:
while (dn != NULL) {
/* loop body */
tmp = of_get_parent(dn);
of_node_put(dn);
dn = tmp;
}
So add of_get_next_parent(), inspired by of_get_next_child(). The
contract is that it returns the parent and drops the reference on the
current node, this makes the loop look like:
while (dn != NULL) {
/* loop body */
dn = of_get_next_parent(dn);
}
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
egrep serial /proc/acpi/battery/BAT0/info
serial number: 32090
serial number can tell you from the imminent danger
of beeing set on fire.
Signed-off-by: maximilian attems <max@stro.at>
Acked-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
ide-cris.c:
* Add cris_setup_ports() helper and use it instead of ide_setup_ports()
(fixes random value being set in ->io_ports[IDE_IRQ_OFFSET]).
buddha.c:
* Add buddha_setup_ports() helper and use it instead of ide_setup_ports().
falconide.c:
* Add falconide_setup_ports() helper and use it instead of ide_setup_ports(),
also fix return value of falconide_init() while at it.
gayle.c:
* Add gayle_setup_ports() helper and use it instead of ide_setup_ports().
macide.c:
* Add macide_setup_ports() helper and use it instead of ide_setup_ports()
(fixes incorrect value being set in ->io_ports[IDE_IRQ_OFFSET]).
q40ide.c:
* Fix q40_ide_setup_ports() comments.
ide.c:
* Remove no longer needed ide_setup_ports().
Cc: Mikael Starvik <starvik@axis.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
This is Palmchip BK3710 IDE controller support.
The IDE controller logic supports PIO, MultiWord-DMA and Ultra-DMA modes.
Supports interface to Compact Flash (CF) configured in True-IDE mode.
Bart:
- remove dead code
- fix ide_hwif_setup_dma() build problem
Signed-off-by: Anton Salnikov <asalnikov@ru.mvista.com>
Reviewed-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Reviewed-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Required by next patch to use it from the flow classifier.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following is an implementation of the Windows Management
Instrumentation (WMI) ACPI interface mapper (PNP0C14).
What it does:
Parses the _WDG method and exports functions to process WMI method calls,
data block query/ set commands (both based on GUID) and does basic event
handling.
How: WMI presents an in kernel interface here (essentially, a minimal
wrapper around ACPI)
(const char *guid assume the 36 character ASCII representation of
a GUID - e.g. 67C3371D-95A3-4C37-BB61-DD47B491DAAB)
wmi_evaluate_method(const char *guid, u8 instance, u32 method_id,
const struct acpi_buffer *in, struct acpi_buffer *out)
wmi_query_block(const char *guid, u8 instance,
struct acpi_buffer *out)
wmi_set_block(const char *guid, u38 instance,
const struct acpi_buffer *in)
wmi_install_notify_handler(acpi_notify_handler handler);
wmi_remove_notify_handler(void);
wmi_get_event_data(u32 event, struct acpi_buffer *out)
wmi_has_guid(const char guid*)
wmi_has_guid() is a helper function to find if a GUID exists or not on the
system (a quick and easy way for WMI dependant drivers to see if the
the method/ block they want exists, since GUIDs are supposed to be unique).
Event handling - allow a WMI based driver to register a notifier handler
for each GUID with WMI. When a notification is sent to a GUID in WMI, the
handler registered with WMI is then called (it is left to the caller to
ask for the WMI event data associated with the GUID, if needed).
What it won't do:
Unicode - The MS article[1] calls for converting between ASCII and Unicode (or
vice versa) if a GUID is marked as "string". This is left up to the calling
driver.
Handle a MOF[1] - the WMI mapper just exports methods, data and events to
userspace. MOF handling is down to userspace.
Userspace interface - this will be added later.
[1] http://www.microsoft.com/whdc/system/pnppwr/wmi/wmi-acpi.mspx
===
ChangeLog
==
v1 (2007-10-02):
* Initial release
v2 (2007-10-05):
* Cleaned up code - split up super "wmi_evaluate_block" -> each external
symbol now handles its own ACPI calls, rather than handing off to
a "super" method (and in turn, is a lot simpler to read)
* Added a find_guid() symbol - return true if a given GUID exists on
the system
* wmi_* functions now return type acpi_status (since they are just
fancy wrappers around acpi_evaluate_object())
* Removed extra debug code
v3 (2007-10-27)
* More code clean up - now passes checkpatch.pl
* Change data block calls - ref MS spec, method ID is not required for
them, so drop it from the function parameters.
* Const'ify guid in the function call parameters.
* Fix _WDG buffer handling - copy the data to our own private structure.
* Change WMI from tristate to bool - otherwise the external functions are
not exported in linux/acpi.h if you try to build WMI as a module.
* Fix more flag comparisons.
* Add a maintainers entry - since I wrote this, I should take the blame
for it.
v4 (2007-10-30)
* Add missing brace from after fixing checkpatch errors.
* Rewrote event handling - allow external drivers to register with WMI to
handle WMI events
* Clean up flags and sanitise flag handling
v5 (2007-11-03)
* Add sysfs interface for userspace. Export events over netlink again.
* Remove module left overs, fully convert to built-in driver.
* Tweak in-kernel API to use u8 for instance, since this is what the GUID
blocks use (so instance cannot be greater than u8).
* Export wmi_get_event_data() for in kernel WMI drivers.
v6 (2007-11-07)
* Split out userspace into a different patch
v7 (2007-11-20)
* Fix driver to handle multiple PNP0C14 devices - store all GUIDs using
the kernel's built in list functions, and just keep adding to the list
every time we handle a PNP0C14 devices - GUIDs will always be unique,
and WMI callers do not know or care about different devices.
* Change WMI event handler registration to use its' own event handling
struct; we should not pass an acpi_handle down to any WMI based drivers
- they should be able to function with only the calls provided in WMI.
* Update my e-mail address
v8 (2007-11-28)
* Convert back to a module.
* Update Kconfig to default to building as a module.
* Remove an erroneous printk.
* Simply comments for string flag (since we now leave the handling to the
caller).
v9 (2007-12-07)
* Add back missing MODULE_DEVICE_TABLE for autoloading
* Checkpatch fixes
v10 (2007-12-12)
* Workaround broken GUIDs declared expensive without a WCxx method.
* Minor cleanups
v11 (2007-12-17)
* More fixing for broken GUIDs declared expensive without a WCxx method.
* Add basic EmbeddedControl region handling.
v12 (2007-12-18)
* Changed EC region handling code, as per Alexey's comments.
v13 (2007-12-27)
* Changed event handling so that we can have one event handler registered
per GUID, as per Matthew Garrett's suggestion.
v14 (2008-01-12)
* Remove ACPI debug statements
v15 (2008-02-01)
* Replace two remaining 'x == NULL' type tests with '!x'
v16 (2008-02-05)
* Change MAINTAINERS entry, as I am not, and never have been, paid to work
on WMI
* Remove 'default' line from Kconfig
Signed-off-by: Carlos Corbacho <carlos@strangeworlds.co.uk>
CC: Matthew Garrett <mjg59@srcf.ucam.org>
CC: Alexey Starikovskiy <aystarik@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
m68k allmodconfig gives
drivers/net/wireless/b43/main.c:251: error: implicit declaration of function 'mmiowb'
because CONFIG_B43=m, CONFIG_SSB_PCIHOST=n.
Might be Kconfig bustage, but this works...
Cc: Michael Buesch <mb@bu3sch.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
agp: remove flush_agp_mappings calls from new flush handling code
intel-agp: introduce IS_I915 and do some cleanups..
[intel_agp] fix name for G35 chipset
intel-agp: fixup resource handling in flush code.
intel-agp: add new chipset ID
agp: remove unnecessary pci_dev_put
agp: remove uid comparison as security check
fix AGP warning
agp/intel: Add chipset flushing support for i8xx chipsets.
intel-agp: add chipset flushing support
agp: add chipset flushing support to AGP interface
Add some new card definitions and fix a typo (from Eugen Paiuc).
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make it possible to unregister a led classdev object in a safe way during a
suspend/resume cycle.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Michael Buesch <mb@bu3sch.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make it possible to unregister a Hardware Random Number Generator
device object in a safe way during a suspend/resume cycle.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Michael Buesch <mb@bu3sch.de>
Cc: Michael Buesch <mb@bu3sch.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make it possible to unregister a misc device object in a safe way during a
suspend/resume cycle.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Michael Buesch <mb@bu3sch.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace latency.c use with pm_qos_params use.
Signed-off-by: mark gross <mgross@linux.intel.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following patch is a generalization of the latency.c implementation done
by Arjan last year. It provides infrastructure for more than one parameter,
and exposes a user mode interface for processes to register pm_qos
expectations of processes.
This interface provides a kernel and user mode interface for registering
performance expectations by drivers, subsystems and user space applications on
one of the parameters.
Currently we have {cpu_dma_latency, network_latency, network_throughput} as
the initial set of pm_qos parameters.
The infrastructure exposes multiple misc device nodes one per implemented
parameter. The set of parameters implement is defined by pm_qos_power_init()
and pm_qos_params.h. This is done because having the available parameters
being runtime configurable or changeable from a driver was seen as too easy to
abuse.
For each parameter a list of performance requirements is maintained along with
an aggregated target value. The aggregated target value is updated with
changes to the requirement list or elements of the list. Typically the
aggregated target value is simply the max or min of the requirement values
held in the parameter list elements.
>From kernel mode the use of this interface is simple:
pm_qos_add_requirement(param_id, name, target_value):
Will insert a named element in the list for that identified PM_QOS
parameter with the target value. Upon change to this list the new target is
recomputed and any registered notifiers are called only if the target value
is now different.
pm_qos_update_requirement(param_id, name, new_target_value):
Will search the list identified by the param_id for the named list element
and then update its target value, calling the notification tree if the
aggregated target is changed. with that name is already registered.
pm_qos_remove_requirement(param_id, name):
Will search the identified list for the named element and remove it, after
removal it will update the aggregate target and call the notification tree
if the target was changed as a result of removing the named requirement.
>From user mode:
Only processes can register a pm_qos requirement. To provide for
automatic cleanup for process the interface requires the process to register
its parameter requirements in the following way:
To register the default pm_qos target for the specific parameter, the
process must open one of /dev/[cpu_dma_latency, network_latency,
network_throughput]
As long as the device node is held open that process has a registered
requirement on the parameter. The name of the requirement is
"process_<PID>" derived from the current->pid from within the open system
call.
To change the requested target value the process needs to write a s32
value to the open device node. This translates to a
pm_qos_update_requirement call.
To remove the user mode request for a target value simply close the device
node.
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix build again]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: mark gross <mgross@linux.intel.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Adam Belay <abelay@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel_shutdown_prepare() can now become static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Smack is the Simplified Mandatory Access Control Kernel.
Smack implements mandatory access control (MAC) using labels
attached to tasks and data containers, including files, SVIPC,
and other tasks. Smack is a kernel based scheme that requires
an absolute minimum of application support and a very small
amount of configuration data.
Smack uses extended attributes and
provides a set of general mount options, borrowing technics used
elsewhere. Smack uses netlabel for CIPSO labeling. Smack provides
a pseudo-filesystem smackfs that is used for manipulation of
system Smack attributes.
The patch, patches for ls and sshd, a README, a startup script,
and x86 binaries for ls and sshd are also available on
http://www.schaufler-ca.com
Development has been done using Fedora Core 7 in a virtual machine
environment and on an old Sony laptop.
Smack provides mandatory access controls based on the label attached
to a task and the label attached to the object it is attempting to
access. Smack labels are deliberately short (1-23 characters) text
strings. Single character labels using special characters are reserved
for system use. The only operation applied to Smack labels is equality
comparison. No wildcards or expressions, regular or otherwise, are
used. Smack labels are composed of printable characters and may not
include "/".
A file always gets the Smack label of the task that created it.
Smack defines and uses these labels:
"*" - pronounced "star"
"_" - pronounced "floor"
"^" - pronounced "hat"
"?" - pronounced "huh"
The access rules enforced by Smack are, in order:
1. Any access requested by a task labeled "*" is denied.
2. A read or execute access requested by a task labeled "^"
is permitted.
3. A read or execute access requested on an object labeled "_"
is permitted.
4. Any access requested on an object labeled "*" is permitted.
5. Any access requested by a task on an object with the same
label is permitted.
6. Any access requested that is explicitly defined in the loaded
rule set is permitted.
7. Any other access is denied.
Rules may be explicitly defined by writing subject,object,access
triples to /smack/load.
Smack rule sets can be easily defined that describe Bell&LaPadula
sensitivity, Biba integrity, and a variety of interesting
configurations. Smack rule sets can be modified on the fly to
accommodate changes in the operating environment or even the time
of day.
Some practical use cases:
Hierarchical levels. The less common of the two usual uses
for MLS systems is to define hierarchical levels, often
unclassified, confidential, secret, and so on. To set up smack
to support this, these rules could be defined:
C Unclass rx
S C rx
S Unclass rx
TS S rx
TS C rx
TS Unclass rx
A TS process can read S, C, and Unclass data, but cannot write it.
An S process can read C and Unclass. Note that specifying that
TS can read S and S can read C does not imply TS can read C, it
has to be explicitly stated.
Non-hierarchical categories. This is the more common of the
usual uses for an MLS system. Since the default rule is that a
subject cannot access an object with a different label no
access rules are required to implement compartmentalization.
A case that the Bell & LaPadula policy does not allow is demonstrated
with this Smack access rule:
A case that Bell&LaPadula does not allow that Smack does:
ESPN ABC r
ABC ESPN r
On my portable video device I have two applications, one that
shows ABC programming and the other ESPN programming. ESPN wants
to show me sport stories that show up as news, and ABC will
only provide minimal information about a sports story if ESPN
is covering it. Each side can look at the other's info, neither
can change the other. Neither can see what FOX is up to, which
is just as well all things considered.
Another case that I especially like:
SatData Guard w
Guard Publish w
A program running with the Guard label opens a UDP socket and
accepts messages sent by a program running with a SatData label.
The Guard program inspects the message to ensure it is wholesome
and if it is sends it to a program running with the Publish label.
This program then puts the information passed in an appropriate
place. Note that the Guard program cannot write to a Publish
file system object because file system semanitic require read as
well as write.
The four cases (categories, levels, mutual read, guardbox) here
are all quite real, and problems I've been asked to solve over
the years. The first two are easy to do with traditonal MLS systems
while the last two you can't without invoking privilege, at least
for a while.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Cc: Joshua Brindle <method@manicmethod.com>
Cc: Paul Moore <paul.moore@hp.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Cc: "Ahmed S. Darwish" <darwish.07@gmail.com>
Cc: Andrew G. Morgan <morgan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The capability bounding set is a set beyond which capabilities cannot grow.
Currently cap_bset is per-system. It can be manipulated through sysctl,
but only init can add capabilities. Root can remove capabilities. By
default it includes all caps except CAP_SETPCAP.
This patch makes the bounding set per-process when file capabilities are
enabled. It is inherited at fork from parent. Noone can add elements,
CAP_SETPCAP is required to remove them.
One example use of this is to start a safer container. For instance, until
device namespaces or per-container device whitelists are introduced, it is
best to take CAP_MKNOD away from a container.
The bounding set will not affect pP and pE immediately. It will only
affect pP' and pE' after subsequent exec()s. It also does not affect pI,
and exec() does not constrain pI'. So to really start a shell with no way
of regain CAP_MKNOD, you would do
prctl(PR_CAPBSET_DROP, CAP_MKNOD);
cap_t cap = cap_get_proc();
cap_value_t caparray[1];
caparray[0] = CAP_MKNOD;
cap_set_flag(cap, CAP_INHERITABLE, 1, caparray, CAP_DROP);
cap_set_proc(cap);
cap_free(cap);
The following test program will get and set the bounding
set (but not pI). For instance
./bset get
(lists capabilities in bset)
./bset drop cap_net_raw
(starts shell with new bset)
(use capset, setuid binary, or binary with
file capabilities to try to increase caps)
************************************************************
cap_bound.c
************************************************************
#include <sys/prctl.h>
#include <linux/capability.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#ifndef PR_CAPBSET_READ
#define PR_CAPBSET_READ 23
#endif
#ifndef PR_CAPBSET_DROP
#define PR_CAPBSET_DROP 24
#endif
int usage(char *me)
{
printf("Usage: %s get\n", me);
printf(" %s drop <capability>\n", me);
return 1;
}
#define numcaps 32
char *captable[numcaps] = {
"cap_chown",
"cap_dac_override",
"cap_dac_read_search",
"cap_fowner",
"cap_fsetid",
"cap_kill",
"cap_setgid",
"cap_setuid",
"cap_setpcap",
"cap_linux_immutable",
"cap_net_bind_service",
"cap_net_broadcast",
"cap_net_admin",
"cap_net_raw",
"cap_ipc_lock",
"cap_ipc_owner",
"cap_sys_module",
"cap_sys_rawio",
"cap_sys_chroot",
"cap_sys_ptrace",
"cap_sys_pacct",
"cap_sys_admin",
"cap_sys_boot",
"cap_sys_nice",
"cap_sys_resource",
"cap_sys_time",
"cap_sys_tty_config",
"cap_mknod",
"cap_lease",
"cap_audit_write",
"cap_audit_control",
"cap_setfcap"
};
int getbcap(void)
{
int comma=0;
unsigned long i;
int ret;
printf("i know of %d capabilities\n", numcaps);
printf("capability bounding set:");
for (i=0; i<numcaps; i++) {
ret = prctl(PR_CAPBSET_READ, i);
if (ret < 0)
perror("prctl");
else if (ret==1)
printf("%s%s", (comma++) ? ", " : " ", captable[i]);
}
printf("\n");
return 0;
}
int capdrop(char *str)
{
unsigned long i;
int found=0;
for (i=0; i<numcaps; i++) {
if (strcmp(captable[i], str) == 0) {
found=1;
break;
}
}
if (!found)
return 1;
if (prctl(PR_CAPBSET_DROP, i)) {
perror("prctl");
return 1;
}
return 0;
}
int main(int argc, char *argv[])
{
if (argc<2)
return usage(argv[0]);
if (strcmp(argv[1], "get")==0)
return getbcap();
if (strcmp(argv[1], "drop")!=0 || argc<3)
return usage(argv[0]);
if (capdrop(argv[2])) {
printf("unknown capability\n");
return 1;
}
return execl("/bin/bash", "/bin/bash", NULL);
}
************************************************************
[serue@us.ibm.com: fix typo]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>a
Signed-off-by: "Serge E. Hallyn" <serue@us.ibm.com>
Tested-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KaiGai Kohei observed that this line in the linux header is not needed.
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
Cc: KaiGai Kohei <kaigai@kaigai.gr.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The patch supports legacy (32-bit) capability userspace, and where possible
translates 32-bit capabilities to/from userspace and the VFS to 64-bit
kernel space capabilities. If a capability set cannot be compressed into
32-bits for consumption by user space, the system call fails, with -ERANGE.
FWIW libcap-2.00 supports this change (and earlier capability formats)
http://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/
[akpm@linux-foundation.org: coding-syle fixes]
[akpm@linux-foundation.org: use get_task_comm()]
[ezk@cs.sunysb.edu: build fix]
[akpm@linux-foundation.org: do not initialise statics to 0 or NULL]
[akpm@linux-foundation.org: unused var]
[serue@us.ibm.com: export __cap_ symbols]
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert b68680e473 to make way for the next
patch: "Add 64-bit capability support to the kernel".
We want to keep the vfs_cap_data.data[] structure, using two 'data's for
64-bit caps (and later three for 96-bit caps), whereas
b68680e473 had gotten rid of the 'data' struct
made its members inline.
The 64-bit caps patch keeps the stack abuse fix at get_file_caps(), which was
the more important part of that patch.
[akpm@linux-foundation.org: coding-style fixes]
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch modifies the interface to inode_getsecurity to have the function
return a buffer containing the security blob and its length via parameters
instead of relying on the calling function to give it an appropriately sized
buffer.
Security blobs obtained with this function should be freed using the
release_secctx LSM hook. This alleviates the problem of the caller having to
guess a length and preallocate a buffer for this function allowing it to be
used elsewhere for Labeled NFS.
The patch also removed the unused err parameter. The conversion is similar to
the one performed by Al Viro for the security_getprocattr hook.
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After making dirty a 100M file, the normal behavior is to start the
writeback for all data after 30s delays. But sometimes the following
happens instead:
- after 30s: ~4M
- after 5s: ~4M
- after 5s: all remaining 92M
Some analyze shows that the internal io dispatch queues goes like this:
s_io s_more_io
-------------------------
1) 100M,1K 0
2) 1K 96M
3) 0 96M
1) initial state with a 100M file and a 1K file
2) 4M written, nr_to_write <= 0, so write more
3) 1K written, nr_to_write > 0, no more writes(BUG)
nr_to_write > 0 in (3) fools the upper layer to think that data have all
been written out. The big dirty file is actually still sitting in
s_more_io. We cannot simply splice s_more_io back to s_io as soon as s_io
becomes empty, and let the loop in generic_sync_sb_inodes() continue: this
may starve newly expired inodes in s_dirty. It is also not an option to
draw inodes from both s_more_io and s_dirty, an let the loop go on: this
might lead to live locks, and might also starve other superblocks in sync
time(well kupdate may still starve some superblocks, that's another bug).
We have to return when a full scan of s_io completes. So nr_to_write > 0
does not necessarily mean that "all data are written". This patch
introduces a flag writeback_control.more_io to indicate that more io should
be done. With it the big dirty file no longer has to wait for the next
kupdate invokation 5s later.
In sync_sb_inodes() we only set more_io on super_blocks we actually
visited. This avoids the interaction between two pdflush deamons.
Also in __sync_single_inode() we don't blindly keep requeuing the io if the
filesystem cannot progress. Failing to do so may lead to 100% iowait.
Tested-by: Mike Snitzer <snitzer@gmail.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Michael Rubin <mrubin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After running SetPageUptodate, preceeding stores to the page contents to
actually bring it uptodate may not be ordered with the store to set the
page uptodate.
Therefore, another CPU which checks PageUptodate is true, then reads the
page contents can get stale data.
Fix this by having an smp_wmb before SetPageUptodate, and smp_rmb after
PageUptodate.
Many places that test PageUptodate, do so with the page locked, and this
would be enough to ensure memory ordering in those places if
SetPageUptodate were only called while the page is locked. Unfortunately
that is not always the case for some filesystems, but it could be an idea
for the future.
Also bring the handling of anonymous page uptodateness in line with that of
file backed page management, by marking anon pages as uptodate when they
_are_ uptodate, rather than when our implementation requires that they be
marked as such. Doing allows us to get rid of the smp_wmb's in the page
copying functions, which were especially added for anonymous pages for an
analogous memory ordering problem. Both file and anonymous pages are
handled with the same barriers.
FAQ:
Q. Why not do this in flush_dcache_page?
A. Firstly, flush_dcache_page handles only one side (the smb side) of the
ordering protocol; we'd still need smp_rmb somewhere. Secondly, hiding away
memory barriers in a completely unrelated function is nasty; at least in the
PageUptodate macros, they are located together with (half) the operations
involved in the ordering. Thirdly, the smp_wmb is only required when first
bringing the page uptodate, wheras flush_dcache_page should be called each time
it is written to through the kernel mapping. It is logically the wrong place to
put it.
Q. Why does this increase my text size / reduce my performance / etc.
A. Because it is adding the necessary instructions to eliminate the data-race.
Q. Can it be improved?
A. Yes, eg. if you were to create a rule that all SetPageUptodate operations
run under the page lock, we could avoid the smp_rmb places where PageUptodate
is queried under the page lock. Requires audit of all filesystems and at least
some would need reworking. That's great you're interested, I'm eagerly awaiting
your patches.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add vm.highmem_is_dirtyable toggle
A 32 bit machine with HIGHMEM64 enabled running DCC has an MMAPed file of
approximately 2Gb size which contains a hash format that is written
randomly by the dbclean process. On 2.6.16 this process took a few
minutes. With lowmem only accounting of dirty ratios, this takes about 12
hours of 100% disk IO, all random writes.
Include a toggle in /proc/sys/vm/highmem_is_dirtyable which can be set to 1 to
add the highmem back to the total available memory count.
[akpm@linux-foundation.org: Fix the CONFIG_DETECT_SOFTLOCKUP=y build]
Signed-off-by: Bron Gondwana <brong@fastmail.fm>
Cc: Ethan Solomita <solo@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: WU Fengguang <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have repeatedly discussed if the cold pages still have a point. There is
one way to join the two lists: Use a single list and put the cold pages at the
end and the hot pages at the beginning. That way a single list can serve for
both types of allocations.
The discussion of the RFC for this and Mel's measurements indicate that
there may not be too much of a point left to having separate lists for
hot and cold pages (see http://marc.info/?t=119492914200001&r=1&w=2).
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Martin Bligh <mbligh@mbligh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Add comments explaing how drain_pages() works.
- Eliminate useless functions
- Rename drain_all_local_pages to drain_all_pages(). It does drain
all pages not only those of the local processor.
- Eliminate useless interrupt off / on sequences. drain_pages()
disables interrupts on its own. The execution thread is
pinned to processor by the caller. So there is no need to
disable interrupts.
- Put drain_all_pages() declaration in gfp.h and remove the
declarations from suspend.h and from mm/memory_hotplug.c
- Make software suspend call drain_all_pages(). The draining
of processor local pages is may not the right approach if
software suspend wants to support SMP. If they call drain_all_pages
then we can make drain_pages() static.
[akpm@linux-foundation.org: fix build]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Daniel Walker <dwalker@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This puts all the clear_refs code where it belongs and probably lets things
compile on MMU-less systems as well.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce a general page table walker
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move is_swap_pte helper function to swapops.h for use by pagemap code
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following replaces the earlier patches sent. It should address
David Rientjes's comments, and has been compile tested on all the
architectures that it touches, save for parisc.
For the /proc/<pid>/pagemap code[1], we need to able to query how
much virtual address space a particular task has. The trick is
that we do it through /proc and can't use TASK_SIZE since it
references "current" on some arches. The process opening the
/proc file might be a 32-bit process opening a 64-bit process's
pagemap file.
x86_64 already has a TASK_SIZE_OF() macro:
#define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, TIF_IA32)) ? IA32_PAGE_OFFSET : TASK_SIZE64)
I'd like to have that for other architectures. So, add it
for all the architectures that actually use "current" in
their TASK_SIZE. For the others, just add a quick #define
in sched.h to use plain old TASK_SIZE.
1. http://www.linuxworld.com/news/2007/042407-kernel.html
- MIPS portion from Ralf Baechle <ralf@linux-mips.org>
[akpm@linux-foundation.org: fix mips build]
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
move_to_swap_cache and move_from_swap_cache functions (which swizzle a page
between tmpfs page cache and swap cache, to avoid page copying) are only used
by shmem.c; and our subsequent fix for unionfs needs different treatments in
the two instances of move_from_swap_cache. Move them from swap_state.c into
their callsites shmem_writepage, shmem_unuse_inode and shmem_getpage, making
add_to_swap_cache externally visible.
shmem.c likes to say set_page_dirty where swap_state.c liked to say
SetPageDirty: respect that diversity, which __set_page_dirty_no_writeback
makes moot (and implies we should lose that "shift page from clean_pages to
dirty_pages list" comment: it's on neither).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Building in a filesystem on a loop device on a tmpfs file can hang when
swapping, the loop thread caught in that infamous throttle_vm_writeout.
In theory this is a long standing problem, which I've either never seen in
practice, or long ago suppressed the recollection, after discounting my load
and my tmpfs size as unrealistically high. But now, with the new aops, it has
become easy to hang on one machine.
Loop used to grab_cache_page before the old prepare_write to tmpfs, which
seems to have been enough to free up some memory for any swapin needed; but
the new write_begin lets tmpfs find or allocate the page (much nicer, since
grab_cache_page missed tmpfs pages in swapcache).
When allocating a fresh page, tmpfs respects loop's mapping_gfp_mask, which
has __GFP_IO|__GFP_FS stripped off, and throttle_vm_writeout is designed to
break out when __GFP_IO or GFP_FS is unset; but when tmfps swaps in,
read_swap_cache_async allocates with GFP_HIGHUSER_MOVABLE regardless of the
mapping_gfp_mask - hence the hang.
So, pass gfp_mask down the line from shmem_getpage to shmem_swapin to
swapin_readahead to read_swap_cache_async to add_to_swap_cache.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
swapin_readahead has never sat well in mm/memory.c: move it to mm/swap_state.c
beside its kindred read_swap_cache_async. Why were its args in a different
order? rearrange them. And since it was always followed by a
read_swap_cache_async of the target page, fold that in and return struct
page*. Then CONFIG_SWAP=n no longer needs valid_swaphandles and
read_swap_cache_async stubs.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Both slab defrag and the large blocksize patches need to ability to take
refcounts on compound pages. May be useful in other places as well.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Checking if an address is a vmalloc address is done in a couple of places.
Define a common version in mm.h and replace the other checks.
Again the include structures suck. The definition of VMALLOC_START and
VMALLOC_END is not available in vmalloc.h since highmem.c cannot be included
there.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make vmalloc functions work the same way as kfree() and friends that
take a const void * argument.
[akpm@linux-foundation.org: fix consts, coding-style]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We already have page table manipulation for vmalloc in vmalloc.c. Move the
vmalloc_to_page() function there as well.
Move the definitions for vmalloc related functions in mm.h to a newly created
section. A better place would be vmalloc.h but mm.h is basic and may depend
on these functions. An alternative would be to include vmalloc.h in mm.h
(like done for vmstat.h).
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simplify page cache zeroing of segments of pages through 3 functions
zero_user_segments(page, start1, end1, start2, end2)
Zeros two segments of the page. It takes the position where to
start and end the zeroing which avoids length calculations and
makes code clearer.
zero_user_segment(page, start, end)
Same for a single segment.
zero_user(page, start, length)
Length variant for the case where we know the length.
We remove the zero_user_page macro. Issues:
1. Its a macro. Inline functions are preferable.
2. The KM_USER0 macro is only defined for HIGHMEM.
Having to treat this special case everywhere makes the
code needlessly complex. The parameter for zeroing is always
KM_USER0 except in one single case that we open code.
Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.
Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be be inline because that constant is not used when these
functions are called.
Also extract the flushing of the caches to be outside of the kmap.
[akpm@linux-foundation.org: fix nfs and ntfs build]
[akpm@linux-foundation.org: fix ntfs build some more]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds a new-style I2C driver with basic support for the sixteen bit
PCA9539 GPIO expanders. These chips have multiple registers, push-pull output
drivers, and (not supported in this patch) pin change interrupts.
Board-specific code must provide "pca9539_platform_data" with each chip's
"i2c_board_info". That provides the GPIO numbers to be used by that chip, and
callbacks for board-specific setup/teardown logic.
Derived from drivers/i2c/chips/pca9539.c (which has no current known users).
This is faster and simpler; it uses 16-bit register access, and cache the
OUTPUT and DIRECTION registers for fast access
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Jean Delvare <khali@linux-fr.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Philipp Zabel <philipp.zabel@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ben Gardner <bgardner@wabtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Basic driver for 8-bit SPI based MCP23S08 GPIO expander, without support for
IRQs or the shared chipselect mechanism.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Eric Miao <eric.miao@marvell.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Philipp Zabel <philipp.zabel@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ben Gardner <bgardner@wabtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a new-style I2C driver for most common 8 and 16 bit I2C based
"quasi-bidirectional" GPIO expanders: pcf8574 or pcf8575, and several
compatible models (mostly faster, supporting I2C at up to 1 MHz).
The driver exposes the GPIO signals using the platform-neutral GPIO
programming interface, so they are easily accessed by other kernel code. The
lack of such a flexible kernel API has been a big factor in the proliferation
of board-specific drivers for these chips... stuff that rarely makes it
upstream since it's so ugly. This driver will let such boards use standard
calls.
Since it's a new-style driver, these devices must be configured as part of
board-specific init. That eliminates the need for error-prone manual
configuration of module parameters, and makes compatibility with legacy
drivers (pcf8574.c, pc8575.c) for these chips easier (there's a clear
either/or disjunction).
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Jean Delvare <khali@linux-fr.org>
Cc: Eric Miao <eric.miao@marvell.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Philipp Zabel <philipp.zabel@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ben Gardner <bgardner@wabtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds PCI's accessor for segment_boundary_mask in device_dma_parameters.
The default segment_boundary is set to 0xffffffff, same to the block layer's
default value (and the scsi mid layer uses the same value).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds new accessors for segment_boundary_mask in device_dma_parameters
structure in the same way I did for max_segment_size. So we can easily change
where to place struct device_dma_parameters in the future.
dma_get_segment boundary returns 0xffffffff if dma_parms in struct device
isn't set up properly. 0xffffffff is the default value used in the block
layer and the scsi mid layer.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds IOMMU helper functions for the free area management. These
functions take care of LLD's segment boundary limit for IOMMUs. They would be
useful for IOMMUs that use bitmap for the free area management.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds struct device_dma_parameters in struct pci_dev and properly
sets up a pointer in struct device.
The default max_segment_size is set to 64K, same to the block layer's
default value.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Mostly-acked-by: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
IOMMUs merges scatter/gather segments without considering a low level
driver's restrictions. The problem is that IOMMUs can't access to the
limitations because they are in request_queue.
This patchset introduces a new structure, device_dma_parameters,
including dma information. A pointer to device_dma_parameters is added
to struct device. The bus specific structures (like pci_dev) includes
device_dma_parameters. Low level drivers can use dma_set_max_seg_size
to tell IOMMUs about the restrictions.
We can move more dma stuff in struct device (like dma_mask) to struct
device_dma_parameters later (needs some cleanups before that).
This includes patches for all the IOMMUs that could merge sg (x86_64,
ppc, IA64, alpha, sparc64, and parisc) though only the ppc patch was
tested. The patches for other IOMMUs are only compile tested.
This patch:
Add a new structure, device_dma_parameters, including dma information. A
pointer to device_dma_parameters is added to struct device.
- there are only max_segment_size and segment_boundary_mask there but we'll
move more dma stuff in struct device (like dma_mask) to struct
device_dma_parameters later. segment_boundary_mask is not supported yet.
- new accessors for the dma parameters are added. So we can easily change
where to place struct device_dma_parameters in the future.
- dma_get_max_seg_size returns 64K if dma_parms in struct device isn't set
up properly. 64K is the default max_segment_size in the block layer.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allow the private_data field to be specified in platform_data for the
standard 8250/16550 UART. This field is used by DW APB type UARTs and
without this patch it's only possible to set this field when registering
the port by hand. If private_data is not set then the driver will
potentially oops with a NULL pointer dereference.
Signed-off-by: Will Newton <will.newton@gmail.com>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FATAL: drivers/bluetooth/btsdio: sizeof(struct sdio_device_id)=12 is not a modulo of the size of section __mod_sdio_device_table=30.
Fix definition of struct sdio_device_id in mod_devicetable.h
m68k has 16bit alignment for unsigned long.
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Pierre Ossman <drzeus@drzeus.cx>
CC: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the new timerfd API as it is implemented by the following patch:
int timerfd_create(int clockid, int flags);
int timerfd_settime(int ufd, int flags,
const struct itimerspec *utmr,
struct itimerspec *otmr);
int timerfd_gettime(int ufd, struct itimerspec *otmr);
The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.
The timerfd_settime() API give new settings by the timerfd fd, by optionally
retrieving the previous expiration time (in case the "otmr" parameter is not
NULL).
The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
is set in the "flags" parameter. Otherwise it's a relative time.
The timerfd_gettime() API returns the next expiration time of the timer, or
{0, 0} if the timerfd has not been set yet.
Like the previous timerfd API implementation, read(2) and poll(2) are
supported (with the same interface). Here's a simple test program I used to
exercise the new timerfd APIs:
http://www.xmailserver.org/timerfd-test2.c
[akpm@linux-foundation.org: coding-style cleanups]
[akpm@linux-foundation.org: fix ia64 build]
[akpm@linux-foundation.org: fix m68k build]
[akpm@linux-foundation.org: fix mips build]
[akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
[heiko.carstens@de.ibm.com: fix s390]
[akpm@linux-foundation.org: fix powerpc build]
[akpm@linux-foundation.org: fix sparc64 more]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I think that advancing the timer against the timer's current "now" can be a
pretty common usage, so, w/out exposing hrtimer's internals, we add a new
hrtimer_forward_now() function.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As Roland pointed out, we have the very old problem with exec. de_thread()
sets SIGNAL_GROUP_EXIT, kills other threads, changes ->group_leader and then
clears signal->flags. All signals (even fatal ones) sent in this window
(which is not too small) will be lost.
With this patch exec doesn't abuse SIGNAL_GROUP_EXIT. signal_group_exit(),
the new helper, should be used to detect exit_group() or exec() in progress.
It can have more users, but this patch does only strictly necessary changes.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Robin Holt <holt@sgi.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It was dumb to make get_task_comm() return void. Change it to return a
pointer to the resulting output for caller convenience.
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On Sat, 2008-01-05 at 13:35 -0800, Davide Libenzi wrote:
> I remember I talked with Arjan about this time ago. Basically, since 1)
> you can drop an epoll fd inside another epoll fd 2) callback-based wakeups
> are used, you can see a wake_up() from inside another wake_up(), but they
> will never refer to the same lock instance.
> Think about:
>
> dfd = socket(...);
> efd1 = epoll_create();
> efd2 = epoll_create();
> epoll_ctl(efd1, EPOLL_CTL_ADD, dfd, ...);
> epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
>
> When a packet arrives to the device underneath "dfd", the net code will
> issue a wake_up() on its poll wake list. Epoll (efd1) has installed a
> callback wakeup entry on that queue, and the wake_up() performed by the
> "dfd" net code will end up in ep_poll_callback(). At this point epoll
> (efd1) notices that it may have some event ready, so it needs to wake up
> the waiters on its poll wait list (efd2). So it calls ep_poll_safewake()
> that ends up in another wake_up(), after having checked about the
> recursion constraints. That are, no more than EP_MAX_POLLWAKE_NESTS, to
> avoid stack blasting. Never hit the same queue, to avoid loops like:
>
> epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
> epoll_ctl(efd3, EPOLL_CTL_ADD, efd2, ...);
> epoll_ctl(efd4, EPOLL_CTL_ADD, efd3, ...);
> epoll_ctl(efd1, EPOLL_CTL_ADD, efd4, ...);
>
> The code "if (tncur->wq == wq || ..." prevents re-entering the same
> queue/lock.
Since the epoll code is very careful to not nest same instance locks
allow the recursion.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I was notified by Randy Stewart that lksctp claims to be
"the reference implementation". First of all, "the
refrence implementation" was the original implementation
of SCTP in usersapce written ty Randy and a few others.
Second, after looking at the definiton of 'reference implementation',
we don't really meet the requirements.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Provide a way to use tc filters on vlan tag even if tag is buried in
skb due to hardware acceleration.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
if_addrlabel.h is needed for iproute2 usage.
Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the following no longer used functions:
- rtattr_parse()
- rtattr_strlcpy()
- __rtattr_parse_nested_compat()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This bumps the AGP interface to 0.103.
Certain Intel chipsets contains a global write buffer, and this can require
flushing from the drm or X.org to make sure all data has hit RAM before
initiating a GPU transfer, due to a lack of coherency with the integrated
graphics device and this buffer.
This just adds generic support to the AGP interfaces, a follow-on patch
will add support to the Intel driver to use this interface.
Signed-off-by: Dave Airlie <airlied@redhat.com>
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (44 commits)
[ARM] 4822/1: RealView: Change the REALVIEW_MPCORE configuration option
[ARM] 4821/1: RealView: Remove the platform dependencies from localtimer.c
[ARM] 4820/1: RealView: Select the timer IRQ at run-time
[ARM] 4819/1: RealView: Fix entry-macro.S to work with multiple platforms
[ARM] 4818/1: RealView: Add core-tile detection
[ARM] 4817/1: RealView: Move the AMBA resource definitions to realview_eb.c
[ARM] 4816/1: RealView: Move the platform-specific definitions into board-eb.h
[ARM] 4815/1: RealView: Add clockevents suport for the local timers
[ARM] 4814/1: RealView: Add broadcasting clockevents support for ARM11MPCore
[ARM] 4813/1: Add SMP helper functions for clockevents support
[ARM] 4812/1: RealView: clockevents support for the RealView platforms
[ARM] 4811/1: RealView: clocksource support for the RealView platforms
[ARM] 4736/1: Export atags to userspace and allow kexec to use customised atags
[ARM] 4798/1: pcm027: fix missing header file
[ARM] 4803/1: pxa: fix building issue of poodle.c caused by patch 4737/1
[ARM] 4801/1: pxa: fix building issues of missing pxa2xx-regs.h
[ARM] pxa: introduce sysdev for pxa3xx static memory controller
[ARM] pxa: add preliminary suspend/resume code for pxa3xx
[ARM] pxa: introduce sysdev for GPIO register saving/restoring
[ARM] pxa: introduce sysdev for IRQ register saving/restoring
...
* 'slub-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm:
Explain kmem_cache_cpu fields
SLUB: Do not upset lockdep
SLUB: Fix coding style violations
Add parameter to add_partial to avoid having two functions
SLUB: rename defrag to remote_node_defrag_ratio
Move count_partial before kmem_cache_shrink
SLUB: Fix sysfs refcounting
slub: fix shadowed variable sparse warnings
Add some comments explaining the fields of the kmem_cache_cpu structure.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The NUMA defrag works by allocating objects from partial slabs on remote
nodes. Rename it to
remote_node_defrag_ratio
to be clear about this.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (25 commits)
virtio: balloon driver
virtio: Use PCI revision field to indicate virtio PCI ABI version
virtio: PCI device
virtio_blk: implement naming for vda-vdz,vdaa-vdzz,vdaaa-vdzzz
virtio_blk: Dont waste major numbers
virtio_blk: provide getgeo
virtio_net: parametrize the napi_weight for virtio receive queue.
virtio: free transmit skbs when notified, not on next xmit.
virtio: flush buffers on open
virtnet: remove double ether_setup
virtio: Allow virtio to be modular and used by modules
virtio: Use the sg_phys convenience function.
virtio: Put the virtio under the virtualization menu
virtio: handle interrupts after callbacks turned off
virtio: reset function
virtio: populate network rings in the probe routine, not open
virtio: Tweak virtio_net defines
virtio: Net header needs hdr_len
virtio: remove unused id field from struct virtio_blk_outhdr
virtio: clarify NO_NOTIFY flag usage
...
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild:
scsi: fix dependency bug in aic7 Makefile
kbuild: add svn revision information to setlocalversion
kbuild: do not warn about __*init/__*exit symbols being exported
Move Kconfig.instrumentation to arch/Kconfig and init/Kconfig
Add HAVE_KPROBES
Add HAVE_OPROFILE
Create arch/Kconfig
Fix ARM to play nicely with generic Instrumentation menu
kconfig: ignore select of unknown symbol
kconfig: mark config as changed when loading an alternate config
kbuild: Spelling/grammar fixes for config DEBUG_SECTION_MISMATCH
Remove __INIT_REFOK and __INITDATA_REFOK
kbuild: print only total number of section mismatces found
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (77 commits)
[IPV6]: Reorg struct ifmcaddr6 to save some bytes
[INET_TIMEWAIT_SOCK]: Reorganize struct inet_timewait_sock to save some bytes
[DCCP]: Reorganize struct dccp_sock to save 8 bytes
[INET6]: Reorganize struct inet6_dev to save 8 bytes
[SOCK] proto: Add hashinfo member to struct proto
EMAC driver: Fix bug: The clock divisor is set to all ones at reset.
EMAC driver: fix bug - invalidate data cache of new_skb->data range when cache is WB
EMAC driver: add power down mode
EMAC driver: ADSP-BF52x arch/mach support
EMAC driver: use simpler comment headers and strip out information that is maintained in the scm's log
EMAC driver: bf537 MAC multicast hash filtering patch
EMAC driver: define MDC_CLK=2.5MHz and caculate mdc_div according to SCLK.
EMAC driver: shorten the mdelay value to solve netperf performance issue
[netdrvr] sis190: build fix
sky2: fix Wake On Lan interaction with BIOS
sky2: restore multicast addresses after recovery
pci-skeleton: Misc fixes to build neatly
phylib: Add Realtek 821x eth PHY support
natsemi: Update locking documentation
PHYLIB: Locking fixes for PHY I/O potentially sleeping
...
Currently early kernel messages, i.e., those from uncompression, go to the
debugging UART. And if it is enabled in the platform configuration, but
not initialized by the bootloader, the machine hangs, waiting for UART
status change. Besides, having those messages on another UART - typically
the console UART - may be preferrable. This patch allows selecting the
UART in kernel configuration.
Signed-off-by: Guennadi Liakhovetski <lg@denx.de>
Acked-by: Andrew Victor <linux@maxim.org.za>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
After discussions with Anthony Liguori, it seems that the virtio
balloon can be made even simpler. Here's my attempt.
The device configuration tells the driver how much memory it should
take from the guest (ie. balloon size). The guest feeds the page
numbers it has taken via one virtqueue.
A second virtqueue feeds the page numbers the driver wants back: if
the device has the VIRTIO_BALLOON_F_MUST_TELL_HOST bit, then this
queue is compulsory, otherwise it's advisory (and the guest can simply
fault the pages back in).
This driver can be enhanced later to deflate the balloon via a
shrinker, oom callback or we could even go for a complete set of
in-guest regulators.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As Avi pointed out, as we continue to massage the virtio PCI ABI, we can make
things a little more friendly to users by utilizing the PCI revision field to
indicate which version of the ABI we're using. This is a hard ABI version
and incrementing it will cause the guest driver to break.
This is the necessary changes to virtio_pci to support this.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a PCI device that implements a transport for virtio. It allows virtio
devices to be used by QEMU based VMMs like KVM or Xen.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
A reset function solves three problems:
1) It allows us to renegotiate features, eg. if we want to upgrade a
guest driver without rebooting the guest.
2) It gives us a clean way of shutting down virtqueues: after a reset,
we know that the buffers won't be used by the host, and
3) It helps the guest recover from messed-up drivers.
So we remove the ->shutdown hook, and the only way we now remove
feature bits is via reset.
We leave it to the driver to do the reset before it deletes queues:
the balloon driver, for example, needs to chat to the host in its
remove function.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1) Turn GSO on virtio net into an all-or-nothing (keep checksumming
separate). Having multiple bits is a pain: if you can't support something
you should handle it in software, which is still a performance win.
2) Make VIRTIO_NET_HDR_GSO_ECN a flag in the header, so it can apply to
IPv6 or v4.
3) Rename VIRTIO_NET_F_NO_CSUM to VIRTIO_NET_F_CSUM (ie. means we do
checksumming).
4) Add csum and gso params to virtio_net to allow more testing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's far easier to deal with packets if we don't have to parse the
packet to figure out the header length to know how much to pull into
the skb data. Add the field to the virtio_net_hdr struct (and fix the
spaces that somehow crept in there).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This field has been unused since an older version of virtio. Remove
it now before we freeze the ABI.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au.
The other side (host) can set the NO_NOTIFY flag as an optimization,
to say "no need to kick me when you add things". Make it clear that
this is advisory only; especially that we should always notify when
the ring is full.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Using unsigned int resulted in silent truncation of the upper 32-bit
on x86_64 resulting in an OOPS since the ring was being initialized
wrong.
Please reconsider my previous patch to just use PAGE_ALIGN(). Open
coding this sort of stuff, no matter how simple it seems, is just
asking for this sort of trouble.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Various drivers want to know when their configuration information
changes: the balloon driver is the immediate user, but the network
driver may one day have a "carrier" status as well.
This introduces that callback (lguest doesn't use it yet).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It seems that virtio_net wants to disable callbacks (interrupts) before
calling netif_rx_schedule(), so we can't use the return value to do so.
Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback
now returns void, rather than a boolean.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Previously we used a type/len pair within the config space, but this
seems overkill. We now simply define a structure which represents the
layout in the config space: the config space can now only be extended
at the end.
The main driver-visible changes:
1) We indicate what fields are present with an explicit feature bit.
2) Virtqueues are explicitly numbered, and not in the config space.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use it in virtio_net (replacing buggy version there), it's also going
to be used by TAP for partial csum support.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
fcntl(F_GETLK,..) can return pid of process for not current pid namespace
(if process is belonged to the several namespaces). It is true also for
pids in /proc/locks. So correct behavior is saving pointer to the struct
pid of the process lock owner.
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Use existing dmi_get_system_info(),
Delete duplicate dmi_get_slot()
Spotted-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Len Brown <len.brown@intel.com>
Function do_timer_interrupt_hook() don't take argument regs,
and structure hrtimer_sleeper don't have member cb_pending.
So delete comments refering to these symbols.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
There is an unmatched parenthesis in the locking commentary of radix_tree.h
which is trivially fixed by the patch below.
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Having the macro to prevent multiple inclusion of
include/linux/dma-mapping.h contain the prefix "_ASM" is just begging
for possible confusion some day.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
PHY read/write functions can potentially sleep (e.g., a PHY accessed
via I2C). The following changes were made to account for this:
* Change spin locks to mutex locks
* Add a BUG_ON() to phy_read() phy_write() to warn against
calling them from an interrupt context.
* Use work queue for PHY state machine handling since
it can potentially sleep
* Change phydev lock from spinlock to mutex
Signed-off-by: Nate Case <ncase@xes-inc.com>
Acked-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 312b1485fb made __INIT_REFOK expand
into .section .section ".ref.text", "ax". Since the assembler doesn't
tolerate stuttering in the source that broke all MIPS builds.
Since with this change Sam downgraded __INIT_REFOK to just a backward
compat thing and there being only a single use in the MIPS arch code the
best solution is to delete both of __INIT_REFOK and __INITDATA_REFOK (which
was equally broken) being unused anyway these can be deleted.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Move the declaration of device_pm_schedule_removal() to device.h
and make it exported, as it will be used directly by some drivers
for unregistering device objects during suspend/resume cycles in a
safe way.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This reverts commit 6c723d5bd8.
It caused build errors on non-x86 platforms, config file confusion, and
even some boot errors on some x86-64 boxes. All around, not quite ready
for prime-time :(
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Move check_dma_crc() to ide-dma.c and add inline version for
CONFIG_BLK_DEV_IDEDMA=n case.
* Rename check_dma_crc() to ide_check_dma_crc().
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* siimage.c: use hwif->sata_scr[SATA_{ERROR,STATUS}_OFFSET] instead of
SATA_{ERROR,STATUS}_REG macros.
* Remove no longer needed SATA_*_REG macros.
While at it:
* Remove needless SATA Status register read from sil_sata_reset_poll().
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* ->nice0 and ->nice2 ide_drive_t fields are always zero so remove them.
* IDE_NICE_0 and IDE_NICE_2 defines from <linux/hdreg.h> are no longer
used by any kernel code so cover them with #ifndef/#endif __KERNEL__.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Un-static create_proc_ide_drives() and call it from ide_device_add_all().
While at it:
* Rename create_proc_ide_drives() to ide_proc_port_register_devices().
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Factor out devices setup from ide_acpi_init() to
ide_acpi_port_init_devices().
* Call ide_acpi_port_init_devices() in ide_device_add_all().
While at it:
* Remove no longer needed 'drive' field from struct ide_acpi_drive_link.
There should be no functionality changes caused by this patch.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add ->port_init_devs method to ide_hwif_t for a host specific
initialization of devices on a port. Call the new method from
ide_port_init_devices().
* Convert ht6560b, qd65xx and opti621 host drivers to use the new
->port_init_devs method.
There should be no functionality changes caused by this patch.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Use the same bit for IDE_HFLAG_CS5520 and IDE_HFLAG_VDMA host flags
(both are used only by cs5520 host driver currently).
* Add IDE_HFLAG_NO_IO32_BIT host flag and use it instead of ->no_io_32bit
ide_hwif_t field.
* Add IDE_HFLAG_NO_UNMASK_IRQS host flag, then convert dtc2278 and rz1000
host drivers to use it.
There should be no functionality changes caused by this patch.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Factor out code for finding ide_hwifs[] slot from ide_register_hw()
to ide_deprecated_find_port().
* Convert bast-ide, ide-cs and delkin_cb host drivers to use ide_device_add()
instead of ide_register_hw() (while at it drop doing "ide_unregister()" loop
which tries to unregister _all_ IDE interfaces if useable ide_hwifs[] slot
cannot be find).
This patch leaves us with only two ide_register_hw() users:
- drivers/macintosh/mediabay.c
- drivers/ide/ide.c
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add 'init_default' (flag for calling init_hwif_default()) and 'restore'
(flag for calling ide_hwif_restore()) arguments to ide_unregister().
* Update ide_unregister() users to set 'init_default' and 'restore' flags.
* No need to set 'init_default' flag in ide_register_hw() if the setup done
by init_hwif_default() is going to be overridden by ide_init_port_hw().
* No need to set 'init_default' and 'restore' flags in cleanup_module().
There should be no functionality changes caused by this patch.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Also, move xfer_func_t typedef to the ide.h since it is used by two drivers
now (more coming).
Bart:
- use __func__ while at it
Signed-off-by: Borislav Petkov <bbpetkov@yahoo.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add ->cable_detect method to ide_hwif_t.
* Call the new method in ide_init_port() if:
- the host supports UDMA modes > UDMA2 ('hwif->ultra_mask & 78')
- DMA initialization was successful (if hwif->dma_base is not set
ide_init_port() sets hwif->ultra_mask to zero)
- "idex=ata66" is not used ('hwif->cbl != ATA_CBL_PATA40_SHORT')
* Convert PCI host drivers to use ->cable_detect method.
While at it:
* Factor out cable detection to separate functions (if not already done).
* hpt366.c/it8213.c/slc90e66.c:
- don't check cable type if "idex=ata66" is used
* pdc202xx_new.c:
- add __devinit tag to pdcnew_cable_detect()
* pdc202xx_old.c:
- rename pdc202xx_old_cable_detect() to pdc2026x_old_cable_detect()
- add __devinit tag to pdc2026x_old_cable_detect()
Reviewed-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Remove 'struct pci_dev *dev' argument from ide_hwif_setup_dma().
* Un-static ide_hwif_setup_dma() and add CONFIG_BLK_DEV_IDEDMA_PCI=n version.
* Add 'const struct ide_port_info *d' argument to ide_device_add[_all]().
* Factor out generic ports init from ide_pci_setup_ports() to ide_init_port(),
move it to ide-probe.c and call it in in ide_device_add_all() instead of
ide_pci_setup_ports().
* Move ->mate setup to ide_device_add_all() from ide_port_init().
* Add IDE_HFLAG_NO_AUTOTUNE host flag for host drivers that don't enable
->autotune currently.
* Setup hwif->chipset in ide_init_port() but iff pi->chipset is set
(to not override setup done by ide_hwif_configure()).
* Add ETRAX host handling to ide_device_add_all().
* cmd640.c: set IDE_HFLAG_ABUSE_* also for CONFIG_BLK_DEV_CMD640_ENHANCED=n.
* pmac.c: make pmac_ide_setup_dma() return an error value and move DMA masks
setup to pmac_ide_setup_device().
* Add 'struct ide_port_info' instances to legacy host drivers, pass them to
ide_device_add() calls and then remove open-coded ports initialization.
Reviewed-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Fix an imprecision in CELSIUS_TO_KELVIN and move these
two macroes to a proper place.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Thomas Sujith <sujith.thomas@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* git://git.infradead.org/battery-2.6:
apm_power: check I.intval for zero value, we use it as the divisor
MAINTAINERS: remove kernel-discuss@handhelds.org list
pda_power: implement polling
pda_power: various cleanups
apm_power: support using VOLTAGE_* properties for apm calculations
pda_power: add suspend/resume support
power_supply: add few more values and props
pda_power: only register available psu
power: fix incorrect unregistration in power_supply_create_attrs error path
power: remove POWER_SUPPLY_PROP_CAPACITY_LEVEL
[BATTERY] power_supply_leds: use kasprintf
[BATTERY] Every file should include the headers containing the prototypes for its global functions.
The Generic Thermal sysfs driver for thermal management.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Thomas Sujith <sujith.thomas@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'for-linus' of git://linux-nfs.org/~bfields/linux: (100 commits)
SUNRPC: RPC program information is stored in unsigned integers
SUNRPC: Move exported symbol definitions after function declaration part 2
NLM: tear down RPC clients in nlm_shutdown_hosts
SUNRPC: spin svc_rqst initialization to its own function
nfsd: more careful input validation in nfsctl write methods
lockd: minor log message fix
knfsd: don't bother mapping putrootfh enoent to eperm
rdma: makefile
rdma: ONCRPC RDMA protocol marshalling
rdma: SVCRDMA sendto
rdma: SVCRDMA recvfrom
rdma: SVCRDMA Core Transport Services
rdma: SVCRDMA Transport Module
rdma: SVCRMDA Header File
svc: Add svc_xprt_names service to replace svc_sock_names
knfsd: Support adding transports by writing portlist file
svc: Add svc API that queries for a transport instance
svc: Add /proc/sys/sunrpc/transport files
svc: Add transport hdr size for defer/revisit
svc: Move the xprt independent code to the svc_xprt.c file
...
* 'suspend' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (38 commits)
suspend: cleanup reference to swsusp_pg_dir[]
PM: Remove obsolete /sys/devices/.../power/state docs
Hibernation: Invoke suspend notifications after console switch
Suspend: Invoke suspend notifications after console switch
Suspend: Clean up suspend_64.c
Suspend: Add config option to disable the freezer if architecture wants that
ACPI: Print message before calling _PTS
ACPI hibernation: Call _PTS before suspending devices
Hibernation: Introduce begin() and end() callbacks
ACPI suspend: Call _PTS before suspending devices
ACPI: Separate disabling of GPEs from _PTS
ACPI: Separate invocations of _GTS and _BFS from _PTS and _WAK
Suspend: Introduce begin() and end() callbacks
suspend: fix ia64 allmodconfig build
ACPI: clear GPE earily in resume to avoid warning
Suspend: Clean up Kconfig (V2)
Hibernation: Clean up Kconfig (V2)
Hibernation: Update messages
Suspend: Use common prefix in messages
Hibernation: Remove unnecessary variable declaration
...
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pci-2.6: (64 commits)
PCI: make pci_bus a struct device
PCI: fix codingstyle issues in include/linux/pci.h
PCI: fix codingstyle issues in drivers/pci/pci.h
PCI: PCIE ASPM support
PCI: Fix fakephp deadlock
PCI: modify SB700 SATA MSI quirk
PCI: Run ACPI _OSC method on root bridges only
PCI ACPI: AER driver should only register PCIe devices with _OSC
PCI ACPI: Added a function to register _OSC with only PCIe devices.
PCI: constify function pointer tables
PCI: Convert drivers/pci/proc.c to use unlocked_ioctl
pciehp: block new requests from the device before power off
pciehp: workaround against Bad DLLP during power off
pciehp: wait for 1000ms before LED operation after power off
PCI: Remove pci_enable_device_bars() from documentation
PCI: Remove pci_enable_device_bars()
PCI: Remove users of pci_enable_device_bars()
PCI: Add pci_enable_device_{io,mem} intefaces
PCI: avoid save the same type of cap multiple times
PCI: correctly initialize a structure for pcie_save_pcix_state()
...
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (128 commits)
USB: fix codingstyle issues in drivers/usb/core/*.c
USB: fix codingstyle issues in drivers/usb/core/message.c
USB: fix codingstyle issues in drivers/usb/core/hcd-pci.c
USB: fix codingstyle issues in drivers/usb/core/devio.c
USB: fix codingstyle issues in drivers/usb/core/devices.c
USB: fix codingstyle issues in drivers/usb/core/*.h
USB: fix codingstyle issues in include/linux/usb/
USB: fix codingstyle issues in include/linux/usb.h
USB: mark USB drivers as being GPL only
USB: use a real vendor and product id for root hubs
USB: mount options: fix usbfs
USB: Fix usb_serial_driver structure for Kobil cardreader driver.
usb: ehci should use u16 for isochronous intervals
usb: ehci, remove false clear-reset path
USB: Use menuconfig objects
usb: ohci-sm501 driver
usb: dma bounce buffer support
USB: last abuses of intfdata in close for usb-serial drivers
USB: kl5kusb105 don't flush to logically disconnected devices
USB: oti6858: cleanup
...
Add LiMn (one of the most common for small non-rechargable batteries)
battery technology and voltage_min/_max properties support.
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Anton Vorontsov <cbou@mail.ru>
The CAPACITY_LEVEL stuff defines various levels of charge; however, what
is the difference between them? What differentiates between HIGH and NORMAL,
LOW and CRITICAL, etc?
As it appears that these are fairly arbitrary, we end up making such policy
decisions in the kernel (or in hardware). This is the sort of decision that
should be made in userspace, not in the kernel.
If the hardware does not support _CAPACITY and it cannot be easily calculated,
then perhaps the driver should register a custom CAPACITY_LEVEL attribute;
however, userspace should not become accustomed to looking for such a thing,
and we should certainly not encourage drivers to provide CAPACITY_LEVEL
stubs.
The following removes support for POWER_SUPPLY_PROP_CAPACITY_LEVEL. The
OLPC battery driver is the only driver making use of this, so it's
removed from there as well.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Introduce global hibernation callback .end() and rename global
hibernation callback .start() to .begin(), in analogy with the
recent modifications of the global suspend callbacks.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
On ACPI systems the target state set by acpi_pm_set_target() is
reset by acpi_pm_finish(), but that need not be called if the
suspend fails. All platforms that use the .set_target() global
suspend callback are affected by analogous issues.
For this reason, we need an additional global suspend callback that
will reset the target state regardless of whether or not the suspend
is successful. Also, it is reasonable to rename the .set_target()
callback, since it will be used for a different purpose on ACPI
systems (due to ACPI 1.0x code ordering requirements).
Introduce the global suspend callback .end() to be executed at the
end of the suspend sequence and rename the .set_target() global
suspend callback to .begin().
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
This patch (as1008b) converts the PM notifier routines from inline
calls to out-of-line code. It also prevents pm_chain_head from
being created when CONFIG_PM_SLEEP isn't enabled, and EXPORTs the
notifier registration and unregistration routines.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
Add PM_RESTORE_PREPARE and PM_POST_RESTORE notifiers to the PM core, to be used
in analogy with the existing PM_HIBERNATION_PREPARE and PM_POST_HIBERNATION
notifiers.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
Move the definitions of hibernation ioctls to a separate header file in
include/linux, which can be exported to the user space.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
This moves the pci_bus class device to be a real struct device and at
the same time, place it in the device tree in the correct location.
Note, the old "bridge" symlink is now gone, but this was a non-standard
link and no userspace program used it. If you need to determine the
device that the bus is on, follow the standard device symlink, or walk
up the device tree.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
PCI Express ASPM defines a protocol for PCI Express components in the D0
state to reduce Link power by placing their Links into a low power state
and instructing the other end of the Link to do likewise. This
capability allows hardware-autonomous, dynamic Link power reduction
beyond what is achievable by software-only controlled power management.
However, The device should be configured by software appropriately.
Enabling ASPM will save power, but will introduce device latency.
This patch adds ASPM support in Linux. It introduces a global policy for
ASPM, a sysfs file /sys/module/pcie_aspm/parameters/policy can control
it. The interface can be used as a boot option too. Currently we have
below setting:
-default, BIOS default setting
-powersave, highest power saving mode, enable all available ASPM
state
and clock power management
-performance, highest performance, disable ASPM and clock power
management
By default, the 'default' policy is used currently.
In my test, power difference between powersave mode and performance mode
is about 1.3w in a system with 3 PCIE links.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The function pci_osc_support_set() traverses every root bridge when
checking for _OSC support for a capability. It quits as soon as it finds a
device/bridge that doesn't support the requested capability. This won't
work for systems that have mixed PCI and PCIe bridges when checking for
PCIe features. I split this function into two -- pci_osc_support_set() and
pcie_osc_support_set(). The latter is used when only PCIe devices should be
traversed.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Now that all in-tree users are gone, this removes pci_enable_device_bars()
completely.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The pci_enable_device_bars() interface isn't well suited to PCI
because you can't actually enable/disable BARs individually on
a device. So for example, if a device has 2 memory BARs 0 and 1,
and one of them (let's say 1) has not been successfully allocated
by the firmware or the kernel, then enabling memory decoding
shouldn't be permitted for the entire device since it will decode
whatever random address is still in that BAR 1.
So a device must be either fully enabled for IO, for Memory, or
for both. Not on a per-BAR basis.
This provides two new functions, pci_enable_device_io() and
pci_enable_device_mem() to replace pci_enable_device_bars(). The
implementation internally builds a BAR mask in order to be able
to use existing arch infrastructure.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The current pci_assign_unassigned_resources() code doesn't work properly
on 32 bits platforms with 64 bits resources. The main reason is the use
of unsigned long in various places instead of resource_size_t.
This is a pre-requisite for making powerpc use the generic code instead of
its own half-useful implementation.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
PCI error recovery usually involves the PCI adapter being reset.
If the device is using MSI, the reset will cause the MSI state
to be lost; the device driver needs to restore the MSI state.
The pci_restore_msi_state() routine is currently protected
by CONFIG_PM; remove this, and also export the symbol, so
that it can be used in a modle.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The vendor_compatible and device_compatible fields in struct pci_dev aren't
used anywhere, and are somewhat pointless. Assuming that these are
historical artifacts, remove them.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
PCIE has a mechanism to wait for Non-Posted request to complete. I think
pci_disable_device is a good place to do this.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch #if 0's the following unused global functions:
- rom.c: pci_map_rom_copy()
- rom.c: pci_remove_rom()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch makes the needlessly global pci_restore_bars() static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (100 commits)
ide: move hwif_register() call out of ide_probe_port()
ide: factor out code for tuning devices from ide_probe_port()
ide: move handling of I/O resources out of ide_probe_port()
ide: make probe_hwif() return an error value
ide: use ide_remove_port_from_hwgroup in init_irq()
ide: prepare init_irq() for using ide_remove_port_from_hwgroup()
ide: factor out code removing port from hwgroup from ide_unregister()
ide: I/O resources are released too early in ide_unregister()
ide: cleanup ide_system_bus_speed()
ide: remove needless zeroing of hwgroup fields from init_irq()
ide: remove unused ide_hwgroup_t fields
ide_platform: remove struct hwif_prop
ide: remove hwif->present manipulations from hwif_init()
ide: move wait_hwif_ready() documentation in the right place
ide: fix handling of busy I/O resources in probe_hwif()
<linux/hdsmart.h> is not used by kernel code
ide: don't include <linux/hdsmart.h>
ide-floppy: cleanup header
ide: update/add my Copyrights
ide: delete filenames/versions from comments
...
This fixes a problem where the mos7720 driver will make io to a device from
which it has been logically disconnected. It does so by introducing a flag by
which the generic usb serial code can signal the subdrivers their
disconnection and appropriate locking.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch exports two statistics to userspace:
/sys/bus/usb/device/.../power/connected_duration
/sys/bus/usb/device/.../power/active_duration
connected_duration is the total time (in msec) that the device has
been connected. active_duration is the total time the device has not
been suspended. With these two statistics, tools like PowerTOP can
calculate the percentage time that a device is active, i.e. not
suspended or auto-suspended.
Users can also use the active_duration to check if a device is actually
autosuspended. Currently, they can set power/level to auto and
power/autosuspend to a positive timeout, but there's no way to know from
userspace if a device was actually autosuspended without looking at the
dmesg output. These statistics will be useful in creating an automated
userspace script to test autosuspend for USB devices.
Signed-off-by: Sarah Sharp <sarah.a.sharp@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Given that none of the referenced header files test the proprocessor
conditional __KERNEL__, there's no point "unifdef"fing them.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When a usb serial adapter is used as console, the usb serial console
driver bumps the open_count on the port struct used but doesn't attach
a real tty to it (only a fake one temporaly). If this port is opened later
using the regular character device interface, the open method won't
initialize the port, which is the expected, and will receive a brand new
tty struct created by tty layer, which will be stored in port->tty.
When the last close is issued, open_count won't be 0 because of the
console usage and the port->tty will still contain the old tty value. This
is the last ttyUSB<n> close so the allocated tty will be freed by the
tty layer. The usb_serial and usb_serial_port are still in use by the
console, so port_free() won't be called (serial_close() ->
usb_serial_put() -> destroy_serial() -> port_free()), so the scheduled
work (port->work, usb_serial_port_work()) will still run. And
usb_serial_port_work() does:
(...)
tty = port->tty;
if (!tty)
return;
tty_wakeup(tty);
which causes (manually copied):
Faulting instruction address: 0x6b6b6b68
Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT PowerMac
Modules linked in: binfmt_misc ipv6 nfs lockd nfs_acl sunrpc dm_snapshot dm_mirror dm_mod hfsplus uinput ams input_polldev genrtc cpufreq_powersave i2c_powermac therm_adt746x snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa joydev snd_aoa_i2sbus snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc pmac_zilog serial_core evdev ide_cd cdrom snd appletouch soundcore snd_aoa_soundbus bcm43xx firmware_class usbhid ieee80211softmac ff_memless firewire_ohci firewire_core ieee80211 ieee80211_crypt crc_itu_t sungem sungem_phy uninorth_agp agpart ssb
NIP: 6b6b6b68 LR: c01b2108 CTR: 6b6b6b6b
REGS: c106de80 TRAP: 0400 Not tainted (2.6.24-rc2)
MSR: 40009032 <EE,ME,IR,DR> CR: 82004024 XER: 00000000
TASK = c106b4c0[5] 'events/0' THREAD: c106c000
GPR00: 6b6b6b6b c106df30 c106b4c0 c2d613a0 00009032 00000001 00001a00 00000001
GPR08: 00000008 00000000 00000000 c106c000 42004028 00000000 016ffbe0 0171a724
GPR16: 016ffcf4 00240e24 00240e70 016fee68 016ff9a4 c03046c4 c0327f50 c03046fc
GPR24: c106b6b9 c106b4c0 c101d610 c106c000 c02160fc c1eac1dc c2d613ac c2d613a0
NIP [6b6b6b68] 0x6b6b6b68
LR [c01b2108] tty_wakeup+0x6c/0x9c
Call Trace:
[c106df30] [c01b20e8] tty_wakeup+0x4c/0x9c (unreliable)
[c106df40] [c0216138] usb_serial_port_work+0x3c/0x78
[c106df50] [c00432e8] run_workqueue+0xc4/0x15c
[c106df90] [c0043798] worker_thread+0xa0/0x124
[c106dfd0] [c0048224] kthread+0x48/0x84
[c106dff0] [c00129bc] kernel_thread+0x44/0x60
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
Slab corruption: size-2048 start=c2d613a0, len=2048
Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Last user: [<c01b16d8>](release_one_tty+0xbc/0xf4)
050: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Prev obj: start=c2d60b88, len=2048
Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Last user: [<c00f30ec>](show_stat+0x410/0x428)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
This patch avoids this, clearing port->tty considering if the port is
used as serial console or not
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
G_PRINTER: Adds a USB printer gadget driver for use in printer firmware.
This adds a USB printer gadget driver for use in printer firmware.
The printer gadget channels data between the USB host and a userspace
program driving the print engine. The user space program reads and
writes the device file /dev/g_printer to receive or send printer data.
It can use ioctl calls to the device file to get or set printer status.
Signed-off-by: Craig W. Nadler <craig@nadler.us>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1011) adds a #define for the newly-created Lockable
(i.e., password-protected) subclass 0x07 for USB mass-storage devices.
The private ISD200 entry (which had been mapped to subclass 0x07) is
moved to 0xf0, which is unlikely to conflict with any official
subclass designation.
The US_SC_MIN and US_SC_MAX constants aren't used anywhere, so the
patch removes them.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Convert from class_device to device for drivers/usb/core.
Signed-off-by: Tony Jones <tonyj@suse.de>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Make ide_build_sglist() and ide_destroy_dmatable() available also when
CONFIG_BLK_DEV_IDEDMA_PCI=n.
* Use ide_build_sglist() and ide_destroy_dmatable() in {ics,au1xxx-}ide.c
and remove no longer needed {ics,au}ide_build_sglist().
There should be no functionality changes caused by this patch.
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Keep pointer to struct device instead of struct pci_dev in ide_hwif_t.
While on it:
* Use *dev->dma_mask instead of pci_dev->dma_mask in ide_toggle_bounce().
There should be no functionality changes caused by this patch.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add IDE_HFLAG_NO_DSC host flag for hosts that doesn't support DSC overlap.
* Set it in aec62xx (for ATP850UF only) and hpt34x host drivers.
* Convert ide-tape device driver to check for IDE_HFLAG_NO_DSC flag.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Rename 'simplex_stat' variable to 'dma_stat' in ide_get_or_set_dma_base().
* Factor out code for forcing host out of "simplex" mode from
ide_get_or_set_dma_base() to ide_pci_clear_simplex() helper.
* Add IDE_HFLAG_CLEAR_SIMPLEX host flag and set it in alim15x3 (for M5229),
amd74xx (for AMD 7409), cmd64x (for CMD643), generic (for Netcell) and
serverworks (for CSB5) host drivers.
* Make ide_get_or_set_dma_base() test for IDE_HFLAG_CLEAR_SIMPLEX host flag
instead of checking dev->device (BTW the code was buggy because it didn't
check for dev->vendor, luckily none of these PCI Device IDs was used by
some other vendor for PCI IDE controller).
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
According to http://marc.info/?l=linux-ide&m=114346138611631, the drivers must
always register 8 DMA ports with ide_setup_dma(), so its last argument is not
needed. While at it, kill some useless parens in that function...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add ide_dump_identify() debug helper for dumping raw identify data in
the hdparm friendly format (== the identify data can be extracted from
dmesg output and passed to hdparm --Istdin).
* Dump identify data in ide-probe.c::do_identify() if DEBUG is enabled.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Move lba_to_msf() and msf_to_lba() to <linux/cdrom.h>
(use 'u8' type instead of 'byte' while at it).
* Remove msf_to_lba() copy from drivers/cdrom/cdrom.c.
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
After commit 7267c33774
wait_drive_not_busy() can become static again.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
- ide_scan_pcibus() can become static
- instead of ide_scan_pci() we can use ide_scan_pcibus() directly
in module_init()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Move the initialzation in __svc_create_thread that happens prior to
thread creation to a new function. Export the function to allow
services to have better control over the svc_rqst structs.
Also rearrange the rqstp initialization to prevent NULL pointer
dereferences in svc_exit_thread in case allocations fail.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This file defines the data types used by the SVCRDMA transport module.
The principle data structure is the transport specific extension to
the svcxprt structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Create a transport independent version of the svc_sock_names function.
The toclose capability of the svc_sock_names service can be implemented
using the svc_xprt_find and svc_xprt_close services.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Add a new svc function that allows a service to query whether a
transport instance has already been created. This is used in lockd
to determine whether or not a transport needs to be created when
a lockd instance is brought up.
Specifying 0 for the address family or port is effectively a wild-card,
and will result in matching the first transport in the service's list
that has a matching class name.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Add a file that when read lists the set of registered svc
transports.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Some transports have a header in front of the RPC header. The current
defer/revisit processing considers only the iov_len and arg_len to
determine how much to back up when saving the original request
to revisit. Add a field to the rqstp structure to save the size
of the transport header so svc_defer can correctly compute
the start of a request.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This functionally trivial patch moves all of the transport independent
functions from the svcsock.c file to the transport independent svc_xprt.c
file.
In addition the following formatting changes were made:
- White space cleanup
- Function signatures on single line
- The inline directive was removed
- Lines over 80 columns were reformatted
- The term 'socket' was changed to 'transport' in comments
- The SMP comment was moved and updated.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This functionally empty patch removes rq_sock and unamed union
from rqstp structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch moves the transport sockaddr to the svc_xprt
structure. Convenience functions are added to set and
get the local and remote addresses of a transport from
the transport provider as well as determine the length
of a sockaddr.
A transport is responsible for setting the xpt_local
and xpt_remote addresses in the svc_xprt structure as
part of transport creation and xpo_accept processing. This
cannot be done in a generic way and in fact varies
between TCP, UDP and RDMA. A set of xpo_ functions
(e.g. getlocalname, getremotename) could have been
added but this would have resulted in additional
caching and copying of the addresses around. Note that
the xpt_local address should also be set on listening
endpoints; for TCP/RDMA this is done as part of
endpoint creation.
For connected transports like TCP and RDMA, the addresses
never change and can be set once and copied into the
rqstp structure for each request. For UDP, however, the
local and remote addresses may change for each request. In
this case, the address information is obtained from the
UDP recvmsg info and copied into the rqstp structure from
there.
A svc_xprt_local_port function was also added that returns
the local port given a transport. This is used by
svc_create_xprt when returning the port associated with
a newly created transport, and later when creating a
generic find transport service to check if a service is
already listening on a given port.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch moves the transport independent sk_deferred list to the svc_xprt
structure and updates the svc_deferred_req structure to keep pointers to
svc_xprt's directly. The deferral processing code is also moved out of the
transport dependent recvfrom functions and into the generic svc_recv path.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Move the authinfo cache to svc_xprt. This allows both the TCP and RDMA
transports to share this logic. A flag bit is used to determine if
auth information is to be cached or not. Previously, this code looked
at the transport protocol.
I've also changed the spin_lock/unlock logic so that a lock is not taken for
transports that are not caching auth info.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
With the implementation of the new mark and sweep algorithm for shutting
down old connections, the sk_lastrecv field is no longer needed.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
All fields touched by svc_sock_received are now transport independent.
Change it to use svc_xprt directly. This function is called from
transport dependent code, so export it.
Update the comment to clearly state the rules for calling this function.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Move the sk_mutex field to the transport independent svc_xprt structure.
Now all the fields that svc_send touches are transport neutral. Change the
svc_send function to use the transport independent svc_xprt directly instead
of the transport dependent svc_sock structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This functionally trivial patch moves the sk_reserved field to the
transport independent svc_xprt structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Move sk_list and sk_ready to svc_xprt. This involves close because these
lists are walked by svcs when closing all their transports. So I combined
the moving of these lists to svc_xprt with making close transport independent.
The svc_force_sock_close has been changed to svc_close_all and takes a list
as an argument. This removes some svc internals knowledge from the svcs.
This code races with module removal and transport addition.
Thanks to Simon Holm Thøgersen for a compile fix.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Simon Holm Thøgersen <odie@cs.aau.dk>
This is another incremental change that moves transport independent
fields from svc_sock to the svc_xprt structure. The changes
should be functionally null.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This functionally trivial change moves the transport independent sk_flags
field to the transport independent svc_xprt structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Change the atomic_t reference count to a kref and move it to the
transport indepenent svc_xprt structure. Change the reference count
wrapper names to be generic.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Modify the various kernel RPC svcs to use the svc_create_xprt service.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The svc_create_xprt function is a transport independent version
of the svc_makesock function.
Since transport instance creation contains transport dependent and
independent components, add an xpo_create transport function. The
transport implementation of this function allocates the memory for the
endpoint, implements the transport dependent initialization logic, and
calls svc_xprt_init to initialize the transport independent field (svc_xprt)
in it's data structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Previously, the accept logic looked into the socket state to determine
whether to call accept or recv when data-ready was indicated on an endpoint.
Since some transports don't use sockets, this logic now uses a flag
bit (SK_LISTENER) to identify listening endpoints. A transport function
(xpo_accept) allows each transport to define its own accept processing.
A transport's initialization logic is reponsible for setting the
SK_LISTENER bit. I didn't see any way to do this in transport independent
logic since the passive side of a UDP connection doesn't listen and
always recv's.
In the svc_recv function, if the SK_LISTENER bit is set, the transport
xpo_accept function is called to handle accept processing.
Note that all functions are defined even if they don't make sense
for a given transport. For example, accept doesn't mean anything for
UDP. The function is defined anyway and bug checks if called. The
UDP transport should never set the SK_LISTENER bit.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
In order to avoid blocking a service thread, the receive side checks
to see if there is sufficient write space to reply to the request.
Each transport has a different mechanism for determining if there is
enough write space to reply.
The code that checked for write space was coupled with code that
checked for CLOSE and CONN. These checks have been broken out into
separate statements to make the code easier to read.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Some transports add fields to the RPC header for replies, e.g. the TCP
record length. This function is called when preparing the reply header
to allow each transport to add whatever fields it requires.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Add transport specific xpo_detach and xpo_free functions. The xpo_detach
function causes the transport to stop delivering data-ready events
and enqueing the transport for I/O.
The xpo_free function frees all resources associated with the particular
transport instance.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The svc_sock_release function releases pages allocated to a thread. For
UDP this frees the receive skb. For RDMA it will post a receive WR
and bump the client credit count.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The sk_sendto and sk_recvfrom are function pointers that allow svc_sock
to be used for both UDP and TCP. Move these function pointers to the
svc_xprt_ops structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The svc_max_payload function currently looks at the socket type
to determine the max payload. Add a max payload value to svc_xprt_class
so it can be returned directly.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The rqstp structure contains a pointer to the transport for the
RPC request. This functionaly trivial patch adds an unamed union
with pointers to both svc_sock and svc_xprt. Ultimately the
union will be removed and only the rq_xprt field will remain. This
allows incrementally extracting transport independent interfaces without
one gigundo patch.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Make TCP and UDP svc_sock transports, and register them
with the svc transport core.
A transport type (svc_sock) has an svc_xprt as its first member,
and calls svc_xprt_init to initialize this field.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The transport class (svc_xprt_class) represents a type of transport, e.g.
udp, tcp, rdma. A transport class has a unique name and a set of transport
operations kept in the svc_xprt_ops structure.
A transport class can be dynamically registered and unregisterd. The
svc_xprt_class represents the module that implements the transport
type and keeps reference counts on the module to avoid unloading while
there are active users.
The endpoint (svc_xprt) is a generic, transport independent endpoint that can
be used to send and receive data for an RPC service. It inherits it's
operations from the transport class.
A transport driver module registers and unregisters itself with svc sunrpc
by calling svc_reg_xprt_class, and svc_unreg_xprt_class respectively.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch addresses a compatibility issue with a Linux NFS server and
AIX NFS client.
I have exported /export as fsid=0 with sec=krb5:krb5i
I have mount --bind /home onto /export/home
I have exported /export/home with sec=krb5i
The AIX client mounts / -o sec=krb5:krb5i onto /mnt
If I do an ls /mnt, the AIX client gets a permission error. Looking at
the network traceIwe see a READDIR looking for attributes
FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID. The response gives a
NFS4ERR_WRONGSEC which the AIX client is not expecting.
Since the AIX client is only asking for an attribute that is an
attribute of the parent file system (pseudo root in my example), it
seems reasonable that there should not be an error.
In discussing this issue with Bruce Fields, I initially proposed
ignoring the error in nfsd4_encode_dirent_fattr() if all that was being
asked for was FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID, however,
Bruce suggested that we avoid calling cross_mnt() if only these
attributes are requested.
The following patch implements bypassing cross_mnt() if only
FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID are called. Since there
is some complexity in the code in nfsd4_encode_fattr(), I didn't want to
duplicate code (and introduce a maintenance nightmare), so I added a
parameter to nfsd4_encode_fattr() that indicates whether it should
ignore cross mounts and simply fill in the attribute using the passed in
dentry as opposed to it's parent.
Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This header is used only in a few places in fs/nfsd, so there seems to
be little point to having it in include/. (Thanks to Robert Day for
pointing this out.)
Cc: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Newer server features such as nfsv4 and gss depend on proc to work, so a
failure to initialize the proc files they need should be treated as
fatal.
Thanks to Andrew Morton for style fix and compile fix in case where
CONFIG_NFSD_V4 is undefined.
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
There's really nothing much the caller can do if cache unregistration
fails. And indeed, all any caller does in this case is print an error
and continue. So just return void and move the printk's inside
cache_unregister.
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
If the reply cache initialization fails due to a kmalloc failure,
currently we try to soldier on with a reduced (or nonexistant) reply
cache.
Better to just fail immediately: the failure is then much easier to
understand and debug, and it could save us complexity in some later
code. (But actually, it doesn't help currently because the cache is
also turned off in some odd failure cases; we should probably find a
better way to handle those failure cases some day.)
Fix some minor style problems while we're at it, and rename
nfsd_cache_init() to remove the need for a comment describing it.
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: For consistency, store the length of path name strings in nfsd
argument structures as unsigned integers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: adjust the sign of the length argument of nfsd_lookup and
nfsd_lookup_dentry, for consistency with recent changes. NFSD version
4 callers already pass an unsigned file name length.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: For consistency, store the length of file name strings in nfsd
argument structures as unsigned integers. This matches the XDR routines
and client argument structures for the same operation types.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
According to The Open Group's NLM specification, NLM callers are variable
length strings. XDR variable length strings use an unsigned 32 bit length.
And internally, negative string lengths are not meaningful for the Linux
NLM implementation.
Clean up: Make nlm_lock.len and nlm_reboot.len unsigned integers. This
makes the sign of NLM string lengths consistent with the sign of xdr_netobj
lengths.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
XDR strings, opaques, and net objects should all use unsigned lengths.
To wit, RFC 4506 says:
4.2. Unsigned Integer
An XDR unsigned integer is a 32-bit datum that encodes a non-negative
integer in the range [0,4294967295].
...
4.11. String
The standard defines a string of n (numbered 0 through n-1) ASCII
bytes to be the number n encoded as an unsigned integer (as described
above), and followed by the n bytes of the string.
After this patch, xdr_decode_string_inplace now matches the other XDR
string and array helpers that take a string length argument. See:
xdr_encode_opaque_fixed, xdr_encode_opaque, xdr_encode_array
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* 'audit.b46' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
[AUDIT] Add uid, gid fields to ANOM_PROMISCUOUS message
[AUDIT] ratelimit printk messages audit
[patch 2/2] audit: complement va_copy with va_end()
[patch 1/2] kernel/audit.c: warning fix
[AUDIT] create context if auditing was ever enabled
[AUDIT] clean up audit_receive_msg()
[AUDIT] make audit=0 really stop audit messages
[AUDIT] break large execve argument logging into smaller messages
[AUDIT] include audit type in audit message when using printk
[AUDIT] do not panic on exclude messages in audit_log_pid_context()
[AUDIT] Add End of Event record
[AUDIT] add session id to audit messages
[AUDIT] collect uid, loginuid, and comm in OBJ_PID records
[AUDIT] return EINTR not ERESTART*
[PATCH] get rid of loginuid races
[PATCH] switch audit_get_loginuid() to task_struct *
This patch setups correctly MIMO PS mode flags
Signed-off-by: Guy Cohen <guy.cohen@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
execve arguments can be quite large. There is no limit on the number of
arguments and a 4G limit on the size of an argument.
this patch prints those aruguments in bite sized pieces. a userspace size
limitation of 8k was discovered so this keeps messages around 7.5k
single arguments larger than 7.5k in length are split into multiple records
and can be identified as aX[Y]=
Signed-off-by: Eric Paris <eparis@redhat.com>
This patch adds an end of event record type. It will be sent by the kernel as
the last record when a multi-record event is triggered. This will aid realtime
analysis programs since they will now reliably know they have the last record
to complete an event. The audit daemon filters this and will not write it to
disk.
Signed-off-by: Steve Grubb <sgrubb redhat com>
Signed-off-by: Eric Paris <eparis@redhat.com>
In order to correlate audit records to an individual login add a session
id. This is incremented every time a user logs in and is included in
almost all messages which currently output the auid. The field is
labeled ses= or oses=
Signed-off-by: Eric Paris <eparis@redhat.com>
Keeping loginuid in audit_context is racy and results in messier
code. Taken to task_struct, out of the way of ->audit_context
changes.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
To allow the implementation of optimized rw-locks in user space, glibc
needs a possibility to select waiters for wakeup depending on a bitset
mask.
This requires two new futex OPs: FUTEX_WAIT_BITS and FUTEX_WAKE_BITS
These OPs are basically the same as FUTEX_WAIT and FUTEX_WAKE plus an
additional argument - a bitset. Further the FUTEX_WAIT_BITS OP is
expecting an absolute timeout value instead of the relative one, which
is used for the FUTEX_WAIT OP.
FUTEX_WAIT_BITS calls into the kernel with a bitset. The bitset is
stored in the futex_q structure, which is used to enqueue the waiter
into the hashed futex waitqueue.
FUTEX_WAKE_BITS also calls into the kernel with a bitset. The wakeup
function logically ANDs the bitset with the bitset stored in each
waiters futex_q structure. If the result is zero (i.e. none of the set
bits in the bitsets is matching), then the waiter is not woken up. If
the result is not zero (i.e. one of the set bits in the bitsets is
matching), then the waiter is woken.
The bitset provided by the caller must be non zero. In case the
provided bitset is zero the kernel returns EINVAL.
Internaly the new OPs are only extensions to the existing FUTEX_WAIT
and FUTEX_WAKE functions. The existing OPs hand a bitset with all bits
set into the futex_wait() and futex_wake() functions.
Signed-off-by: Thomas Gleixner <tgxl@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To allow better diagnosis of tick-sched related, especially NOHZ
related problems, we need to know when the last wakeup via an irq
happened and when the CPU left the idle state.
Add two fields (idle_waketime, idle_exittime) to the tick_sched
structure and add them to the timer_list output.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
xtime_cache needs to be updated whenever xtime and or wall_to_monotic
are changed. Otherwise users of xtime_cache might see a stale (and in
the case of timezone changes utterly wrong) value until the next
update happens.
Fixup the obvious places, which miss this update.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <johnstul@us.ibm.com>
Tested-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: kill swap_io_context()
as-iosched: fix inconsistent ioc->lock context
ide-cd: fix leftover data BUG
block: make elevator lib checkpatch compliant
cfq-iosched: make checkpatch compliant
block: make core bits checkpatch compliant
block: new end request handling interface should take unsigned byte counts
unexport add_disk_randomness
block/sunvdc.c:print_version() must be __devinit
splice: always updated atime in direct splice
It blindly copies everything in the io_context, including the lock.
That doesn't work so well for either lock ordering or lockdep.
There seems zero point in swapping io contexts on a request to request
merge, so the best point of action is to just remove it.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Convert cpu_idle_wait() to cpuidle_kick_cpus() macro which is
SMP-only, and gives error on non supported CPU.
Signed-off-by: Kevin Hilman <khilman@mvista.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
Add new "flow" classifier, which is meant to extend the SFQ hashing
capabilities without hard-coding new hash functions and also allows
deterministic mappings of keys to classes, replacing some out of tree
iptables patches like IPCLASSIFY (maps IPs to classes), IPMARK (maps
IPs to marks, with fw filters to classes), ...
Some examples:
- Classic SFQ hash:
tc filter add ... flow hash \
keys src,dst,proto,proto-src,proto-dst divisor 1024
- Classic SFQ hash, but using information from conntrack to work properly in
combination with NAT:
tc filter add ... flow hash \
keys nfct-src,nfct-dst,proto,nfct-proto-src,nfct-proto-dst divisor 1024
- Map destination IPs of 192.168.0.0/24 to classids 1-257:
tc filter add ... flow map \
key dst addend -192.168.0.0 divisor 256
- alternatively:
tc filter add ... flow map \
key dst and 0xff
- similar, but reverse ordered:
tc filter add ... flow map \
key dst and 0xff xor 0xff
Perturbation is currently not supported because we can't reliable kill the
timer on destruction.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for dumping statistics and make internal queues visible as
classes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct ipv4_devconf can now become static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Outbound sequence number overflow error status
is counted as XfrmOutStateSeqError.
o Additionaly, it changes inbound sequence number replay
error name from XfrmInSeqOutOfWindow to XfrmInStateSeqError
to apply name scheme above.
o Inbound IPv4 UDP encapsuling type mismatch error is wrongly
mapped to XfrmInStateInvalid then this patch fiex the error
to XfrmInStateMismatch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current ip route cache implementation is not suited to large caches.
We can consume a lot of CPU when cache must be invalidated, since we
currently need to evict all cache entries, and this eviction is
sometimes asynchronous. min_delay & max_delay can somewhat control this
asynchronism behavior, but whole thing is a kludge, regularly triggering
infamous soft lockup messages. When entries are still in use, this also
consumes a lot of ram, filling dst_garbage.list.
A better scheme is to use a generation identifier on each entry,
so that cache invalidation can be performed by changing the table
identifier, without having to scan all entries.
No more delayed flushing, no more stalling when secret_interval expires.
Invalidated entries will then be freed at GC time (controled by
ip_rt_gc_timeout or stress), or when an invalidated entry is found
in a chain when an insert is done.
Thus we keep a normal equilibrium.
This patch :
- renames rt_hash_rnd to rt_genid (and makes it an atomic_t)
- Adds a new rt_genid field to 'struct rtable' (filling a hole on 64bit)
- Checks entry->rt_genid at appropriate places :
Reuse the existing logic for multicast list synchronization for the
unicast address list. The core of dev_mc_sync/unsync are split out as
__dev_addr_sync/unsync and moved from dev_mcast.c to dev.c. These are
then used to implement dev_unicast_sync/unsync as well.
I'm working on cleaning up Intel's FCoE stack, which generates new MAC
addresses from the fibre channel device id assigned by the fabric as
per the current draft specification in T11. When using such a
protocol in a VLAN environment it would be nice to not always be
forced into promiscuous mode, assuming the underlying Ethernet driver
supports multiple unicast addresses as well.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Normally during a dump the key of the last dumped entry is used for
continuation, but since lock is dropped it might be lost. In that case
fallback to the old counter based N^2 behaviour. This means the dump
will end up skipping some routes which matches what FIB_HASH does.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a net argument to inet6_lookup and propagate it further.
Actually, this is tcp-v6 implementation of what was done for
tcp-v4 sockets in a previous patch.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have INET_MATCH, INET_TW_MATCH and INET6_MATCH to test sockets and
twbuckets for matching, but ipv6 twbuckets are tested manually.
Here's the INET6_TW_MATCH to help with it.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduces the xt_hashlimit match revision 1. It adds support for
kernel-level inversion and grouping source and/or destination IP
addresses, allowing to limit on a per-subnet basis. While this would
technically obsolete xt_limit, xt_hashlimit is a more expensive due
to the hashbucketing.
Kernel-level inversion: Previously you had to do user-level inversion:
iptables -N foo
iptables -A foo -m hashlimit --hashlimit(-upto) 5/s -j RETURN
iptables -A foo -j DROP
iptables -A INPUT -j foo
now it is simpler:
iptables -A INPUT -m hashlimit --hashlimit-over 5/s -j DROP
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
CHECK net/ipv4/netfilter/ip_tables.c
net/ipv4/netfilter/ip_tables.c:1453:8: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1453:8: expected int *size
net/ipv4/netfilter/ip_tables.c:1453:8: got unsigned int [usertype] *size
net/ipv4/netfilter/ip_tables.c:1458:44: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1458:44: expected int *size
net/ipv4/netfilter/ip_tables.c:1458:44: got unsigned int [usertype] *size
net/ipv4/netfilter/ip_tables.c:1603:2: warning: incorrect type in argument 2 (different signedness)
net/ipv4/netfilter/ip_tables.c:1603:2: expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1603:2: got int *<noident>
net/ipv4/netfilter/ip_tables.c:1627:8: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1627:8: expected int *size
net/ipv4/netfilter/ip_tables.c:1627:8: got unsigned int *size
net/ipv4/netfilter/ip_tables.c:1634:40: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1634:40: expected int *size
net/ipv4/netfilter/ip_tables.c:1634:40: got unsigned int *size
net/ipv4/netfilter/ip_tables.c:1653:8: warning: incorrect type in argument 5 (different signedness)
net/ipv4/netfilter/ip_tables.c:1653:8: expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1653:8: got int *<noident>
net/ipv4/netfilter/ip_tables.c:1666:2: warning: incorrect type in argument 2 (different signedness)
net/ipv4/netfilter/ip_tables.c:1666:2: expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1666:2: got int *<noident>
CHECK net/ipv4/netfilter/arp_tables.c
net/ipv4/netfilter/arp_tables.c:1285:40: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/arp_tables.c:1285:40: expected int *size
net/ipv4/netfilter/arp_tables.c:1285:40: got unsigned int *size
net/ipv4/netfilter/arp_tables.c:1543:44: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/arp_tables.c:1543:44: expected int *size
net/ipv4/netfilter/arp_tables.c:1543:44: got unsigned int [usertype] *size
CHECK net/ipv6/netfilter/ip6_tables.c
net/ipv6/netfilter/ip6_tables.c:1481:8: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1481:8: expected int *size
net/ipv6/netfilter/ip6_tables.c:1481:8: got unsigned int [usertype] *size
net/ipv6/netfilter/ip6_tables.c:1486:44: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1486:44: expected int *size
net/ipv6/netfilter/ip6_tables.c:1486:44: got unsigned int [usertype] *size
net/ipv6/netfilter/ip6_tables.c:1631:2: warning: incorrect type in argument 2 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1631:2: expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1631:2: got int *<noident>
net/ipv6/netfilter/ip6_tables.c:1655:8: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1655:8: expected int *size
net/ipv6/netfilter/ip6_tables.c:1655:8: got unsigned int *size
net/ipv6/netfilter/ip6_tables.c:1662:40: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1662:40: expected int *size
net/ipv6/netfilter/ip6_tables.c:1662:40: got unsigned int *size
net/ipv6/netfilter/ip6_tables.c:1680:8: warning: incorrect type in argument 5 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1680:8: expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1680:8: got int *<noident>
net/ipv6/netfilter/ip6_tables.c:1693:2: warning: incorrect type in argument 2 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1693:2: expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1693:2: got int *<noident>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for ranges to the new revision. This doesn't affect
compatibility since the new revision was not released yet.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Propagate netns from userspace.
* arpt_register_table() registers table in supplied netns.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Propagate netns from userspace down to xt_find_table_lock()
* Register ip6 tables in netns (modules still use init_net)
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Typical table module registers xt_table structure (i.e. packet_filter)
and link it to list during it. We can't use one template for it because
corresponding list_head will become corrupted. We also can't unregister
with template because it wasn't changed at all and thus doesn't know in
which list it is.
So, we duplicate template at the very first step of table registration.
Table modules will save it for use during unregistration time and actual
filtering.
Do it at once to not screw bisection.
P.S.: renaming i.e. packet_filter => __packet_filter is temporary until
full netnsization of table modules is done.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In fact all we want is per-netns set of rules, however doing that will
unnecessary complicate routines such as ipt_hook()/ipt_do_table, so
make full xt_table array per-netns.
Every user stubbed with init_net for a while.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch from 0/-E to ptr/PTR_ERR convention.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the xt_conntrack match revision 1 by port matching (all four
{orig,repl}{src,dst}) and by packet direction matching.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before the removal of the deferred output hooks, netoutdev was used in
case of VLANs on top of a bridge to store the VLAN device, so the
deferred hooks would see the correct output device. This isn't
necessary anymore since we're calling the output hooks for the correct
device directly in the IP stack.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for combined mode algorithms with GCM being
the first algorithm supported.
Combined mode algorithms can be added through the xfrm_user interface
using the new algorithm payload type XFRMA_ALG_AEAD. Each algorithms
is identified by its name and the ICV length.
For the purposes of matching algorithms in xfrm_tmpl structures,
combined mode algorithms occupy the same name space as encryption
algorithms. This is in line with how they are negotiated using IKE.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move headers usbnet.h and rndis_host.h to include/linux/usb and fix includes
for drivers/net/usb modules. Headers are moved because rndis_wlan will be
outside drivers/net/usb in drivers/net/wireless and yet need these headers.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Teach rfkill about wimax radios.
Had to define a KEY_WIMAX as a 'key for disabling only wimax radios',
as other radio technologies have. This makes sense as hardware has
specific keys for disabling specific radios.
The RFKILL enabling part is, otherwise, a copy and paste of any other
radio technology.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits)
Remove commented-out code copied from NFS
NFS: Switch from intr mount option to TASK_KILLABLE
Add wait_for_completion_killable
Add wait_event_killable
Add schedule_timeout_killable
Use mutex_lock_killable in vfs_readdir
Add mutex_lock_killable
Use lock_page_killable
Add lock_page_killable
Add fatal_signal_pending
Add TASK_WAKEKILL
exit: Use task_is_*
signal: Use task_is_*
sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL
ptrace: Use task_is_*
power: Use task_is_*
wait: Use TASK_NORMAL
proc/base.c: Use task_is_*
proc/array.c: Use TASK_REPORT
perfmon: Use task_is_*
...
Fixed up conflicts in NFS/sunrpc manually..
bring back the avr32, blackfin, sh, sparc architectures into working order,
by reverting the effects of this change that came in via the x86 tree:
commit a5a19c63f4
Author: Jeremy Fitzhardinge <jeremy@goop.org>
Date: Wed Jan 30 13:33:39 2008 +0100
x86: demacro asm-x86/pgalloc_32.h
Sorry about that!
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (454 commits)
[POWERPC] Cell IOMMU fixed mapping support
[POWERPC] Split out the ioid fetching/checking logic
[POWERPC] Add support to cell_iommu_setup_page_tables() for multiple windows
[POWERPC] Split out the IOMMU logic from cell_dma_dev_setup()
[POWERPC] Split cell_iommu_setup_hardware() into two parts
[POWERPC] Split out the logic that allocates struct iommus
[POWERPC] Allocate the hash table under 1G on cell
[POWERPC] Add set_dma_ops() to match get_dma_ops()
[POWERPC] 83xx: Clean up / convert mpc83xx board DTS files to v1 format.
[POWERPC] 85xx: Only invalidate TLB0 and TLB1
[POWERPC] 83xx: Fix typo in mpc837x compatible entries
[POWERPC] 85xx: convert sbc85* boards to use machine_device_initcall
[POWERPC] 83xx: rework platform Kconfig
[POWERPC] 85xx: rework platform Kconfig
[POWERPC] 86xx: Remove unused IRQ defines
[POWERPC] QE: Explicitly set address-cells and size cells for muram
[POWERPC] Convert StorCenter DTS file to /dts-v1/ format.
[POWERPC] 86xx: Convert all 86xx DTS files to /dts-v1/ format.
[PPC] Remove 85xx from arch/ppc
[PPC] Remove 83xx from arch/ppc
...
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
alpha: fix x86.git merge build error
ia64: on UP percpu variables are not small memory model
x86: fix arch/x86/kernel/test_nx.c modular build bug
s390: use generic percpu linux-2.6.git
POWERPC: use generic per cpu
ia64: use generic percpu
SPARC64: use generic percpu
percpu: change Kconfig to HAVE_SETUP_PER_CPU_AREA
modules: fold percpu_modcopy into module.c
x86: export copy_from_user_ll_nocache[_nozero]
x86: fix duplicated TIF on 64-bit
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
security: compile capabilities by default
selinux: make selinux_set_mnt_opts() static
SELinux: Add warning messages on network denial due to error
SELinux: Add network ingress and egress control permission checks
NetLabel: Add auditing to the static labeling mechanism
NetLabel: Introduce static network labels for unlabeled connections
SELinux: Allow NetLabel to directly cache SIDs
SELinux: Enable dynamic enable/disable of the network access checks
SELinux: Better integration between peer labeling subsystems
SELinux: Add a new peer class and permissions to the Flask definitions
SELinux: Add a capabilities bitmap to SELinux policy version 22
SELinux: Add a network node caching mechanism similar to the sel_netif_*() functions
SELinux: Only store the network interface's ifindex
SELinux: Convert the netif code to use ifindex values
NetLabel: Add IP address family information to the netlbl_skbuff_getattr() function
NetLabel: Add secid token support to the NetLabel secattr struct
NetLabel: Consolidate the LSM domain mapping/hashing locks
NetLabel: Cleanup the LSM domain hash functions
NetLabel: Remove unneeded RCU read locks
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
PPC: Fix powerpc vio_find_name to not use devices_subsys
Driver core: add bus_find_device_by_name function
Module: check to see if we have a built in module with the same name
x86: fix runtime error in arch/x86/kernel/cpu/mcheck/mce_amd_64.c
Driver core: Fix up build when CONFIG_BLOCK=N
ia64 has a special processor specific mapping that can be used to locate the
offset for the current per cpu area.
Cc: linux-ia64@vger.kernel.org
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Migrating the apic timer in the critical section is not very nice, and is
absolutely horrible with the real-time port. Move migration to the regular
vcpu execution path, triggered by a new bitflag.
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
kvm_para.h potentially contains definitions that are to be used by userspace,
so it should not be included inside the __KERNEL__ block. To protect its own
data structures, kvm_para.h already includes its own __KERNEL__ block.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Acked-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves kvm_fpu asm-x86/kvm.h to allow every architecture to
define an own representation used for KVM_GET_FPU/KVM_SET_FPU.
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Convert the synchronization of the shadow handling to a separate mmu_lock
spinlock.
Also guard fetch() by mmap_sem in read-mode to protect against alias
and memslot changes.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
In preparation for a mmu spinlock, add kvm_read_guest_atomic()
and use it in fetch() and prefetch_page().
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This adds a mechanism for exposing the virtual apic tpr to the guest, and a
protocol for letting the guest update the tpr without causing a vmexit if
conditions allow (e.g. there is no interrupt pending with a higher priority
than the new tpr).
Signed-off-by: Avi Kivity <avi@qumranet.com>
Add a facility to report on accesses to the local apic tpr even if the
local apic is emulated in the kernel. This is basically a hack that
allows userspace to patch Windows which tends to bang on the tpr a lot.
Signed-off-by: Avi Kivity <avi@qumranet.com>
IA64 also needs to see ioapic structure in irqchip.
Signed-off-by: xiantao.zhang@intel.com <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Moving kvm_vcpu_kick() to x86.c. Since it should be
common for all archs, put its declarations in <linux/kvm_host.h>
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This paves the way for multiple architecture support. Note that while
ioapic.c could potentially be shared with ia64, it is also moved.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Currently, make headers_check barfs due to <asm/kvm.h>, which <linux/kvm.h>
includes, not existing. Rather than add a zillion <asm/kvm.h>s, export kvm.h
only if the arch actually supports it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch fixes a small issue where sturctures:
kvm_pic_state
kvm_ioapic_state
are defined inside x86 specific code and may or may not
be defined in anyway for other architectures. The problem
caused is one cannot compile userspace apps (ex. libkvm)
for other archs since a size cannot be determined for these
structures.
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The current cpuid management suffers from several problems, which inhibit
passing through the host feature set to the guest:
- No way to tell which features the host supports
While some features can be supported with no changes to kvm, others
need explicit support. That means kvm needs to vet the feature set
before it is passed to the guest.
- No support for indexed or stateful cpuid entries
Some cpuid entries depend on ecx as well as on eax, or on internal
state in the processor (running cpuid multiple times with the same
input returns different output). The current cpuid machinery only
supports keying on eax.
- No support for save/restore/migrate
The internal state above needs to be exposed to userspace so it can
be saved or migrated.
This patch adds extended cpuid support by means of three new ioctls:
- KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm)
supports
- KVM_SET_CPUID2: sets the vcpu's cpuid table
- KVM_GET_CPUID2: gets the vcpu's cpuid table, including hidden state
[avi: fix original KVM_SET_CPUID not removing nx on non-nx hosts as it did
before]
Signed-off-by: Dan Kenigsberg <danken@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves structures:
kvm_cpuid_entry
kvm_cpuid
from include/linux/kvm.h to include/asm-x86/kvm.h
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Move structures:
kvm_sregs
kvm_msr_entry
kvm_msrs
kvm_msr_list
from include/linux/kvm.h to include/asm-x86/kvm.h
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves structures:
kvm_segment
kvm_dtable
from include/linux/kvm.h to include/asm-x86/kvm.h
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves structure lapic_state from include/linux/kvm.h
to include/asm-x86/kvm.h
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves structure kvm_regs to include/asm-x86/kvm.h.
Each architecture will need to create there own version of this
structure.
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves structures:
kvm_pic_state
kvm_ioapic_state
to inclue/asm-x86/kvm.h.
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves sturct kvm_memory_alias from include/linux/kvm.h
to include/asm-x86/kvm.h. Also have include/linux/kvm.h include
include/asm/kvm.h.
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Currently kvm has a wart in that it requires three extra pages for use
as a tss when emulating real mode on Intel. This patch moves the allocation
internally, only requiring userspace to tell us where in the physical address
space we can place the tss.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Currently kvm provides hypercalls only for x86* architectures. To
provide hypercall infrastructure for other kvm architectures I split
kvm_para.h into a generic header file and architecture specific
definitions.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Instead of having the kernel allocate memory to the guest, let userspace
allocate it and pass the address to the kernel.
This is required for s390 support, but also enables features like memory
sharing and using hugetlbfs backed memory.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The user is now able to set how many mmu pages will be allocated to the guest.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch refactors the current hypercall infrastructure to better
support live migration and SMP. It eliminates the hypercall page by
trapping the UD exception that would occur if you used the wrong hypercall
instruction for the underlying architecture and replacing it with the right
one lazily.
A fall-out of this patch is that the unhandled hypercalls no longer trap to
userspace. There is very little reason though to use a hypercall to
communicate with userspace as PIO or MMIO can be used. There is no code
in tree that uses userspace hypercalls.
[avi: fix #ud injection on vmx]
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch adds a new configuration option, which adds support for a new
early_param which gets checked in arch/x86/kernel/setup_{32,64}.c:setup_arch()
to decide wether OHCI-1394 FireWire controllers should be initialized and
enabled for physical DMA access to allow remote debugging of early problems
like issues ACPI or other subsystems which are executed very early.
If the config option is not enabled, no code is changed, and if the boot
paramenter is not given, no new code is executed, and independent of that,
all new code is freed after boot, so the config option can be even enabled
in standard, non-debug kernels.
With specialized tools, it is then possible to get debugging information
from machines which have no serial ports (notebooks) such as the printk
buffer contents, or any data which can be referenced from global pointers,
if it is stored below the 4GB limit and even memory dumps of of the physical
RAM region below the 4GB limit can be taken without any cooperation from the
CPU of the host, so the machine can be crashed early, it does not matter.
In the extreme, even kernel debuggers can be accessed in this way. I wrote
a small kgdb module and an accompanying gdb stub for FireWire which allows
to gdb to talk to kgdb using remote remory reads and writes over FireWire.
An version of the gdb stub fore FireWire is able to read all global data
from a system which is running a a normal kernel without any kernel debugger,
without any interruption or support of the system's CPU. That way, e.g. the
task struct and so on can be read and even manipulated when the physical DMA
access is granted.
A HOWTO is included in this patch, in Documentation/debugging-via-ohci1394.txt
and I've put a copy online at
ftp://ftp.suse.de/private/bk/firewire/docs/debugging-via-ohci1394.txt
It also has links to all the tools which are available to make use of it
another copy of it is online at:
ftp://ftp.suse.de/private/bk/firewire/kernel/ohci1394_dma_early-v2.diff
Signed-Off-By: Bernhard Kaindl <bk@suse.de>
Tested-By: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
get more testing of the c_p_a() code done by not turning off
PSE on DEBUG_PAGEALLOC.
this simplifies the early pagetable setup code, and tests
the largepage-splitup code quite heavily.
In the end, all the largepages will be split up pretty quickly,
so there's no difference to how DEBUG_PAGEALLOC worked before.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Convert macros into inline functions, for better type-checking.
This patch required a little bit of fiddling with headers in order to
make __(pte|pmd)_free_tlb inline rather than macros.
asm-generic/tlb.h includes asm/pgalloc.h, though it doesn't directly
use any pgalloc definitions. I removed this include to avoid an
include cycle, but it may cause secondary compile failures by things
depending on the indirect inclusion; arch/x86/mm/hugetlbpage.c was one
such place; there may be others.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch defines the PCI identifiers found in
the RDC R-321x System-on-Chip.
Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
They now look like:
hal-resmgr[13791]: segfault at 3c rip 2b9c8caec182 rsp 7fff1e825d30 error 4 in libacl.so.1.1.0[2b9c8caea000+6000]
This makes it easier to pinpoint bugs to specific libraries.
And printing the offset into a mapping also always allows to find the
correct fault point in a library even with randomized mappings. Previously
there was no way to actually find the correct code address inside
the randomized mapping.
Relies on earlier patch to shorten the printk formats.
They are often now longer than 80 characters, but I think that's worth it.
[includes fix from Eric Dumazet to check d_path error value]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
On VMs implemented using JITs that cache translated code changing the lock
prefixes is a quite costly operation that forces the JIT to throw away and
retranslate a lot of code.
Previously a SMP kernel would rewrite the locks once for each CPU which
is quite unnecessary. This patch changes the code to never switch at boot in
the normal case (SMP kernel booting with >1 CPU) or only once for SMP kernel
on UP.
This makes a significant difference in boot up performance on AMD SimNow!
Also I expect it to be a little faster on native systems too because a smp
switch does a lot of text_poke()s which each synchronize the pipeline.
v1->v2: Rename max_cpus
v1->v2: Fix off by one in UP check (Thomas Gleixner)
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The ENDPROCs() were not used everywhere. Some code used just END() instead,
while other code used nothing. um/sys-i386/checksum.S didn't #include
<linux/linkage.h> . I also got confused because gcc puts the
.type near the ENTRY, while ENDPROC puts it on the opposite end.
Signed off by: John Reiser <jreiser@BitWagon.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
introduce the "asmregparm" calling convention: for functions
implemented in assembly with a fixed regparm input parameters
calling convention.
mark the semaphore and rwsem slowpath functions with that.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Here is a quick and naive smoke test for kprobes. This is intended to
just verify if some unrelated change broke the *probes subsystem. It is
self contained, architecture agnostic and isn't of any great use by itself.
This needs to be built in the kernel and runs a basic set of tests to
verify if kprobes, jprobes and kretprobes run fine on the kernel. In case
of an error, it'll print out a message with a "BUG" prefix.
This is a start; we intend to add more tests to this bucket over time.
Thanks to Jim Keniston and Masami Hiramatsu for comments and suggestions.
Tested on x86 (32/64) and powerpc.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- Special consideration for IA64: Add the ability to specify
arch specific per cpu flags
- remove .data.percpu attribute from DEFINE_PER_CPU for non-smp case.
The arch definitions are all the same. So move them into linux/percpu.h.
We cannot move DECLARE_PER_CPU since some include files just include
asm/percpu.h to avoid include recursion problems.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
# HG changeset patch
# User Jeremy Fitzhardinge <jeremy@xensource.com>
# Date 1199317452 28800
# Node ID f7e7db3facd9406545103164f9be8f9ba1a2b549
# Parent 4d9a413a0f4c1d98dbea704f0366457b5117045d
x86: add _AT() macro to conditionally cast
Define _AT(type, value) to conditionally cast a value when compiling C
code, but not when used in assembler.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This makes ELF core dumps of 32-bit processes include a new
note type NT_386_TLS (0x200) giving the contents of the TLS
slots in struct user_desc format. This lets post mortem
examination figure out what the segment registers mean like
the debugger does with get_thread_area on a live process.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This adds a generic definition of compat_sys_ptrace that calls
compat_arch_ptrace, parallel to sys_ptrace/arch_ptrace. Some
machines needing this already define a function by that name.
The new generic function is defined only on machines that
put #define __ARCH_WANT_COMPAT_SYS_PTRACE into asm/ptrace.h.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This adds a compat_ptrace_request that is the analogue of ptrace_request
for the things that 32-on-64 ptrace implementations can share in common.
So far there are just a couple of requests handled generically.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This defines two new inlines in linux/regset.h, for use in arch_ptrace
implementations and the like. These provide simplified wrappers for using
the user_regset interfaces to copy thread regset data into the caller's
user-space memory. The inlines are trivial, but make the common uses in
places such as ptrace implementation much more concise, easier to read, and
less prone to code-copying errors.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This adds some inlines to linux/regset.h intended for arch code to use in
its user_regset get and set functions. These make it pretty easy to deal
with the interface's optional kernel-space or user-space pointers and its
generalized access to a part of the register data at a time.
In simple cases where the internal data structure matches the exported
layout (core dump format), a get function can be nothing but a call to
user_regset_copyout, and a set function a call to user_regset_copyin.
In other cases the exported layout is usually made up of a few pieces each
stored contiguously in a different internal data structure. These helpers
make it straightforward to write a get or set function by processing each
contiguous chunk of the data in order. The start_pos and end_pos arguments
are always constants, so these inlines collapse to a small amount of code.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The new header <linux/regset.h> defines the types struct user_regset and
struct user_regset_view, with some associated declarations. This new set
of interfaces will become the standard way for arch code to expose
user-mode machine-specific state. A single set of entry points into arch
code can do all the low-level work in one place to fill the needs of core
dumps, ptrace, and any other user-mode debugging facilities that might come
along in the future.
For existing arch code to adapt to the user_regset interfaces, each arch
can work from the code it already has to support core files and ptrace.
The formats you want for user_regset are the core file formats. The only
wrinkle in adapting old ptrace implementation code as user_regset get and
set functions is that these functions can be called on current as well as
on another task_struct that is stopped and switched out as for ptrace.
For some kinds of machine state, you may have to load it directly from CPU
registers or otherwise differently for current than for another thread.
(Your core dump support already handles this in elf_core_copy_regs for
current and elf_core_copy_task_regs for other tasks, so just check there.)
The set function should also be made to work on current in case that
entails some special cases, though this was never required before for
ptrace. Adding this flexibility covers the arch needs to open the door to
more sophisticated new debugging facilities that don't always need to
context-switch to do every little thing.
The copyin/copyout helper functions (in a later patch) relieve the arch
code of most of the cumbersome details of the flexible get/set interfaces.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
.. allowing to remove their declarations from a global include file
(the symbols don't exist for anything but x86).
Likewise for 64-bits' fix_processor_context(), just that that one was
properly declared in an arch-specific header.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The break_lock data structure and code for spinlocks is quite nasty.
Not only does it double the size of a spinlock but it changes locking to
a potentially less optimal trylock.
Put all of that under CONFIG_GENERIC_LOCKBREAK, and introduce a
__raw_spin_is_contended that uses the lock data itself to determine whether
there are waiters on the lock, to be used if CONFIG_GENERIC_LOCKBREAK is
not set.
Rename need_lockbreak to spin_needbreak, make it use spin_is_contended to
decouple it from the spinlock implementation, and make it typesafe (rwlocks
do not have any need_lockbreak sites -- why do they even get bloated up
with that break_lock then?).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This makes ptrace_request handle PTRACE_SINGLEBLOCK along with
PTRACE_CONT et al. The new generic code makes use of the
arch_has_block_step macro and generic entry points on machines
that define them.
[ mingo@elte.hu: bugfix ]
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This defines the new macro arch_has_block_step() in linux/ptrace.h, a
default for when asm/ptrace.h does not define it. This is the analog
of arch_has_single_step() for step-until-branch on machines that have
it. It declares the new user_enable_block_step function, which goes
with the existing user_enable_single_step and user_disable_single_step.
This is not used yet, but paves the way to harmonize on this interface
for the arch-specific calls on all machines.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This defines the new macro arch_has_single_step() in linux/ptrace.h, a
default for when asm/ptrace.h does not define it. It declares the new
user_enable_single_step and user_disable_single_step functions.
This is not used yet, but paves the way to harmonize on this interface
for the arch-specific calls on all machines.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch removes the extern struct resource declarations for
data_resource, code_resource and bss_resource on x86 and declares that
three structures as static as done on other architectures like IA64.
On i386, these structures are moved to setup_32.c (from e820_32.c) because
that's code that is not specific to e820 and also required on EFI systems.
That makes the "extern" reference superfluous.
On x86_64, data_resource, code_resource and bss_resource are passed to
e820_reserve_resources() as arguments just as done on i386 and IA64. That
also avoids the "extern" reference and it's possible to make it static.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Current idle time in kstat is based on jiffies and is coarse grained.
tick_sched.idle_sleeptime is making some attempt to keep track of idle time
in a fine grained manner. But, it is not handling the time spent in
interrupts fully.
Make tick_sched.idle_sleeptime accurate with respect to time spent on
handling interrupts and also add tick_sched.idle_lastupdate, which keeps
track of last time when idle_sleeptime was updated.
This statistics will be crucial for cpufreq-ondemand governor, which can
shed some conservative gaurd band that is uses today while setting the
frequency. The ondemand changes that uses the exact idle time is coming
soon.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The userspace API for the HPET (see Documentation/hpet.txt) did not work. The
HPET_IE_ON ioctl was failing as there was no IRQ assigned to the timer
device. This patch fixes it by allocating IRQs to timer blocks in the HPET.
arch/x86/kernel/hpet.c | 13 +++++--------
drivers/char/hpet.c | 45 ++++++++++++++++++++++++++++++++++++++-------
include/linux/hpet.h | 2 +-
3 files changed, 44 insertions(+), 16 deletions(-)
Signed-off-by: Balaji Rao <balajirrao@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
On x86 the PIT might become an unusable clocksource. Add an unregister
function to provide a possibilty to remove the PIT from the list of
available clock sources.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Clean up hungarian notation from timer code.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Clean up: Follow recommendations of Chapter 5 of Documentation/CodingStyle
and use "u32" instead of "__u32" for types in definitions that are not
shared with user space.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
RPC protocol version numbers are unsigned.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: pass 5 arguments to nlmclnt_init() in a structure similar to the
new nfs_client_initdata structure.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now that each NFS mount point caches its own nlm_host structure, it can be
passed to nlmclnt_proc() for each lock request. By pinning an nlm_host for
each mount point, we trade the overhead of looking up or creating a fresh
nlm_host struct during every NLM procedure call for a little extra memory.
We also restrict the nlmclnt_proc symbol to limit the use of this call to
in-tree modules.
Note that nlm_lookup_host() (just removed from the client's per-request
NLM processing) could also trigger an nlm_host garbage collection. Now
client-side nlm_host garbage collection occurs only during NFS mount
processing. Since the NFS client now holds a reference on these nlm_host
structures, they wouldn't have been affected by garbage collection
anyway.
Given that nlm_lookup_host() reorders the global nlm_host chain after
every successful lookup, and that a garbage collection could be triggered
during the call, we've removed a significant amount of per-NLM-request
CPU processing overhead.
Sidebar: there are only a few remaining references to the internals of
NFS inodes in the client-side NLM code. The only references I found are
related to extracting or comparing the inode's file handle via NFS_FH().
One is in nlmclnt_grant(); the other is in nlmclnt_setlockargs().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cache an appropriate nlm_host structure in the NFS client's mount point
metadata for later use.
Note that there is no need to set NFS_MOUNT_NONLM in the error case -- if
nfs_start_lockd() returns a non-zero value, its callers ensure that the
mount request fails outright.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We would like to remove the per-lock-operation nlm_lookup_host() call from
nlmclnt_proc().
The new architecture pins an nlm_host structure to each NFS client
superblock that has the "lock" mount option set. The NFS client passes
in the pinned nlm_host structure during each call to nlmclnt_proc(). NFS
client unmount processing "puts" the nlm_host so it can be garbage-
collected later.
This patch introduces externally callable NLM functions that handle
mount-time nlm_host set up and tear-down.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: have the set up routines explicitly pass the strings to be used
for the transport name and NETID. This removes a number of conditionals
and dependencies on rpc_xprt.prot, which is overloaded.
Tighten up type checking on the address_strings array while we're at it.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, if you have a server mounted using networking protocol, you
cannot specify a different value using the 'proto=' option on another
mountpoint.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In order to be able to support setting the timeo and retrans parameters on
a per-mountpoint basis, we move the rpc_timeout structure into the
rpc_clnt.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Prepare for managing larger addresses in the NFS client by widening the
nfs_client struct's cl_addr field.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
(Modified to work with the new parameters for nfs_alloc_client)
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The nfs_client's cl_ipaddr field needs to be larger to hold strings that
represent IPv6 addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure that the RPC buffer size specified for NFSv4 SETCLIENTID procedures
matches what we are encoding into the buffer. See the definition of
struct nfs4_setclientid {} and the encode_setclientid() function.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Universal addresses are defined in RFC 1833 and clarified in RFC 3530. We
need to use them in several places in the NFS and RPC clients, so move the
relevant definition and block comment to an appropriate global include
file.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Move the common code for setting up the nfs_write_data and nfs_read_data
structures into fs/nfs/read.c, fs/nfs/write.c and fs/nfs/direct.c.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We want the default scheduling priority (priority == 0) to remain
RPC_PRIORITY_NORMAL.
Also ensure that the priority wait queue scheduling is per process id
instead of sometimes being per thread, and sometimes being per inode.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
xprt_disconnect() should really only be called when the transport shutdown
is completed, and it is time to wake up any pending tasks. Rename it to
xprt_disconnect_done() in order to reflect the semantical change.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add an xprt->state bit to enable the TCP ->state_change() method to signal
whether or not the TCP connection is in the process of closing down.
This will to be used by the reconnection logic in a separate patch.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When scheduling the autoclose RPC call, we want to ensure that we don't
race against the test_bit() call in xprt_clear_locked().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Added an active/deactive mechanism to the nfs_server structure
allowing async operations to hold off umount until the
operations are done.
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch adds auditing support to the NetLabel static labeling mechanism.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
This patch introduces a mechanism for checking when labeled IPsec or SECMARK
are in use by keeping introducing a configuration reference counter for each
subsystem. In the case of labeled IPsec, whenever a labeled SA or SPD entry
is created the labeled IPsec/XFRM reference count is increased and when the
entry is removed it is decreased. In the case of SECMARK, when a SECMARK
target is created the reference count is increased and later decreased when the
target is removed. These reference counters allow SELinux to quickly determine
if either of these subsystems are enabled.
NetLabel already has a similar mechanism which provides the netlbl_enabled()
function.
This patch also renames the selinux_relabel_packet_permission() function to
selinux_secmark_relabel_packet_permission() as the original name and
description were misleading in that they referenced a single packet label which
is not the case.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25: (1470 commits)
[IPV6] ADDRLABEL: Fix double free on label deletion.
[PPP]: Sparse warning fixes.
[IPV4] fib_trie: remove unneeded NULL check
[IPV4] fib_trie: More whitespace cleanup.
[NET_SCHED]: Use nla_policy for attribute validation in ematches
[NET_SCHED]: Use nla_policy for attribute validation in actions
[NET_SCHED]: Use nla_policy for attribute validation in classifiers
[NET_SCHED]: Use nla_policy for attribute validation in packet schedulers
[NET_SCHED]: sch_api: introduce constant for rate table size
[NET_SCHED]: Use typeful attribute parsing helpers
[NET_SCHED]: Use typeful attribute construction helpers
[NET_SCHED]: Use NLA_PUT_STRING for string dumping
[NET_SCHED]: Use nla_nest_start/nla_nest_end
[NET_SCHED]: Propagate nla_parse return value
[NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get
[NET_SCHED]: act_api: use nlmsg_parse
[NET_SCHED]: act_api: fix netlink API conversion bug
[NET_SCHED]: sch_netem: use nla_parse_nested_compat
[NET_SCHED]: sch_atm: fix format string warning
[NETNS]: Add namespace for ICMP replying code.
...
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (79 commits)
Remove references to "make dep"
kconfig: document use of HAVE_*
Introduce new section reference annotations tags: __ref, __refdata, __refconst
kbuild: warn about ld added unique sections
kbuild: add verbose option to Section mismatch reporting in modpost
kconfig: tristate choices with mixed tristate and boolean values
asm-generic/vmlix.lds.h: simplify __mem{init,exit}* dependencies
remove __attribute_used__
kbuild: support ARCH=x86 in buildtar
kconfig: remove "enable"
kbuild: simplified warning report in modpost
kbuild: introduce a few helpers in modpost
kbuild: use simpler section mismatch warnings in modpost
kbuild: link vmlinux.o before kallsyms passes
kbuild: introduce new option to enhance section mismatch analysis
Use separate sections for __dev/__cpu/__mem code/data
compiler.h: introduce __section()
all archs: consolidate init and exit sections in vmlinux.lds.h
kbuild: check section names consistently in modpost
kbuild: introduce blacklisting in modpost
...
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
Module: check to see if we have a built in module with the same name
module: add module taint on ndiswrapper
module: fix the module name length in param_sysfs_builtin
module: make module_address_lookup safe
module: better OOPS and lockdep coverage for loading modules
module: Fix gratuitous sprintf in module.c
module: wait for dependent modules doing init.
module: Don't report discarded init pages as kernel text.
module_address_lookup releases preemption then returns a pointer into
the module space. The only user (kallsyms) copies the result, so just
do that under the preempt disable.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Add the functions ext4_ext_search_left() and ext4_ext_search_right(),
which are used by mballoc during ext4_ext_get_blocks to decided whether
to merge extent information.
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Johann Lombardi <johann@clusterfs.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This function is used by the ext4 multi block allocator patches.
Also add generic_find_next_le_bit
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This patch adds 64-bit inode version support to ext4. The lower 32 bits
are stored in the osd1.linux1.l_i_version field while the high 32 bits
are stored in the i_version_hi field newly created in the ext4_inode.
This field is incremented in case the ext4_inode is large enough. A
i_version mount option has been added to enable the feature.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Jean Noel Cordenner <jean-noel.cordenner@bull.net>
The i_version field of the inode is changed to be a 64-bit counter that
is set on every inode creation and that is incremented every time the
inode data is modified (similarly to the "ctime" time-stamp).
The aim is to fulfill a NFSv4 requirement for rfc3530.
This first part concerns the vfs, it converts the 32-bit i_version in
the generic inode to a 64-bit, a flag is added in the super block in
order to check if the feature is enabled and the i_version is
incremented in the vfs.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Jean Noel Cordenner <jean-noel.cordenner@bull.net>
Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>
The journal checksum feature adds two new flags i.e
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM.
JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the
checksum for the blocks described by the descriptor blocks.
Due to checksums, writing of the commit record no longer needs to be
synchronous. Now commit record can be sent to disk without waiting for
descriptor blocks to be written to disk. This behavior is controlled
using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be
able to recover the journal with _ASYNC_COMMIT hence it is made
incompat.
The commit header has been extended to hold the checksum along with the
type of the checksum.
For recovery in pass scan checksums are verified to ensure the sanity
and completeness(in case of _ASYNC_COMMIT) of every transaction.
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Girish Shilamkar <girish@clusterfs.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
The patch below updates the jbd stats patch to 2.6.20/jbd2.
The initial patch was posted by Alex Tomas in December 2005
(http://marc.info/?l=linux-ext4&m=113538565128617&w=2).
It provides statistics via procfs such as transaction lifetime and size.
Sometimes, investigating performance problems, i find useful to have
stats from jbd about transaction's lifetime, size, etc. here is a
patch for review and inclusion probably.
for example, stats after creation of 3M files in htree directory:
[root@bob ~]# cat /proc/fs/jbd/sda/history
R/C tid wait run lock flush log hndls block inlog ctime write drop close
R 261 8260 2720 0 0 750 9892 8170 8187
C 259 750 0 4885 1
R 262 20 2200 10 0 770 9836 8170 8187
R 263 30 2200 10 0 3070 9812 8170 8187
R 264 0 5000 10 0 1340 0 0 0
C 261 8240 3212 4957 0
R 265 8260 1470 0 0 4640 9854 8170 8187
R 266 0 5000 10 0 1460 0 0 0
C 262 8210 2989 4868 0
R 267 8230 1490 10 0 4440 9875 8171 8188
R 268 0 5000 10 0 1260 0 0 0
C 263 7710 2937 4908 0
R 269 7730 1470 10 0 3330 9841 8170 8187
R 270 0 5000 10 0 830 0 0 0
C 265 8140 3234 4898 0
C 267 720 0 4849 1
R 271 8630 2740 20 0 740 9819 8170 8187
C 269 800 0 4214 1
R 272 40 2170 10 0 830 9716 8170 8187
R 273 40 2280 0 0 3530 9799 8170 8187
R 274 0 5000 10 0 990 0 0 0
where,
R - line for transaction's life from T_RUNNING to T_FINISHED
C - line for transaction's checkpointing
tid - transaction's id
wait - for how long we were waiting for new transaction to start
(the longest period journal_start() took in this transaction)
run - real transaction's lifetime (from T_RUNNING to T_LOCKED
lock - how long we were waiting for all handles to close
(time the transaction was in T_LOCKED)
flush - how long it took to flush all data (data=ordered)
log - how long it took to write the transaction to the log
hndls - how many handles got to the transaction
block - how many blocks got to the transaction
inlog - how many blocks are written to the log (block + descriptors)
ctime - how long it took to checkpoint the transaction
write - how many blocks have been written during checkpointing
drop - how many blocks have been dropped during checkpointing
close - how many running transactions have been closed to checkpoint this one
all times are in msec.
[root@bob ~]# cat /proc/fs/jbd/sda/info
280 transaction, each upto 8192 blocks
average:
1633ms waiting for transaction
3616ms running transaction
5ms transaction was being locked
1ms flushing data (in ordered mode)
1799ms logging transaction
11781 handles per transaction
5629 blocks per transaction
5641 logged blocks per transaction
Signed-off-by: Johann Lombardi <johann.lombardi@bull.net>
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
We are currently taking the truncate_mutex for every read. This would have
performance impact on large CPU configuration. Convert the lock to read write
semaphore and take read lock when we are trying to read the file.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
When doing a migrate from ext3 to ext4 inode we need to make sure the test
for inode type and walking inode data happens inside lock. To make this
happen move truncate_mutex early before checking the i_flags.
This actually should enable us to remove the verify_chain().
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Before we start committing a transaction, we call
__journal_clean_checkpoint_list() to cleanup transaction's written-back
buffers.
If this call happens to remove all of them (and there were already some
buffers), __journal_remove_checkpoint() will decide to free the transaction
because it isn't (yet) a committing transaction and soon we fail some
assertion - the transaction really isn't ready to be freed :).
We change the check in __journal_remove_checkpoint() to free only a
transaction in T_FINISHED state. The locking there is subtle though (as
everywhere in JBD ;(). We use j_list_lock to protect the check and a
subsequent call to __journal_drop_transaction() and do the same in the end
of journal_commit_transaction() which is the only place where a transaction
can get to T_FINISHED state.
Probably I'm too paranoid here and such locking is not really necessary -
checkpoint lists are processed only from log_do_checkpoint() where a
transaction must be already committed to be processed or from
__journal_clean_checkpoint_list() where kjournald itself calls it and thus
transaction cannot change state either. Better be safe if something
changes in future...
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add buffer head related helper function bh_uptodate_or_lock and
bh_submit_read which can be used by file system
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
This patch extends bg_itable_unused of ext4 group descriptor
from 16bit into 32bit. In order to add bg_itable_unused_hi into
struct ext4_group_desc, some extra fields which are already introduced into
e2fsprogs are also added in for consistency.
Signed-off-by: Coly Li <coyli@suse.de>
Cc: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Calculate & store the max offset for bitmapped files, and
catch too-large seeks, truncates, and writes in ext4, shortening
or rejecting as appropriate.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
This patch converts ext4_inode i_blocks to represent total
blocks occupied by the inode in file system block size.
Earlier the variable used to represent this in 512 byte
block size. This actually limited the total size of the file.
The feature is enabled transparently when we write an inode
whose i_blocks cannot be represnted as 512 byte units in a
48 bit variable.
inode flag EXT4_HUGE_FILE_FL
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Use the __le16 l_i_reserved1 field of the linux2 struct of ext4_inode
to represet the higher 16 bits for i_blocks. With this change max_file
size becomes (2**48 -1 )* 512 bytes.
We add a RO_COMPAT feature to the super block to indicate that inode
have i_blocks represented as a split 48 bits. Super block with this
feature set cannot be mounted read write on a kernel with CONFIG_LSF
disabled.
Super block flag EXT4_FEATURE_RO_COMPAT_HUGE_FILE
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Rename ext4_inode.i_dir_acl to i_size_high
drop ext4_inode_info.i_dir_acl as it is not used
Rename ext4_inode.i_size to ext4_inode.i_size_lo
Add helper function for accessing the ext4_inode combined i_size.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Rename i_file_acl to i_file_acl_lo. This helps
in finding bugs where we use i_file_acl instead
of the combined i_file_acl_lo and i_file_acl_high
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
In many places variables for block group are of type int, which limits the
maximum number of block groups to 2^31. Each block group can have up to
2^15 blocks, with a 4K block size, and the max filesystem size is limited to
2^31 * (2^15 * 2^12) = 2^58 -- or 256 PB
This patch introduces a new type ext4_group_t, of type unsigned long, to
represent block group numbers in ext4.
All occurrences of block group variables are converted to type ext4_group_t.
Signed-off-by: Avantika Mathur <mathur@us.ibm.com>
This patch adds a new data type ext4_lblk_t to represent
the logical file blocks.
This is the preparatory patch to support large files in ext4
The follow up patch with convert the ext4_inode i_blocks to
represent the number of blocks in file system block size. This
changes makes it possible to have a block number 2**32 -1 which
will result in overflow if the block number is represented by
signed long. This patch convert all the block number to type
ext4_lblk_t which is typedef to __u32
Also remove dead code ext4_ext_walk_space
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
With 64KB blocksize, a directory entry can have size 64KB which does not fit
into 16 bits we have for entry lenght. So we store 0xffff instead and convert
value when read from / written to disk. The patch also converts some places
to use ext4_next_entry() when we are changing them anyway.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
This patch set supports large block size(>4k, <=64k) in ext4,
just enlarging the block size limit. But it is NOT possible to have 64kB
blocksize on ext4 without some changes to the directory handling
code. The reason is that an empty 64kB directory block would have a
rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in
the filesystem. The proposed solution is treat 64k rec_len
with a an impossible value like rec_len = 0xffff to handle this.
The Patch-set consists of the following 2 patches.
[1/2] ext4: enlarge blocksize
- Allow blocksize up to pagesize
[2/2] ext4: fix rec_len overflow
- prevent rec_len from overflow with 64KB blocksize
Now on 64k page ppc64 box runs with this patch set we could create a 64k
block size ext4dev, and able to handle empty directory block.
Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
in_dev_find() need a namespace to pass it to fib_get_table(), so add
an argument.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes ieee80211_bar control and start_seq_num to
match the proper bitwise attribute expected from ieee 802.11 frame
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds a mac80211 based wireless driver for the rtl8180 and
rtl8185 PCI wireless cards. Also included are some rtl8187 changes
required due to the relationship between that driver and this one.
Michael Wu is primarily responsible for the initial driver and rtl8185
support. Andreas Merello provided the additional rtl8180 support.
Thanks to Jukka Ruohonen for the donating a rtl8185 card! It was very
helpful for the rtl8225z2 code.
The Signed-off-by information below is collected from the individual
patches submitted to wireless-2.6 before merging this driver upstream.
Signed-off-by: Andrea Merello <andreamrl@tiscali.it>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds the 'ssb_pcihost_set_power_state' function.
This function allows us to set the power state of a PCI device
(for example b44 ethernet device).
Signed-off-by: Miguel Botón <mboton@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes lowlevel register access for PCMCIA based devices.
The patch also adds a temporary workaround for the device mac address.
It simply adds generation of a random address. The real SPROM extraction
will follow in another patch.
The temporary workaround will be removed then, but for now it's OK.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes extraction of some values from the SPROM.
It mainly fixes extraction of antenna related values, which
is needed for another b43 fix sent later.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This short patch modifies the IPv4 networking to enable use of the
240.0.0.0/4 (aka "class-E") address space as propsed in the internet
draft draft-fuller-240space-00.txt.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keep track of the number of VLAN devices in a vlan group. This allows
to have the caller sense when the group is going to be destroyed and
stop using it, which in turn allows to remove the wrapper around
unregister_vlan_dev for the NETDEV_UNREGISTER notifier and avoid
iterating over all possible VLAN ids whenever a device in unregistered.
Also fix what looks like a use-after-free (but is actually safe since
we're holding the RTNL), the real_dev reference should not be dropped
while we still use it.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The print_mac function is not very suitable for debugging printks
in performance critical paths since without ifdefs it will always
get called. MAC_FMT can be used with pr_debug without any overhead
when debugging is disabled.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only user already includes __FUNCTION__ (vlan_proto_init) in the
output, which is enough to identify what the message is about.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix 3 space indentation and some overly long lines by moving the
comments to a kdoc structure description.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
- struct packet_type is not used
- struct vlan_group is declared later in the file before the first use
- struct net_device is not needed since netdevice.h is included
- struct vlan_collection does not exist
- struct vlan_dev_info is declared later in the file before the first use
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create a specific helper for netlink kernel socket disposal. This just
let the code look better and provides a ground for proper disposal
inside a namespace.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The old, now unused, data structures and SPROM extraction routines
are removed.
Signed-off-by: Larry Finger<Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In disagreement with the SPROM specs, revision 3 devices appear to have
moved the MAC address.
Change ssb to handle the revision 4 SPROM, which is a different size.
This change in size is handled by adding a new variable to the ssb_sprom
struct and using it whenever possible. For those routines that do not
have access to this structure, a 'u16 size' argument is added.
The new PCI_ID for the BCM4328 is also added.
Testing of the Revision 4 SPROM, which is used on the BCM4328, was done
by Michael Gerdau <mgerdau@tiscali.de>.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The SPROM's for various devices utilizing the Sonics Silicon Backplane come
with various revisions. The Revision 2 SPROM inherited the data layout of 1, and
Revision 3 inherited the layout of 2. The first instance of Revision 4 has
now been found in a BCM4328 wireless LAN card. This device does not inherit any
layout from previous versions. Although it was possible to create a data
structure that kept all the old layouts, we decided to start fresh, keep only
those SPROM variables that are used by the drivers that utilize ssb, and to
do the conversion in such a manner that neither compilation or execution will
be affected if a bisection lands in the middle of these changes, while keeping
the patches as small as possible.
In this patch, the sprom structures are changed while maintaining the old ones.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
- whitespaces vs tabs
- use 80 cols
- use if_mii
- use netdev_priv
- remove useless cast to void *
- PCI device id does not need to be globally available
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Spotted by Pablo Neira Ayuso <pablo@netfilter.org>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves ipt_iprange to xt_iprange, in preparation for adding
IPv6 support to xt_iprange.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduces the xt_mark match revision 1. It uses fixed types,
eventually obsoleting revision 0 some day (uses nonfixed types).
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduces the xt_conntrack match revision 1. It uses fixed types, the
new nf_inet_addr and comes with IPv6 support, thereby completely
superseding xt_state.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend union nf_inet_addr with struct in_addr and in6_addr. Useful
because a lot of in-kernel IPv4 and IPv6 functions use
in_addr/in6_addr.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduces the xt_connmark match revision 1. It uses fixed types,
eventually obsoleting revision 0 some day (uses nonfixed types).
(Unfixed types like "unsigned long" do not play well with mixed
user-/kernelspace "bitness", e.g. 32/64, as is common on SPARC64,
and need extra compat code.)
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduces the xt_MARK target revision 2. It uses fixed types, and
also uses the more expressive XOR logic.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduces the xt_CONNMARK target revision 1. It uses fixed types, and
also uses the more expressive XOR logic. Futhermore, it allows to
selectively pick bits from both the ctmark and the nfmark in the SAVE
and RESTORE operations.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
arp_ignore has two arguments: dev & in_dev. dev is used for
inet_confirm_addr calling only.
inet_confirm_addr, in turn, either gets in_dev from the device passed
or iterates over all network devices if the device passed is NULL. It
seems logical to directly pass in_dev into inet_confirm_addr.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
alg_key_len is currently defined as 'signed int'. This unfortunatly
leads to integer divides in several paths.
Converting it to unsigned is safe and saves 208 bytes of text on i386.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cat /proc/net/atm/arp causes the NULL pointer dereference in the
get_proc_net+0xc/0x3a. This happens as proc_get_net believes that the
parent proc dir entry contains struct net.
Fix this assumption for "net/atm" case.
The problem is introduced by the commit c0097b07abf5f92ab135d024dd41bd2aada1512f
from Eric W. Biederman/Daniel Lezcano.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The conntracks subsystem has a similar infrastructure
to maintain ctl_paths, but since we already have it
on the generic level, I think it's OK to switch to
using it.
So, basically, this patch just replaces the ctl_table-s
with ctl_path-s, nf_register_sysctl_table with
register_sysctl_paths() and removes no longer needed code.
After this the net/netfilter/nf_sysctl.c file contains
the paths only.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can avoid divides (as seen with CONFIG_CC_OPTIMIZE_FOR_SIZE=y on
x86) changing vlan_group_get_device()/vlan_group_set_device() id
parameter from signed to unsigned.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the basic needed abilities and functions for A-MPDU Rx session
changed functions:
- ieee80211_sta_process_addba_request - Rx A-MPDU initialization enabled
- ieee80211_stop - stops all A-MPDU Rx in case interface goes down
added functions:
- ieee80211_send_delba - used for sending out Del BA in A-MPDU sessions
- ieee80211_sta_stop_rx_BA_session - stopping Rx A-MPDU session
- sta_rx_agg_session_timer_expired - stops A-MPDU Rx use if load is too
low
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- 'cb' is a fake struct member. In a previous patch struct cn_callback
was renamed to cn_callback_id, so 'cb' should have been deleted at that
time.
- 'nls' isn't used and is redundant, we can retrieve this data through
cn_callback_entry.pdev->nls.
- 'seq' and 'group' should be u32, as they are declared to be u32 in
other places.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Struct member netlink_groups is never used, and I don't see how it can
be useful.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before pushing pcounter to Linus tree, I would like to make some adjustments.
Goal is to reduce kernel text size, by unlining too big functions.
When a pcounter is bound to a statically defined per_cpu variable,
we define two small helpers functions. (No more folding function
using the fat for_each_possible_cpu(cpu) ... )
static DEFINE_PER_CPU(int, NAME##_pcounter_values);
static void NAME##_pcounter_add(struct pcounter *self, int val)
{
__get_cpu_var(NAME##_pcounter_values) += val;
}
static int NAME##_pcounter_getval(const struct pcounter *self, int cpu)
{
return per_cpu(NAME##_pcounter_values, cpu);
}
Fast path is therefore unchanged, while folding/alloc/free is now unlined.
This saves 228 bytes on i386
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch contains the scheduled removal of the shaper driver.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
print_mac() used many most net drivers and format_addr() used by
net-sysfs.c are very similar and they can be intergrated.
format_addr() is also identically redefined in the qla4xxx iscsi
driver.
Export a new function sysfs_format_mac() to be used by net-sysfs,
qla4xxx and others in the future. Both print_mac() and
sysfs_format_mac() call _format_mac_addr() to do the formatting.
Changed print_mac() to use unsigned char * to be consistent with
net_device struct's dev_addr. Added buffer length overrun checking
as suggested by Joe Perches.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After a station is added to the kernel's structures, userspace
has to be able to retrieve statistics about that station, especially
whether the station was idle and how much bytes were transferred
to and from it. This adds the necessary code to nl80211.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds station handling to cfg80211/nl80211.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the necessary API to cfg80211/nl80211 to allow
changing beaconing settings.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This introduces key handling to cfg80211/nl80211. Default
and group keys can be added, changed and removed; sequence
counters for each key can be retrieved.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This statistics is shown factor dropped by transformation
at /proc/net/xfrm_stat for developer.
It is a counter designed from current transformation source code
and defined as linux private MIB.
See Documentation/networking/xfrm_proc.txt for the detail.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
A few netfilter modules provide their own union of IPv4 and IPv6
address storage. Will unify that in this patch series.
(1/4): Rename union nf_conntrack_address to union nf_inet_addr and
move it to x_tables.h.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to use rcu_assign_pointer/rcu_dereference to avoid races.
Also remove an obsolete CONFIG_IP_NAT_NEEDED ifdef.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to Maciej Soltysiak's ipt_LOG patch, include GID in addition
to UID in netlink message.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for James Morris' connsecmark.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The combination of NAT and helpers may produce TCP sequence adjustments.
In failover setups, this information needs to be replicated in order to
achieve a successful recovery of mangled, related connections. This patch is
particularly useful for conntrackd, see:
http://people.netfilter.org/pablo/conntrack-tools/
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use compat types and compat iterators when dealing with compat entries for
clarity. This doesn't actually make a difference for ip_tables, but is
needed for ip6_tables and arp_tables.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make xt_compat_match_from_user return an int to make it usable in the
*tables iterator macros and kill a now unnecessary wrapper function.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce scan capabilities to WEXT so that userspace can do intelligent
things with scan behavior such as handling hidden SSIDs more gracefully.
If the driver reports a specific scan capability, the driver must
respect the options specified in the iw_scan_req structure when handling
the SIOCSIWSCAN call, unless it's mode or state does not allow it to do
so, in which case it must return an error.
This version switches to Dave Kilroy's suggestion of claiming unused
padding space for the scan_capa field.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change IPV4 specific macros LOOPBACK MULTICAST LOCAL_MCAST BADCLASS
and ZERONET macros to inline functions ipv4_is_<type>(__be32 addr)
Adds type safety and arguably some readability.
Changes since last submission:
Removed ipv4_addr_octets function
Used hex constants
Converted recently added rfc3330 macros
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These are scattered over the code, but almost all the
"critical" places already have the proper struct net
at hand except for snmp proc showing function and routing
rtnl handler.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In DCCP, timestamps can occur on packets anytime, CCID3 uses a timestamp(/echo) on the Request/Response
exchange. This patch addresses the following situation:
* timestamps are recorded on the listening socket;
* Responses are sent from dccp_request_sockets;
* suppose two connections reach the listening socket with very small time in between:
* the first timestamp value gets overwritten by the second connection request.
This is not really good, so this patch separates timestamps into
* those which are received by the server during the initial handshake (on dccp_request_sock);
* those which are received by the client or the client after connection establishment.
As before, a timestamp of 0 is regarded as indicating that no (meaningful) timestamp has been
received (in addition, a warning message is printed if hosts send 0-valued timestamps).
The timestamp-echoing now works as follows:
* when a timestamp is present on the initial Request, it is placed into dreq, due to the
call to dccp_parse_options in dccp_v{4,6}_conn_request;
* when a timestamp is present on the Ack leading from RESPOND => OPEN, it is copied over
from the request_sock into the child cocket in dccp_create_openreq_child;
* timestamps received on an (established) dccp_sock are treated as before.
Since Elapsed Time is measured in hundredths of milliseconds (13.2), the new dccp_timestamp()
function is used, as it is expected that the time between receiving the timestamp and
sending the timestamp echo will be very small against the wrap-around time. As a byproduct,
this allows smaller timestamping-time fields.
Furthermore, inserting the Timestamp Echo option has been taken out of the block starting with
'!dccp_packet_without_ack()', since Timestamp Echo can be carried on any packet (5.8 and 13.3).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The option parsing code currently only parses on full sk's. This causes a problem for
options sent during the initial handshake (in particular timestamps and feature-negotiation
options). Therefore, this patch extends the option parsing code with an additional argument
for request_socks: if it is non-NULL, options are parsed on the request socket, otherwise
the normal path (parsing on the sk) is used.
Subsequent patches, which implement feature negotiation during connection setup, make use
of this facility.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a socket option and signalling support for the case where the server
holds timewait state on closing the connection, as described in RFC 4340, 8.3.
Since holding timewait state at the server is the non-usual case, it is enabled
via a socket option. Documentation for this socket option has been added.
The setsockopt statement has been made resilient against different possible cases
of expressing boolean `true' values using a suggestion by Ian McDonald.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload. This patch implements this
for ICMP traffic that originates from or terminates on localhost.
This is activated on outbound with the new policy flag XFRM_POLICY_ICMP,
and on inbound by the new state flag XFRM_STATE_ICMP.
On inbound the policy check is now performed by the ICMP protocol so
that it can repeat the policy check where necessary.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload. This patch adds the functions
xfrm_decode_session_reverse and xfrmX_policy_check_reverse so we can get
the reverse flow to perform such a lookup.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is essentially IN_DEV_ANDCONF with proper arguments.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just move the variable on the struct net and adjust
its usage.
Others sysctls from sys.net.core table are more
difficult to virtualize (i.e. make them per-namespace),
but I'll look at them as well a bit later.
Signed-off-by: Pavel Emelyanov <xemul@oenvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous move of the the UDP inDatagrams counter caused each
peek of the same packet to be counted separately. This may be
undesirable.
This patch fixes this by adding a bit to sk_buff to record whether
this packet has already been seen through skb_recv_datagram. We
then only increment the counter when the packet is seen for the
first time.
The only dodgy part is the fact that skb_recv_datagram doesn't have
a good way of returning this new bit of information. So I've added
a new function __skb_recv_datagram that does return this and made
skb_recv_datagram a wrapper around it.
The plan is to eventually replace all uses of skb_recv_datagram with
this new function at which time it can be renamed its proper name.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently it is possible for two processes to peek on the same socket
and end up incrementing the error counter twice for the same packet.
This patch fixes it by making skb_kill_datagram return whether it
succeeded in unlinking the packet and only incrementing the counter
if it did.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
AFAIS these two entries should do the same thing - change the
forwarding state on ipv4_devconf and on all the devices.
I propose to merge the handlers together using ctl paths.
The inet_forward_change() is static after this and I move
it higher to be closer to other "propagation" helpers and
to avoid diff making patches based on { and } matching :)
i.e. - make them easier to read.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
I have removed all the entries from this table (core_table,
ipv4_table and tr_table), so now we can safely drop it.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The same thing for token-ring - use ctl paths and get
rid of external references on the tr_table.
Unfortunately, I couldn't split this patch into cleanup and
use-the-paths parts.
As a lame excuse I can say, that the cleanup is just moving
the tr_table from one file to another - closet to a single
variable, that this ctl table tunes. Since the source file
becomes empty after the move, I remove it.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move common fields for queue management to struct nf_info and rename it
to struct nf_queue_entry. The avoids one allocation/free per packet and
simplifies the code a bit.
Alternatively we could add some private room at the tail, but since
all current users use identical structs this seems easier.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the data pointer from struct nf_queue_handler. It has never been used
and is useless for the only handler that really matters, nfnetlink_queue,
since the handler is shared between all instances.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
nf_conntrack_h323 needs ip6_route_output for the call forwarding filter.
Add a ->route function to nf_afinfo and use that to avoid pulling in the
ipv6 module.
Fix the #ifdef for the IPv6 code while I'm at it - the IPv6 support is
only needed when IPv6 conntrack is enabled.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add rate estimator match. The rate estimator match can match on
estimated rates by the RATEEST target. It supports matching on
absolute bps/pps values, comparing two rate estimators and matching
on the difference between two rate estimators.
This is what I use to route outgoing data connections from a FTP
server over two lines based on the available bandwidth:
# estimate outgoing rates
iptables -t mangle -A POSTROUTING -o eth0 -j RATEEST --rateest-name eth0 \
--rateest-interval 250ms \
--rateest-ewma 0.5s
iptables -t mangle -A POSTROUTING -o ppp0 -j RATEEST --rateest-name ppp0 \
--rateest-interval 250ms \
--rateest-ewma 0.5s
# mark based on available bandwidth
iptables -t mangle -A BALANCE -m state --state NEW \
-m helper --helper ftp \
-m rateest --rateest-delta \
--rateest1 eth0 \
--rateest-bps1 2.5mbit \
--rateest-gt \
--rateest2 ppp0 \
--rateest-bps2 2mbit \
-j CONNMARK --set-mark 0x1
iptables -t mangle -A BALANCE -m state --state NEW \
-m helper --helper ftp \
-m rateest --rateest-delta \
--rateest1 ppp0 \
--rateest-bps1 2mbit \
--rateest-gt \
--rateest2 eth0 \
--rateest-bps2 2.5mbit \
-j CONNMARK --set-mark 0x2
iptables -t mangle -A BALANCE -j CONNMARK --restore-mark
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new rate estimator target (using gen_estimator). In combination with
the rateest match (next patch) this can be used for load-based multipath
routing.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extends the xt_DSCP target by xt_TOS v1 to add support for selectively
setting and flipping any bit in the IPv4 TOS and IPv6 Priority fields.
(ipt_TOS and xt_DSCP only accepted a limited range of possible
values.)
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extends the xt_dscp match by xt_tos v1 to add support for selectively
matching any bit in the IPv4 TOS and IPv6 Priority fields. (ipt_tos
and xt_dscp only accepted a limited range of possible values.)
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Addrtype match has a new revision (1), which lets address type checking
limited to the interface the current packet belongs to. Either incoming
or outgoing interface can be used depending on the current hook. In the
FORWARD hook two maches should be used if both interfaces have to be checked.
The new structure is ipt_addrtype_info_v1.
Revision 0 lets older userspace programs use the match as earlier.
ipt_addrtype_info is used.
Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
xt_owner merges ipt_owner and ip6t_owner, and adds a flag to match
on socket (non-)existence.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of using a big array of NR_CPUS entries, we can compute the size
needed at runtime, using nr_cpu_ids
This should save some ram (especially on David's machines where NR_CPUS=4096 :
32 KB can be saved per table, and 64KB for dynamically allocated ones (because
of slab/slub alignements) )
In particular, the 'bootstrap' tables are not any more static (in data
section) but on stack as their size is now very small.
This also should reduce the size used on stack in compat functions
(get_info() declares an automatic variable, that could be bigger than kernel
stack size for big NR_CPUS)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sven Schnelle <svens@bitebene.org>
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the basic infrastructure for per namespace sysctls.
A list of lists of sysctl headers is added, allowing each namespace to have
it's own list of sysctl headers.
Each list of sysctl headers has a lookup function to find the first
sysctl header in the list, allowing the lists to have a per namespace
instance.
register_sysct_root is added to tell sysctl.c about additional
lists of sysctl_headers. As all of the users are expected to be in
kernel no unregister function is provided.
sysctl_head_next is updated to walk through the list of lists.
__register_sysctl_paths is added to add a new sysctl table on
a non-default sysctl list.
The only intrusive part of this patch is propagating the information
to decided which list of sysctls to use for sysctl_check_table.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
By doing this we allow users of register_sysctl_paths that build
and dynamically allocate their ctl_table to be simpler. This allows
them to just remember the ctl_table_header returned from
register_sysctl_paths from which they can now find the
ctl_table array they need to free.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are a number of modules that register a sysctl table
somewhere deeply nested in the sysctl hierarchy, such as
fs/nfs, fs/xfs, dev/cdrom, etc.
They all specify several dummy ctl_tables for the path name.
This patch implements register_sysctl_path that takes
an additional path name, and makes up dummy sysctl nodes
for each component.
This patch was originally written by Olaf Kirch and
brought to my attention and reworked some by Olaf Hering.
I have changed a few additional things so the bugs are mine.
After converting all of the easy callers Olaf Hering observed
allyesconfig ARCH=i386, the patch reduces the final binary size by 9369 bytes.
.text +897
.data -7008
text data bss dec hex filename
26959310 4045899 4718592 35723801 2211a19 ../vmlinux-vanilla
26960207 4038891 4718592 35717690 221023a ../O-allyesconfig/vmlinux
So this change is both a space savings and a code simplification.
CC: Olaf Kirch <okir@suse.de>
CC: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kill the defines again, convert to the new checksum helper names and
remove the dependency of NET_ACT_NAT on NETFILTER.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows to get rid of the CONFIG_NETFILTER dependency of NET_ACT_NAT.
This patch redefines the old names to keep the noise low, the next patch
converts all users.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the necessary state transitions for the two forms of passive-close
* PASSIVE_CLOSE - which is entered when a host receives a Close;
* PASSIVE_CLOSEREQ - which is entered when a client receives a CloseReq.
Here is a detailed account of what the patch does in each state.
1) Receiving CloseReq
The pseudo-code in 8.5 says:
Step 13: Process CloseReq
If P.type == CloseReq and S.state < CLOSEREQ,
Generate Close
S.state := CLOSING
Set CLOSING timer.
This means we need to address what to do in CLOSED, LISTEN, REQUEST, RESPOND, PARTOPEN, and OPEN.
* CLOSED: silently ignore - it may be a late or duplicate CloseReq;
* LISTEN/RESPOND: will not appear, since Step 7 is performed first (we know we are the client);
* REQUEST: perform Step 13 directly (no need to enqueue packet);
* OPEN/PARTOPEN: enter PASSIVE_CLOSEREQ so that the application has a chance to process unread data.
When already in PASSIVE_CLOSEREQ, no second CloseReq is enqueued. In any other state, the CloseReq is ignored.
I think that this offers some robustness against rare and pathological cases: e.g. a simultaneous close where
the client sends a Close and the server a CloseReq. The client will then be retransmitting its Close until it
gets the Reset, so ignoring the CloseReq while in state CLOSING is sane.
2) Receiving Close
The code below from 8.5 is unconditional.
Step 14: Process Close
If P.type == Close,
Generate Reset(Closed)
Tear down connection
Drop packet and return
Thus we need to consider all states:
* CLOSED: silently ignore, since this can happen when a retransmitted or late Close arrives;
* LISTEN: dccp_rcv_state_process() will generate a Reset ("No Connection");
* REQUEST: perform Step 14 directly (no need to enqueue packet);
* RESPOND: dccp_check_req() will generate a Reset ("Packet Error") -- left it at that;
* OPEN/PARTOPEN: enter PASSIVE_CLOSE so that application has a chance to process unread data;
* CLOSEREQ: server performed active-close -- perform Step 14;
* CLOSING: simultaneous-close: use a tie-breaker to avoid message ping-pong (see comment);
* PASSIVE_CLOSEREQ: ignore - the peer has a bug (sending first a CloseReq and now a Close);
* TIMEWAIT: packet is ignored.
Note that the condition of receiving a packet in state CLOSED here is different from the condition "there
is no socket for such a connection": the socket still exists, but its state indicates it is unusable.
Last, dccp_finish_passive_close sets either DCCP_CLOSED or DCCP_CLOSING = TCP_CLOSING, so that
sk_stream_wait_close() will wait for the final Reset (which will trigger CLOSING => CLOSED).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds two auxiliary states to deal with passive closes:
* PASSIVE_CLOSE (reached from OPEN via reception of Close) and
* PASSIVE_CLOSEREQ (reached from OPEN via reception of CloseReq)
as internal intermediate states.
These states are used to allow a receiver to process unread data before
acknowledging the received connection-termination-request (the Close/CloseReq).
Without such support, it will happen that passively-closed sockets enter CLOSED
state while there is still unprocessed data in the queue; leading to unexpected
and erratic API behaviour.
PASSIVE_CLOSE has been mapped into TCPF_CLOSE_WAIT, so that the code will
seamlessly work with inet_accept() (which tests for this state).
The state names are thanks to Arnaldo, who suggested this naming scheme
following an earlier revision of this patch.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch includes support for the Intra-Site Automatic Tunnel
Addressing Protocol (ISATAP) per RFC4214. It uses the SIT
module, and is configured using extensions to the "iproute2"
utility. The diffs are specific to the Linux 2.6.24-rc2 kernel
distribution.
This version includes the diff for ./include/linux/if.h which was
missing in the v2.4 submission and is needed to make the
patch compile. The patch has been installed, compiled and
tested in a clean 2.6.24-rc2 kernel build area.
Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sock_wake_async() performs a bit different actions
depending on "how" argument. Unfortunately this argument
ony has numerical magic values.
I propose to give names to their constants to help people
reading this function callers understand what's going on
without looking into this function all the time.
I suppose this is 2.6.25 material, but if it's not (or the
naming seems poor/bad/awful), I can rework it against the
current net-2.6 tree.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
I keep getting this build error and couldn't find anyone fixing
it in archives. ...Maybe all net developers except me build
just SMP kernels :-).
In file included from include/net/sock.h:50,
from ipc/mqueue.c:35:
include/linux/pcounter.h: In function 'pcounter_add':
include/linux/pcounter.h:87: error: 'struct pcounter' has no
member named 'value'
make[1]: *** [ipc/mqueue.o] Error 1
make: *** [ipc] Error 2
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This decouples PARTOPEN from TCP-specific stream-states.
It thus addresses the FIXME.
The code has been checked with regard to dependency on PARTOPEN and FIN_WAIT1
states (to which PARTOPEN previously was mapped): there is no difference, as
PARTOPEN is always referred to directly (i.e. not via the mapping to TCP
state).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This just generalises what was introduced by Eric Dumazet for the struct proto
inuse field in 286ab3d460:
[NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA way.
Please look at the comment in there to see the rationale.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds several structs and definitions to ieee80211.h
to support 802.11n draft specifications.
As 802.11n depends on and extends the 802.11e standard in several issues,
there are also several definitions that belong to 802.11e.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After this patch none of the netlink callback support anything
except the initial network namespace but the rtnetlink infrastructure
now handles multiple network namespaces.
Changes from v2:
- IPv6 addrlabel processing
Changes from v1:
- no need for special rtnl_unlock handling
- fixed IPv6 ndisc
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds IEEE80211_MAX_FRAME_LEN which is useful for drivers trying
to determine how much to allocate for their RX buffers.
It also updates the comment on IEEE80211_MAX_DATA_LEN based on revisions
in 802.11e.
IEEE80211_MAX_FRAG_THRESHOLD and IEEE80211_MAX_RTS_THRESHOLD are also
revised due to the new maximum frame size.
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rx_flags variable is redundant. Turning rx on/off is done
via setting the rx_np pointer.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The local_mac is managed by the network device, no need to keep a
spare copy and all the management problems that could cause.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the missing Kbuild entries and the missing Kbuild file
in include/linux/can for the CAN subsystem.
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the use of plain integers instead of __u32 in a struct
that is visible from kernel space and user space.
Thanks to Sam Ravnborg for pointing out the wrong plain int usage.
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the CAN broadcast manager (bcm) protocol.
Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the CAN raw protocol.
Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the CAN core functionality but no protocols or drivers.
No protocol implementations are included here. They come as separate
patches. Protocol numbers are already in include/linux/can.h.
Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a protocol/address family number, ARP hardware type,
ethernet packet type, and a line discipline number for the SocketCAN
implementation.
Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Key points of this patch are:
- In case new SACK information is advance only type, no skb
processing below previously discovered highest point is done
- Optimize cases below highest point too since there's no need
to always go up to highest point (which is very likely still
present in that SACK), this is not entirely true though
because I'm dropping the fastpath_skb_hint which could
previously optimize those cases even better. Whether that's
significant, I'm not too sure.
Currently it will provide skipping by walking. Combined with
RB-tree, all skipping would become fast too regardless of window
size (can be done incrementally later).
Previously a number of cases in TCP SACK processing fails to
take advantage of costly stored information in sack_recv_cache,
most importantly, expected events such as cumulative ACK and new
hole ACKs. Processing on such ACKs result in rather long walks
building up latencies (which easily gets nasty when window is
huge). Those latencies are often completely unnecessary
compared with the amount of _new_ information received, usually
for cumulative ACK there's no new information at all, yet TCP
walks whole queue unnecessary potentially taking a number of
costly cache misses on the way, etc.!
Since the inclusion of highest_sack, there's a lot information
that is very likely redundant (SACK fastpath hint stuff,
fackets_out, highest_sack), though there's no ultimate guarantee
that they'll remain the same whole the time (in all unearthly
scenarios). Take advantage of this knowledge here and drop
fastpath hint and use direct access to highest SACKed skb as
a replacement.
Effectively "special cased" fastpath is dropped. This change
adds some complexity to introduce better coveraged "fastpath",
though the added complexity should make TCP behave more cache
friendly.
The current ACK's SACK blocks are compared against each cached
block individially and only ranges that are new are then scanned
by the high constant walk. For other parts of write queue, even
when in previously known part of the SACK blocks, a faster skip
function is used (if necessary at all). In addition, whenever
possible, TCP fast-forwards to highest_sack skb that was made
available by an earlier patch. In typical case, no other things
but this fast-forward and mandatory markings after that occur
making the access pattern quite similar to the former fastpath
"special case".
DSACKs are special case that must always be walked.
The local to recv_sack_cache copying could be more intelligent
w.r.t DSACKs which are likely to be there only once but that
is left to a separate patch.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is going to replace the sack fastpath hint quite soon... :-)
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Policy table is implemented as an RCU linear list since we do not expect
large list nor frequent updates.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IPv4 and IPv6 hook values are identical, yet some code tries to figure
out the "correct" value by looking at the address family. Introduce NF_INET_*
values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__
section for userspace compatibility.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow caller to pass in a release function, there might be
other resources that need releasing as well. Needed for
network receive.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Today we have the following annotations for functions/data
referencing __init/__exit functions / data:
__init_refok => for init functions
__initdata_refok => for init data
__exit_refok => for exit functions
There is really no difference between the __init and __exit
versions and simplify it and to introduce a shorter annotation
the following new annotations are introduced:
__ref => for functions (code) that
references __*init / __*exit
__refdata => for variables
__refconst => for const variables
Whit this annotation is it more obvious what the annotation
is for and there is no longer the arbitary division
between __init and __exit code.
The mechanishm is the same as before - a special section
is created which is made part of the usual sections
in the linker script.
We will start to see annotations like this:
-static struct pci_serial_quirk pci_serial_quirks[] = {
+static const struct pci_serial_quirk pci_serial_quirks[] __refconst = {
-----------------
-static struct notifier_block __cpuinitdata cpuid_class_cpu_notifier =
+static struct notifier_block cpuid_class_cpu_notifier __refdata =
----------------
-static int threshold_cpu_callback(struct notifier_block *nfb,
+static int __ref threshold_cpu_callback(struct notifier_block *nfb,
[The above is just random samples].
Note: No modifications were needed in modpost
to support the new sections due to the newly introduced
blacklisting.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Remove the deprecated __attribute_used__.
[Introduce __section in a few places to silence checkpatch /sam]
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Introducing separate sections for __dev* (HOTPLUG),
__cpu* (HOTPLUG_CPU) and __mem* (MEMORY_HOTPLUG)
allows us to do a much more reliable Section mismatch
check in modpost. We are no longer dependent on the actual
configuration of for example HOTPLUG.
This has the effect that all users see much more
Section mismatch warnings than before because they
were almost all hidden when HOTPLUG was enabled.
The advantage of this is that when building a piece
of code then it is much more likely that the Section
mismatch errors are spotted and the warnings will be
felt less random of nature.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Adrian Bunk <bunk@kernel.org>
auxvec.h, i2c-dev.h and vt.h *should* be unifdef'ed i2o-dev.h does not need
unifdef'ing
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (24 commits)
HID: ADS/Tech Radio si470x needs blacklist entry
HID: Logitech Extreme 3D needs NOGET quirk
HID: Refactor MS Presenter 8K key mapping
HID: MS Presenter mapping for PID 0x0701
HID: Support Samsung IR remote
HID: fix compilation of hidbp drivers without usbhid
HID: Blacklist the Gretag-Macbeth Huey display colorimeter
HID: the `bit' in hidinput_mapping_quirks() is an out parameter
HID: remove redundant WARN_ON()s in order not to scare users
HID: force hiddev creation for SONY PS3 controller
HID: Use hid blacklist in usbmouse/usbkbd
HID: proper handling of MS 4k and 6k devices
HID: remove unused variable in quirk event handler
HID: hid-input quirk for BTC 8193
HID: separate hid-input event quirks from generic code
HID: refactor mapping to input subsystem for quirky devices
HID: Microsoft Wireless Optical Desktop 3.0 quirk
HID: Add support for Logitech Elite keyboards
HID: add full support for Genius KB-29E
HID: fix a potential bug in pointer casting
...
* 'sg' of git://git.kernel.dk/linux-2.6-block:
SG: work with the SCSI fixed maximum allocations.
SG: Convert SCSI to use scatterlist helpers for sg chaining
SG: Move functions to lib/scatterlist.c and add sg chaining allocator helpers
Samsung USB remotes (0419:0001) are rejected by kernel 2.6.23, because the
report descriptor from the remote contains a 48 bit HID report field. HID 1.11
states: Fields may span at most 4 bytes.
This patch, based on 2.6.23, fixes this by modifying the internal report
descriptor in hid-quirks.c. Additional user space support (e.g. LIRC) is
required to fetch the information from the hiddev interface.
The burden to reconstruct the data is moved into userspace (lirc through hiddev).
There is no need to set HID_QUIRK_HIDDEV quirk, as the device has also output
applications, which trigger the creation of hiddev device automatically.
Signed-off-by: Robert Schedel <r.schedel@yahoo.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Fix a panic, by changing
hidinput_mapping_quirks(,, unsigned long *bit,)
to
hidinput_mapping_quirks(,, unsigned long **bit,)
The `bit' in this function is an out parameter.
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This removes ugly macros IS_* to distinguish devices that
need special handling in hid-input, and establish proper
quirks for them.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
BTC 8193 keyboard handles its scrollwheel in very non-standard way.
It produces two non-standard usages for scrolling up and down, in
both cases with postive value equaling to 1. We handle this by temporary
mapping, which we then catch in quirk event handler, and remap to
negative HWHEEL even in order to introduce correct behavior.
Also the button requires special mapping, as it triggers standard-violating
usage code.
Reported in kernel.org bugzilla #9385
Reported-by: Kir Kolyshkin <kir@sacred.ru>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This patch separates also the hid-input quirks that have to be
applied at the time the event occurs, so that the generic code
handling HUT-compliant devices is not messed up by them too much.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Currently, the handling of mapping between hid and input for devices
that don't conform to HUT 1.12 specification is very messy -- no per-device
handling, no blacklists, conditions on idVendor and idProduct placed
all over the code.
This patch moves all the device-specific input mapping to a separate
file, and introduces a blacklist-style handling for non-standard
device-specific mappings.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Genius KB-29E has broken report descriptor, which causes some of the
Consumer usages to appear incorrectly as Button usages. We fix it by
fixing the report descriptor before it is being parsed.
Also a few of the keys violate the HUT standard, so they need a special
handling. They currently fall into "Reserved" range as per HUT 1.12.
Reported-by: Szekeres Istvan <szekeres@iii.hu>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This mouse distinguishes horizontal wheel from vertical by a special "pseudo
event" GenericDesktop.00b8, with values of 0 for vertical and 8 for horizontal
wheel. Because this event is supplied by the parser too late, we need to delay
a wheel event, wait for this one and send either REL_WHEEL or REL_HWHEEL to
input depending on the event value.
Signed-off-by: Pavel Troller <patrol@sinus.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Preserve identifiers exposed in build and run time configuration though in
order not to break existing configurations.
This is in preparation for adding support for Apple aluminum USB keyboards.
Signed-off-by: Michel Daenzer <michel@tungstengraphics.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* orion: (26 commits)
[ARM] Orion: implement power-off method for QNAP TS-109/209
[ARM] Orion: add support for QNAP TS-109/TS-209
[ARM] Orion: I2C support
[I2C] i2c-mv64xxx: Don't set i2c_adapter.retries
[I2C] Split mv643xx I2C platform support
[ARM] Orion: enable CONFIG_RTC_DRV_M41T80 for D-Link DNS-323
[ARM] Orion defconfig
[ARM] Orion: add support for Orion/MV88F5181 based D-Link DNS-323
[ARM] Orion: MV88F5181 support bits
[ARM] Orion: Buffalo/Revogear Kurobox Pro support
[ARM] OrionNAS RD board support
[ARM] Orion: support for Marvell Orion-2 (88F5281) Development Board
[ARM] Orion: common platform setup for Gigabit Ethernet port
[ARM] Orion: platform device registration for UART, USB and NAND
[ARM] Orion: system timer support
[ARM] Orion edge GPIO IRQ support
[ARM] Orion: IRQ support
[ARM] Orion: provide GPIO method for enabling hardware assisted blinking
[ARM] Orion: GPIO support
[ARM] Orion: programable address map support
...
Conflicts:
arch/arm/Kconfig
arch/arm/Makefile
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
SCSI sg table allocation has a maximum size (of SCSI_MAX_SG_SEGMENTS,
currently 128) and this will cause a BUG_ON() in SCSI if something
tries an allocation over it. This patch adds a size limit to the
chaining allocator to allow the specification of the maximum
allocation size for chaining, so we always chain in units of the
maximum SCSI allocation size.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
These DMA drain buffer implementations in drivers are pretty horrible
to do in terms of manipulating the scatterlist. Plus they're being
done at least in drivers/ide and drivers/ata, so we now have code
duplication.
The one use case for this, as I understand it is AHCI controllers doing
PIO mode to mmc devices but translating this to DMA at the controller
level.
So, what about adding a callback to the block layer that permits the
adding of the drain buffer for the problem devices. The idea is that
you'd do this in slave_configure after you find one of these devices.
The beauty of doing it in the block layer is that it quietly adds the
drain buffer to the end of the sg list, so it automatically gets mapped
(and unmapped) without anything unusual having to be done to the
scatterlist in driver/scsi or drivers/ata and without any alteration to
the transfer length.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
syslets (or other threads/processes that want io context sharing) can
set this to enforce sharing of io context.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The io context sharing introduced a per-ioc spinlock, that would protect
the cfq io context lookup. That is a regression from the original, since
we never needed any locking there because the ioc/cic were process private.
The cic lookup is changed from an rbtree construct to a radix tree, which
we can then use RCU to make the reader side lockless. That is the performance
critical path, modifying the radix tree is only done on process creation
(when that process first does IO, actually) and on process exit (if that
process has done IO).
As it so happens, radix trees are also much faster for this type of
lookup where the key is a pointer. It's a very sparse tree.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch converts 'uptodate' arguments of no longer exported
interfaces, end_that_request_first/last, to 'error', and removes
internal conversions for it in blk_end_request interfaces.
Also, this patch removes no longer needed end_io_error().
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch removes the following functions:
o end_that_request_first()
o end_that_request_chunk()
and stops exporting the functions below:
o end_that_request_last()
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch adds a variant of the interface, blk_end_bidi_request(),
which completes a bidi request.
Bidi request must be completed as a whole, both rq and rq->next_rq
at once. So the interface has 2 arguments for completion size.
As for ->end_io, only rq->end_io is called (rq->next_rq->end_io is not
called). So if special completion handling is needed, the handler
must be set to rq->end_io.
And the handler must take care of freeing next_rq too, since
the interface doesn't care of it if rq->end_io is not NULL.
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch adds a variant of the interface, blk_end_request_callback(),
which has driver callback feature.
Drivers may need to do special works between end_that_request_first()
and end_that_request_last().
For such drivers, blk_end_request_callback() allows it to pass
a callback function which is called between end_that_request_first()
and end_that_request_last().
This interface is only for fallback of other blk_end_request interfaces.
Drivers should avoid their tricky behaviors and use other interfaces
as much as possible.
Currently, only one driver, ide-cd, needs this interface.
So this interface should/will be removed, after the driver removes
such tricky behaviors.
o ide-cd (cdrom_newpc_intr())
In PIO mode, cdrom_newpc_intr() needs to defer end_that_request_last()
until the device clears DRQ_STAT and raises an interrupt after
end_that_request_first().
So end_that_request_first() and end_that_request_last() are called
separately in cdrom_newpc_intr().
This means blk_end_request_callback() has to return without
completing request even if no leftover in the request.
To satisfy the requirement, callback function has return value
so that drivers can tell blk_end_request_callback() to return
without completing request.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch adds/exports functions to get the size of request in bytes.
They are useful because blk_end_request interfaces take bytes
as a completed I/O size instead of sectors.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch adds 2 new interfaces for request completion:
o blk_end_request() : called without queue lock
o __blk_end_request() : called with queue lock held
blk_end_request takes 'error' as an argument instead of 'uptodate',
which current end_that_request_* take.
The meanings of values are below and the value is used when bio is
completed.
0 : success
< 0 : error
Some device drivers call some generic functions below between
end_that_request_{first/chunk} and end_that_request_last().
o add_disk_randomness()
o blk_queue_end_tag()
o blkdev_dequeue_request()
These are called in the blk_end_request interfaces as a part of
generic request completion.
So all device drivers become to call above functions.
To decide whether to call blkdev_dequeue_request(), blk_end_request
uses list_empty(&rq->queuelist) (blk_queued_rq() macro is added for it).
So drivers must re-initialize it using list_init() or so before calling
blk_end_request if drivers use it for its specific purpose.
(Currently, there is no driver which completes request without
re-initializing the queuelist after used it. So rq->queuelist
can be used for the purpose above.)
"Normal" drivers can be converted to use blk_end_request()
in a standard way shown below.
a) end_that_request_{chunk/first}
spin_lock_irqsave()
(add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request())
end_that_request_last()
spin_unlock_irqrestore()
=> blk_end_request()
b) spin_lock_irqsave()
end_that_request_{chunk/first}
(add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request())
end_that_request_last()
spin_unlock_irqrestore()
=> spin_lock_irqsave()
__blk_end_request()
spin_unlock_irqsave()
c) spin_lock_irqsave()
(add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request())
end_that_request_last()
spin_unlock_irqrestore()
=> blk_end_request() or spin_lock_irqsave()
__blk_end_request()
spin_unlock_irqrestore()
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Manually doing chained sg lists is not trivial, so add some helpers
to make sure that drivers get it right.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Let queue_dma_alignment return 0 if it was specifically set to 0.
This permits devices with no particular alignment restrictions to
use arbitrary user space buffers without copying.
Signed-off-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Since the SCSI layer uses the request queues from the block layer, blktrace can
also be used to trace the requests to all SCSI devices (like SCSI tape drives),
not only disks. The only missing part is the ioctl interface to start and stop
tracing.
This patch adds the SETUP, START, STOP and TEARDOWN ioctls from blktrace to the
sg device files. With this change, blktrace can be used for SCSI devices like
for disks, e.g.: blktrace -d /dev/sg1 -o - | blkparse -i -
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The driver core, and some other parts of the kernel just want to find a
device based on a name for a specific bus. Give them a simple wrapper
to prevent them from having to always roll their own.
This will be used in the PPC patch later in this series.
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This adds a i2c_new_dummy() primitive to help work with devices
that consume multiple addresses, which include many I2C eeproms
and at least one RTC.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
The i2c_adapter.clients list of i2c_client nodes duplicates driver
model state. This patch starts removing that list, letting us remove
most existing users of those i2c-core lists.
* The core I2C code now iterates over the driver model's list instead
of the i2c-internal one in some places where it's safe:
- Passing a command/ioctl to each client, a mechanims
used almost exclusively by DVB adapters;
- Device address checking, in both i2c-core and i2c-dev.
* Provide i2c_verify_client() to use with driver model iterators.
* Flag the relevant i2c_adapter and i2c_client fields as deprecated,
to help prevent new users from appearing.
For the moment the list needs to stick around, since some issues show
up when deleting devices created by legacy I2C drivers. (They don't
follow standard driver model rules. Removing those devices can cause
self-deadlocks.)
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Discard all I2C driver IDs that aren't used anywhere. That's not just a
couple of them, but more like 49 or one quarter of all defined IDs! And
this is just a first pass, next will come all IDs that are set but
never used, or used but never set.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Move the tps65010 header file from the OMAP arch directory to the
more generic <linux/i2c/...> directory, and remove the spurious
dependency of this driver on OMAP.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
i2c_driver.list is superfluous, this list duplicates the one
maintained by the driver core. Drop it.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
i2c_adapter.list is superfluous, this list duplicates the one
maintained by the driver core. Drop it.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Use more standard prototypes for i2c_use_client() and
i2c_release_client(). The former now returns a pointer to the client,
and the latter no longer returns anything. This matches what all other
subsystems do.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: David Brownell <david-b@pacbell.net>
Don't implement our own reference counting mechanism for i2c clients
when the driver model already has one.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: David Brownell <david-b@pacbell.net>
This patch allows much of the I2C client address data to move from initdata
into text.
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
This patch contains the overdue removal of three I2C drivers.
[JD: In fact only i2c-ixp4xx can be removed at the moment, the other two
platforms don't implement the generic GPIO layer yet.]
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
This patch contains the scheduled removal of legacy I2C RTC drivers with
replacement drivers.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Based on the earlier work by Tejun Heo.
All users are gone so we can finally remove it.
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Based on the earlier work by Tejun Heo.
Switch set_xfer_rate() to use REQ_TYPE_ATA_TASKFILE requests
and make ide_wait_cmd() static.
There should be no functionality changes caused by this patch.
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Use wait_drive_not_busy() in drive_cmd_intr().
v2:
* Fix wait_drive_not_busy() comment (noticed by Sergei).
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
task_end_request() modified to always call ide_end_drive_cmd()
for taskfile requests. Previously, ide_end_drive_cmd() was
called only when IDE_TFLAG_FLAGGED was set. Also,
ide_dma_intr() is modified to use task_end_request().
Enables TASKFILE ioctls to get valid register outputs on
successful completion.
Bart:
- ported it over recent IDE changes
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add IDE_TFLAG_{HOB,TF,DEVICE} defines.
* Set IDE_TFLAG_IN_* flags in {do_rw,ide_no_data,ide_raw}_taskfile() users.
* Remove no longer needed ->tf_flags setup from ide_end_drive_cmd().
There should be no functionality changes caused by this patch.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
In ide_taskfile_ioctl(), there was a race condition involving
drive->io_32bit. It was cleared and restored during ioctl
requests but there was no synchronization with other requests.
So, other requests could execute with the altered ->io_32bit
setting or updated drive->io_32bit could be overwritten by
ide_taskfile_ioctl().
This patch adds IDE_TFLAG_IO_16BIT flag to indicate to
ide_pio_datablock() that 16-bit I/O is needed regardless of
drive->io_32bit settting.
Bart:
- ported it over recent IDE changes
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Remove broken disk byte-swapping support:
- it can cause a data corruption on SMP (or if using PREEMPT on UP)
- all data coming from disk are byte-swapped by taskfile_*_data() which
results in incorrect identify data being reported by /proc/ide/ and IOCTLs
- "hdx=bswap/byteswap" kernel parameter has been broken on m68k host drivers
(including Atari/Q40 ones) since 2.5.x days (because of 'hwif' zero-ing)
- byte-swapping is limited to PIO transfers (for working with TiVo disks on
x86 machines using user-space solutions or dm-byteswap should result in
much better performance because DMA can be used)
For previous discussions please see:
http://www.ussg.iu.edu/hypermail/linux/kernel/0201.0/0768.htmlhttp://lkml.org/lkml/2004/2/28/111
[ I have dm-byteswap device mapper target if somebody is interested
(patch is for 2.6.4 though but I'll dust it off if needed). ]
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Make remaining built-in only IDE host drivers modular, add ide-scan-pci.c
file for probing PCI host drivers registered with IDE core (special case
for built-in IDE and CONFIG_IDEPCI_PCIBUS_ORDER=y) and then take care of
the ordering in which all IDE host drivers are probed when IDE is built-in
during link time.
* Move probing of gayle, falconide, macide, q40ide and buddha (m68k arch
specific) host drivers, before PCI ones (no PCI on m68k), ide-cris (cris
arch specific), cmd640 (x86 arch specific) and pmac (ppc arch specific).
* Move probing of ide-cris (cris arch specific) host driver before cmd640
(x86 arch specific).
* Move probing of mpc8xx (ppc specific) host driver before ide-pnp (depends
on ISA and none of ppc platform that use mpc8xx supports ISA) and ide-h8300
(h8300 arch specific).
* Add "probe_vlb" kernel parameter to cmd640 host driver and update
Documentation/ide.txt accordingly.
* Make IDE_ARM config option visible so it can also be disabled if needed.
* Remove bogus comment from ide.c while at it.
v2:
* Fix two issues spotted by Sergei:
- replace ENOMEM error value by ENOENT in ide-h8300 host driver
- fix MODULE_PARM_DESC() in cmd640 host driver
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Rename init_hwif_data() to ide_init_port_data() and export it.
* For all users of ide_register_hw() with 'initializing' argument set
hwif->present and hwif->hold are always zero so convert these host
drivers to use ide_find_port()+ide_init_port_data()+ide_init_port_hw()
instead (also no need for init_hwif_default() call since the setup
done by it gets over-ridden by ide_init_port_hw() call).
* Drop 'initializing' argument from ide_register_hw().
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add ide_init_port_hw() helper.
* rapide.c: convert rapide_locate_hwif() to rapide_setup_ports()
and use ide_init_port_hw().
* ide_platform.c: convert plat_ide_locate_hwif() to plat_ide_setup_ports()
and use ide_init_port_hw().
* sgiioc4.c: use ide_init_port_hw().
* pmac.c: add 'hw_regs_t *hw' argument to pmac_ide_setup_device(),
setup 'hw' in pmac_ide_{macio,pci}_attach() and use ide_init_port_hw()
in pmac_ide_setup_device().
This patch is a preparation for the future changes in the IDE probing code.
There should be no functionality changes caused by this patch.
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Anton Vorontsov <avorontsov@ru.mvista.com>
Cc: Jeremy Higdon <jeremy@sgi.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Fix build break of powerpc holly_defconfig:
In file included from arch/powerpc/platforms/embedded6xx/holly.c:24:
include/linux/ide.h:1206: error: 'CONFIG_IDE_MAX_HWIFS' undeclared here (not in a function)
There's no need to have a sized array in the prototype, might as well
turn it into a pointer.
It could probably be argued that large parts of the include file can be
covered under #ifdef CONFIG_IDE, but that's a larger undertaking.
Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Rename ide_device_add() to ide_device_add_all() and make it accept
'u8 idx[MAX_HWIFS]' instead of 'u8 idx[4]' as an argument.
* Add ide_device_add() wrapper for ide_device_add_all().
* Convert ide_generic_init() to use ide_device_add_all().
* Remove no longer needed ideprobe_init().
There should be no functionality changes caused by this patch.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Assign drive->quirk_list in ->quirkproc implementations:
- hpt366.c::hpt3xx_quirkproc()
- pdc202xx_new.c::pdcnew_quirkproc()
- pdc202xx_old.c::pdc202xx_quirkproc()
* Make ->quirkproc void.
* Move calling ->quirkproc from do_identify() to probe_hwif().
* Convert it821x_fixups() to it821x_quirkproc() in it821x.c.
* Convert siimage_fixup() to sil_quirkproc() in siimage.c, also remove
no longer needed drive->present check from is_dev_seagate_sata().
* Convert ide_undecoded_slave() to accept 'drive' instead of 'hwif'
as an argument. Then convert ide_register_hw() to accept 'quirkproc'
argument instead of 'fixup' one.
* Remove no longer needed ->fixup method.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Merge ->dma_host_{on,off} methods into ->dma_host_set method
which takes 'int on' argument.
There should be no functionality changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Make ide_dma_off_quietly() and __ide_dma_on() always available.
* Drop "__" prefix from __ide_dma_on().
* Check for presence of ->dma_host_on instead of ->ide_dma_on.
* Convert all users of ->ide_dma_on and ->dma_off_quietly methods
to use ide_dma_on() and ide_dma_off_quietly() instead.
* Remove no longer needed ->ide_dma_on and ->dma_off_quietly methods
from ide_hwif_t.
* Make ide_dma_on() void.
There should be no functionality changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Fix SWDMA/MWDMA masks in cy82c693_chipset.
* Add IDE_HFLAG_CY82C693 host flag and use it in ide_tune_dma() to
check whether the DMA should be enabled even if ide_max_dma_mode()
fails.
* Convert cy82c693_dma_enable() to become cy82c693_set_dma_mode()
and remove no longer needed cy82c693_ide_dma_on(). Then set
IDE_HFLAG_CY82C693 instead of IDE_HFLAG_TRUST_BIOS_FOR_DMA in
cy82c693_chipset.
* Bump driver version.
As a result of this patch cy82c693 driver will configure and use DMA on
all SWDMA0-2 and MWDMA0-2 capable ATA devices instead of relying on BIOS.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
I2C adapter drivers are supposed to handle retries on nack by themselves
if they do, so there's no point in setting .retries if they don't.
As this retry mechanism is going away (at least in its current form),
clean this up now so that we don't get build failures later.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Mark A. Greer <mgreer@mvista.com>
The motivation for this change is to allow other chips, like the
Marvell Orion ARM SoC family, to use the existing i2c-mv64xxx driver.
Signed-off-by: Tzachi Perelstein <tzachi@marvell.com>
Acked-by: Nicolas Pitre <nico@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Acked-by: Mark A. Greer <mgreer@mvista.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 commits)
[SCSI] usbstorage: use last_sector_bug flag universally
[SCSI] libsas: abstract STP task status into a function
[SCSI] ultrastor: clean up inline asm warnings
[SCSI] aic7xxx: fix firmware build
[SCSI] aacraid: fib context lock for management ioctls
[SCSI] ch: remove forward declarations
[SCSI] ch: fix device minor number management bug
[SCSI] ch: handle class_device_create failure properly
[SCSI] NCR5380: fix section mismatch
[SCSI] sg: fix /proc/scsi/sg/devices when no SCSI devices
[SCSI] IB/iSER: add logical unit reset support
[SCSI] don't use __GFP_DMA for sense buffers if not required
[SCSI] use dynamically allocated sense buffer
[SCSI] scsi.h: add macro for enclosure bit of inquiry data
[SCSI] sd: add fix for devices with last sector access problems
[SCSI] fix pcmcia compile problem
[SCSI] aacraid: add Voodoo Lite class of cards.
[SCSI] aacraid: add new driver features flags
[SCSI] qla2xxx: Update version number to 8.02.00-k7.
[SCSI] qla2xxx: Issue correct MBC_INITIALIZE_FIRMWARE command.
...
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: (67 commits)
fix drivers/ata/sata_fsl.c double-decl
[libata] Prefer SCSI_SENSE_BUFFERSIZE to sizeof()
pata_legacy: Merge winbond support
ata_generic: Cenatek support
pata_winbond: error return
pata_serverworks: Fix cable types and cosmetics
pata_mpc52xx: remove un-needed assignment
libata: fix off-by-one in error categorization
ahci: factor out AHCI enabling and enable AHCI before reading CAP
ata_piix: implement SIDPR SCR access
ata_piix: convert to prepare - activate initialization
libata: factor out ata_pci_activate_sff_host() from ata_pci_one()
[libata] Prefer SCSI_SENSE_BUFFERSIZE to sizeof()
pata_legacy: resychronize with upstream changes and resubmit
[libata] pata_legacy: typo fix
[libata] pata_winbond: update for new ->data_xfer hook
pata_pcmcia: convert to new data_xfer prototype
libata annotations and fixes
libata: use dev_driver_string() instead of "libata" in libata-sff.c
ata_piix: kill unused constants and flags
...
This allows others to use the DLM constants without being tied to the
function API of fs/dlm.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (509 commits)
V4L/DVB (7078): radio: fix sf16fmi section mismatch
V4L/DVB (7077): bt878: remove handcrafted PCI subsystem ID check
V4L/DVB (7075): Make a local function static
V4L/DVB (7074): DiB7000P: correct tuning problem for 7MHz channel
V4L/DVB (7073): DiB7070: Reception quality improved
V4L/DVB (7072): sets the MT2060 IF1 frequency according to EEPROM
V4L/DVB (7071): DiB0700: Start streaming the right way
V4L/DVB (7070): Fix some tuning problems
V4L/DVB (7069): Support for myTV.t
V4L/DVB (7068): Add support for WinTV Nova-T-CE driver
V4L/DVB (7067): fix autoserach in the Hauppauge NOVA-T 500
V4L/DVB (7066): ASUS My Cinema U3000 Mini DVBT Tuner
V4L/DVB (7065): Artec T14BR patches
V4L/DVB (7063): xc5000: Fix OOPS caused by missing firmware
V4L/DVB (7062): radio-si570x: Some fixes and new USB ID addition
V4L/DVB (7061): radio-si470x: Some cleanups
V4L/DVB (7060): em28xx: remove has_tuner
V4L/DVB (7059): cx88: Ensure the tuner is reset correctly
V4L/DVB (7058): IR corrections for the Pinnacle 800i
V4L/DVB (7056): tuner: suppress obsolete tuner i2c address warning for XC5000 tuners
...
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (67 commits)
ide: remove redundant DMA blacklist check from __ide_dma_on()
ide: cleanup ide_set_dma()
ide: remove redundant ->ide_dma_on call from set_using_dma()
sc1200: move DMA timings to timing tables
ide: add IDE_HFLAG_ABUSE_SET_DMA_MODE host flag
sis5513: factor out UDMA programming code
pdc202xx_new: move PIO programming code to pdcnew_set_pio_mode()
ide: make 'extra' field in struct ide_port_info u8
ide: kill duplicate code in ide_dump_{ata,atapi}_status()
ide-disk: use ide_get_lba_addr()
ide: printk fix
ide: add ide_tf_read() helper
ide: fix registers loading order in ide_dump_ata_status()
ide-disk: use do_rw_taskfile() (take 2)
ide-disk: add ide_tf_set_cmd() helper
ide-disk: extend timeout for PIO-in commands
ide: remove 'handler' field from ide_task_t (take 2)
ide: use ->data_phase to set ->handler in do_rw_taskfile()
ide: convert do_rw_taskfile() to use ->data_phase
ide: merge flagged_taskfile() into do_rw_taskfile()
...
* Add IDE_HFLAG_ABUSE_SET_DMA_MODE host flag and use it to decide
what to do with transfer modes < XFER_PIO_0 in ide_set_xfer_rate().
* Set IDE_HFLAG_ABUSE_SET_DMA_MODE in host drivers that need it
(aec62xx, amd74xx, cs5520, cs5535, hpt34x, hpt366, pdc202xx_old,
serverworks, tc86c001 and via82cxxx) and cleanup ->set_dma_mode
methods in host drivers that don't (IDE core code guarantees that
->set_dma_mode will be called only for modes which are present
in SWDMA/MWDMA/UDMA masks).
While at it:
* Add IDE_HFLAGS_HPT34X/HPT3XX/PDC202XX/SVWKS define in
hpt34x/hpt366/pdc202xx_old/serverworks host driver.
There should be no functionality changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
The maximum value used currently for 'extra' field in struct ide_port_info
is 240.
Make 'extra' u8 so it packs nicely together with enablebits[] and 'chipset'
fields (ide_pci_enablebit_t is 3 bytes and hwif_chipset_t is 1 byte).
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Export ide_get_lba_addr().
* Convert idedisk_{read_native,set}_max_address() to use ide_get_lba_addr().
* Remove incorrect comment from idedisk_read_native_max_address()
(noticed by Sergei).
There should be no functionality changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Factor out code reading taskfile registers from ide_end_drive_cmd()
to the new ide_tf_read() helper.
* Add IDE_TFLAG_IN_* taskfile flags to indicate the need to load
particular IDE taskfile register in ide_tf_read().
* Update ide_end_drive_cmd() to set respective IDE_TFLAG_IN_* taksfile flags.
* Add ide_get_lba_addr() for getting LBA sector address from taskfile struct.
* Factor out code getting sector address from ide_dump_ata_status()
to the new ide_dump_sector() function.
* Convert ide_dump_sector() to use ide_tf_read() and ide_get_lba_addr().
* Remove no longer needed ide_read_24().
The only change in functionality caused by this patch is that
ide_dump_ata_status() no longer prints "high"/"low" parts of LBA48
sector address (of course LBA48 sector address is still printed).
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add IDE_TFLAG_DMA_PIO_FALLBACK taskfile flag to indicate the need
to skip loading taskfile registers in do_rw_taskfile().
* Export do_rw_taskfile().
* Convert __ide_do_rw_disk() to use do_rw_taskfile().
* Unexport ide_tf_load().
* Unexport {pre_task_out,task_in}_intr() and make it static.
* Remove incorrect comment about do_rw_taskfile() from <linux/ide.h>.
There should be no functionality changes caused by this patch.
v2:
* Add missing blk_fs_request() check to task_dma_ok() (for VDMA).
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add IDE_TFLAG_CUSTOM_HANDLER taskfile flag and use it for internal requests
which require custom handlers. Check the flag in do_rw_taskfile() and set
handler accordingly.
* Cleanup ide_init_{specify,restore,setmult}_cmd() and rename it to
ide_tf_set_{specify,restore,setmult}_cmd().
* Make {set_geometry,recal,set_multmode}_intr() static.
* Remove no longer needed 'handler' field from ide_task_t.
v2:
* 'handler' in do_rw_taskfile() must be set to NULL initially.
There should be no functionality changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Use ->data_phase to set ->handler in do_rw_taskfile() instead of
setting ->handler in callers of ide_raw_taskfile()/do_rw_taskfile().
* Unexport task_no_data_intr() and make it static.
There should be no functionality changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Use task->data_phase in do_rw_taskfile() to decide what to do.
* task->prehandler is only used by TASKFILE[_MULTI]_OUT so just
use pre_task_out_intr() directly and remove no longer needed
'prehandler' field from ide_task_t.
* Remove no longer needed ide_pre_handler_t type.
There should be no functionality changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Based on the earlier work by Tejun Heo.
task->data_phase == TASKFILE_MULTI_{IN,OUT} vs drive->mult_count == 0
check is needed also for ide_taskfile_ioctl() requests that don't have
IDE_TFLAG_FLAGGED taskfile flag set.
Cc: Tejun Heo <htejun@gmail.com>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add IDE_TFLAG_IN_DATA taskfile flag to indicate the need of reading
IDE_DATA_REG in ide_end_drive_cmd().
Set the new flag in ide_taskfile_ioctl() if ->in_flags.b.data is set.
* Add IDE_TFLAG_FLAGGED_SET_IN_FLAGS taskfile flag to indicate the
need of modifying ->in_flags in ide_taskfile_ioctl().
Set the new flag in flagged_taskfile() and move the code modifying
->tf_in_flags to ide_taskfile_ioctl().
While at it remove the bogus comment: ->tf_in_flags (except .b.data)
have no effect on selection of registers to read.
* Remove no longer needed 'tf_in_flags' field from ide_task_t.
As the result we finally have the internals of HDIO_DRIVE_TASKFILE ioctl
separated from the core IDE code.
There should be no functionality changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add 'data_buf' and 'nsect' variables in ide_taskfile_ioctl()
to cache data buffer pointer and number of sectors to transfer
(this allows us to have only one ide_diag_taskfile() call).
* Add IDE_TFLAG_WRITE taskfile flag and use it to check whether
the REQ_RW request flag should be set.
* Move ->command_type handling from ide_diag_taskfile() to
ide_taskfile_ioctl() and use ->req_cmd instead of ->command_type.
* Add 'nsect' parameter to ide_raw_taskfile().
* Merge ide_diag_taskfile() into ide_raw_taskfile().
* Initialize ->data_phase explicitly in idedisk_prepare_flush(),
ide_start_power_step() and ide_disk_special().
* Remove no longer needed 'command_type' field from ide_task_t.
* Add #ifndef/#endif __KERNEL__ to <linux/hdreg.h> around no
longer used by kernel IDE_DRIVE_TASK_* and TASKFILE_* defines.
There should be no functionality changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Given that:
* hpt366.c::hpt3xx_intrproc() is the only user of hwif->intrproc
* hpt366.c::hpt3xx_quirkproc() sets drive->quirk_list to 1 for quirky drives
which is a value unique to hpt366 host driver
we can remove hwif->intproc and just check for drive->quirk_list == 1
in ide_do_request().
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Add ide_pktcmd_tf_load() helper and convert ATAPI device drivers to use it.
There should be no functionality changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Remove atapi_ireason_t.
While at it:
* replace 'HWIF(drive)' by 'drive->hwif' (or just 'hwif' where possible)
v2:
* v1 had CD and IO bits reversed in many places.
* Use CD and IO defines from <linux/hdreg.h>.
v3:
* Fix incorrect "(ireason & IO) == test_bit()". (Noticed by Sergei)
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Remove ata_nsector_t, ata_data_t (unused) and atapi_bcount_t.
While at it:
* replace 'HWIF(drive)' by 'hwif'
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Remove atapi_feature_t.
While at it:
* replace 'HWIF(drive)' by 'hwif'
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Remove atapi_error_t.
While at it:
* replace 'HWIF(drive)' by 'drive->hwif'
v2:
* Add {ILI,EOM,LFS}_ERR defines to <linux/hdreg.h>.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Remove ata_status_t (unused) and atapi_status_t.
While at it:
* replace 'HWIF(drive)' by 'drive->hwif' (or just 'hwif' where possible)
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
special_t is used only internally by the IDE subsystem (it isn't
related to hardware registers and isn't exported to the user-space).
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Based on the earlier work by Tejun Heo.
All users are gone so we can finally remove it.
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add IDE_TFLAG_OUT_DEVICE taskfile flag to indicate the need of writing
the Device register and handle it in ide_tf_load().
Update ide_tf_load() and {do_rw,flagged}_taskfile() users accordingly.
* Use struct ide_taskfile and ide_tf_load() in execute_drive_cmd().
* Make the debugging code dump all taskfile registers for both
REQ_ATA_TYPE_{CMD,TASK} requests and move it to ide_tf_load()
so it also covers REQ_ATA_TYPE_TASKFILE requests.
There should be no functionality changes caused by this patch
(unless DEBUG is defined).
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Remove stale ide.h "configuration options":
* INITIAL_MULT_COUNT - always defined to 0
* SUPPORT_SLOW_DATA_PORTS - unused
* OK_TO_RESET_CONTROLLER - always defined to 1
* DISABLE_IRQ_NOSYNC - always defined to 0
Leave SUPPORT_VLB_SYNC (defined to 0 for CRIS and FRV, otherwise to 1)
for now but disallow overriding it by <asm/ide.h>.
There should be no functionality changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Based on the earlier work by Tejun Heo.
* Move setting IDE_TFLAG_LBA48 taskfile flag from do_rw_taskfile()
function to the callers.
* Add IDE_TFLAG_FLAGGED taskfile flag for flagged taskfiles coming
from ide_taskfile_ioctl(). Check it instead of ->tf_out_flags.all.
* Add IDE_TFLAG_OUT_DATA taskfile flag to indicate the need to load
IDE data register in ide_tf_load().
* Add IDE_TFLAG_OUT_* taskfile flags to indicate the need to load
particular IDE taskfile registers in ide_tf_load().
* Update do_rw_taskfile() and ide_tf_load() users to set respective
IDE_TFLAG_OUT_* taksfile flags.
* Add task_dma_ok() helper.
* Use IDE_TFLAG_FLAGGED taskfile flag to select HIHI mask in ide_tf_load().
* Use do_rw_taskfile() in flagged_taskfile().
* Remove no longer needed 'tf_out_flags' field from ide_task_t.
There should be no functionality changes caused by this patch.
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add ide_no_data_taskfile() helper and convert ide_raw_taskfile() w/ NO DATA
protocol users to use it instead.
* Set ->data_phase explicitly in ide_no_data_taskfile()
(TASKFILE_NO_DATA is defined as 0x0000).
* Unexport task_no_data_intr().
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Based on the earlier work by Tejun Heo.
* Add 'tf_flags' field (for taskfile flags) to ide_task_t.
* Add IDE_TFLAG_LBA48 taskfile flag for LBA48 taskfiles.
* Add IDE_TFLAG_NO_SELECT_MASK taskfile flag for __ide_do_rw_disk()
which doesn't use SELECT_MASK() (looks like a bug but it requires
some more investigation).
* Split off ide_tf_load() helper from do_rw_taskfile().
* Convert __ide_do_rw_disk() to use ide_tf_load().
There should be no functionality changes caused by this patch.
Cc: Tejun Heo <htejun@gmail.com>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Don't set write-only ide_task_t.hobRegister[6] and ide_task_t.hobRegister[7]
in idedisk_set_max_address_ext().
* Add struct ide_taskfile and use it in ide_task_t instead of tfRegister[]
and hobRegister[].
* Remove no longer needed IDE_CONTROL_OFFSET_HOB define.
* Add #ifndef/#endif __KERNEL__ around definitions of {task,hob}_struct_t.
While at it:
* Use ATA_LBA define for LBA bit (0x40) as suggested by Tejun Heo.
v2:
* Add missing newlines. (Noticed by Sergei)
* Use ~ATA_LBA instead of 0xBF. (Noticed by Sergei)
* Use unnamed unions for error/feature and status/command.
(Suggested by Sergei).
There should be no functionality changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Remove task_ioreg_t typedef from the kernel code (but leave it
in <linux/hdreg.h> for #ifndef/#endif __KERNEL__ case).
While at it also move sata_ioreg_t typedef under #ifndef/#endif __KERNEL__.
v2:
Remove name of the second parameter from ide_execute_command() declaration.
(Noticed by Sergei).
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Convert cmd64x, hpt366 and pdc202xx_old host drivers to use
pci_resource_start(hwif->pci_dev, 4) instead of hwif->dma_master.
* Remove no longer needed ->dma_master field from ide_hwif_t.
v2:
* Use the more readable 'hwif->dma_base - (hwif->channel * 8)' instead of
pci_resource_start(hwif->pci_dev, 4).
v3:
* Use hwif->extra_base in hpt366/pdc20xx_old + some cosmetic fixups over v2
(suggested by Sergei).
v4:
* Correct offsets in hpt3xxn_set_clock().
v5:
* Use hwif->extra_base in hpt366 for _real_ this time. (Noticed by Sergei)
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Right now, the linux kernel (with scheduler statistics enabled) keeps track
of the maximum time a process is waiting to be scheduled. While the maximum
is a very useful metric, tracking average and total is equally useful
(at least for latencytop) to figure out the accumulated effect of scheduler
delays. The accumulated effect is important to judge the performance impact
of scheduler tuning/behavior.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LatencyTOP kernel infrastructure; it measures latencies in the
scheduler and tracks it system wide and per process.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For some crazy reason (trying to work around hw problem in i810) I wanted
to use HZ around 4000.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently all highres=off timers are run from softirq context, but
HRTIMER_CB_IRQSAFE_NO_SOFTIRQ timers expect to run from irq context.
Fix this up by splitting it similar to the highres=on case.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We need to teach no_hz about the rt throttling because its tick driven.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Extend group scheduling to also cover the realtime classes. It uses the time
limiting introduced by the previous patch to allow multiple realtime groups.
The hard time limit is required to keep behaviour deterministic.
The algorithms used make the realtime scheduler O(tg), linear scaling wrt the
number of task groups. This is the worst case behaviour I can't seem to get out
of, the avg. case of the algorithms can be improved, I focused on correctness
and worst case.
[ akpm@linux-foundation.org: move side-effects out of BUG_ON(). ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Very simple time limit on the realtime scheduling classes.
Allow the rq's realtime class to consume sched_rt_ratio of every
sched_rt_period slice. If the class exceeds this quota the fair class
will preempt the realtime class.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use HR-timers (when available) to deliver an accurate preemption tick.
The regular scheduler tick that runs at 1/HZ can be too coarse when nice
level are used. The fairness system will still keep the cpu utilisation 'fair'
by then delaying the task that got an excessive amount of CPU time but try to
minimize this by delivering preemption points spot-on.
The average frequency of this extra interrupt is sched_latency / nr_latency.
Which need not be higher than 1/HZ, its just that the distribution within the
sched_latency period is important.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Why do we even have cond_resched when real preemption
is on? It seems to be a waste of space and time.
remove cond_resched with CONFIG_PREEMPT on.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce a new rlimit that allows the user to set a runtime timeout on
real-time tasks their slice. Once this limit is exceeded the task will receive
SIGXCPU.
So it measures runtime since the last sleep.
Input and ideas by Thomas Gleixner and Lennart Poettering.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Lennart Poettering <mzxreary@0pointer.de>
CC: Michael Kerrisk <mtk.manpages@googlemail.com>
CC: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move the task_struct members specific to rt scheduling together.
A future optimization could be to put sched_entity and sched_rt_entity
into a union.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch implements a new version of RCU which allows its read-side
critical sections to be preempted. It uses a set of counter pairs
to keep track of the read-side critical sections and flips them
when all tasks exit read-side critical section. The details
of this implementation can be found in this paper -
http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf
and the article-
http://lwn.net/Articles/253651/
This patch was developed as a part of the -rt kernel development and
meant to provide better latencies when read-side critical sections of
RCU don't disable preemption. As a consequence of keeping track of RCU
readers, the readers have a slight overhead (optimizations in the paper).
This implementation co-exists with the "classic" RCU implementations
and can be switched to at compiler.
Also includes RCU tracing summarized in debugfs.
[ akpm@linux-foundation.org: build fixes on non-preempt architectures ]
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Reviewed-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch re-organizes the RCU code to enable multiple implementations
of RCU. Users of RCU continues to include rcupdate.h and the
RCU interfaces remain the same. This is in preparation for
subsequently merging the preemptible RCU implementation.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch makes RCU use softirq instead of tasklets.
It also adds a memory barrier after raising the softirq
inorder to ensure that the cpu sees the most recently updated
value of rcu->cur while processing callbacks.
The discussion of the related theoretical race pointed out
by James Huang can be found here --> http://lkml.org/lkml/2007/11/20/603
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Reviewed-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dmitry Adamushko found that the current implementation of the RT
balancing code left out changes to the sched_setscheduler and
rt_mutex_setprio.
This patch addresses this issue by adding methods to the schedule classes
to handle being switched out of (switched_from) and being switched into
(switched_to) a sched_class. Also a method for changing of priorities
is also added (prio_changed).
This patch also removes some duplicate logic between rt_mutex_setprio and
sched_setscheduler.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To make the main sched.c code more agnostic to the schedule classes.
Instead of having specific hooks in the schedule code for the RT class
balancing. They are replaced with a pre_schedule, post_schedule
and task_wake_up methods. These methods may be used by any of the classes
but currently, only the sched_rt class implements them.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We add the notion of a root-domain which will be used later to rescope
global variables to per-domain variables. Each exclusive cpuset
essentially defines an island domain by fully partitioning the member cpus
from any other cpuset. However, we currently still maintain some
policy/state as global variables which transcend all cpusets. Consider,
for instance, rt-overload state.
Whenever a new exclusive cpuset is created, we also create a new
root-domain object and move each cpu member to the root-domain's span.
By default the system creates a single root-domain with all cpus as
members (mimicking the global state we have today).
We add some plumbing for storing class specific data in our root-domain.
Whenever a RQ is switching root-domains (because of repartitioning) we
give each sched_class the opportunity to remove any state from its old
domain and add state to the new one. This logic doesn't have any clients
yet but it will later in the series.
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: Christoph Lameter <clameter@sgi.com>
CC: Paul Jackson <pj@sgi.com>
CC: Simon Derr <simon.derr@bull.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The current wake-up code path tries to determine if it can optimize the
wake-up to "this_cpu" by computing load calculations. The problem is that
these calculations are only relevant to SCHED_OTHER tasks where load is king.
For RT tasks, priority is king. So the load calculation is completely wasted
bandwidth.
Therefore, we create a new sched_class interface to help with
pre-wakeup routing decisions and move the load calculation as a function
of CFS task's class.
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some RT tasks (particularly kthreads) are bound to one specific CPU.
It is fairly common for two or more bound tasks to get queued up at the
same time. Consider, for instance, softirq_timer and softirq_sched. A
timer goes off in an ISR which schedules softirq_thread to run at RT50.
Then the timer handler determines that it's time to smp-rebalance the
system so it schedules softirq_sched to run. So we are in a situation
where we have two RT50 tasks queued, and the system will go into
rt-overload condition to request other CPUs for help.
This causes two problems in the current code:
1) If a high-priority bound task and a low-priority unbounded task queue
up behind the running task, we will fail to ever relocate the unbounded
task because we terminate the search on the first unmovable task.
2) We spend precious futile cycles in the fast-path trying to pull
overloaded tasks over. It is therefore optimial to strive to avoid the
overhead all together if we can cheaply detect the condition before
overload even occurs.
This patch tries to achieve this optimization by utilizing the hamming
weight of the task->cpus_allowed mask. A weight of 1 indicates that
the task cannot be migrated. We will then utilize this information to
skip non-migratable tasks and to eliminate uncessary rebalance attempts.
We introduce a per-rq variable to count the number of migratable tasks
that are currently running. We only go into overload if we have more
than one rt task, AND at least one of them is migratable.
In addition, we introduce a per-task variable to cache the cpus_allowed
weight, since the hamming calculation is probably relatively expensive.
We only update the cached value when the mask is updated which should be
relatively infrequent, especially compared to scheduling frequency
in the fast path.
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
this patch extends the soft-lockup detector to automatically
detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
printed the following way:
------------------>
INFO: task prctl:3042 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
prctl D fd5e3793 0 3042 2997
f6050f38 00000046 00000001 fd5e3793 00000009 c06d8264 c06dae80 00000286
f6050f40 f6050f00 f7d34d90 f7d34fc8 c1e1be80 00000001 f6050000 00000000
f7e92d00 00000286 f6050f18 c0489d1a f6050f40 00006605 00000000 c0133a5b
Call Trace:
[<c04883a5>] schedule_timeout+0x6d/0x8b
[<c04883d8>] schedule_timeout_uninterruptible+0x15/0x17
[<c0133a76>] msleep+0x10/0x16
[<c0138974>] sys_prctl+0x30/0x1e2
[<c0104c52>] sysenter_past_esp+0x5f/0xa5
=======================
2 locks held by prctl/3042:
#0: (&sb->s_type->i_mutex_key#5){--..}, at: [<c0197d11>] do_fsync+0x38/0x7a
#1: (jbd_handle){--..}, at: [<c01ca3d2>] journal_start+0xc7/0xe9
<------------------
the current default timeout is 120 seconds. Such messages are printed
up to 10 times per bootup. If the system has crashed already then the
messages are not printed.
if lockdep is enabled then all held locks are printed as well.
this feature is a natural extension to the softlockup-detector (kernel
locked up without scheduling) and to the NMI watchdog (kernel locked up
with IRQs disabled).
[ Gautham R Shenoy <ego@in.ibm.com>: CPU hotplug fixes. ]
[ Andrew Morton <akpm@linux-foundation.org>: build warning fix. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
This patch converts the known per-subsystem mutexes to get_online_cpus
put_online_cpus. It also eliminates the CPU_LOCK_ACQUIRE and
CPU_LOCK_RELEASE hotplug notification events.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use
get_online_cpus and put_online_cpus instead as it highlights the
refcount semantics in these operations.
The new API guarantees protection against the cpu-hotplug operation, but
it doesn't guarantee serialized access to any of the local data
structures. Hence the changes needs to be reviewed.
In case of pseries_add_processor/pseries_remove_processor, use
cpu_maps_update_begin()/cpu_maps_update_done() as we're modifying the
cpu_present_map there.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch implements a Refcount + Waitqueue based model for
cpu-hotplug.
Now, a thread which wants to prevent cpu-hotplug, will bump up a global
refcount and the thread which wants to perform a cpu-hotplug operation
will block till the global refcount goes to zero.
The readers, if any, during an ongoing cpu-hotplug operation are blocked
until the cpu-hotplug operation is over.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Paul Jackson <pj@sgi.com> [For !CONFIG_HOTPLUG_CPU ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The current load balancing scheme isn't good enough for precise
group fairness.
For example: on a 8-cpu system, I created 3 groups as under:
a = 8 tasks (cpu.shares = 1024)
b = 4 tasks (cpu.shares = 1024)
c = 3 tasks (cpu.shares = 1024)
a, b and c are task groups that have equal weight. We would expect each
of the groups to receive 33.33% of cpu bandwidth under a fair scheduler.
This is what I get with the latest scheduler git tree:
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--------------------------------------------------------------------------------
Col1 | Col2 | Col3 | Col4
------|---------|-------|-------------------------------------------------------
a | 277.676 | 57.8% | 54.1% 54.1% 54.1% 54.2% 56.7% 62.2% 62.8% 64.5%
b | 116.108 | 24.2% | 47.4% 48.1% 48.7% 49.3%
c | 86.326 | 18.0% | 47.5% 47.9% 48.5%
--------------------------------------------------------------------------------
Explanation of o/p:
Col1 -> Group name
Col2 -> Cumulative execution time (in seconds) received by all tasks of that
group in a 60sec window across 8 cpus
Col3 -> CPU bandwidth received by the group in the 60sec window, expressed in
percentage. Col3 data is derived as:
Col3 = 100 * Col2 / (NR_CPUS * 60)
Col4 -> CPU bandwidth received by each individual task of the group.
Col4 = 100 * cpu_time_recd_by_task / 60
[I can share the test case that produces a similar o/p if reqd]
The deviation from desired group fairness is as below:
a = +24.47%
b = -9.13%
c = -15.33%
which is quite high.
After the patch below is applied, here are the results:
--------------------------------------------------------------------------------
Col1 | Col2 | Col3 | Col4
------|---------|-------|-------------------------------------------------------
a | 163.112 | 34.0% | 33.2% 33.4% 33.5% 33.5% 33.7% 34.4% 34.8% 35.3%
b | 156.220 | 32.5% | 63.3% 64.5% 66.1% 66.5%
c | 160.653 | 33.5% | 85.8% 90.6% 91.4%
--------------------------------------------------------------------------------
Deviation from desired group fairness is as below:
a = +0.67%
b = -0.83%
c = +0.17%
which is far better IMO. Most of other runs have yielded a deviation within
+-2% at the most, which is good.
Why do we see bad (group) fairness with current scheuler?
=========================================================
Currently cpu's weight is just the summation of individual task weights.
This can yield incorrect results. For ex: consider three groups as below
on a 2-cpu system:
CPU0 CPU1
---------------------------
A (10) B(5)
C(5)
---------------------------
Group A has 10 tasks, all on CPU0, Group B and C have 5 tasks each all
of which are on CPU1. Each task has the same weight (NICE_0_LOAD =
1024).
The current scheme would yield a cpu weight of 10240 (10*1024) for each cpu and
the load balancer will think both CPUs are perfectly balanced and won't
move around any tasks. This, however, would yield this bandwidth:
A = 50%
B = 25%
C = 25%
which is not the desired result.
What's changing in the patch?
=============================
- How cpu weights are calculated when CONFIF_FAIR_GROUP_SCHED is
defined (see below)
- API Change
- Two tunables introduced in sysfs (under SCHED_DEBUG) to
control the frequency at which the load balance monitor
thread runs.
The basic change made in this patch is how cpu weight (rq->load.weight) is
calculated. Its now calculated as the summation of group weights on a cpu,
rather than summation of task weights. Weight exerted by a group on a
cpu is dependent on the shares allocated to it and also the number of
tasks the group has on that cpu compared to the total number of
(runnable) tasks the group has in the system.
Let,
W(K,i) = Weight of group K on cpu i
T(K,i) = Task load present in group K's cfs_rq on cpu i
T(K) = Total task load of group K across various cpus
S(K) = Shares allocated to group K
NRCPUS = Number of online cpus in the scheduler domain to
which group K is assigned.
Then,
W(K,i) = S(K) * NRCPUS * T(K,i) / T(K)
A load balance monitor thread is created at bootup, which periodically
runs and adjusts group's weight on each cpu. To avoid its overhead, two
min/max tunables are introduced (under SCHED_DEBUG) to control the rate
at which it runs.
Fixes from: Peter Zijlstra <a.p.zijlstra@chello.nl>
- don't start the load_balance_monitor when there is only a single cpu.
- rename the kthread because its currently longer than TASK_COMM_LEN
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
selinux: make mls_compute_sid always polyinstantiate
security/selinux: constify function pointer tables and fields
security: add a secctx_to_secid() hook
security: call security_file_permission from rw_verify_area
security: remove security_sb_post_mountroot hook
Security: remove security.h include from mm.h
Security: remove security_file_mmap hook sparse-warnings (NULL as 0).
Security: add get, set, and cloning of superblock security information
security/selinux: Add missing "space"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (125 commits)
[CRYPTO] twofish: Merge common glue code
[CRYPTO] hifn_795x: Fixup container_of() usage
[CRYPTO] cast6: inline bloat--
[CRYPTO] api: Set default CRYPTO_MINALIGN to unsigned long long
[CRYPTO] tcrypt: Make xcbc available as a standalone test
[CRYPTO] xcbc: Remove bogus hash/cipher test
[CRYPTO] xcbc: Fix algorithm leak when block size check fails
[CRYPTO] tcrypt: Zero axbuf in the right function
[CRYPTO] padlock: Only reset the key once for each CBC and ECB operation
[CRYPTO] api: Include sched.h for cond_resched in scatterwalk.h
[CRYPTO] salsa20-asm: Remove unnecessary dependency on CRYPTO_SALSA20
[CRYPTO] tcrypt: Add select of AEAD
[CRYPTO] salsa20: Add x86-64 assembly version
[CRYPTO] salsa20_i586: Salsa20 stream cipher algorithm (i586 version)
[CRYPTO] gcm: Introduce rfc4106
[CRYPTO] api: Show async type
[CRYPTO] chainiv: Avoid lock spinning where possible
[CRYPTO] seqiv: Add select AEAD in Kconfig
[CRYPTO] scatterwalk: Handle zero nbytes in scatterwalk_map_and_copy
[CRYPTO] null: Allow setkey on digest_null
...
We have to be able to change individual LEBs for utilities like
ubifsck, ubifstune. For example, ubifsck has to be able to fix
errors on the media, ubifstune has to be able to change the
the superblock, hence this ioctl.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Add the following class iteration functions for driver use:
class_for_each_device
class_find_device
class_for_each_child
class_find_child
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This name is just passed to platform_device_alloc which has its parameter
declared const.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
All kobjects require a dynamically allocated name now. We no longer
need to keep track if the name is statically assigned, we can just
unconditionally free() all kobject names on cleanup.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There are no in-kernel users of kobject_unregister() so it should be
removed.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We save the current state in the object itself, so we can do proper
cleanup when the last reference is dropped.
If the initial reference is dropped, the object will be removed from
sysfs if needed, if an "add" event was sent, "remove" will be send, and
the allocated resources are released.
This allows us to clean up some driver core usage as well as allowing us
to do other such changes to the rest of the kernel.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
No one is calling this anymore, so just remove it and hard-code the one
internal-use of it.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The function is no longer used by anyone in the kernel, and it prevents
the proper sending of the kobject uevent after the needed files are set
up by the caller. kobject_init_and_add() can be used in its place.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Now that the old kobject_init() function is gone, rename
kobject_init_ng() to kobject_init() to clean up the namespace.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The old kobject_init() function is on longer in use, so let us remove it
from the public scope (kset mess in the kobject.c file still uses it,
but that can be cleaned up later very simply.)
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Now that the old kobject_add() function is gone, rename kobject_add_ng()
to kobject_add() to clean up the namespace.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The old kobject_add() function is on longer in use, so let us remove it
from the public scope (kset mess in the kobject.c file still uses it,
but that can be cleaned up later very simply.)
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This moves the block devices to /sys/class/block. It will create a
flat list of all block devices, with the disks and partitions in one
directory. For compatibility /sys/block is created and contains symlinks
to the disks.
/sys/class/block
|-- sda -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda
|-- sda1 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda1
|-- sda10 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda10
|-- sda5 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda5
|-- sda6 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda6
|-- sda7 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda7
|-- sda8 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda8
|-- sda9 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda9
`-- sr0 -> ../../devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0
/sys/block/
|-- sda -> ../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda
`-- sr0 -> ../devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch removes the kobject, and a few other driver-core-only fields
out of struct driver and into the driver core only. Now drivers can be
safely create on the stack or statically (like they currently are.)
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The module driver specific code should belong in the driver core, not in
the kernel/ directory. So move this code. This is done in preparation
for some struct device_driver rework that should be confined to the
driver core code only.
This also lets us keep from exporting these functions, as no external
code should ever be calling it.
Thanks to Andrew Morton for the !CONFIG_MODULES fix.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The iseries driver wants to hang kobjects off of its driver, so, to
preserve backwards compatibility, we need to add a call to the driver
core to allow future changes to work properly.
Hopefully no one uses this function in the future and the iseries_veth
driver authors come to their senses so I can remove this hack...
Cc: Dave Larson <larson1@us.ibm.com>
Cc: Santiago Leon <santil@us.ibm.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is lot like default attributes for devices (and indeed,
a lot of the code is lifted from there).
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
struct bus_type is static everywhere in the kernel. This moves the
kobject in the structure out of it, and a bunch of other private only to
the driver core fields are now moved to a private structure. This lets
us dynamically create the backing kobject properly and gives us the
chance to be able to document to users exactly how to use the struct
bus_type as there are no fields they can improperly access.
Thanks to Kay for the build fixes on this patch.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This allows an easier way to get to the device klist associated with a
struct bus_type (you have three to choose from...) This will make it
easier to move these fields to be dynamic in a future patch.
The only user of this is the PCI core which horribly abuses this
interface to rearrange the order of the pci devices. This should be
done using the existing bus device walking functions, but that's left
for future patches.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This allows an easier way to get to the kset associated with a struct
bus_type (you have three to choose from...) This will make it easier to
move these fields to be dynamic in a future patch.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This isn't used by anything in the driver core, and by no one in the 204
different usages of it in the kernel tree. Remove this field so no one
gets any idea that it is needed to be used.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The uio kobject code is "wierd". This patch should hopefully fix it up
to be sane and not leak memory anymore.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benedikt Spranger <b.spranger@linutronix.de>
Signed-off-by: Hans J. Koch <hjk@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
/sys/power should not be a kset, that's overkill. This patch renames it
to power_kset and fixes up all usages of it in the tree.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
These functions are no longer used and are the last remants of the old
subsystem crap. So delete them for good.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
kernel_kset does not need to be a kset, but a much simpler kobject now
that we have kobj_attributes.
We also rename kernel_kset to kernel_kobj to catch all users of this
symbol with a build error instead of an easy-to-ignore build warning.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This macro is no longer used. ksets should be created dynamically with
a call to kset_create_and_add() not declared statically.
Yes, there are 5 remaining static struct kset usages in the kernel tree,
but they will be fixed up soon.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is no firmware "subsystem" it's just a directory in /sys that
other portions of the kernel want to hook into. So make it a kobject
not a kset to help alivate anyone who tries to do some odd kset-like
things with this.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
These functions are no longer called or needed, so we can remove them.
As I rewrote the whole firmware.c file, add my copyright.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Remove the no longer needed subsys_attributes, they are all converted to
the more sensical kobj_attributes.
There is no longer a magic fallback in sysfs attribute operations, all
kobjects which create simple attributes need explicitely a ktype
assigned, which tells the core what was intended here.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Needed for future firmware subsystem cleanups.
In the end, the firmware_register/unregister functions will be deleted
entirely, but we need this symbol so that subsystems can migrate over.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Clean up the use of ksets and kobjects. Kobjects are instances of
objects (like struct user_info), ksets are collections of objects of a
similar type (like the uids directory containing the user_info directories).
So, use kobjects for the user_info directories, and a kset for the "uids"
directory.
On object cleanup, the final kobject_put() was missing.
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add kobj_sysfs_ops to replace subsys_sysfs_ops. There is no
need for special kset operations, we want to be able to use
simple attribute operations at any kobject, not only ksets.
The whole concept of any default sysfs attribute operations
will go away with the upcoming removal of subsys_sysfs_ops.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dynamically create the kset instead of declaring it statically.
Having 3 static kobjects in one structure is not only foolish, but ripe
for nasty race conditions if handled improperly. We also rename the
field to catch any potential users of it (not that there should be
outside of the driver core...)
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dynamically create the kset instead of declaring it statically.
Having 3 static kobjects in one structure is not only foolish, but ripe
for nasty race conditions if handled improperly. We also rename the
field to catch any potential users of it (not that there should be
outside of the driver core...)
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dynamically create the kset instead of declaring it statically. We also
rename power_subsys to power_kset to catch all users of the variable and
we properly export it so that people don't have to guess that it really
is present in the system.
The pseries code is wierd, why is it createing /sys/power if CONFIG_PM
is disabled? Oh well, stupid big boxes ignoring config options...
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dynamically create the kset instead of declaring it statically. We also
rename module_subsys to module_kset to catch all users of the variable.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>