Commit Graph

12658 Commits

Author SHA1 Message Date
Adrian McMenamin
1795cf48b3 sh/maple: clean maple bus code
This patch cleans up the handling of the maple bus queue to remove
the risk of races when adding packets. It also removes references to the
redundant connect and disconnect functions.

Signed-off-by: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2008-07-29 22:10:56 +09:00
FUJITA Tomonori
8978b74253 generic, x86: fix add iommu_num_pages helper function
This IOMMU helper function doesn't work for some architectures:

  http://marc.info/?l=linux-kernel&m=121699304403202&w=2

It also breaks POWER and SPARC builds:

  http://marc.info/?l=linux-kernel&m=121730388001890&w=2

Currently, only x86 IOMMUs use this so let's move it to x86 for
now.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-29 12:12:48 +02:00
Avi Kivity
ed84862433 KVM: Advertise synchronized mmu support to userspace
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-29 12:34:02 +03:00
Andrea Arcangeli
e930bffe95 KVM: Synchronize guest physical memory map to host virtual memory map
Synchronize changes to host virtual addresses which are part of
a KVM memory slot to the KVM shadow mmu.  This allows pte operations
like swapping, page migration, and madvise() to transparently work
with KVM.

Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-29 12:33:53 +03:00
Linus Torvalds
5dfb66ba8c Merge branch 'for-linus' of git://git.o-hand.com/linux-mfd
* 'for-linus' of git://git.o-hand.com/linux-mfd:
  mfd: accept pure device as a parent, not only platform_device
  mfd: add platform_data to mfd_cell
  mfd: Coding style fixes
  mfd: Use to_platform_device instead of container_of
2008-07-28 18:15:41 -07:00
Linus Torvalds
1d9b9f6a53 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (21 commits)
  x86/PCI: use dev_printk when possible
  PCI: add D3 power state avoidance quirk
  PCI: fix bogus "'device' may be used uninitialized" warning in pci_slot
  PCI: add an option to allow ASPM enabled forcibly
  PCI: disable ASPM on pre-1.1 PCIe devices
  PCI: disable ASPM per ACPI FADT setting
  PCI MSI: Don't disable MSIs if the mask bit isn't supported
  PCI: handle 64-bit resources better on 32-bit machines
  PCI: rewrite PCI BAR reading code
  PCI: document pci_target_state
  PCI hotplug: fix typo in pcie hotplug output
  x86 gart: replace to_pages macro with iommu_num_pages
  x86, AMD IOMMU: replace to_pages macro with iommu_num_pages
  iommu: add iommu_num_pages helper function
  dma-coherent: add documentation to new interfaces
  Cris: convert to using generic dma-coherent mem allocator
  Sh: use generic per-device coherent dma allocator
  ARM: support generic per-device coherent dma mem
  Generic dma-coherent: fix DMA_MEMORY_EXCLUSIVE
  x86: use generic per-device dma coherent allocator
  ...
2008-07-28 18:14:24 -07:00
Dmitry Baryshkov
424f525a12 mfd: accept pure device as a parent, not only platform_device
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2008-07-29 01:30:26 +02:00
Hisashi Hifumi
8ab22b9abb vfs: pagecache usage optimization for pagesize!=blocksize
When we read some part of a file through pagecache, if there is a
pagecache of corresponding index but this page is not uptodate, read IO
is issued and this page will be uptodate.

I think this is good for pagesize == blocksize environment but there is
room for improvement on pagesize != blocksize environment.  Because in
this case a page can have multiple buffers and even if a page is not
uptodate, some buffers can be uptodate.

So I suggest that when all buffers which correspond to a part of a file
that we want to read are uptodate, use this pagecache and copy data from
this pagecache to user buffer even if a page is not uptodate.  This can
reduce read IO and improve system throughput.

I wrote a benchmark program and got result number with this program.

This benchmark do:

  1: mount and open a test file.

  2: create a 512MB file.

  3: close a file and umount.

  4: mount and again open a test file.

  5: pwrite randomly 300000 times on a test file.  offset is aligned
     by IO size(1024bytes).

  6: measure time of preading randomly 100000 times on a test file.

The result was:
	2.6.26
        330 sec

	2.6.26-patched
        226 sec

Arch:i386
Filesystem:ext3
Blocksize:1024 bytes
Memory: 1GB

On ext3/4, a file is written through buffer/block.  So random read/write
mixed workloads or random read after random write workloads are optimized
with this patch under pagesize != blocksize environment.  This test result
showed this.

The benchmark program is as follows:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <time.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mount.h>

#define LEN 1024
#define LOOP 1024*512 /* 512MB */

main(void)
{
	unsigned long i, offset, filesize;
	int fd;
	char buf[LEN];
	time_t t1, t2;

	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
		perror("cannot mount\n");
		exit(1);
	}
	memset(buf, 0, LEN);
	fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
	if (fd < 0) {
		perror("cannot open file\n");
		exit(1);
	}
	for (i = 0; i < LOOP; i++)
		write(fd, buf, LEN);
	close(fd);
	if (umount("/root/test1/") < 0) {
		perror("cannot umount\n");
		exit(1);
	}
	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
		perror("cannot mount\n");
		exit(1);
	}
	fd = open("/root/test1/testfile", O_RDWR);
	if (fd < 0) {
		perror("cannot open file\n");
		exit(1);
	}

	filesize = LEN * LOOP;
	for (i = 0; i < 300000; i++){
		offset = (random() % filesize) & (~(LEN - 1));
		pwrite(fd, buf, LEN, offset);
	}
	printf("start test\n");
	time(&t1);
	for (i = 0; i < 100000; i++){
		offset = (random() % filesize) & (~(LEN - 1));
		pread(fd, buf, LEN, offset);
	}
	time(&t2);
	printf("%ld sec\n", t2-t1);
	close(fd);
	if (umount("/root/test1/") < 0) {
		perror("cannot umount\n");
		exit(1);
	}
}

Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@ucw.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-28 16:30:21 -07:00
Andrea Arcangeli
cddb8a5c14 mmu-notifiers: core
With KVM/GFP/XPMEM there isn't just the primary CPU MMU pointing to pages.
 There are secondary MMUs (with secondary sptes and secondary tlbs) too.
sptes in the kvm case are shadow pagetables, but when I say spte in
mmu-notifier context, I mean "secondary pte".  In GRU case there's no
actual secondary pte and there's only a secondary tlb because the GRU
secondary MMU has no knowledge about sptes and every secondary tlb miss
event in the MMU always generates a page fault that has to be resolved by
the CPU (this is not the case of KVM where the a secondary tlb miss will
walk sptes in hardware and it will refill the secondary tlb transparently
to software if the corresponding spte is present).  The same way
zap_page_range has to invalidate the pte before freeing the page, the spte
(and secondary tlb) must also be invalidated before any page is freed and
reused.

Currently we take a page_count pin on every page mapped by sptes, but that
means the pages can't be swapped whenever they're mapped by any spte
because they're part of the guest working set.  Furthermore a spte unmap
event can immediately lead to a page to be freed when the pin is released
(so requiring the same complex and relatively slow tlb_gather smp safe
logic we have in zap_page_range and that can be avoided completely if the
spte unmap event doesn't require an unpin of the page previously mapped in
the secondary MMU).

The mmu notifiers allow kvm/GRU/XPMEM to attach to the tsk->mm and know
when the VM is swapping or freeing or doing anything on the primary MMU so
that the secondary MMU code can drop sptes before the pages are freed,
avoiding all page pinning and allowing 100% reliable swapping of guest
physical address space.  Furthermore it avoids the code that teardown the
mappings of the secondary MMU, to implement a logic like tlb_gather in
zap_page_range that would require many IPI to flush other cpu tlbs, for
each fixed number of spte unmapped.

To make an example: if what happens on the primary MMU is a protection
downgrade (from writeable to wrprotect) the secondary MMU mappings will be
invalidated, and the next secondary-mmu-page-fault will call
get_user_pages and trigger a do_wp_page through get_user_pages if it
called get_user_pages with write=1, and it'll re-establishing an updated
spte or secondary-tlb-mapping on the copied page.  Or it will setup a
readonly spte or readonly tlb mapping if it's a guest-read, if it calls
get_user_pages with write=0.  This is just an example.

This allows to map any page pointed by any pte (and in turn visible in the
primary CPU MMU), into a secondary MMU (be it a pure tlb like GRU, or an
full MMU with both sptes and secondary-tlb like the shadow-pagetable layer
with kvm), or a remote DMA in software like XPMEM (hence needing of
schedule in XPMEM code to send the invalidate to the remote node, while no
need to schedule in kvm/gru as it's an immediate event like invalidating
primary-mmu pte).

At least for KVM without this patch it's impossible to swap guests
reliably.  And having this feature and removing the page pin allows
several other optimizations that simplify life considerably.

Dependencies:

1) mm_take_all_locks() to register the mmu notifier when the whole VM
   isn't doing anything with "mm".  This allows mmu notifier users to keep
   track if the VM is in the middle of the invalidate_range_begin/end
   critical section with an atomic counter incraese in range_begin and
   decreased in range_end.  No secondary MMU page fault is allowed to map
   any spte or secondary tlb reference, while the VM is in the middle of
   range_begin/end as any page returned by get_user_pages in that critical
   section could later immediately be freed without any further
   ->invalidate_page notification (invalidate_range_begin/end works on
   ranges and ->invalidate_page isn't called immediately before freeing
   the page).  To stop all page freeing and pagetable overwrites the
   mmap_sem must be taken in write mode and all other anon_vma/i_mmap
   locks must be taken too.

2) It'd be a waste to add branches in the VM if nobody could possibly
   run KVM/GRU/XPMEM on the kernel, so mmu notifiers will only enabled if
   CONFIG_KVM=m/y.  In the current kernel kvm won't yet take advantage of
   mmu notifiers, but this already allows to compile a KVM external module
   against a kernel with mmu notifiers enabled and from the next pull from
   kvm.git we'll start using them.  And GRU/XPMEM will also be able to
   continue the development by enabling KVM=m in their config, until they
   submit all GRU/XPMEM GPLv2 code to the mainline kernel.  Then they can
   also enable MMU_NOTIFIERS in the same way KVM does it (even if KVM=n).
   This guarantees nobody selects MMU_NOTIFIER=y if KVM and GRU and XPMEM
   are all =n.

The mmu_notifier_register call can fail because mm_take_all_locks may be
interrupted by a signal and return -EINTR.  Because mmu_notifier_reigster
is used when a driver startup, a failure can be gracefully handled.  Here
an example of the change applied to kvm to register the mmu notifiers.
Usually when a driver startups other allocations are required anyway and
-ENOMEM failure paths exists already.

 struct  kvm *kvm_arch_create_vm(void)
 {
        struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
+       int err;

        if (!kvm)
                return ERR_PTR(-ENOMEM);

        INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);

+       kvm->arch.mmu_notifier.ops = &kvm_mmu_notifier_ops;
+       err = mmu_notifier_register(&kvm->arch.mmu_notifier, current->mm);
+       if (err) {
+               kfree(kvm);
+               return ERR_PTR(err);
+       }
+
        return kvm;
 }

mmu_notifier_unregister returns void and it's reliable.

The patch also adds a few needed but missing includes that would prevent
kernel to compile after these changes on non-x86 archs (x86 didn't need
them by luck).

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix mm/filemap_xip.c build]
[akpm@linux-foundation.org: fix mm/mmu_notifier.c build]
Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Kanoj Sarcar <kanojsarcar@yahoo.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Chris Wright <chrisw@redhat.com>
Cc: Marcelo Tosatti <marcelo@kvack.org>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Izik Eidus <izike@qumranet.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-28 16:30:21 -07:00
Andrea Arcangeli
7906d00cd1 mmu-notifiers: add mm_take_all_locks() operation
mm_take_all_locks holds off reclaim from an entire mm_struct.  This allows
mmu notifiers to register into the mm at any time with the guarantee that
no mmu operation is in progress on the mm.

This operation locks against the VM for all pte/vma/mm related operations
that could ever happen on a certain mm.  This includes vmtruncate,
try_to_unmap, and all page faults.

The caller must take the mmap_sem in write mode before calling
mm_take_all_locks().  The caller isn't allowed to release the mmap_sem
until mm_drop_all_locks() returns.

mmap_sem in write mode is required in order to block all operations that
could modify pagetables and free pages without need of altering the vma
layout (for example populate_range() with nonlinear vmas).  It's also
needed in write mode to avoid new anon_vmas to be associated with existing
vmas.

A single task can't take more than one mm_take_all_locks() in a row or it
would deadlock.

mm_take_all_locks() and mm_drop_all_locks are expensive operations that
may have to take thousand of locks.

mm_take_all_locks() can fail if it's interrupted by signals.

When mmu_notifier_register returns, we must be sure that the driver is
notified if some task is in the middle of a vmtruncate for the 'mm' where
the mmu notifier was registered (mmu_notifier_invalidate_range_start/end
is run around the vmtruncation but mmu_notifier_register can run after
mmu_notifier_invalidate_range_start and before
mmu_notifier_invalidate_range_end).  Same problem for rmap paths.  And
we've to remove page pinning to avoid replicating the tlb_gather logic
inside KVM (and GRU doesn't work well with page pinning regardless of
needing tlb_gather), so without mm_take_all_locks when vmtruncate frees
the page, kvm would have no way to notice that it mapped into sptes a page
that is going into the freelist without a chance of any further
mmu_notifier notification.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Kanoj Sarcar <kanojsarcar@yahoo.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Chris Wright <chrisw@redhat.com>
Cc: Marcelo Tosatti <marcelo@kvack.org>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Izik Eidus <izike@qumranet.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-28 16:30:21 -07:00
Andrea Arcangeli
6beeac76f5 mmu-notifiers: add list_del_init_rcu()
Introduce list_del_init_rcu() and document it.

Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Kanoj Sarcar <kanojsarcar@yahoo.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Chris Wright <chrisw@redhat.com>
Cc: Marcelo Tosatti <marcelo@kvack.org>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Izik Eidus <izike@qumranet.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-28 16:30:20 -07:00
Mike Rapoport
56edb58be1 mfd: add platform_data to mfd_cell
Adding platform_data to mfd_cell allows passing of platform data directly
to the platform_device created for each cell and thus reuse of existing
drivers.
On the other side it can be used as a hook to mfd_cell itself
removing the need in mfd_get_cell method.

Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Acked-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2008-07-29 01:23:32 +02:00
Alan Cox
979b1791e5 PCI: add D3 power state avoidance quirk
Libata has some hacks to deal with certain controllers going silly in D3
state. The right way to handle this is to keep a PCI device flag for
such devices. That can then be generalised for no ATA devices with power
problems.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-07-28 15:12:11 -07:00
Shaohua Li
149e16372a PCI: disable ASPM on pre-1.1 PCIe devices
Disable ASPM on pre-1.1 PCIe devices, as many of them don't implement it
correctly.

Tested-by: Jack Howarth <howarth@bromo.msbb.uc.edu>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-07-28 14:56:57 -07:00
Shaohua Li
5fde244d39 PCI: disable ASPM per ACPI FADT setting
The ACPI FADT table includes an ASPM control bit. If the bit is set, do
not enable ASPM since it may indicate that the platform doesn't actually
support the feature.

Tested-by: Jack Howarth <howarth@bromo.msbb.uc.edu>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-07-28 14:56:09 -07:00
Ingo Molnar
9e3ee1c39c Merge branch 'linus' into cpus4096
Conflicts:

	kernel/stop_machine.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 23:32:00 +02:00
Jesse Barnes
29111f579f Merge branch 'x86/iommu' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip into for-linus 2008-07-28 14:31:10 -07:00
Linus Torvalds
e56b3bc794 cpu masks: optimize and clean up cpumask_of_cpu()
Clean up and optimize cpumask_of_cpu(), by sharing all the zero words.

Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns
creating a huge array of constant bitmasks, realize that the zero words
can be shared.

In other words, on a 64-bit architecture, we only ever need 64 of these
arrays - with a different bit set in one single world (with enough zero
words around it so that we can create any bitmask by just offsetting in
that big array). And then we just put enough zeroes around it that we
can point every single cpumask to be one of those things.

So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each,
with one bit set in each array - 2MB memory total), we have exactly 64
arrays instead, each 8k bits in size (64kB total).

And then we just point cpumask(n) to the right position (which we can
calculate dynamically). Once we have the right arrays, getting
"cpumask(n)" ends up being:

  static inline const cpumask_t *get_cpu_mask(unsigned int cpu)
  {
          const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG];
          p -= cpu / BITS_PER_LONG;
          return (const cpumask_t *)p;
  }

This brings other advantages and simplifications as well:

 - we are not wasting memory that is just filled with a single bit in
   various different places

 - we don't need all those games to re-create the arrays in some dense
   format, because they're already going to be dense enough.

if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory
is a non-issue (especially since by doing this "overlapping" trick we
probably get better cache behaviour anyway).

[ mingo@elte.hu:

  Converted Linus's mails into a commit. See:

     http://lkml.org/lkml/2008/7/27/156
     http://lkml.org/lkml/2008/7/28/320

  Also applied a family filter - which also has the side-effect of leaving
  out the bits where Linus calls me an idio... Oh, never mind ;-)
]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 22:20:41 +02:00
Ingo Molnar
414f746d23 Merge branch 'linus' into cpus4096 2008-07-28 21:14:43 +02:00
Linus Torvalds
f934fb19ef Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: add driver for Atmel integrated touchscreen controller
  Input: ads7846 - optimize order of calculating Rt in ads7846_rx()
  Input: ads7846 - fix sparse endian warnings
  Input: uinput - remove duplicate include
  Input: serio - offload resume to kseriod
  Input: serio - mark serio_register_driver() __must_check
2008-07-28 09:59:26 -07:00
Ben Dooks
7f71ac9374 mfd: Coding style fixes
Fix some coding style fixes in the mfd core driver.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2008-07-28 18:29:09 +02:00
Linus Torvalds
d9089c296b Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (25 commits)
  powerpc: Disable 64K hugetlb support when doing 64K SPU mappings
  powerpc/powermac: Fixup default serial port device for pmac_zilog
  powerpc/powermac: Use sane default baudrate for SCC debugging
  powerpc/mm: Implement _PAGE_SPECIAL & pte_special() for 64-bit
  powerpc: Show processor cache information in sysfs
  powerpc: Make core id information available to userspace
  powerpc: Make core sibling information available to userspace
  powerpc/vio: More fallout from dma_mapping_error API change
  ibmveth: Fix multiple errors with dma_mapping_error conversion
  powerpc/pseries: Fix CMO sysdev attribute API change fallout
  powerpc: Enable tracehook for the architecture
  powerpc: Add TIF_NOTIFY_RESUME support for tracehook
  powerpc: Add asm/syscall.h with the tracehook entry points
  powerpc: Make syscall tracing use tracehook.h helpers
  powerpc: Call tracehook_signal_handler() when setting up signal frames
  powerpc: Update cpu_sibling_maps dynamically
  powerpc: register_cpu_online should be __cpuinit
  powerpc: kill useless SMT code in prom_hold_cpus
  powerpc: Fix 8xx build failure
  powerpc: Fix vio build warnings
  ...
2008-07-28 09:05:35 -07:00
Linus Torvalds
b10a8b7238 Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (72 commits)
  sh: SuperH Mobile CEU and camera platform data for AP325RXA
  sh: Update smc911x platform data for AP325RXA
  sh: SuperH Mobile LCDC platform data for AP325RXA
  sh: Add SuperH Mobile CEU platform data for Migo-R
  sh: Add SuperH Mobile LCDC platform data for Migo-R
  sh: Move asid_cache() out of ifdef to fix SH-3/4 nommu build.
  sh: Workaround for __put_user_asm() bug with gcc 4.x on big-endian.
  sh: Wire up new syscalls.
  sh: fix uImage Entry Point
  sh_keysc: remove request_mem_region() and release_mem_region()
  sh: Don't miss pending signals returning to user mode after signal processing
  sh: Use clk_always_enable() on sh7366
  sh: Use clk_always_enable() on sh7343 / SE77343
  sh: Use clk_always_enable() on sh7722 / Migo-R / SE7722
  sh: Use clk_always_enable() on sh7723 / ap325rxa
  sh: Introduce clk_always_enable() function
  sh: Show all clocks and their state in /proc/clocks
  sh: Merge sh7343 and sh7722 clock code
  sh: Add SuperH Mobile MSTPCR bits to clock framework
  sh: Use arch_flags to simplify sh7722 siu clock code
  ...
2008-07-28 08:41:13 -07:00
Linus Torvalds
37eaf8c746 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  stop_machine: fix up ftrace.c
  stop_machine: Wean existing callers off stop_machine_run()
  stop_machine(): stop_machine_run() changed to use cpu mask
  Hotplug CPU: don't check cpu_online after take_cpu_down
  Simplify stop_machine
  stop_machine: add ALL_CPUS option
  module: fix build warning with !CONFIG_KALLSYMS
2008-07-28 08:37:46 -07:00
Jeremy Fitzhardinge
d974ae379a generic, memparse(): constify argument
memparse()'s first argument can be const, so it should be.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 15:05:23 +02:00
Adrian McMenamin
306cfd630a maple: tidy maple_driver code by removing redundant connect/disconnect
The connect and disconnect functions are unnecessary - everything they do can be
accomplished in the initial probe - so remove them.

Signed-off-by: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2008-07-28 18:10:30 +09:00
Barry Naujok
9403540c06 dcache: Add case-insensitive support d_ci_add() routine
This add a dcache entry to the dcache for lookup, but changing the name
that is associated with the entry rather than the one passed in to the
lookup routine.

First, it sees if the case-exact match already exists in the dcache and
uses it if one exists. Otherwise, it allocates a new node with the new
name and splices it into the dcache.

Original code from ntfs_lookup in fs/ntfs/namei.c by Anton Altaparmakov.

Signed-off-by: Barry Naujok <bnaujok@sgi.com>
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Acked-by: Christoph Hellwig <hch@infradead.org>
2008-07-28 16:58:39 +10:00
Benjamin Herrenschmidt
d65d830ca0 Merge commit 'gcl/gcl-next' 2008-07-28 16:30:40 +10:00
Rusty Russell
eeec4fad96 stop_machine(): stop_machine_run() changed to use cpu mask
Instead of a "cpu" arg with magic values NR_CPUS (any cpu) and ~0 (all
cpus), pass a cpumask_t.  Allow NULL for the common case (where we
don't care which CPU the function is run on): temporary cpumask_t's
are usually considered bad for stack space.

This deprecates stop_machine_run, to be removed soon when all the
callers are dead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-28 12:16:30 +10:00
Rusty Russell
ffdb5976c4 Simplify stop_machine
stop_machine creates a kthread which creates kernel threads.  We can
create those threads directly and simplify things a little.  Some care
must be taken with CPU hotunplug, which has special needs, but that code
seems more robust than it was in the past.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
2008-07-28 12:16:29 +10:00
Jason Baron
5c2aed6225 stop_machine: add ALL_CPUS option
-allow stop_mahcine_run() to call a function on all cpus. Calling
 stop_machine_run() with a 'ALL_CPUS' invokes this new behavior.
 stop_machine_run() proceeds as normal until the calling cpu has
 invoked 'fn'. Then, we tell all the other cpus to call 'fn'.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
CC: Adrian Bunk <bunk@stusta.de>
CC: Andi Kleen <andi@firstfloor.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Christoph Hellwig <hch@infradead.org>
CC: mingo@elte.hu
CC: akpm@osdl.org
2008-07-28 12:16:28 +10:00
Mauro Carvalho Chehab
c2f90e9536 Merge ../linux-2.6 2008-07-27 22:23:18 -03:00
Andrea Righi
940389b8af task IO accounting: move all IO statistics in struct task_io_accounting
Simplify the code of include/linux/task_io_accounting.h.

It is also more reasonable to have all the task i/o-related statistics in a
single struct (task_io_accounting).

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-27 16:12:28 -07:00
Mauro Carvalho Chehab
eb703027ac Merge ../linux-2.6 2008-07-27 18:11:53 -03:00
Linus Torvalds
837b41b5de Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
  firewire: state userland requirements in Kconfig help
  firewire: avoid memleak after phy config transmit failure
  firewire: fw-ohci: TSB43AB22/A dualbuffer workaround
  firewire: queue the right number of data
  firewire: warn on unfinished transactions during card removal
  firewire: small fw_fill_request cleanup
  firewire: fully initialize fw_transaction before marking it pending
  firewire: fix race of bus reset with request transmission
2008-07-27 10:24:06 -07:00
Andrea Righi
5995477ab7 task IO accounting: improve code readability
Put all i/o statistics in struct proc_io_accounting and use inline functions to
initialize and increment statistics, removing a lot of single variable
assignments.

This also reduces the kernel size as following (with CONFIG_TASK_XACCT=y and
CONFIG_TASK_IO_ACCOUNTING=y).

    text    data     bss     dec     hex filename
   11651       0       0   11651    2d83 kernel/exit.o.before
   11619       0       0   11619    2d63 kernel/exit.o.after
   10886     132     136   11154    2b92 kernel/fork.o.before
   10758     132     136   11026    2b12 kernel/fork.o.after

 3082029  807968 4818600 8708597  84e1f5 vmlinux.o.before
 3081869  807968 4818600 8708437  84e155 vmlinux.o.after

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Acked-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-27 09:58:20 -07:00
Mauro Carvalho Chehab
50cb993ea6 Merge ../linux-2.6 2008-07-27 12:25:57 -03:00
Mauro Carvalho Chehab
9fa0f6db3a V4L/DVB (8522): videodev2: Fix merge conflict
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2008-07-27 12:24:02 -03:00
Hans Verkuil
c1d7f4f164 V4L/DVB (8524): videodev: copy the VID_TYPE defines to videodev.h
The VID_TYPE defines are V4L1 specific, so copy them back to videodev.h.
In videodev2.h ensure that they are not used in the kernel (you need
to include videodev.h instead) and mark them are deprecated.

Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2008-07-27 11:07:12 -03:00
Jean-Francois Moine
1250ac6d4a V4L/DVB (8518): gspca: Remove the remaining frame decoding functions from the subdrivers.
SPCA505 and SPCA508 added in the pixel formats.
Decode functions and associated resources removed in spca505, 506 and 508.
The decode routines are now found in the V4L library.

Signed-off-by: Jean-Francois Moine <moinejf@free.fr>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2008-07-27 11:06:42 -03:00
Mauro Carvalho Chehab
9993e51c0c V4L/DVB (8502): videodev2.h: CodingStyle cleanups
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2008-07-27 11:06:20 -03:00
Linus Torvalds
8be1a6d6c7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  mlx4: Update/add Mellanox Technologies copyright lines to mlx4 driver files
  mlx4_core: Add VLAN tag field to WQE control segment struct
  RDMA/nes: CM connection setup/teardown rework
  IPoIB: Correct help text for INFINIBAND_IPOIB_DEBUG
  IPoIB/cm: Connected mode is no longer EXPERIMENTAL
  RDMA/ucm: BKL is not needed for ib_ucm_open()
  RDMA/ucma: BKL is not needed for ucma_open()
2008-07-26 20:40:36 -07:00
Linus Torvalds
9ee08c2df4 Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6: (57 commits)
  [MTD] [NAND] subpage read feature as a way to increase performance. 
  CPUFREQ: S3C24XX NAND driver frequency scaling support.
  [MTD][NAND] au1550nd: remove unused variable
  [MTD] jedec_probe: Fix SST 16-bit chip detection
  [MTD][MTDPART] Fix a division by zero bug
  [MTD][MTDPART] Cleanup and document the erase region handling
  [MTD][MTDPART] Handle most checkpatch findings
  [MTD][MTDPART] Seperate main loop from per-partition code in add_mtd_partition
  [MTD] physmap: resume already suspended chips on failure to suspend
  [MTD] physmap: Fix suspend/resume/shutdown bugs.
  [MTD] [NOR] Fix -ETIMEO errors in CFI driver
  [MTD] [NAND] fsl_elbc_nand: fix section mismatch with CONFIG_MTD_OF_PARTS=y
  [JFFS2] Use .unlocked_ioctl
  [MTD] Fix const assignment in the MTD command line partitioning driver
  [MTD] [NOR] gen_probe: No debug message when debugging is disabled
  [MTD] [NAND] remove __PPC__ hardcoded address from DiskOnChip drivers
  [MTD] [MAPS] Remove the bast-flash driver.
  [MTD] [NAND] fsl_elbc_nand: ecclayout cleanups
  [MTD] [NAND] fsl_elbc_nand: implement support for flash-based BBT
  [MTD] [NAND] fsl_elbc_nand: fix OOB workability for large page NAND chips
  ...
2008-07-26 20:30:56 -07:00
Linus Torvalds
eaf0ba5ef6 Merge branch 'tracehook' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace
* 'tracehook' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace:
  tracehook: comment fixes
2008-07-26 20:29:39 -07:00
Linus Torvalds
bdee6ac7d1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
  atmel-mci: debugfs support
  mmc: Add per-card debugfs support
  mmc: Export internal host state through debugfs
  imxmmc: fix crash when no platform data is provided
  imxmmc: fix platform resources
  imxmmc: remove DEBUG definition
  mmc_spi: put signals to low power off fix
2008-07-26 20:27:31 -07:00
Linus Torvalds
4836e30078 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (39 commits)
  [PATCH] fix RLIM_NOFILE handling
  [PATCH] get rid of corner case in dup3() entirely
  [PATCH] remove remaining namei_{32,64}.h crap
  [PATCH] get rid of indirect users of namei.h
  [PATCH] get rid of __user_path_lookup_open
  [PATCH] f_count may wrap around
  [PATCH] dup3 fix
  [PATCH] don't pass nameidata to __ncp_lookup_validate()
  [PATCH] don't pass nameidata to gfs2_lookupi()
  [PATCH] new (local) helper: user_path_parent()
  [PATCH] sanitize __user_walk_fd() et.al.
  [PATCH] preparation to __user_walk_fd cleanup
  [PATCH] kill nameidata passing to permission(), rename to inode_permission()
  [PATCH] take noexec checks to very few callers that care
  Re: [PATCH 3/6] vfs: open_exec cleanup
  [patch 4/4] vfs: immutable inode checking cleanup
  [patch 3/4] fat: dont call notify_change
  [patch 2/4] vfs: utimes cleanup
  [patch 1/4] vfs: utimes: move owner check into inode_change_ok()
  [PATCH] vfs: use kstrdup() and check failing allocation
  ...
2008-07-26 20:23:44 -07:00
Linus Torvalds
5c7c204aec Merge git://git.kernel.org/pub/scm/linux/kernel/git/kkeil/ISDN-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/kkeil/ISDN-2.6:
  Add layer1 over IP support
  Add mISDN HFC multiport driver
  Add mISDN HFC PCI driver
  Add mISDN DSP
  Add mISDN core files
  Define AF_ISDN and PF_ISDN
  Add mISDN driver
2008-07-26 20:19:41 -07:00
Linus Torvalds
2284284281 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  netns: fix ip_rt_frag_needed rt_is_expired
  netfilter: nf_conntrack_extend: avoid unnecessary "ct->ext" dereferences
  netfilter: fix double-free and use-after free
  netfilter: arptables in netns for real
  netfilter: ip{,6}tables_security: fix future section mismatch
  selinux: use nf_register_hooks()
  netfilter: ebtables: use nf_register_hooks()
  Revert "pkt_sched: sch_sfq: dump a real number of flows"
  qeth: use dev->ml_priv instead of dev->priv
  syncookies: Make sure ECN is disabled
  net: drop unused BUG_TRAP()
  net: convert BUG_TRAP to generic WARN_ON
  drivers/net: convert BUG_TRAP to generic WARN_ON
2008-07-26 20:17:56 -07:00
Andrea Righi
510a35d4a4 hugetlb: remove unused variable warning
Remove the following warning when CONFIG_HUGETLB_PAGE is not set:

	ipc/shm.c: In function `shm_get_stat':
	ipc/shm.c:565: warning: unused variable `h'

[akpm@linux-foundation.org: use tabs, not spaces]
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 20:16:47 -07:00
Al Viro
3f8206d496 [PATCH] get rid of indirect users of namei.h
fs.h needs path.h, not namei.h; nfs_fs.h doesn't need it at all.
Several places in the tree needed direct include.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:42 -04:00
Al Viro
964bd18362 [PATCH] get rid of __user_path_lookup_open
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:41 -04:00
Al Viro
516e0cc564 [PATCH] f_count may wrap around
make it atomic_long_t; while we are at it, get rid of useless checks in affs,
hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:40 -04:00
Al Viro
2d8f30380a [PATCH] sanitize __user_walk_fd() et.al.
* do not pass nameidata; struct path is all the callers want.
* switch to new helpers:
	user_path_at(dfd, pathname, flags, &path)
	user_path(pathname, &path)
	user_lpath(pathname, &path)
	user_path_dir(pathname, &path)  (fail if not a directory)
  The last 3 are trivial macro wrappers for the first one.
* remove nameidata in callers.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:34 -04:00
Al Viro
f419a2e3b6 [PATCH] kill nameidata passing to permission(), rename to inode_permission()
Incidentally, the name that gives hundreds of false positives on grep
is not a good idea...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:31 -04:00
Miklos Szeredi
9767d74957 [patch 1/4] vfs: utimes: move owner check into inode_change_ok()
Add a new ia_valid flag: ATTR_TIMES_SET, to handle the
UTIMES_OMIT/UTIMES_NOW and UTIMES_NOW/UTIMES_OMIT cases.  In these
cases neither ATTR_MTIME_SET nor ATTR_ATIME_SET is in the flags, yet
the POSIX draft specifies that permission checking is performed the
same way as if one or both of the times was explicitly set to a
timestamp.

See the path "vfs: utimensat(): fix error checking for
{UTIME_NOW,UTIME_OMIT} case" by Michael Kerrisk for the patch
introducing this behavior.

This is a cleanup, as well as allowing filesystems (NFS/fuse/...) to
perform their own permission checking instead of the default.

CC: Ulrich Drepper <drepper@redhat.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:25 -04:00
Li Zefan
88b387824f [PATCH] vfs: use kstrdup() and check failing allocation
- use kstrdup() instead of kmalloc() + memcpy()
- return NULL if allocating ->mnt_devname failed
- mnt_devname should be const

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:24 -04:00
Al Viro
b77b0646ef [PATCH] pass MAY_OPEN to vfs_permission() explicitly
... and get rid of the last "let's deduce mask from nameidata->flags"
bit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:22 -04:00
Al Viro
a110343f0d [PATCH] fix MAY_CHDIR/MAY_ACCESS/LOOKUP_ACCESS mess
* MAY_CHDIR is redundant - it's an equivalent of MAY_ACCESS
* MAY_ACCESS on fuse should affect only the last step of pathname resolution
* fchdir() and chroot() should pass MAY_ACCESS, for the same reason why
  chdir() needs that.
* now that we pass MAY_ACCESS explicitly in all cases, LOOKUP_ACCESS can be
  removed; it has no business being in nameidata.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:21 -04:00
Al Viro
7f2da1e7d0 [PATCH] kill altroot
long overdue...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:20 -04:00
Al Viro
8bb79224b8 [PATCH] permission checks for chdir need special treatment only on the last step
... so we ought to pass MAY_CHDIR to vfs_permission() instead of having
it triggered on every step of preceding pathname resolution.  LOOKUP_CHDIR
is killed by that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:19 -04:00
Miklos Szeredi
db2e747b14 [patch 5/5] vfs: remove mode parameter from vfs_symlink()
Remove the unused mode parameter from vfs_symlink and callers.

Thanks to Tetsuo Handa for noticing.

CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-07-26 20:53:18 -04:00
Miklos Szeredi
2f1936b877 [patch 3/5] vfs: change remove_suid() to file_remove_suid()
All calls to remove_suid() are made with a file pointer, because
(similarly to file_update_time) it is called when the file is written.

Clean up callers by passing in a file instead of a dentry.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-07-26 20:53:16 -04:00
Al Viro
e6305c43ed [PATCH] sanitize ->permission() prototype
* kill nameidata * argument; map the 3 bits in ->flags anybody cares
  about to new MAY_... ones and pass with the mask.
* kill redundant gfs2_iop_permission()
* sanitize ecryptfs_permission()
* fix remaining places where ->permission() instances might barf on new
  MAY_... found in mask.

The obvious next target in that direction is permission(9)

folded fix for nfs_permission() breakage from Miklos Szeredi <mszeredi@suse.cz>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:14 -04:00
Al Viro
9043476f72 [PATCH] sanitize proc_sysctl
* keep references to ctl_table_head and ctl_table in /proc/sys inodes
* grab the former during operations, use the latter for access to
  entry if that succeeds
* have ->d_compare() check if table should be seen for one who does lookup;
  that allows us to avoid flipping inodes - if we have the same name resolve
  to different things, we'll just keep several dentries and ->d_compare()
  will reject the wrong ones.
* have ->lookup() and ->readdir() scan the table of our inode first, then
  walk all ctl_table_header and scan ->attached_by for those that are
  attached to our directory.
* implement ->getattr().
* get rid of insane amounts of tree-walking
* get rid of the need to know dentry in ->permission() and of the contortions
  induced by that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:12 -04:00
Al Viro
ae7edecc9b [PATCH] sysctl: keep track of tree relationships
In a sense, that's the heart of the series.  It's based on the following
property of the trees we are actually asked to add: they can be split into
stem that is already covered by registered trees and crown that is entirely
new.  IOW, if a/b and a/c/d are introduced by our tree, then a/c is also
introduced by it.

That allows to associate tree and table entry with each node in the union;
while directory nodes might be covered by many trees, only one will cover
the node by its crown.  And that will allow much saner logics for /proc/sys
in the next patches.  This patch introduces the data structures needed to
keep track of that.

When adding a sysctl table, we find a "parent" one.  Which is to say,
find the deepest node on its stem that already is present in one of the
tables from our table set or its ancestor sets.  That table will be our
parent and that node in it - attachment point.  Add our table to list
anchored in parent, have it refer the parent and contents of attachment
point.  Also remember where its crown lives.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:11 -04:00
Al Viro
f7e6ced406 [PATCH] allow delayed freeing of ctl_table_header
Refcount the sucker; instead of freeing it by the end of unregistration
just drop the refcount and free only when it hits zero.  Make sure that
we _always_ make ->unregistering non-NULL in start_unregistering().

That allows anybody to get a reference to such puppy, preventing its
freeing and reuse.  It does *not* block unregistration.  Anybody who
holds such a reference can
	* try to grab a "use" reference (ctl_head_grab()); that will
succeeds if and only if it hadn't entered unregistration yet.  If it
succeeds, we can use it in all normal ways until we release the "use"
reference (with ctl_head_finish()).  Note that this relies on having
->unregistering become non-NULL in all cases when one starts to unregister
the sucker.
	* keep pointers to ctl_table entries; they *can* be freed if
the entire thing is unregistered.  However, if ctl_head_grab() succeeds,
we know that unregistration had not happened (and will not happen until
ctl_head_finish()) and such pointers can be used safely.

IOW, now we can have inodes under /proc/sys keep references to ctl_table
entries, protecting them with references to ctl_table_header and
grabbing the latter for the duration of operations that require access
to ctl_table.  That won't cause deadlocks, since unregistration will not
be stopped by mere keeping a reference to ctl_table_header.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:09 -04:00
Al Viro
734550921e [PATCH] beginning of sysctl cleanup - ctl_table_set
New object: set of sysctls [currently - root and per-net-ns].
Contains: pointer to parent set, list of tables and "should I see this set?"
method (->is_seen(set)).
Current lists of tables are subsumed by that; net-ns contains such a beast.
->lookup() for ctl_table_root returns pointer to ctl_table_set instead of
that to ->list of that ctl_table_set.

[folded compile fixes by rdd for configs without sysctl]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:08 -04:00
Denys Vlasenko
d2d9648ec6 [PATCH] reuse xxx_fifo_fops for xxx_pipe_fops
Merge fifo and pipe file_operations.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:06 -04:00
Pekka Enberg
93bc4e89c2 netfilter: fix double-free and use-after free
As suggested by Patrick McHardy, introduce a __krealloc() that doesn't
free the original buffer to fix a double-free and use-after-free bug
introduced by me in netfilter that uses RCU.

Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Tested-by: Dieter Ries <clip2@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-26 17:49:33 -07:00
Karsten Keil
af69fb3a8f Add mISDN HFC multiport driver
Enable support for cards with Cologne Chip AG's HFC multiport
chip.

Signed-off-by: Karsten Keil <kkeil@suse.de>
2008-07-27 02:00:43 +02:00
Karsten Keil
960366cf8d Add mISDN DSP
Enable support for digital audio processing capability.
This module may be used for special applications that require
cross connecting of bchannels, conferencing, dtmf decoding
echo cancelation, tone generation, and Blowfish encryption and
decryption.
It may use hardware features if available.

Signed-off-by: Karsten Keil <kkeil@suse.de>
2008-07-27 01:56:38 +02:00
Karsten Keil
1b2b03f8e5 Add mISDN core files
Add mISDN core files

Signed-off-by: Karsten Keil <kkeil@suse.de>
2008-07-27 01:54:58 +02:00
Karsten Keil
04578dd330 Define AF_ISDN and PF_ISDN
Define the address and protocol family value for mISDN.

Signed-off-by: Karsten Keil <kkeil@suse.de>
2008-07-27 01:47:00 +02:00
Haavard Skinnemoen
f4b7f927b5 mmc: Add per-card debugfs support
For each card successfully added to the bus, create a subdirectory under
the host's debugfs root with information about the card.

At the moment, only a single file is added to the card directory for
all cards: "state". It reflects the "state" field in struct mmc_card,
indicating whether the card is present, readonly, etc.

For MMC and SD cards (not SDIO), another file is added: "status".
Reading this file will ask the card about its current status and
return it. This can be useful if the card just refuses to respond to
any commands, which might indicate that the card state is not what the
MMC core thinks it is (due to a missing stop command, for example.)

Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2008-07-27 01:26:17 +02:00
Haavard Skinnemoen
6edd8ee60a mmc: Export internal host state through debugfs
When CONFIG_DEBUG_FS is set, create a few files under /sys/kernel/debug
containing information about an mmc host's internal state. Currently,
just a single file is created, "ios", which contains information about
the current operating parameters for the bus (clock speed, bus width,
etc.)

Host drivers can add additional files and directories under the host's
root directory by passing the debugfs_root field in struct mmc_host as
the 'parent' parameter to debugfs_create_*.

Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2008-07-27 01:26:16 +02:00
Roland McGrath
a9906a1919 tracehook: comment fixes
This fixes some typos and errors in <linux/tracehook.h> comments.
No code changes.

Signed-off-by: Roland McGrath <roland@redhat.com>
2008-07-26 14:41:26 -07:00
Linus Torvalds
fb3b806144 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, AMD IOMMU: include amd_iommu_last_bdf in device initialization
  x86: fix IBM Summit based systems' phys_cpu_present_map on 32-bit kernels
  x86, RDC321x: remove gpio.h complications
  x86, RDC321x: add to mach-default
  crashdump: fix undefined reference to `elfcorehdr_addr'
  flag parameters: fix compile error of sys_epoll_create1
2008-07-26 13:25:05 -07:00
Adrian Bunk
9580d85f9c drivers/char/rtc.c: make 2 functions static
The following functions can now become static:
 - rtc_interrupt()
 - rtc_get_rtc_time()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Bernhard Walle <bwalle@suse.de>
Acked-by: Paul Gortmaker <p_gortmaker@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:12 -07:00
Adrian Bunk
7c363b8c65 mm/swapfile.c: make code static
This patch makes the following needlessly global code static:
 - swap_lock
 - nr_swapfiles
 - struct swap_list

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:12 -07:00
Adrian Bunk
15f59adae0 make mm/memory.c:print_bad_pte() static
This patch makes the needlessly global print_bad_pte() static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:12 -07:00
Adrian Bunk
9d8fddfb17 mm/allocpercpu.c: make 4 functions static
This patch makes the following needlessly global functions static:
 - percpu_depopulate()
 - __percpu_depopulate_mask()
 - percpu_populate()
 - __percpu_populate_mask()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:12 -07:00
Roland McGrath
bbc698636e task_current_syscall
This adds the new function task_current_syscall() on machines where the
asm/syscall.h interface is supported (CONFIG_HAVE_ARCH_TRACEHOOK).  It's
exported for modules to use in the future.  This function safely samples
the state of a blocked thread to collect what system call it is blocked
in, and the six system call argument registers.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:10 -07:00
Roland McGrath
85ba2d862e tracehook: wait_task_inactive
This extends wait_task_inactive() with a new argument so it can be used in
a "soft" mode where it will check for the task changing state unexpectedly
and back off.  There is no change to existing callers.  This lays the
groundwork to allow robust, noninvasive tracing that can try to sample a
blocked thread but back off safely if it wakes up.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:09 -07:00
Roland McGrath
828c365cc8 tracehook: asm/syscall.h
This adds asm-generic/syscall.h, which documents what a real
asm-ARCH/syscall.h file should define.  This is not used yet, but will
provide all the machine-dependent details of examining a user system call
about to begin, in progress, or just ended.

Each arch should add an asm-ARCH/syscall.h that defines all the entry
points documented in asm-generic/syscall.h, as short inlines if possible.
This lets us write new tracing code that understands user system call
registers, without any new arch-specific work.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:09 -07:00
Roland McGrath
64b1208d5b tracehook: TIF_NOTIFY_RESUME
This adds tracehook.h inlines to enable a new arch feature in support of
user debugging/tracing.  This is not used yet, but it lays the groundwork
for a debugger to be able to wrangle a task that's possibly running,
without interrupting its syscalls in progress.

Each arch should define TIF_NOTIFY_RESUME, and in their entry.S code treat
it much like TIF_SIGPENDING.  That is, it causes you to take the slow path
when returning to user mode, where you get the full user-mode state
accessible as for signal handling or ptrace.  The arch code should check
TIF_NOTIFY_RESUME after handling TIF_SIGPENDING.  When it's set, clear it
and then call tracehook_notify_resume().

In future, tracing code will call set_notify_resume() when it wants to get
a callback in tracehook_notify_resume().

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:09 -07:00
Roland McGrath
b787f7ba67 tracehook: force signal_pending()
This defines a new hook tracehook_force_sigpending() that lets tracing
code decide to force TIF_SIGPENDING on in recalc_sigpending().

This is not used yet, so it compiles away to nothing for now.  It lays the
groundwork for new tracing code that can interrupt a task synthetically
without actually sending a signal.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:09 -07:00
Roland McGrath
2b2a1ff64a tracehook: death
This moves the ptrace logic in task death (exit_notify) into tracehook.h
inlines.  Some code is rearranged slightly to make things nicer.  There is
no change, only cleanup.

There is one hook called with the tasklist_lock write-locked, as ptrace
needs.  There is also a new hook called after exit_state changes and
without locks.  This is a better place for tracing work to be in the
future, since it doesn't delay the whole system with locking.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:09 -07:00
Roland McGrath
fa00b80b3c tracehook: job control
This defines the tracehook_notify_jctl() hook to formalize the ptrace
effects on the job control notifications.  There is no change, only
cleanup.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:09 -07:00
Roland McGrath
7bcf6a2ca5 tracehook: get_signal_to_deliver
This defines the tracehook_get_signal() hook to allow tracing code to slip
in before normal signal dequeuing.  This lays the groundwork for new
tracing features that can inject synthetic signals outside the normal
queue or control the disposition of delivered signals.  The calling
convention lets tracehook_get_signal() decide both exactly what will
happen and what signal number to report in the handler/exit.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:09 -07:00
Roland McGrath
283d7559e7 tracehook: syscall
This adds standard tracehook.h inlines for arch code to call when
TIF_SYSCALL_TRACE has been set.  This replaces having each arch implement
the ptrace guts for its syscall tracing support.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:09 -07:00
Roland McGrath
445a91d2fe tracehook: tracehook_consider_fatal_signal
This defines tracehook_consider_fatal_signal() has a fine-grained hook for
deciding to skip the special cases for a fatal signal, as ptrace does.
There is no change, only cleanup.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:09 -07:00
Roland McGrath
35de254dc6 tracehook: tracehook_consider_ignored_signal
This defines tracehook_consider_ignored_signal() has a fine-grained hook
for deciding to prevent the normal short-circuit of sending an ignored
signal, as ptrace does.  There is no change, only cleanup.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:09 -07:00
Roland McGrath
c45aea2761 tracehook: tracehook_signal_handler
This defines tracehook_signal_handler() as a hook for the arch signal
handling code to call.  It gives ptrace the opportunity to stop for a
pseudo-single-step trap immediately after signal handler setup is done.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:09 -07:00
Roland McGrath
fa8e26ccd4 tracehook: tracehook_expect_breakpoints
This adds tracehook_expect_breakpoints() as a formal hook for the nommu
code to use for its, "Is text-poking likely?" check at mmap time.  This
names the actual semantics the code means to test, and documents it.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:09 -07:00
Roland McGrath
0d094efeb1 tracehook: tracehook_tracer_task
This adds the tracehook_tracer_task() hook to consolidate all forms of
"Who is using ptrace on me?" logic.  This is used for "TracerPid:" in
/proc and for permission checks.  We also clean up the selinux code the
called an identical accessor.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:08 -07:00
Roland McGrath
dae33574dc tracehook: release_task
This moves the ptrace-related logic from release_task into tracehook.h and
ptrace.h inlines.  It provides clean hooks both before and after locking
tasklist_lock, for future tracing logic to do more cleanup without the
lock.

This also changes release_task() itself in the rare "zap_leader" case to
set the leader to EXIT_DEAD before iterating.  This maintains the
invariant that release_task() only ever handles a task in EXIT_DEAD.  This
is a common-sense invariant that is already always true except in this one
arcane case of zombie leader whose parent ignores SIGCHLD.

This change is harmless and only costs one store in this one rare case.
It keeps the expected state more consisently sane, which is nicer when
debugging weirdness in release_task().  It also lets some future code in
the tracehook entry points rely on this invariant for bookkeeping.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:08 -07:00
Roland McGrath
daded34be9 tracehook: vfork-done
This moves the PTRACE_EVENT_VFORK_DONE tracing into a tracehook.h inline,
tracehook_report_vfork_done().  The change has no effect, just clean-up.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:08 -07:00
Roland McGrath
09a05394fe tracehook: clone
This moves all the ptrace initialization and tracing logic for task
creation into tracehook.h and ptrace.h inlines.  It reorganizes the code
slightly, but should not change any behavior.

There are four tracehook entry points, at each important stage of task
creation.  This keeps the interface from the core fork.c code fairly
clean, while supporting the complex setup required for ptrace or something
like it.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:08 -07:00
Roland McGrath
30199f5a46 tracehook: exit
This moves the PTRACE_EVENT_EXIT tracing into a tracehook.h inline,
tracehook_report_exec().  The change has no effect, just clean-up.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:08 -07:00
Roland McGrath
6341c393fc tracehook: exec
This moves all the ptrace hooks related to exec into tracehook.h inlines.

This also lifts the calls for tracing out of the binfmt load_binary hooks
into search_binary_handler() after it calls into the binfmt module.  This
change has no effect, since all the binfmt modules' load_binary functions
did the call at the end on success, and now search_binary_handler() does
it immediately after return if successful.  We consolidate the repeated
code, and binfmt modules no longer need to import ptrace_notify().

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:08 -07:00
Roland McGrath
88ac2921a7 tracehook: add linux/tracehook.h
This patch series introduces the "tracehook" interface layer of inlines in
<linux/tracehook.h>.  There are more details in the log entry for patch
01/23 and in the header file comments inside that patch.  Most of these
changes move code around with little or no change, and they should not
break anything or change any behavior.

This sets a new standard for uniform arch support to enable clean
arch-independent implementations of new debugging and tracing stuff,
denoted by CONFIG_HAVE_ARCH_TRACEHOOK.  Patch 20/23 adds that symbol to
arch/Kconfig, with comments listing everything an arch has to do before
setting "select HAVE_ARCH_TRACEHOOK".  These are elaborted a bit at:

	http://sourceware.org/systemtap/wiki/utrace/arch/HowTo

The new inlines that arch code must define or call have detailed kerneldoc
comments in the generic header files that say what is required.

No arch is obligated to do any work, and no arch's build should be broken
by these changes.  There are several steps that each arch should take so
it can set HAVE_ARCH_TRACEHOOK.  Most of these are simple.  Providing this
support will let new things people add for doing debugging and tracing of
user-level threads "just work" for your arch in the future.  For an arch
that does not provide HAVE_ARCH_TRACEHOOK, some new options for such
features will not be available for config.

I have done some arch work and will submit this to the arch maintainers
after the generic tracehook series settles in.  For now, that work is
available in my GIT repositories, and in patch and mbox-of-patches form at
http://people.redhat.com/roland/utrace/2.6-current/

This paves the way for my "utrace" work, to be submitted later.  But it is
not innately tied to that.  I hope that the tracehook series can go in
soon regardless of what eventually does or doesn't go on top of it.  For
anyone implementing any kind of new tracing/debugging plan, or just
understanding all the context of the existing ptrace implementation,
having tracehook.h makes things much easier to find and understand.

This patch:

This adds the new kernel-internal header file <linux/tracehook.h>.  This
is not yet used at all.  The comments in the header introduce what the
following series of patches is about.

The aim is to formalize and consolidate all the places that the core
kernel code and the arch code now ties into the ptrace implementation.

These patches mostly don't cause any functional change.  They just move
the details of ptrace logic out of core code into tracehook.h inlines,
where they are mostly compiled away to the same as before.  All that
changes is that everything is thoroughly documented and any future
reworking of ptrace, or addition of something new, would not have to touch
core code all over, just change the tracehook.h inlines.

The new linux/ptrace.h inlines are used by the following patches in the
new tracehook_*() inlines.  Using these helpers for the ptrace event stops
makes it simple to change or disable the old ptrace implementation of
these stops conditionally later.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:08 -07:00
Alexey Dobriyan
51cc50685a SL*B: drop kmem cache argument from constructor
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres.  Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.

Non-trivial places are:
	arch/powerpc/mm/init_64.c
	arch/powerpc/mm/hugetlbpage.c

This is flag day, yes.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:07 -07:00
Nick Piggin
19fd623127 mm: spinlock tree_lock
mapping->tree_lock has no read lockers.  convert the lock from an rwlock
to a spinlock.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:06 -07:00
Nick Piggin
e286781d5f mm: speculative page references
If we can be sure that elevating the page_count on a pagecache page will
pin it, we can speculatively run this operation, and subsequently check to
see if we hit the right page rather than relying on holding a lock or
otherwise pinning a reference to the page.

This can be done if get_page/put_page behaves consistently throughout the
whole tree (ie.  if we "get" the page after it has been used for something
else, we must be able to free it with a put_page).

Actually, there is a period where the count behaves differently: when the
page is free or if it is a constituent page of a compound page.  We need
an atomic_inc_not_zero operation to ensure we don't try to grab the page
in either case.

This patch introduces the core locking protocol to the pagecache (ie.
adds page_cache_get_speculative, and tweaks some update-side code to make
it work).

Thanks to Hugh for pointing out an improvement to the algorithm setting
page_count to zero when we have control of all references, in order to
hold off speculative getters.

[kamezawa.hiroyu@jp.fujitsu.com: fix migration_entry_wait()]
[hugh@veritas.com: fix add_to_page_cache]
[akpm@linux-foundation.org: repair a comment]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:06 -07:00
Nick Piggin
47feff2c8e radix-tree: add gang_lookup_slot, gang_lookup_slot_tag
Introduce gang_lookup_slot() and gang_lookup_slot_tag() functions, which
are used by lockless pagecache.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:06 -07:00
Nick Piggin
21cc199baa mm: introduce get_user_pages_fast
Introduce a new get_user_pages_fast mm API, which is basically a
get_user_pages with a less general API (but still tends to be suited to
the common case):

- task and mm are always current and current->mm
- force is always 0
- pages is always non-NULL
- don't pass back vmas

This restricted API can be implemented in a much more scalable way on many
architectures when the ptes are present, by walking the page tables
locklessly (no mmap_sem or page table locks).  When the ptes are not
populated, get_user_pages_fast() could be slower.

This is implemented locklessly on x86, and used in some key direct IO call
sites, in later patches, which provides nearly 10% performance improvement
on a threaded database workload.

Lots of other code could use this too, depending on use cases (eg.  grep
drivers/).  And it might inspire some new and clever ways to use it.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:05 -07:00
Huang Weiyi
080ccd4573 include/linux/aio.h: removed duplicated include
Removed duplicated include <linux/uio.h> in include/linux/aio.h

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:04 -07:00
Eduard - Gabriel Munteanu
20d8b67c06 relay: add buffer-only channels; useful for early logging
Allows one to create and use a channel with no associated files.  Files
can be initialized later.  This is useful in scenarios such as logging in
early code, before VFS is up.  Therefore, such channels can be created and
used as soon as kmem_cache_init() completed.

This is needed by kmemtrace to do tracing in early kernel code.

[kosaki.motohiro@jp.fujitsu.com: build fix]
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:04 -07:00
Eduard - Gabriel Munteanu
7babe8db99 Full conversion to early_initcall() interface, remove old interface
A previous patch added the early_initcall(), to allow a cleaner hooking of
pre-SMP initcalls.  Now we remove the older interface, converting all
existing users to the new one.

[akpm@linux-foundation.org: cleanups]
[akpm@linux-foundation.org: build fix]
[kosaki.motohiro@jp.fujitsu.com: warning fix]
[kosaki.motohiro@jp.fujitsu.com: warning fix]
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:04 -07:00
Eduard - Gabriel Munteanu
c2147a5092 Better interface for hooking early initcalls
Added early initcall (pre-SMP) support, using an identical interface to
that of regular initcalls.  Functions called from do_pre_smp_initcalls()
could be converted to use this cleaner interface.

This is required by CPU hotplug, because early users have to register
notifiers before going SMP.  One such CPU hotplug user is the relay
interface with buffer-only channels, which needs to register such a
notifier, to be usable in early code.  This in turn is used by kmemtrace.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:04 -07:00
Huang Ying
89081d17f7 kexec jump: save/restore device state
This patch implements devices state save/restore before after kexec.

This patch together with features in kexec_jump patch can be used for
following:

- A simple hibernation implementation without ACPI support.  You can kexec a
  hibernating kernel, save the memory image of original system and shutdown
  the system.  When resuming, you restore the memory image of original system
  via ordinary kexec load then jump back.

- Kernel/system debug through making system snapshot.  You can make system
  snapshot, jump back, do some thing and make another system snapshot.

- Cooperative multi-kernel/system.  With kexec jump, you can switch between
  several kernels/systems quickly without boot process except the first time.
  This appears like swap a whole kernel/system out/in.

- A general method to call program in physical mode (paging turning
  off). This can be used to invoke BIOS code under Linux.

The following user-space tools can be used with kexec jump:

- kexec-tools needs to be patched to support kexec jump. The patches
  and the precompiled kexec can be download from the following URL:
       source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
       patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
       binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10

- makedumpfile with patches are used as memory image saving tool, it
  can exclude free pages from original kernel memory image file. The
  patches and the precompiled makedumpfile can be download from the
  following URL:
       source: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-src_cvs_kh10.tar.bz2
       patches: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-patches_cvs_kh10.tar.bz2
       binary: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile_cvs_kh10

- An initramfs image can be used as the root file system of kexeced
  kernel. An initramfs image built with "BuildRoot" can be downloaded
  from the following URL:
       initramfs image: http://khibernation.sourceforge.net/download/release_v10/initramfs/rootfs_cvs_kh10.gz
  All user space tools above are included in the initramfs image.

Usage example of simple hibernation:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_RELOCATABLE=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PM=y
CONFIG_HIBERNATION=y
CONFIG_KEXEC_JUMP=y

2. Build an initramfs image contains kexec-tool and makedumpfile, or
   download the pre-built initramfs image, called rootfs.gz in
   following text.

3. Prepare a partition to save memory image of original kernel, called
   hibernating partition in following text.

4. Boot kernel compiled in step 1 (kernel A).

5. In the kernel A, load kernel compiled in step 1 (kernel B) with
   /sbin/kexec. The shell command line can be as follow:

   /sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000
     --mem-max=0xffffff --initrd=rootfs.gz

6. Boot the kernel B with following shell command line:

   /sbin/kexec -e

7. The kernel B will boot as normal kexec. In kernel B the memory
   image of kernel A can be saved into hibernating partition as
   follow:

   jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='`
   echo $jump_back_entry > kexec_jump_back_entry
   cp /proc/vmcore dump.elf

   Then you can shutdown the machine as normal.

8. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
   root file system.

9. In kernel C, load the memory image of kernel A as follow:

   /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf

10. Jump back to the kernel A as follow:

   /sbin/kexec -e

   Then, kernel A is resumed.

Implementation point:

To support jumping between two kernels, before jumping to (executing)
the new kernel and jumping back to the original kernel, the devices
are put into quiescent state, and the state of devices and CPU is
saved. After jumping back from kexeced kernel and jumping to the new
kernel, the state of devices and CPU are restored accordingly. The
devices/CPU state save/restore code of software suspend is called to
implement corresponding function.

Known issues:

- Because the segment number supported by sys_kexec_load is limited,
  hibernation image with many segments may not be load. This is
  planned to be eliminated by adding a new flag to sys_kexec_load to
  make a image can be loaded with multiple sys_kexec_load invoking.

Now, only the i386 architecture is supported.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:04 -07:00
Huang Ying
3ab8352137 kexec jump
This patch provides an enhancement to kexec/kdump.  It implements the
following features:

- Backup/restore memory used by the original kernel before/after
  kexec.

- Save/restore CPU state before/after kexec.

The features of this patch can be used as a general method to call program in
physical mode (paging turning off).  This can be used to call BIOS code under
Linux.

kexec-tools needs to be patched to support kexec jump. The patches and
the precompiled kexec can be download from the following URL:

       source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
       patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
       binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10

Usage example of calling some physical mode code and return:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_KEXEC=y
CONFIG_PM=y
CONFIG_KEXEC_JUMP=y

2. Build patched kexec-tool or download the pre-built one.

3. Build some physical mode executable named such as "phy_mode"

4. Boot kernel compiled in step 1.

5. Load physical mode executable with /sbin/kexec. The shell command
   line can be as follow:

   /sbin/kexec --load-preserve-context --args-none phy_mode

6. Call physical mode executable with following shell command line:

   /sbin/kexec -e

Implementation point:

To support jumping without reserving memory.  One shadow backup page (source
page) is allocated for each page used by kexeced code image (destination
page).  When do kexec_load, the image of kexeced code is loaded into source
pages, and before executing, the destination pages and the source pages are
swapped, so the contents of destination pages are backupped.  Before jumping
to the kexeced code image and after jumping back to the original kernel, the
destination pages and the source pages are swapped too.

C ABI (calling convention) is used as communication protocol between
kernel and called code.

A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.

Now, only the i386 architecture is supported.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:04 -07:00
Alex Dubov
17017d8d2c memstick: add "start" and "stop" methods to memstick device
In some cases it may be desirable to ensure that associated driver is not
going to access the media in some period of time.  "start" and "stop"
methods are provided therefore to allow it.

Signed-off-by: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:04 -07:00
Alex Dubov
b77899985b memstick: allow "set_param" method to return an error code
Some controllers (Jmicron, for instance) can report temporal failure
condition during power-on.  It is desirable to account for this using a
return value of "set_param" device method.  The return value can also be
handy to distinguish between supported and unsupported device parameters
in run time.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:04 -07:00
Adrian Bunk
929dfb24fb parport/share.c: proper externs
This patch adds proper externs for parport_default_timeslice and
parport_default_spintime in include/linux/parport.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:03 -07:00
FUJITA Tomonori
8d8bb39b9e dma-mapping: add the device argument to dma_mapping_error()
Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:

This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).

I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread).  So I
CC'ed this to KVM camp.  Comments are appreciated.

A pointer to dma_mapping_ops to struct dev_archdata is added.  If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it.  If it's
NULL, the system-wide dma_ops pointer is used as before.

If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging).  It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.

The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations.  So x86 can't have dma_mapping_ops per
device.  Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.

The first patch adds the device argument to dma_mapping_error.  The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.

This patch:

dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations.  So we can't have dma_mapping_ops per device.

Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function.  x86 IOMMUs use device
argument.

[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:03 -07:00
Andrew Morton
16d69265b9 uninline arch_pick_mmap_layout()
Fix this, on avr32:

  include/linux/utsname.h:35,
                   from init/main.c:20:
  include/linux/sched.h: In function 'arch_pick_mmap_layout':
  include/linux/sched.h:2149: error: implicit declaration of function 'PAGE_ALIGN'

Reported-by: Adrian Bunk <bunk@kernel.org>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:01 -07:00
Mauro Carvalho Chehab
fdd2a7e2da V4L/DVB (8500a): videotext.h: whitespace cleanup
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2008-07-26 13:25:25 -03:00
Mike Travis
b8d317d10c cpumask: make cpumask_of_cpu_map generic
If an arch doesn't define cpumask_of_cpu_map, create a generic
statically-initialized one for them.  This allows removal of the buggy
cpumask_of_cpu() macro (&cpumask_of_cpu() gives address of
out-of-scope var).

An arch with NR_CPUS of 4096 probably wants to allocate this itself
based on the actual number of CPUs, since otherwise they're using 2MB
of rodata (1024 cpus means 128k).  That's what
CONFIG_HAVE_CPUMASK_OF_CPU_MAP is for (only x86/64 does so at the
moment).

In future as we support more CPUs, we'll need to resort to a
get_cpu_map()/put_cpu_map() allocation scheme.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-26 16:40:32 +02:00
Ingo Molnar
10d3285d0b Merge branch 'x86/urgent' into x86/core
Conflicts:

	include/asm-x86/gpio.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-26 16:30:19 +02:00
Ingo Molnar
6dec3a10a7 Merge branch 'x86/x2apic' into x86/core
Conflicts:

	include/asm-x86/i8259.h
	include/asm-x86/msidef.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-26 16:29:23 +02:00
Yinghai Lu
0af36739af usb: move ehci reg def
prepare x86: usb debug port early console

move ehci struct def to linux/usrb/ehci_def.h from host/ehci.h

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "Arjan van de Ven" <arjan@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Greg KH" <greg@kroah.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-26 16:17:01 +02:00
Joerg Roedel
3bc9f79ee1 iommu: add iommu_num_pages helper function
Calculating the number of pages from given address and length numbers is a task
required in multiple IOMMU implementations. So implement this as a generic
function into the IOMMU helper code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: robert.richter@amd.com
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-26 15:43:05 +02:00
Robert Richter
ee648bc77f OProfile: add IBS code macros
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: oprofile-list <oprofile-list@lists.sourceforge.net>
Cc: Barry Kasindorf <barry.kasindorf@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-26 11:48:04 +02:00
Robert Richter
021f8b75e7 x86: add PCI IDs for AMD Barcelona PCI devices
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: oprofile-list <oprofile-list@lists.sourceforge.net>
Cc: Barry Kasindorf <barry.kasindorf@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-26 11:47:59 +02:00
Ingo Molnar
36ac26171a crashdump: fix undefined reference to `elfcorehdr_addr'
fix build bug introduced by 95b68dec0d "calgary iommu: use the first
kernels TCE tables in kdump":

arch/x86/kernel/built-in.o: In function `calgary_iommu_init':
(.init.text+0x8399): undefined reference to `elfcorehdr_addr'
arch/x86/kernel/built-in.o: In function `calgary_iommu_init':
(.init.text+0x856c): undefined reference to `elfcorehdr_addr'
arch/x86/kernel/built-in.o: In function `detect_calgary':
(.init.text+0x8c68): undefined reference to `elfcorehdr_addr'
arch/x86/kernel/built-in.o: In function `detect_calgary':
(.init.text+0x8d0c): undefined reference to `elfcorehdr_addr'

make elfcorehdr_addr a generally available symbol.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-26 11:26:23 +02:00
Ilpo Järvinen
ec34c702ca net: drop unused BUG_TRAP()
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-25 21:45:49 -07:00
Grant Likely
284b018973 spi: Add OF binding support for SPI busses
This patch adds support for populating an SPI bus based on data in the
OF device tree.  This is useful for powerpc platforms which use the
device tree instead of discrete code for describing platform layout.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2008-07-25 22:34:40 -04:00
Grant Likely
dc87c98e8f spi: split up spi_new_device() to allow two stage registration.
spi_new_device() allocates and registers an spi device all in one swoop.
If the driver needs to add extra data to the spi_device before it is
registered, then this causes problems.  This is needed for OF device
tree support so that the SPI device tree helper can add a pointer to
the device node after the device is allocated, but before the device
is registered.  OF aware SPI devices can then retrieve data out of the
device node to populate a platform data structure.

This patch splits the allocation and registration portions of code out
of spi_new_device() and creates two new functions; spi_alloc_device()
and spi_register_device().  spi_new_device() is modified to use the new
functions for allocation and registration.  None of the existing users
of spi_new_device() should be affected by this change.

Drivers using the new API can forego the use of spi_board_info
structure to describe the device layout and populate data into the
spi_device structure directly.

This change is in preparation for adding an OF device tree parser to
generate spi_devices based on data in the device tree.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
2008-07-25 22:34:29 -04:00
Grant Likely
3f07af494d of: adapt of_find_i2c_driver() to be usable by SPI also
SPI has a similar problem as I2C in that it needs to determine an
appropriate modalias value for each device node.  This patch adapts
the of_i2c of_find_i2c_driver() function to be usable by of_spi also.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2008-07-25 22:25:13 -04:00
Harvey Harrison
b4615e69b6 sys_paccept definition missing __user annotation
Introduced by commit aaca0bdca5 ("flag
parameters: paccept"):

  net/socket.c:1515:17: error: symbol 'sys_paccept' redeclared with different type (originally declared at include/linux/syscalls.h:413) - incompatible argument 4 (different address spaces)

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 17:28:49 -07:00
Linus Torvalds
ff5d48a6d1 Merge git://git.infradead.org/embedded-2.6
* git://git.infradead.org/embedded-2.6:
  Make console charset translation optional
2008-07-25 12:02:08 -07:00
Linus Torvalds
762b8291be Merge git://git.infradead.org/~dwmw2/random-2.6
* git://git.infradead.org/~dwmw2/random-2.6:
  remove dummy asm/kvm.h files
  firmware: create firmware binaries during 'make modules'.
2008-07-25 12:01:37 -07:00
Johannes Weiner
c6af5e9f8a bootmem: Move node allocation macros back to !HAVE_ARCH_BOOTMEM_NODE
These got unintentionally moved, put them back as x86 provides its own
versions.

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 11:36:44 -07:00
Adrian Bunk
7dcf2a9fce remove dummy asm/kvm.h files
This patch removes the dummy asm/kvm.h files on architectures not (yet)
supporting KVM and uses the same conditional headers installation as
already used for a.out.h .

Also removed are superfluous install rules in the s390 and x86 Kbuild
files (they are already in Kbuild.asm).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2008-07-25 14:35:50 -04:00
Linus Torvalds
5047887caf Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (34 commits)
  powerpc: Wireup new syscalls
  Move update_mmu_cache() declaration from tlbflush.h to pgtable.h
  powerpc/pseries: Remove kmalloc call in handling writes to lparcfg
  powerpc/pseries: Update arch vector to indicate support for CMO
  ibmvfc: Add support for collaborative memory overcommit
  ibmvscsi: driver enablement for CMO
  ibmveth: enable driver for CMO
  ibmveth: Automatically enable larger rx buffer pools for larger mtu
  powerpc/pseries: Verify CMO memory entitlement updates with virtual I/O
  powerpc/pseries: vio bus support for CMO
  powerpc/pseries: iommu enablement for CMO
  powerpc/pseries: Add CMO paging statistics
  powerpc/pseries: Add collaborative memory manager
  powerpc/pseries: Utilities to set firmware page state
  powerpc/pseries: Enable CMO feature during platform setup
  powerpc/pseries: Split retrieval of processor entitlement data into a helper routine
  powerpc/pseries: Add memory entitlement capabilities to /proc/ppc64/lparcfg
  powerpc/pseries: Split processor entitlement retrieval and gathering to helper routines
  powerpc/pseries: Remove extraneous error reporting for hcall failures in lparcfg
  powerpc: Fix compile error with binutils 2.15
  ...

Fixed up conflict in arch/powerpc/platforms/52xx/Kconfig manually.
2008-07-25 11:08:17 -07:00
Linus Torvalds
996abf053e Merge branch 'linux-next' of git://git.infradead.org/~dedekind/ubi-2.6
* 'linux-next' of git://git.infradead.org/~dedekind/ubi-2.6: (22 commits)
  UBI: always start the background thread
  UBI: fix gcc warning
  UBI: remove pre-sqnum images support
  UBI: fix kernel-doc errors and warnings
  UBI: fix checkpatch.pl errors and warnings
  UBI: bugfix - do not torture PEB needlessly
  UBI: rework scrubbing messages
  UBI: implement multiple volumes rename
  UBI: fix and re-work debugging stuff
  UBI: amend commentaries
  UBI: fix error message
  UBI: improve mkvol request validation
  UBI: add ubi_sync() interface
  UBI: fix 64-bit calculations
  UBI: fix LEB locking
  UBI: fix memory leak on error path
  UBI: do not forget to free internal volumes
  UBI: fix memory leak
  UBI: avoid unnecessary division operations
  UBI: fix buffer padding
  ...
2008-07-25 11:02:17 -07:00
Arthur Jones
8f421c595a edac: i5100 new intel chipset driver
Preliminary support for the Intel 5100 MCH.  CE and UE errors are reported
along with the current DIMM label information and other memory parameters.

Reasons why this is preliminary:

1) This chip has 2 independent memory controllers which, for best
   perforance, use interleaved accesses to the DDR2 memory.  This
   architecture does not map very well to the current edac data structures
   which depend on symmetric channel access to the interleaved data.
   Without core changes, the best I could do for now is to map both memory
   controllers to different csrows (first all ranks of controller 0, then
   all ranks of controller 1).  Someone much more familiar with the edac
   core than I will probably need to come up with a more general data
   structure to handle the interleaving and de-interleaving of the two
   memory controllers.

2) I have not yet tackled the de-interleaving of the rank/controller
   address space into the physical address space of the CPU.  There is
   nothing fundamentally missing, it is just ending up to be a lot of
   code, and I'd rather keep it separate for now, esp since it doesn't
   work yet...

3) The code depends on a particular i5100 chip select to DIMM mainboard
   chip select mapping.  This mapping seems obvious to me in order to
   support dual and single ranked memory, but it is not unique and DIMM
   labels could be wrong on other mainboards.  There is no way to query
   this mapping that I know of.

4) The code requires that the i5100 is in 32GB mode.  Only 4 ranks per
   controller, 2 ranks per DIMM are supported.  I do not have hardware
   (nor do I expect to have hardware anytime soon) for the 48GB (6 ranks
   per controller) mode.

5) The serial presence detect code should be broken out into a "real"
   i2c driver so that decode-dimms.pl can work.

Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:48 -07:00
Miklos Szeredi
33670fa296 fuse: nfs export special lookups
Implement the get_parent export operation by sending a LOOKUP request with
".." as the name.

Implement looking up an inode by node ID after it has been evicted from
the cache.  This is done by seding a LOOKUP request with "." as the name
(for all file types, not just directories).

The filesystem can set the FUSE_EXPORT_SUPPORT flag in the INIT reply, to
indicate that it supports these special lookups.

Thanks to John Muir for the original implementation of this feature.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: David Teigland <teigland@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:48 -07:00
Miklos Szeredi
bde74e4bc6 locks: add special return value for asynchronous locks
Use a special error value FILE_LOCK_DEFERRED to mean that a locking
operation returned asynchronously.  This is returned by

  posix_lock_file() for sleeping locks to mean that the lock has been
  queued on the block list, and will be woken up when it might become
  available and needs to be retried (either fl_lmops->fl_notify() is
  called or fl_wait is woken up).

  f_op->lock() to mean either the above, or that the filesystem will
  call back with fl_lmops->fl_grant() when the result of the locking
  operation is known.  The filesystem can do this for sleeping as well
  as non-sleeping locks.

This is to make sure, that return values of -EAGAIN and -EINPROGRESS by
filesystems are not mistaken to mean an asynchronous locking.

This also makes error handling in fs/locks.c and lockd/svclock.c slightly
cleaner.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: David Teigland <teigland@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:47 -07:00
Keika Kobayashi
016ae219b9 per-task-delay-accounting: update taskstats for memory reclaim delay
Add members for memory reclaim delay to taskstats, and accumulate them in
__delayacct_add_tsk() .

Signed-off-by: Keika Kobayashi <kobayashi.kk@ncos.nec.co.jp>
Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:47 -07:00
Keika Kobayashi
873b477177 per-task-delay-accounting: add memory reclaim delay
Sometimes, application responses become bad under heavy memory load.
Applications take a bit time to reclaim memory.  The statistics, how long
memory reclaim takes, will be useful to measure memory usage.

This patch adds accounting memory reclaim to per-task-delay-accounting for
accounting the time of do_try_to_free_pages().

<i.e>

- When System is under low memory load,
  memory reclaim may not occur.

$ free
             total       used       free     shared    buffers     cached
Mem:       8197800    1577300    6620500          0       4808    1516724
-/+ buffers/cache:      55768    8142032
Swap:     16386292          0   16386292

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 5069748  10612 3014060    0    0     0     0    3   26  0  0 100  0
 0  0      0 5069748  10612 3014060    0    0     0     0    4   22  0  0 100  0
 0  0      0 5069748  10612 3014060    0    0     0     0    3   18  0  0 100  0

Measure the time of tar command.

$ ls -s test.dat
1501472 test.dat

$ time tar cvf test.tar test.dat
real    0m13.388s
user    0m0.116s
sys     0m5.304s

$ ./delayget -d -p <pid>
CPU             count     real total  virtual total    delay total
                  428     5528345500     5477116080       62749891
IO              count    delay total
                  338     8078977189
SWAP            count    delay total
                    0              0
RECLAIM         count    delay total
                    0              0

- When system is under heavy memory load
  memory reclaim may occur.

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0 7159032  49724   1812   3012    0    0     0     0    3   24  0  0 100  0
 0  0 7159032  49724   1812   3012    0    0     0     0    4   24  0  0 100  0
 0  0 7159032  49848   1812   3012    0    0     0     0    3   22  0  0 100  0

In this case, one process uses more 8G memory
by execution of malloc() and memset().

$ time tar cvf test.tar test.dat
real    1m38.563s        <-  increased by 85 sec
user    0m0.140s
sys     0m7.060s

$ ./delayget -d -p <pid>
CPU             count     real total  virtual total    delay total
                 9021     7140446250     7315277975      923201824
IO              count    delay total
                 8965    90466349669
SWAP            count    delay total
                    3       21036367
RECLAIM         count    delay total
                  740    61011951153

In the later case, the value of RECLAIM is increasing.
So, taskstats can show how much memory reclaim influences TAT.

Signed-off-by: Keika Kobayashi <kobayashi.kk@ncos.nec.co.jp>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujistu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:47 -07:00
Andrea Righi
297c5d9263 task IO accounting: provide distinct tgid/tid I/O statistics
Report per-thread I/O statistics in /proc/pid/task/tid/io and aggregate
parent I/O statistics in /proc/pid/io.  This approach follows the same
model used to account per-process and per-thread CPU times.

As a practial application, this allows for example to quickly find the top
I/O consumer when a process spawns many child threads that perform the
actual I/O work, because the aggregated I/O statistics can always be found
in /proc/pid/io.

[ Oleg Nesterov points out that we should check that the task is still
  alive before we iterate over the threads, but also says that we can do
  that fixup on top of this later.  - Linus ]

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Cc: Matt Heaton <matt@hostmonster.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Acked-by-with-comments: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:47 -07:00
Pavel Emelyanov
0b6b030fc3 bsdacct: switch from global bsd_acct_struct instance to per-pidns one
Allocate the structure on the first call to sys_acct().  After this each
namespace, that ordered the accounting, will live with this structure till
its own death.

Two notes
- routines, that close the accounting on fs umount time use
  the init_pid_ns's acct by now;
- accounting routine accounts to dying task's namespace
  (also by now).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:47 -07:00
Pavel Emelyanov
20fad13ac6 pidns: add the struct bsd_acct_struct pointer on pid_namespace struct
All the bsdacct-related info will be stored in the area, pointer by this
one.

It will be NULL automatically for all new namespaces.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:46 -07:00
Jonathan Lim
49b5cf3472 accounting: account for user time when updating memory integrals
Adapt acct_update_integrals() to include user time when calculating the time
difference.  The units of acct_rss_mem1 and acct_vm_mem1 are also changed from
pages-jiffies to pages-usecs to avoid calling jiffies_to_usecs() in
xacct_add_tsk() which might overflow.

Signed-off-by: Jonathan Lim <jlim@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:46 -07:00
Pavel Emelyanov
dbda0de526 pidns: remove find_task_by_pid, unused for a long time
It seems to me that it was a mistake marking this function as deprecated
and scheduling it for removal, rather than resolutely removing it after
the last caller's death.

Anyway - better late, then never.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:45 -07:00
Pavel Emelyanov
e49859e71e pidns: remove now unused find_pid function.
This one had the only users so far - the kill_proc, which is removed, so
drop this (invalid in namespaced world) call too.

And of course - erase all references on it from comments.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:45 -07:00
Pavel Emelyanov
19b0cfcca4 pidns: remove now unused kill_proc function
This function operated on a pid_t to kill a task, which is no longer valid
in a containerized system.

It has finally lost all its users and we can safely remove it from the
tree.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:45 -07:00
Richard Kennedy
33166b1ffc shrink struct pid by removing padding on 64 bit builds
When struct pid is built on a 64 bit platform gcc has to insert padding to
maintain the correct alignment, by simply reordering its members the
memory usage shrinks from 88 bytes to 80.

I've successfully run with this patch on my desktop AMD64 machine.

There are no significant kernel size changes to a default config.X86_64
on the latest git v2.6.26-rc1

   text    data     bss     dec     hex filename
5404828  976760  734280 7115868  6c945c vmlinux
5404811  976760  734280 7115851  6c944b vmlinux.pid-patch

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:45 -07:00
Adrian Bunk
3ae4eed34b proper pid{hash,map}_init() prototypes
This patch adds proper prototypes for pid{hash,map}_init() in
include/linux/pid_namespace.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:45 -07:00
Alexey Dobriyan
881adb8535 proc: always do ->release
Current two-stage scheme of removing PDE emphasizes one bug in proc:

		open
				rmmod
				remove_proc_entry
		close

->release won't be called because ->proc_fops were cleared.  In simple
cases it's small memory leak.

For every ->open, ->release has to be done.  List of openers is introduced
which is traversed at remove_proc_entry() if neeeded.

Discussions with Al long ago (sigh).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:44 -07:00
Adrian Bunk
6e644c3126 move proc_kmsg_operations to fs/proc/internal.h
This patch moves the extern of struct proc_kmsg_operations to
fs/proc/internal.h and adds an #include "internal.h" to fs/proc/kmsg.c
so that the latter sees the former.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:44 -07:00
Abdel Benamrouche
d805dda412 fs/partition/check.c: fix return value warning
fs/partitions/check.c:381: warning: ignoring return value of ___device_add___,
  declared with attribute warn_unused_result

[akpm@linux-foundation.org: multiple-return-statements-per-function are evil]
Signed-off-by: Abdel Benamrouche <draconux@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:44 -07:00
Nadia Derbey
9eefe520c8 ipc: do not use a negative value to re-enable msgmni automatic recomputing
This patch proposes an alternative to the "magical
positive-versus-negative number trick" Andrew complained about last week
in http://lkml.org/lkml/2008/6/24/418.

This had been introduced with the patches that scale msgmni to the amount
of lowmem.  With these patches, msgmni has a registered notification
routine that recomputes msgmni value upon memory add/remove or ipc
namespace creation/ removal.

When msgmni is changed from user space (i.e.  value written to the proc
file), that notification routine is unregistered, and the way to make it
registered back is to write a negative value into the proc file.  This is
the "magical positive-versus-negative number trick".

To fix this, a new proc file is introduced: /proc/sys/kernel/auto_msgmni.
This file acts as ON/OFF for msgmni automatic recomputing.

With this patch, the process is the following:
1) kernel boots in "automatic recomputing mode"
   /proc/sys/kernel/msgmni contains the value that has been computed (depends
                           on lowmem)
   /proc/sys/kernel/automatic_msgmni contains "1"

2) echo <val> > /proc/sys/kernel/msgmni
   . sets msg_ctlmni to <val>
   . de-activates automatic recomputing (i.e. if, say, some memory is added
     msgmni won't be recomputed anymore)
   . /proc/sys/kernel/automatic_msgmni now contains "0"

3) echo "0" > /proc/sys/kernel/automatic_msgmni
   . de-activates msgmni automatic recomputing
     this has the same effect as 2) except that msg_ctlmni's value stays
     blocked at its current value)

3) echo "1" > /proc/sys/kernel/automatic_msgmni
   . recomputes msgmni's value based on the current available memory size
     and number of ipc namespaces
   . re-activates automatic recomputing for msgmni.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:42 -07:00
Manfred Spraul
380af1b33b ipc/sem.c: rewrite undo list locking
The attached patch:
- reverses the locking order of ulp->lock and sem_lock:
  Previously, it was first ulp->lock, then inside sem_lock.
  Now it's the other way around.
- converts the undo structure to rcu.

Benefits:
- With the old locking order, IPC_RMID could not kfree the undo structures.
  The stale entries remained in the linked lists and were released later.
- The patch fixes a a race in semtimedop(): if both IPC_RMID and a semget() that
  recreates exactly the same id happen between find_alloc_undo() and sem_lock,
  then semtimedop() would access already kfree'd memory.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:42 -07:00
Manfred Spraul
a1193f8ec0 ipc/sem.c: convert sem_array.sem_pending to struct list_head
sem_array.sem_pending is a double linked list, the attached patch converts
it to struct list_head.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:42 -07:00
Manfred Spraul
2c0c29d414 ipc/sem.c: remove unused entries from struct sem_queue
sem_queue.sma and sem_queue.id were never used, the attached patch removes
them.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:42 -07:00
Manfred Spraul
4daa28f6d8 ipc/sem.c: convert undo structures to struct list_head
The undo structures contain two linked lists, the attached patch replaces
them with generic struct list_head lists.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:42 -07:00
Nadia Derbey
f9c46d6ea5 idr: make idr_find rcu-safe
Make idr_find rcu-safe: it can now be called inside an rcu_read critical
section.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:42 -07:00
Nadia Derbey
944ca05c7b idr: error checking factorization
Do some code factorization in the return code analysis.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:41 -07:00
Nadia Derbey
2027d1abc2 idr: change the idr structure
After scalability problems have been detected when using the sysV ipcs, I have
proposed to use an RCU based implementation of the IDR api instead (see
threads http://lkml.org/lkml/2008/4/11/212 and
http://lkml.org/lkml/2008/4/29/295).

This resulted in many people asking to convert the idr API and make it rcu
safe (because most of the code was duplicated and thus unmaintanable and
unreviewable).

So here is a first attempt.

The important change wrt to the idr API itself is during idr removes: idr
layers are freed after a grace period, instead of being moved to the free
list.

The important change wrt to ipcs, is that idr_find() can now be called
locklessly inside a rcu read critical section.

Here are the results I've got for the pmsg test sent by Manfred:

   2.6.25-rc3-mm1   2.6.25-rc3-mm1+   2.6.25-mm1   Patched 2.6.25-mm1
1         1168441           1064021       876000               947488
2         1094264            921059      1549592              1730685
3         2082520           1738165      1694370              2324880
4         2079929           1695521       404553              2400408
5         2898758            406566       391283              3246580
6         2921417            261275       263249              3752148
7         3308761            126056       191742              4243142
8         3329456            100129       141722              4275780

1st column: stock 2.6.25-rc3-mm1
2nd column: 2.6.25-rc3-mm1 + ipc patches (store ipcs into idrs)
3nd column: stock 2.6.25-mm1
4th column: 2.6.25-mm1 + this pacth series.

This patch:

Add an rcu_head to the idr_layer structure in order to free it after a grace
period.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:41 -07:00
Chandru
95b68dec0d calgary iommu: use the first kernels TCE tables in kdump
kdump kernel fails to boot with calgary iommu and aacraid driver on a x366
box.  The ongoing dma's of aacraid from the first kernel continue to exist
until the driver is loaded in the kdump kernel.  Calgary is initialized
prior to aacraid and creation of new tce tables causes wrong dma's to
occur.  Here we try to get the tce tables of the first kernel in kdump
kernel and use them.  While in the kdump kernel we do not allocate new tce
tables but instead read the base address register contents of calgary
iommu and use the tables that the registers point to.  With these changes
the kdump kernel and hence aacraid now boots normally.

Signed-off-by: Chandru Siddalingappa <chandru@in.ibm.com>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:41 -07:00
Oleg Nesterov
3da1c84c00 workqueues: make get_online_cpus() useable for work->func()
workqueue_cpu_callback(CPU_DEAD) flushes cwq->thread under
cpu_maps_update_begin().  This means that the multithreaded workqueues
can't use get_online_cpus() due to the possible deadlock, very bad and
very old problem.

Introduce the new state, CPU_POST_DEAD, which is called after
cpu_hotplug_done() but before cpu_maps_update_done().

Change workqueue_cpu_callback() to use CPU_POST_DEAD instead of CPU_DEAD.
This means that create/destroy functions can't rely on get_online_cpus()
any longer and should take cpu_add_remove_lock instead.

[akpm@linux-foundation.org: fix CONFIG_SMP=n]
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:40 -07:00
Oleg Nesterov
db70089722 workqueues: implement flush_work()
Most of users of flush_workqueue() can be changed to use cancel_work_sync(),
but sometimes we really need to wait for the completion and cancelling is not
an option. schedule_on_each_cpu() is good example.

Add the new helper, flush_work(work), which waits for the completion of the
specific work_struct. More precisely, it "flushes" the result of of the last
queue_work() which is visible to the caller.

For example, this code

	queue_work(wq, work);
	/* WINDOW */
	queue_work(wq, work);

	flush_work(work);

doesn't necessary work "as expected". What can happen in the WINDOW above is

	- wq starts the execution of work->func()

	- the caller migrates to another CPU

now, after the 2nd queue_work() this work is active on the previous CPU, and
at the same time it is queued on another. In this case flush_work(work) may
return before the first work->func() completes.

It is trivial to add another helper

	int flush_work_sync(struct work_struct *work)
	{
		return flush_work(work) || wait_on_work(work);
	}

which works "more correctly", but it has to iterate over all CPUs and thus
it much slower than flush_work().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
Acked-by: Jarek Poplawski <jarkao2@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:40 -07:00
Oleg Nesterov
a94e2d408e coredump: kill mm->core_done
Now that we have core_state->dumper list we can use it to wake up the
sub-threads waiting for the coredump completion.

This uglifies the code and .text grows by 47 bytes, but otoh mm_struct
lessens by sizeof(struct completion).  Also, with this change we can
decouple exit_mm() from the coredumping code.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:40 -07:00
Oleg Nesterov
b564daf806 coredump: construct the list of coredumping threads at startup time
binfmt->core_dump() has to iterate over the all threads in system in order
to find the coredumping threads and construct the list using the
GFP_ATOMIC allocations.

With this patch each thread allocates the list node on exit_mm()'s stack and
adds itself to the list.

This allows us to do further changes:

	- simplify ->core_dump()

	- change exit_mm() to clear ->mm first, then wait for ->core_done.
	  this makes the coredumping process visible to oom_kill

	- kill mm->core_done

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:40 -07:00
Oleg Nesterov
c5f1cc8c18 coredump: turn core_state->nr_threads into atomic_t
Turn core_state->nr_threads into atomic_t and kill now unneeded
down_write(&mm->mmap_sem) in exit_mm().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:39 -07:00
Oleg Nesterov
999d9fc167 coredump: move mm->core_waiters into struct core_state
Move mm->core_waiters into "struct core_state" allocated on stack.  This
shrinks mm_struct a little bit and allows further changes.

This patch mostly does s/core_waiters/core_state.  The only essential
change is that coredump_wait() must clear mm->core_state before return.

The coredump_wait()'s path is uglified and .text grows by 30 bytes, this
is fixed by the next patch.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:39 -07:00
Oleg Nesterov
32ecb1f26d coredump: turn mm->core_startup_done into the pointer to struct core_state
mm->core_startup_done points to "struct completion startup_done" allocated
on the coredump_wait()'s stack.  Introduce the new structure, core_state,
which holds this "struct completion".  This way we can add more info
visible to the threads participating in coredump without enlarging
mm_struct.

No changes in affected .o files.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:39 -07:00
Oleg Nesterov
246bb0b1de kill PF_BORROWED_MM in favour of PF_KTHREAD
Kill PF_BORROWED_MM.  Change use_mm/unuse_mm to not play with ->flags, and
do s/PF_BORROWED_MM/PF_KTHREAD/ for a couple of other users.

No functional changes yet.  But this allows us to do further
fixes/cleanups.

oom_kill/ptrace/etc often check "p->mm != NULL" to filter out the
kthreads, this is wrong because of use_mm().  The problem with
PF_BORROWED_MM is that we need task_lock() to avoid races.  With this
patch we can check PF_KTHREAD directly, or use a simple lockless helper:

	/* The result must not be dereferenced !!! */
	struct mm_struct *__get_task_mm(struct task_struct *tsk)
	{
		if (tsk->flags & PF_KTHREAD)
			return NULL;
		return tsk->mm;
	}

Note also ecard_task().  It runs with ->mm != NULL, but it's the kernel
thread without PF_BORROWED_MM.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:39 -07:00
Oleg Nesterov
7b34e4283c introduce PF_KTHREAD flag
Introduce the new PF_KTHREAD flag to mark the kernel threads.  It is set
by INIT_TASK() and copied to the forked childs (we could set it in
kthreadd() along with PF_NOFREEZE instead).

daemonize() was changed as well.  In that case testing of PF_KTHREAD is
racy, but daemonize() is hopeless anyway.

This flag is cleared in do_execve(), before search_binary_handler().
Probably not the best place, we can do this in exec_mmap() or in
start_thread(), or clear it along with PF_FORKNOEXEC.  But I think this
doesn't matter in practice, and if do_execve() fails kthread should die
soon.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:39 -07:00
Oleg Nesterov
364d3c13c1 ptrace: give more respect to SIGKILL
ptrace_stop() has some complicated checks to prevent the scheduling in the
TASK_TRACED state with the pending SIGKILL, but these checks are racy, and
they depend on arch_ptrace_stop_needed().

This patch assumes that the traced task should die asap if it was killed by
SIGKILL, in that case schedule()->signal_pending_state() has no reason to
ignore the TASK_WAKEKILL part of TASK_TRACED, and we can kill this nasty
special case.

Note: do_exit()->ptrace_notify() is special, the killed task can already
dequeue SIGKILL at this point. Another indication that fatal_signal_pending()
is not exactly right.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:39 -07:00
KAMEZAWA Hiroyuki
12b9804419 res_counter: limit change support ebusy
Add an interface to set limit.  This is necessary to memory resource
controller because it shrinks usage at set limit.

Other controllers may not need this interface to shrink usage because
shrinking is not necessary or impossible.

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:37 -07:00
KAMEZAWA Hiroyuki
c9b0ed5148 memcg: helper function for relcaim from shmem.
A new call, mem_cgroup_shrink_usage() is added for shmem handling and
relacing non-standard usage of mem_cgroup_charge/uncharge.

Now, shmem calls mem_cgroup_charge() just for reclaim some pages from
mem_cgroup.  In general, shmem is used by some process group and not for
global resource (like file caches).  So, it's reasonable to reclaim pages
from mem_cgroup where shmem is mainly used.

[hugh@veritas.com: shmem_getpage release page sooner]
[hugh@veritas.com: mem_cgroup_shrink_usage css_put]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:37 -07:00
KAMEZAWA Hiroyuki
69029cd550 memcg: remove refcnt from page_cgroup
memcg: performance improvements

Patch Description
 1/5 ... remove refcnt fron page_cgroup patch (shmem handling is fixed)
 2/5 ... swapcache handling patch
 3/5 ... add helper function for shmem's memory reclaim patch
 4/5 ... optimize by likely/unlikely ppatch
 5/5 ... remove redundunt check patch (shmem handling is fixed.)

Unix bench result.

== 2.6.26-rc2-mm1 + memory resource controller
Execl Throughput                           2915.4 lps   (29.6 secs, 3 samples)
C Compiler Throughput                      1019.3 lpm   (60.0 secs, 3 samples)
Shell Scripts (1 concurrent)               5796.0 lpm   (60.0 secs, 3 samples)
Shell Scripts (8 concurrent)               1097.7 lpm   (60.0 secs, 3 samples)
Shell Scripts (16 concurrent)               565.3 lpm   (60.0 secs, 3 samples)
File Read 1024 bufsize 2000 maxblocks    1022128.0 KBps  (30.0 secs, 3 samples)
File Write 1024 bufsize 2000 maxblocks   544057.0 KBps  (30.0 secs, 3 samples)
File Copy 1024 bufsize 2000 maxblocks    346481.0 KBps  (30.0 secs, 3 samples)
File Read 256 bufsize 500 maxblocks      319325.0 KBps  (30.0 secs, 3 samples)
File Write 256 bufsize 500 maxblocks     148788.0 KBps  (30.0 secs, 3 samples)
File Copy 256 bufsize 500 maxblocks       99051.0 KBps  (30.0 secs, 3 samples)
File Read 4096 bufsize 8000 maxblocks    2058917.0 KBps  (30.0 secs, 3 samples)
File Write 4096 bufsize 8000 maxblocks   1606109.0 KBps  (30.0 secs, 3 samples)
File Copy 4096 bufsize 8000 maxblocks    854789.0 KBps  (30.0 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places         126145.2 lpm   (30.0 secs, 3 samples)

                     INDEX VALUES
TEST                                        BASELINE     RESULT      INDEX

Execl Throughput                                43.0     2915.4      678.0
File Copy 1024 bufsize 2000 maxblocks         3960.0   346481.0      875.0
File Copy 256 bufsize 500 maxblocks           1655.0    99051.0      598.5
File Copy 4096 bufsize 8000 maxblocks         5800.0   854789.0     1473.8
Shell Scripts (8 concurrent)                     6.0     1097.7     1829.5
                                                                 =========
     FINAL SCORE                                                     991.3

== 2.6.26-rc2-mm1 + this set ==
Execl Throughput                           3012.9 lps   (29.9 secs, 3 samples)
C Compiler Throughput                       981.0 lpm   (60.0 secs, 3 samples)
Shell Scripts (1 concurrent)               5872.0 lpm   (60.0 secs, 3 samples)
Shell Scripts (8 concurrent)               1120.3 lpm   (60.0 secs, 3 samples)
Shell Scripts (16 concurrent)               578.0 lpm   (60.0 secs, 3 samples)
File Read 1024 bufsize 2000 maxblocks    1003993.0 KBps  (30.0 secs, 3 samples)
File Write 1024 bufsize 2000 maxblocks   550452.0 KBps  (30.0 secs, 3 samples)
File Copy 1024 bufsize 2000 maxblocks    347159.0 KBps  (30.0 secs, 3 samples)
File Read 256 bufsize 500 maxblocks      314644.0 KBps  (30.0 secs, 3 samples)
File Write 256 bufsize 500 maxblocks     151852.0 KBps  (30.0 secs, 3 samples)
File Copy 256 bufsize 500 maxblocks      101000.0 KBps  (30.0 secs, 3 samples)
File Read 4096 bufsize 8000 maxblocks    2033256.0 KBps  (30.0 secs, 3 samples)
File Write 4096 bufsize 8000 maxblocks   1611814.0 KBps  (30.0 secs, 3 samples)
File Copy 4096 bufsize 8000 maxblocks    847979.0 KBps  (30.0 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places         128148.7 lpm   (30.0 secs, 3 samples)

                     INDEX VALUES
TEST                                        BASELINE     RESULT      INDEX

Execl Throughput                                43.0     3012.9      700.7
File Copy 1024 bufsize 2000 maxblocks         3960.0   347159.0      876.7
File Copy 256 bufsize 500 maxblocks           1655.0   101000.0      610.3
File Copy 4096 bufsize 8000 maxblocks         5800.0   847979.0     1462.0
Shell Scripts (8 concurrent)                     6.0     1120.3     1867.2
                                                                 =========
     FINAL SCORE                                                    1004.6

This patch:

Remove refcnt from page_cgroup().

After this,

 * A page is charged only when !page_mapped() && no page_cgroup is assigned.
	* Anon page is newly mapped.
	* File page is added to mapping->tree.

 * A page is uncharged only when
	* Anon page is fully unmapped.
	* File page is removed from LRU.

There is no change in behavior from user's view.

This patch also removes unnecessary calls in rmap.c which was used only for
refcnt mangement.

[akpm@linux-foundation.org: fix warning]
[hugh@veritas.com: fix shmem_unuse_inode charging]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:37 -07:00
KAMEZAWA Hiroyuki
e8589cc189 memcg: better migration handling
This patch changes page migration under memory controller to use a
different algorithm.  (thanks to Christoph for new idea.)

Before:
 - page_cgroup is migrated from an old page to a new page.
After:
 - a new page is accounted , no reuse of page_cgroup.

Pros:

 - We can avoid compliated lock depndencies and races in migration.

Cons:

 - new param to mem_cgroup_charge_common().

 - mem_cgroup_getref() is added for handling ref_cnt ping-pong.

This version simplifies complicated lock dependency in page migraiton
under memory resource controller.

  new refcnt sequence is following.

a mapped page:
  prepage_migration() ..... +1 to NEW page
  try_to_unmap()      ..... all refs to OLD page is gone.
  move_pages()        ..... +1 to NEW page if page cache.
  remap...            ..... all refs from *map* is added to NEW one.
  end_migration()     ..... -1 to New page.

  page's mapcount + (page_is_cache) refs are added to NEW one.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:37 -07:00
Serge E. Hallyn
e885dcde75 cgroup_clone: use pid of newly created task for new cgroup
cgroup_clone creates a new cgroup with the pid of the task.  This works
correctly for unshare, but for clone cgroup_clone is called from
copy_namespaces inside copy_process, which happens before the new pid is
created.  As a result, the new cgroup was created with current's pid.
This patch:

	1. Moves the call inside copy_process to after the new pid
	   is created
	2. Passes the struct pid into ns_cgroup_clone (as it is not
	   yet attached to the task)
	3. Passes a name from ns_cgroup_clone() into cgroup_clone()
	   so as to keep cgroup_clone() itself simpler
	4. Uses pid_vnr() to get the process id value, so that the
	   pid used to name the new cgroup is always the pid as it
	   would be known to the task which did the cloning or
	   unsharing.  I think that is the most intuitive thing to
	   do.  This way, task t1 does clone(CLONE_NEWPID) to get
	   t2, which does clone(CLONE_NEWPID) to get t3, then the
	   cgroup for t3 will be named for the pid by which t2 knows
	   t3.

(Thanks to Dan Smith for finding the main bug)

Changelog:
	June 11: Incorporate Paul Menage's feedback:  don't pass
	         NULL to ns_cgroup_clone from unshare, and reduce
		 patch size by using 'nodename' in cgroup_clone.
	June 10: Original version

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Serge Hallyn <serge@us.ibm.com>
Acked-by: Paul Menage <menage@google.com>
Tested-by: Dan Smith <danms@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:37 -07:00
Paul Menage
856c13aa1f cgroup files: convert res_counter_write() to be a cgroups write_string() handler
Currently res_counter_write() is a raw file handler even though it's
ultimately taking a number, since in some cases it wants to
pre-process the string when converting it to a number.

This patch converts res_counter_write() from a raw file handler to a
write_string() handler; this allows some of the boilerplate
copying/locking/checking to be removed, and simplies the cleanup path,
since these functions are now performed by the cgroups framework.

[lizf@cn.fujitsu.com: build fix]
Signed-off-by: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:36 -07:00
Paul Menage
84eea84288 cgroups: misc cleanups to write_string patchset
This patch contains cleanups suggested by reviewers for the recent
write_string() patchset:

- pair cgroup_lock_live_group() with cgroup_unlock() in cgroup.c for
  clarity, rather than directly unlocking cgroup_mutex.

- make the return type of cgroup_lock_live_group() a bool

- use a #define'd constant for the local buffer size in read/write functions

Signed-off-by: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:35 -07:00
Paul Menage
e788e066c6 cgroup files: move the release_agent file to use typed handlers
Adds cgroup_release_agent_write() and cgroup_release_agent_show()
methods to handle writing/reading the path to a cgroup hierarchy's
release agent. As a result, cgroup_common_file_read() is now unnecessary.

As part of the change, a previously-tolerated race in
cgroup_release_agent() is avoided by copying the current
release_agent_path prior to calling call_usermode_helper().

Signed-off-by: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:35 -07:00
Paul Menage
db3b14978a cgroup files: add write_string cgroup control file method
This patch adds a write_string() method for cgroups control files. The
semantics are that a buffer is copied from userspace to kernelspace
and the handler function invoked on that buffer.  The buffer is
guaranteed to be nul-terminated, and no longer than max_write_len
(defaulting to 64 bytes if unspecified). Later patches will convert
existing raw file write handlers in control group subsystems to use
this method.

Signed-off-by: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Balbir Singh <balbir@in.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:35 -07:00
Paul Menage
ce16b49d37 cgroup files: clean up whitespace in struct cftype
This patch removes some extraneous spaces from method declarations in
struct cftype, to fit in with conventional kernel style.

Signed-off-by: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:35 -07:00
Pavel Emelyanov
f2992db2a4 Mark res_counter_charge(_locked) with __must_check
Ignoring their return values may result in counter underflow in the future -
when the value charged will be uncharged (or in "leaks" - when the value is
not uncharged).

This also prevents from using charging routines to decrement the
counter value (i.e. uncharge it) ;)

(Current code works OK with res_counter, however :) )

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:35 -07:00
Jan Kara
657d3bfa98 quota: implement sending information via netlink about user below quota
Sometimes it may be useful for userspace to know (e.g.  for some hosting
guys) that some user stopped exceeding his hardlimit or softlimit in
quotas.  Implement sending of such events to userspace via quota netlink
protocol so that they don't have to poll for such events.  Based on idea
and initial implementation by Vladislav Bogdanov.

Cc: Vladislav Bogdanov <slava@nsys.by>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:35 -07:00
Jan Kara
03b063436c quota: convert macros to inline functions
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:35 -07:00
Jan Kara
74abb9890d quota: move function-macros from quota.h to quotaops.h
Move declarations of some macros, which should be in fact functions to
quotaops.h.  This way they can be later converted to inline functions
because we can now use declarations from quota.h.  Also add necessary
includes of quotaops.h to a few files.

[akpm@linux-foundation.org: fix JFS build]
[akpm@linux-foundation.org: fix UFS build]
[vegard.nossum@gmail.com: fix QUOTA=n build]
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Arjen Pool <arjenpool@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:35 -07:00
Jan Kara
02a55ca871 quota: cleanup loop in sync_dquots()
Make loop in sync_dquots() checking whether there's something to write
more readable, remove useless variable and macro info_any_dirty() which
is used only in this place.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: "Vegard Nossum" <vegard.nossum@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:35 -07:00
Jan Kara
b85f4b87a5 quota: rename quota functions from upper case, make bigger ones non-inline
Cleanup quotaops.h: Rename functions from uppercase to lowercase (and
define backward compatibility macros), move larger functions to dquot.c
and make them non-inline.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:35 -07:00
Joe Peterson
b271e067c8 fatfs: add UTC timestamp option
Provide a new mount option ("tz=UTC") for DOS (vfat/msdos) filesystems,
allowing timestamps to be in coordinated universal time (UTC) rather than
local time in applications where doing this is advantageous.

In particular, portable devices that use fat/vfat (such as digital
cameras) can benefit from using UTC in their internal clocks, thus
avoiding daylight saving time errors and general time ambiguity issues.
The user of the device does not have to worry about changing the time when
moving from place or when daylight saving changes.

The new mount option, when set, disables the counter-adjustment that Linux
currently makes to FAT timestamp info in anticipation of the normal
userspace time zone correction.  When used in this new mode, all daylight
saving time and time zone handling is done in userspace as is normal for
many other filesystems (like ext3).  The default mode, which remains
unchanged, is still appropriate when mounting volumes written in Windows
(because of its use of local time).

I originally based this patch on one submitted last year by Paul Collins,
but I updated it to work with current source and changed variable/option
naming.  Ogawa Hirofumi (who maintains these filesystems) and I discussed
this patch at length on lkml, and he suggested using the option name in
the attached version of the patch.  Barry Bouwsma pointed out a good
addition to the patch as well.

Signed-off-by: Joe Peterson <joe@skyrush.com>
Signed-off-by: Paul Collins <paul@ondioline.org>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Barry Bouwsma <free_beer_for_all@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:34 -07:00
Adrian Bunk
e8938a62a8 remove unused #include <linux/dirent.h>'s
Remove some unused #include <linux/dirent.h>'s.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:34 -07:00
Adrian Bunk
cf6ae8b50e remove the in-kernel struct dirent{,64}
The kernel struct dirent{,64} were different from the ones in
userspace.

Even worse, we exported the kernel ones to userspace.

But after the fat usages are fixed we can remove the conflicting
kernel versions.

Reviewed-by: H. Peter Anvin <hpa@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:34 -07:00
Rene Scharfe
7557bc66be msdos fs: remove unsettable atari option
It has been impossible to set the option 'atari' of the MSDOS filesystem
for several years.  Since nobody seems to have missed it, let's remove its
remains.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:34 -07:00
OGAWA Hirofumi
4596c8aaf9 fat: fix VFAT_IOCTL_READDIR_xxx and cleanup for userland
"struct dirent" is a kernel type here, but is a **different type** in
userspace!  This means both the structure and the IOCTL number is wrong!

So, this adds new "struct __fat_dirent" to generate correct IOCTL number.
And kernel stuff moves to under __KERNEL__.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:34 -07:00
Jeff Mahoney
90415deac7 reiserfs: convert j_commit_lock to mutex
j_commit_lock is a semaphore but uses it as if it were a mutex.  This patch
converts it to a mutex.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Edward Shishkin <edward.shishkin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:33 -07:00
Jeff Mahoney
afe7025907 reiserfs: convert j_flush_sem to mutex
j_flush_sem is a semaphore but uses it as if it were a mutex.  This patch
converts it to a mutex.

[akpm@linux-foundation.org: fix mutex_trylock retval treatment]
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Edward Shishkin <edward.shishkin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:33 -07:00
Jeff Mahoney
f68215c464 reiserfs: convert j_lock to mutex
j_lock is a semaphore but uses it as if it were a mutex.  This patch converts
it to a mutex.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Edward Shishkin <edward.shishkin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:33 -07:00
Adrian Bunk
de0ca06a99 coda: remove CODA_FS_OLD_API
While fixing CONFIG_ leakages to the userspace kernel headers I ran into
CODA_FS_OLD_API.

After five years, are there still people using the old API left?
Especially considering that you have to choose at compile time which API
to support in the kernel (and distributions tend to offer the new API for
some time).

Jan: "The old API can definitely go.  Around the time the new
      interface went in there were some non-Coda userspace file system
      implementations that took a while longer to convert to the new API,
      but by now they all switched to the new interface or in some cases
      to a FUSE-based solution."

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:33 -07:00
Duane Griffin
ae76dd9a6b ext3: handle corrupted orphan list at mount
If the orphan node list includes valid, untruncatable nodes with nlink > 0
the ext3_orphan_cleanup loop which attempts to delete them will not do so,
causing it to loop forever. Fix by checking for such nodes in the
ext3_orphan_get function.

This patch fixes the second case (image hdb.20000009.softlockup.gz)
reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: printk warning fix]
Signed-off-by: Duane Griffin <duaneg@dghda.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:32 -07:00
Samuel Thibault
50c33a84db ext2: fix typo in Hurd part of include/linux/ext2_fs.h
Fix typo in Hurd part of include/linux/ext2_fs.h

The ';' here is redundant or can even pose problem.  This is actually not
used by the Linux kernel, but it is exposed in GNU/Hurd.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:31 -07:00
Eric Miao
bbcd6d543d gpio: max732x driver
This adds a driver supporting a family of I2C port expanders from Maxim,
which includes the MAX7319 and MAX7320-7327 chips.

[dbrownell@users.sourceforge.net: minor fixes]
Signed-off-by: Jack Ren <jack.ren@marvell.com>
Signed-off-by: Eric Miao <eric.miao@marvell.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:30 -07:00
David Brownell
8f1cc3b10e gpio: mcp23s08 handles multiple chips per chipselect
Teach the mcp23s08 driver about a curious feature of these chips: up to
four of them can share the same chipselect, with the SPI signals wired in
parallel, by matching two bits in the first protocol byte against two
address lines on the chip.

This is handled by three software changes:

  * Platform data now holds an array of per-chip structs, not
    just one chip's address and pullup configuration.

  * Probe() and remove() now use another level of structure,
    wrapping an instance of the original structure for each
    mcp23s08 chip sharing that chipselect.

  * The HAEN bit is set, so that the hardware address bits can no
    longer be ignored (boot firmware may not have enabled them).

The "one struct per chip" preserves the guts of the current code,
but platform_data will need minor changes.

    OLD:
	/* incorrect "slave" ID may not have mattered */
	.slave = 3,
	.pullups = BIT(3) | BIT(1) | BIT(0),

    NEW:
	/* slave address _must_ match chip's wiring */
	.chip[3] = {
		.is_present = true,
		.pullups = BIT(3) | BIT(1) | BIT(0),
	},

There's no change in how things _behave_ for spi_device nodes with a
single mcp23s08 chip.  New multi-chip configurations assign GPIOs in
sequence, without holes.  The spi_device just resembles a bigger
controller, but internally it has multiple gpio_chip instances.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:30 -07:00
David Brownell
d8f388d8dc gpio: sysfs interface
This adds a simple sysfs interface for GPIOs.

    /sys/class/gpio
    	/export ... asks the kernel to export a GPIO to userspace
    	/unexport ... to return a GPIO to the kernel
        /gpioN ... for each exported GPIO #N
	    /value ... always readable, writes fail for input GPIOs
	    /direction ... r/w as: in, out (default low); write high, low
	/gpiochipN ... for each gpiochip; #N is its first GPIO
	    /base ... (r/o) same as N
	    /label ... (r/o) descriptive, not necessarily unique
	    /ngpio ... (r/o) number of GPIOs; numbered N .. N+(ngpio - 1)

GPIOs claimed by kernel code may be exported by its owner using a new
gpio_export() call, which should be most useful for driver debugging.
Such exports may optionally be done without a "direction" attribute.

Userspace may ask to take over a GPIO by writing to a sysfs control file,
helping to cope with incomplete board support or other "one-off"
requirements that don't merit full kernel support:

  echo 23 > /sys/class/gpio/export
	... will gpio_request(23, "sysfs") and gpio_export(23);
	use /sys/class/gpio/gpio-23/direction to (re)configure it,
	when that GPIO can be used as both input and output.
  echo 23 > /sys/class/gpio/unexport
	... will gpio_free(23), when it was exported as above

The extra D-space footprint is a few hundred bytes, except for the sysfs
resources associated with each exported GPIO.  The additional I-space
footprint is about two thirds of the current size of gpiolib (!).  Since
no /dev node creation is involved, no "udev" support is needed.

Related changes:

  * This adds a device pointer to "struct gpio_chip".  When GPIO
    providers initialize that, sysfs gpio class devices become children of
    that device instead of being "virtual" devices.

  * The (few) gpio_chip providers which have such a device node have
    been updated.

  * Some gpio_chip drivers also needed to update their module "owner"
    field ...  for which missing kerneldoc was added.

  * Some gpio_chips don't support input GPIOs.  Those GPIOs are now
    flagged appropriately when the chip is registered.

Based on previous patches, and discussion both on and off LKML.

A Documentation/ABI/testing/sysfs-gpio update is ready to submit once this
merges to mainline.

[akpm@linux-foundation.org: a few maintenance build fixes]
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de>
Cc: Greg KH <greg@kroah.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:30 -07:00
Srinivasa D S
ef53d9c5e4 kprobes: improve kretprobe scalability with hashed locking
Currently list of kretprobe instances are stored in kretprobe object (as
used_instances,free_instances) and in kretprobe hash table.  We have one
global kretprobe lock to serialise the access to these lists.  This causes
only one kretprobe handler to execute at a time.  Hence affects system
performance, particularly on SMP systems and when return probe is set on
lot of functions (like on all systemcalls).

Solution proposed here gives fine-grain locks that performs better on SMP
system compared to present kretprobe implementation.

Solution:

 1) Instead of having one global lock to protect kretprobe instances
    present in kretprobe object and kretprobe hash table.  We will have
    two locks, one lock for protecting kretprobe hash table and another
    lock for kretporbe object.

 2) We hold lock present in kretprobe object while we modify kretprobe
    instance in kretprobe object and we hold per-hash-list lock while
    modifying kretprobe instances present in that hash list.  To prevent
    deadlock, we never grab a per-hash-list lock while holding a kretprobe
    lock.

 3) We can remove used_instances from struct kretprobe, as we can
    track used instances of kretprobe instances using kretprobe hash
    table.

Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
with return probes set on all systemcalls looks like this.

cacheline              non-cacheline             Un-patched kernel
aligned patch 	       aligned patch
===============================================================================
real    9m46.784s       9m54.412s                  10m2.450s
user    40m5.715s       40m7.142s                  40m4.273s
sys     2m57.754s       2m58.583s                  3m17.430s
===========================================================

Time duration for kernel compilation ("make -j 8) on the same system, when
kernel is not probed.
=========================
real    9m26.389s
user    40m8.775s
sys     2m7.283s
=========================

Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:30 -07:00
Ben Dooks
42cd2366fb sm501: gpio I2C support
Add support for adding the GPIO based I2C resources.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Cc: Arnaud Patard <apatard@mandriva.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:30 -07:00
Arnaud Patard
60e540d617 sm501: gpio dynamic registration for PCI devices
The SM501 PCI card requires a dyanmic gpio allocation as the number of
cards is not known at compile time.  Fixup the platform data and
registration to deal with this.

Acked-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Arnaud Patard <apatard@mandriva.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:30 -07:00
Ben Dooks
f61be273d3 sm501: add gpiolib support
Add support for exporting the GPIOs on the SM501 via gpiolib.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Cc: Arnaud Patard <apatard@mandriva.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:29 -07:00
Ben Dooks
472dba7d11 sm501: add power control callback
Add callback to get or set the power control if the device has the sleep
connected to some form of GPIO.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Cc: Arnaud Patard <apatard@mandriva.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:29 -07:00
Dave Young
717115e1a5 printk ratelimiting rewrite
All ratelimit user use same jiffies and burst params, so some messages
(callbacks) will be lost.

For example:
a call printk_ratelimit(5 * HZ, 1)
b call printk_ratelimit(5 * HZ, 1) before the 5*HZ timeout of a, then b will
will be supressed.

- rewrite __ratelimit, and use a ratelimit_state as parameter.  Thanks for
  hints from andrew.

- Add WARN_ON_RATELIMIT, update rcupreempt.h

- remove __printk_ratelimit

- use __ratelimit in net_ratelimit

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:29 -07:00
Vegard Nossum
2711b793eb kallsyms: unify 32- and 64-bit code
Use the %p format string which already accounts for the padding you need
with a pointer type on a particular architecture.

Also replace the macro with a static inline function to match the rest of
the file.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:29 -07:00
Arjan van de Ven
b6c6393700 Rename WARN() to WARNING() to clear the namespace
We want to use WARN() as a variant of WARN_ON(), however a few drivers are
using WARN() internally.  This patch renames these to WARNING() to avoid the
namespace clash.  A few cases were defining but not using the thing, for those
cases I just deleted the definition.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Greg KH <greg@kroah.com>
Cc: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:29 -07:00
Robert P. J. Day
4500d067ee init.h: remove obsolete content
Remove apparently obsolete content from init.h referring to gcc 2.9x
and to "no_module_init".

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:28 -07:00
KOSAKI Motohiro
ac331d158e call_usermodehelper(): increase reliability
Presently call_usermodehelper_setup() uses GFP_ATOMIC.  but it can return
NULL _very_ easily.

GFP_ATOMIC is needed only when we can't sleep.  and, GFP_KERNEL is robust
and better.

thus, I add gfp_mask argument to call_usermodehelper_setup().

So, its callers pass the gfp_t as below:

call_usermodehelper() and call_usermodehelper_keys():
	depend on 'wait' argument.
call_usermodehelper_pipe():
	always GFP_KERNEL because always run under process context.
orderly_poweroff():
	pass to GFP_ATOMIC because may run under interrupt context.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: "Paul Menage" <menage@google.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:28 -07:00
Andrew Morton
cebbd3fb80 build-kernel-profileo-only-when-requested-cleanups
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:27 -07:00
Adrian Bunk
b03f6489f9 build kernel/profile.o only when requested
Build kernel/profile.o only if CONFIG_PROFILING is enabled.

This makes CONFIG_PROFILING=n kernels smaller.

As a bonus, some profile_tick() calls and one branch from schedule() are
now eliminated with CONFIG_PROFILING=n (but I doubt these are
measurable effects).

This patch changes the effects of CONFIG_PROFILING=n, but I don't think
having more than two choices would be the better choice.

This patch also adds the name of the first parameter to the prototypes
of profile_{hits,tick}() since I anyway had to add them for the dummy
functions.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:27 -07:00
Robert P. J. Day
e0ce0da9fe lists: remove a redundant conditional definition of list_add()
Remove the conditional surrounding the definition of list_add() from list.h
since, if you define CONFIG_DEBUG_LIST, the definition you will subsequently
pick up from lib/list_debug.c will be absolutely identical, at which point you
can remove that redundant definition from list_debug.c as well.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:27 -07:00
Robert P. J. Day
b39c08cb69 Remove apparently unused fd1772.h header file.
This header file has been unused for quite some time, and the
corresponding source files appear to have been removed back in commit
99eb8a550d ("Remove the arm26 port")

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Cc: Adrian Bunk <bunk@stusta.de>
Cc: Ian Molton <spyro@f2s.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:27 -07:00
Harvey Harrison
8b5ac31e27 include: use get/put_unaligned_* helpers
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:26 -07:00
Steven Rostedt
3f307891ce locking: add typecheck on irqsave and friends for correct flags
There haave been several areas in the kernel where an int has been used for
flags in local_irq_save() and friends instead of a long.  This can cause some
hard to debug problems on some architectures.

This patch adds a typecheck inside the irqsave and restore functions to flag
these cases.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:26 -07:00
Andrew Morton
e0deaff470 split the typecheck macros out of include/linux/kernel.h
Needed to fix up a recursive include snafu in
locking-add-typecheck-on-irqsave-and-friends-for-correct-flags.patch

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:26 -07:00
Yevgeny Petrilin
25c94d010a mlx4_core: Add VLAN tag field to WQE control segment struct
Add fields for VLAN tag and insert VLAN tag flag to the control
section struct.  These fields will be used for sending ethernet
packets.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-25 10:30:06 -07:00
David Miller
3d6f4a20cc endian: Always evaluate arguments.
Changeset 7fa897b91a ("ide: trivial sparse
annotations") created an IDE bootup regression on big-endian systems.

In drivers/ide/ide-iops.c, function ide_fixstring() we now have the
loop:

		for (p = end ; p != s;)
			be16_to_cpus((u16 *)(p -= 2));

which will never terminate on big-endian because in such
a configuration be16_to_cpus() evaluates to "do { } while (0)"

Therefore, always evaluate the arguments to nop endian transformation
operations.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 09:28:09 -07:00
Alexey Korolev
3d45955962 [MTD] [NAND] subpage read feature as a way to increase performance.
This patch enables NAND subpage read functionality.
If upper layer drivers are requesting to read non page aligned data NAND
subpage-read functionality reads the only whose ECC regions which include
requested data when original code reads whole page.
This significantly improves performance in many cases.

Here are some digits :

UBI volume mount time
No subpage reads: 5.75 seconds
Subpage read patch: 2.42 seconds

Open/stat time for files on JFFS2 volume:
No subpage read  0m 5.36s
Subpage read     0m 2.88s

Signed-off-by Alexey Korolev <akorolev@infradead.org>
Acked-by: Artem Bityutskiy <dedekind@infradead.org>
Acked-by: Jörn Engel <joern@logfs.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2008-07-25 10:49:50 -04:00
David Woodhouse
ff877ea80e Merge branch 'linux-next' of git://git.infradead.org/~dedekind/ubi-2.6 2008-07-25 10:40:14 -04:00
Stefan Richter
95984f62c9 firewire: fw-ohci: TSB43AB22/A dualbuffer workaround
Isochronous reception in dualbuffer mode is reportedly broken with
TI TSB43AB22A on x86-64.  Descriptor addresses above 2G have been
determined as the trigger:
https://bugzilla.redhat.com/show_bug.cgi?id=435550

Two fixes are possible:
  - pci_set_consistent_dma_mask(pdev, DMA_31BIT_MASK);
    at least when IR descriptors are allocated, or
  - simply don't use dualbuffer.
This fix implements the latter workaround.

But we keep using dualbuffer on x86-32 which won't give us highmen (and
thus physical addresses outside the 31bit range) in coherent DMA memory
allocations.  Right now we could for example also whitelist PPC32, but
DMA mapping implementation details are expected to change there.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
2008-07-25 15:41:23 +02:00
Ingo Molnar
10a010f695 Merge branch 'linus' into x86/x2apic
Conflicts:

	drivers/pci/dmar.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-25 13:08:16 +02:00
Nathan Lynch
483fad1c3f ELF loader support for auxvec base platform string
Some IBM POWER-based platforms have the ability to run in a
mode which mostly appears to the OS as a different processor from the
actual hardware.  For example, a Power6 system may appear to be a
Power5+, which makes the AT_PLATFORM value "power5+".  This means that
programs are restricted to the ISA supported by Power5+;
Power6-specific instructions are treated as illegal.

However, some applications (virtual machines, optimized libraries) can
benefit from knowledge of the underlying CPU model.  A new aux vector
entry, AT_BASE_PLATFORM, will denote the actual hardware.  For
example, on a Power6 system in Power5+ compatibility mode, AT_PLATFORM
will be "power5+" and AT_BASE_PLATFORM will be "power6".  The idea is
that AT_PLATFORM indicates the instruction set supported, while
AT_BASE_PLATFORM indicates the underlying microarchitecture.

If the architecture has defined ELF_BASE_PLATFORM, copy that value to
the user stack in the same manner as ELF_PLATFORM.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-25 15:44:39 +10:00
Linus Torvalds
832fe9c222 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  virtio: Add transport feature handling stub for virtio_ring.
  virtio: Rename set_features to finalize_features
  virtio: Formally reserve bits 28-31 to be 'transport' features.
  s390: use virtio_console for KVM on s390
  virtio: console as a config option
  virtio_console: use virtqueue notification for hvc_console
  hvc_console: rework setup to replace irq functions with callbacks
  virtio_blk: check for hardsector size from host
  virtio: Use bus_type probe and remove methods
  virtio: don't always force a notification when ring is full
  virtio: clarify that ABI is usable by any implementations
  virtio: Recycle unused recv buffer pages for large skbs in net driver
  virtio net: Allow receiving SG packets
  virtio net: Add ethtool ops for SG/GSO
  virtio: fix virtio_net xmit of freed skb bug
2008-07-24 19:11:49 -07:00
Rusty Russell
ed9559d38a Label kthread_create() with printf attribute tag.
Obvious misc patch been in my queue (& linux-next) for over a cycle.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 19:11:15 -07:00
Rusty Russell
e34f872567 virtio: Add transport feature handling stub for virtio_ring.
To prepare for virtio_ring transport feature bits, hook in a call in
all the users to manipulate them.  This currently just clears all the
bits, since it doesn't understand any features.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-25 12:06:14 +10:00
Rusty Russell
c624896e48 virtio: Rename set_features to finalize_features
Rather than explicitly handing the features to the lower-level, we just
hand the virtio_device and have it set the features.  This make it clear
that it has the chance to manipulate the features of the device at this
point (and that all feature negotiation is already done).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-25 12:06:12 +10:00
Rusty Russell
dd7c7bc462 virtio: Formally reserve bits 28-31 to be 'transport' features.
We assign feature bits as required, but it makes sense to reserve some
for the particular transport, rather than the particular device.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-25 12:06:07 +10:00
Christian Borntraeger
066f4d82a6 virtio_blk: check for hardsector size from host
Currently virtio_blk assumes a 512 byte hard sector size. This can cause
trouble / performance issues if the backing has a different block size
(like a file on an ext3 file system formatted with 4k block size or a dasd).

Lets add a feature flag that tells the guest to use a different hard sector
size than 512 byte.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-25 12:06:05 +10:00
Rusty Russell
674bfc23c5 virtio: clarify that ABI is usable by any implementations
We want others to implement and use virtio, so it makes sense to BSD
license the non-__KERNEL__ parts of the headers to make this crystal
clear.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Mark McLoughlin <markmc@redhat.com>
Acked-by: Ryan Harper <ryanh@us.ibm.com>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
Acked-by: Anthony Liguori <aliguori@us.ibm.com>
2008-07-25 12:06:04 +10:00
Linus Torvalds
b5684b83b1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (76 commits)
  ide: use proper printk() KERN_* levels in ide-probe.c
  ide: fix for EATA SCSI HBA in ATA emulating mode
  ide: remove stale comments from drivers/ide/Makefile
  ide: enable local IRQs in all handlers for TASKFILE_NO_DATA data phase
  ide-scsi: remove kmalloced struct request
  ht6560b: remove old history
  ht6560b: update email address
  ide-cd: fix oops when using growisofs
  gayle: release resources on ide_host_add() failure
  palm_bk3710: add UltraDMA/100 support
  ide: trivial sparse annotations
  ide: ide-tape.c sparse annotations and unaligned access removal
  ide: drop 'name' parameter from ->init_chipset method
  ide: prefix messages from IDE PCI host drivers by driver name
  it821x: remove DECLARE_ITE_DEV() macro
  it8213: remove DECLARE_ITE_DEV() macro
  ide: include PCI device name in messages from IDE PCI host drivers
  ide: remove <asm/ide.h> for some archs
  ide-generic: remove ide_default_{io_base,irq}() inlines (take 3)
  ide-generic: is no longer needed on ppc32
  ...
2008-07-24 14:55:09 -07:00
Linus Torvalds
5042d99795 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: fixup sparse endianness warnings in proc.c
  PCI PM: make more PCI PM core functionality available to drivers
  PCI/DMAR: don't assume presence of RMRRs
  PCI hotplug: fix error path in pci_slot's register_slot
2008-07-24 13:57:13 -07:00
Bartlomiej Zolnierkiewicz
a326b02b0c ide: drop 'name' parameter from ->init_chipset method
There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-24 22:53:33 +02:00
Bartlomiej Zolnierkiewicz
2a8f7450f8 ide: remove <asm/ide.h> for some archs
* Remove <linux/irq.h> include from <asm-ia64.h> (<linux/ide.h> includes
  <linux/interrupt.h> which is enough).

* Remove <asm/ide.h> for alpha/blackfin/h8300/ia64/m32r/sh/x86/xtensa
  (this leaves us with arm/frv/m68k/mips/mn10300/parisc/powerpc/sparc[64]).

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-24 22:53:31 +02:00
Bartlomiej Zolnierkiewicz
d83b8b85cd ide: define MAX_HWIFS in <linux/ide.h>
* Now that ide_hwif_t instances are allocated dynamically
  the difference between MAX_HWIFS == 2 and MAX_HWIFS == 10
  is ~100 bytes (x86-32) so use MAX_HWIFS == 10 on all archs
  except these ones that use MAX_HWIFS == 1.

* Define MAX_HWIFS in <linux/ide.h> instead of <asm/ide.h>.

[ Please note that avr32/cris/v850 have no <asm/ide.h>
  and alpha/ia64/sh always define CONFIG_IDE_MAX_HWIFS. ]

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-24 22:53:30 +02:00
Bartlomiej Zolnierkiewicz
ef0b04276d ide: add ide_pci_remove() helper
* Add 'unsigned long host_flags' field to struct ide_host.

* Set ->host_flags in ide_host_alloc_all().

* Always set PCI dev's ->driver_data in ide_pci_init_{one,two}().

* Add ide_pci_remove() helper (the default implementation for
  struct pci_driver's ->remove method).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-24 22:53:19 +02:00
Bartlomiej Zolnierkiewicz
08da591e14 ide: add ide_device_{get,put}() helpers
* Add 'struct ide_host *host' field to ide_hwif_t and set it
  in ide_host_alloc_all().

* Add ide_device_{get,put}() helpers loosely based on SCSI's
  scsi_device_{get,put}() ones.

* Convert IDE device drivers to use ide_device_{get,put}().

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-24 22:53:15 +02:00
Bartlomiej Zolnierkiewicz
6cdf6eb357 ide: add ->dev and ->host_priv fields to struct ide_host
* Add 'struct device *dev[2]' and 'void *host_priv' fields
  to struct ide_host.

* Set ->dev[] in ide_host_alloc_all()/ide_setup_pci_device[s]().

* Pass 'void *priv' argument to ide_setup_pci_device[s]()
  and use it to set ->host_priv.

* Set PCI dev's ->driver_data to point to the struct ide_host
  instance if PCI host driver wants to use ->host_priv.

* Rename ide_setup_pci_device[s]() to ide_pci_init_{one,two}().

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-24 22:53:14 +02:00
Linus Torvalds
5c402355ad Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  MAINTAINERS: Remove Glenn Streiff from NetEffect entry
  mlx4_core: Improve error message when not enough UAR pages are available
  IB/mlx4: Add support for memory management extensions and local DMA L_Key
  IB/mthca: Keep free count for MTT buddy allocator
  mlx4_core: Keep free count for MTT buddy allocator
  mlx4_code: Add missing FW status return code
  IB/mlx4: Rename struct mlx4_lso_seg to mlx4_wqe_lso_seg
  mlx4_core: Add module parameter to enable QoS support
  RDMA/iwcm: Remove IB_ACCESS_LOCAL_WRITE from remote QP attributes
  IPoIB: Include err code in trace message for ib_sa_path_rec_get() failures
  IB/sa_query: Check if sm_ah is NULL in ib_sa_remove_one()
  IB/ehca: Release mutex in error path of alloc_small_queue_page()
  IB/ehca: Use default value for Local CA ACK Delay if FW returns 0
  IB/ehca: Filter PATH_MIG events if QP was never armed
  IB/iser: Add support for RDMA_CM_EVENT_ADDR_CHANGE event
  RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event
  RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event
2008-07-24 12:56:07 -07:00
Linus Torvalds
ecc8b655b3 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  nohz: adjust tick_nohz_stop_sched_tick() call of s390 as well
  nohz: prevent tick stop outside of the idle loop
2008-07-24 12:55:01 -07:00
Linus Torvalds
7540081c6b Merge branch 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc
* 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc:
  Remove __DECLARE_SEMAPHORE_GENERIC
  Remove asm/semaphore.h
  Remove use of asm/semaphore.h
  Add missing semaphore.h includes
  Remove mention of semaphores from kernel-locking
2008-07-24 12:24:40 -07:00
Linus Torvalds
c54554d388 Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds:
  leds: Ensure led->trigger is set earlier
  leds: Add support for Philips PCA955x I2C LED drivers
  leds: Fix sparse warnings in leds-h1940 driver
  leds: mark led_classdev.default_trigger as const
  leds: fix unsigned value overflow in atmel pwm driver
  leds: Add pca9532 platform data for Thecus N2100
  leds: Add pca9532 led driver
2008-07-24 12:16:02 -07:00
Steven Whitehouse
f9247273cb UFS: add const to parser token table
This patch adds a "const" to the parser token table. I've done an
allmodconfig build to see if this produces any warnings/failures and the
patch includes a fix for the only warning that was produced.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Alexander Viro <aviro@redhat.com>
Acked-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 11:50:15 -07:00
Philippe De Muyter
5bb49fcd50 video/fb: cleanup FB_MAJOR usage
Currently, linux/major.h defines a GRAPHDEV_MAJOR (29) that nobody uses,
and linux/fb.h defines the real FB_MAJOR (also 29), that only fbmem.c
needs.  Drop GRAPHDEV_MAJOR from major.h, move FB_MAJOR definition from
fb.h to major.h, and fix fbmem.c to use major.h's definition.

Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:41 -07:00
Hans-Christian Egtvedt
3e074058d7 fbdev: LCD backlight driver using Atmel PWM driver
This patch adds a platform driver using the ATMEL PWM driver to control a
backlight which requires a PWM signal and optional GPIO signal for discrete
on/off signal.  It has been tested on Favr-32 board from EarthLCD.

The driver is configurable by supplying a struct with the platform data.  See
the include/linux/atmel-pwm-bl.h for details.

The board code for Favr-32 will be submitted to the AVR32 kernel list.

Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:41 -07:00
Ben Dooks
0c531360ed lcd: add lcd_device to check_fb() entry in lcd_ops
Add the lcd_device being checked to the check_fb entry of lcd_ops.  This
ensures that any driver using this to check against it's own state can do
so, and also makes all the calls in lcd_ops more orthogonal in their
arguments.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:40 -07:00
Ben Dooks
206c5d69d0 sm501: add inversion controls for VBIASEN and FPEN
Add flags to allow the driver to invert the sense of both VBIASEN and FPEN
signals comming from the SM501.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:40 -07:00
Krzysztof Helt
01a2d9ed85 tridentfb: acceleration constants change
This patch replaces deprecated constant FB_ACCELF_TEXT with
FBINFO_HWACCEL_DISABLED and adds constants for Trident families of
accelerators.

The FBINFO_HWACCEL_DISABLED is correctly used so noaccel parameter works
now.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:36 -07:00
David Brownell
d3de851a44 rtc: BCD codeshrink
This updates <linux/bcd.h> to define the key routines as constant
functions, which the macros will then call.  Newer code can now call
bcd2bin() instead of SCREAMING BCD2BIN() TO THE FOUR WINDS.

This lets each driver shrink their codespace by using N function calls to
a single (global) copy of those routines, instead of N inlined copies of
these functions per driver.

These routines aren't used in speed-critical code.  Almost all callers are
in the RTC framework.  Typical per-driver savings is near 300 bytes.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Adrian Bunk <bunk@kernel.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:33 -07:00
David Brownell
53e84b672c rtc: ds1305/ds1306 driver
Support the Dallas/Maxim DS1305 and DS1306 RTC chips.  These use SPI, and
support alarms, NVRAM, and a trickle charger for use when their backup
power supply is a supercap or rechargeable cell.

This basic driver doesn't yet support suspend/resume or wakealarms.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:33 -07:00
David Brownell
5ad31a5751 rtc: remove BKL for ioctl()
Remove implicit use of BKL in ioctl() from the RTC framework.

Instead, the rtc->ops_lock is used.  That's the same lock that already
protects the RTC operations when they're issued through the exported
rtc_*() calls in drivers/rtc/interface.c ...  making this a bugfix, not
just a cleanup, since both ioctl calls and set_alarm() need to update IRQ
enable flags and that implies a common lock (which RTC drivers as a rule
do not provide on their own).

A new comment at the declaration of "struct rtc_class_ops" summarizes
current locking rules.  It's not clear to me that the exceptions listed
there should exist ...  if not, those are pre-existing problems which can
be fixed in a patch that doesn't relate to BKL removal.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Jonathan Corbet <corbet@lwn.net>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:33 -07:00
Ian Kent
aa55ddf340 autofs4: remove unused ioctls
The ioctls AUTOFS_IOC_TOGGLEREGHOST and AUTOFS_IOC_ASKREGHOST were added
several years ago but what they were intended for has never been
implemented (as far as I'm aware noone uses them) so remove them.

Signed-off-by: Ian Kent <raven@themaw.net>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:33 -07:00
Grant Likely
102eb97564 spi: make spi_board_info.modalias a char array
Currently, 'modalias' in the spi_device structure is a 'const char *'.
The spi_new_device() function fills in the modalias value from a passed in
spi_board_info data block.  Since it is a pointer copy, the new spi_device
remains dependent on the spi_board_info structure after the new spi_device
is registered (no other fields in spi_device directly depend on the
spi_board_info structure; all of the other data is copied).

This causes a problem when dynamically propulating the list of attached
SPI devices.  For example, in arch/powerpc, the list of SPI devices can be
populated from data in the device tree.  With the current code, the device
tree adapter must kmalloc() a new spi_board_info structure for each new
SPI device it finds in the device tree, and there is no simple mechanism
in place for keeping track of these allocations.

This patch changes modalias from a 'const char *' to a fixed char array.
By copying the modalias string instead of referencing it, the dependency
on the spi_board_info structure is eliminated and an outside caller does
not need to maintain a separate spi_board_info allocation for each device.

If searched through the code to the best of my ability for any references
to modalias which may be affected by this change and haven't found
anything.  It has been tested with the lite5200b platform in arch/powerpc.

[dbrownell@users.sourceforge.net: cope with linux-next changes: KOBJ_NAME_LEN obliterated, etc]
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:30 -07:00
Ulrich Drepper
9fe5ad9c8c flag parameters add-on: remove epoll_create size param
Remove the size parameter from the new epoll_create syscall and renames the
syscall itself.  The updated test program follows.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <sys/syscall.h>

#ifndef __NR_epoll_create2
# ifdef __x86_64__
#  define __NR_epoll_create2 291
# elif defined __i386__
#  define __NR_epoll_create2 329
# else
#  error "need __NR_epoll_create2"
# endif
#endif

#define EPOLL_CLOEXEC O_CLOEXEC

int
main (void)
{
  int fd = syscall (__NR_epoll_create2, 0);
  if (fd == -1)
    {
      puts ("epoll_create2(0) failed");
      return 1;
    }
  int coe = fcntl (fd, F_GETFD);
  if (coe == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if (coe & FD_CLOEXEC)
    {
      puts ("epoll_create2(0) set close-on-exec flag");
      return 1;
    }
  close (fd);

  fd = syscall (__NR_epoll_create2, EPOLL_CLOEXEC);
  if (fd == -1)
    {
      puts ("epoll_create2(EPOLL_CLOEXEC) failed");
      return 1;
    }
  coe = fcntl (fd, F_GETFD);
  if (coe == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if ((coe & FD_CLOEXEC) == 0)
    {
      puts ("epoll_create2(EPOLL_CLOEXEC) set close-on-exec flag");
      return 1;
    }
  close (fd);

  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:29 -07:00
Ulrich Drepper
510df2dd48 flag parameters: NONBLOCK in inotify_init
This patch adds non-blocking support for inotify_init1.  The
additional changes needed are minimal.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>

#ifndef __NR_inotify_init1
# ifdef __x86_64__
#  define __NR_inotify_init1 294
# elif defined __i386__
#  define __NR_inotify_init1 332
# else
#  error "need __NR_inotify_init1"
# endif
#endif

#define IN_NONBLOCK O_NONBLOCK

int
main (void)
{
  int fd = syscall (__NR_inotify_init1, 0);
  if (fd == -1)
    {
      puts ("inotify_init1(0) failed");
      return 1;
    }
  int fl = fcntl (fd, F_GETFL);
  if (fl == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if (fl & O_NONBLOCK)
    {
      puts ("inotify_init1(0) set non-blocking mode");
      return 1;
    }
  close (fd);

  fd = syscall (__NR_inotify_init1, IN_NONBLOCK);
  if (fd == -1)
    {
      puts ("inotify_init1(IN_NONBLOCK) failed");
      return 1;
    }
  fl = fcntl (fd, F_GETFL);
  if (fl == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if ((fl & O_NONBLOCK) == 0)
    {
      puts ("inotify_init1(IN_NONBLOCK) set non-blocking mode");
      return 1;
    }
  close (fd);

  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:29 -07:00
Ulrich Drepper
be61a86d72 flag parameters: NONBLOCK in pipe
This patch adds O_NONBLOCK support to pipe2.  It is minimally more involved
than the patches for eventfd et.al but still trivial.  The interfaces of the
create_write_pipe and create_read_pipe helper functions were changed and the
one other caller as well.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>

#ifndef __NR_pipe2
# ifdef __x86_64__
#  define __NR_pipe2 293
# elif defined __i386__
#  define __NR_pipe2 331
# else
#  error "need __NR_pipe2"
# endif
#endif

int
main (void)
{
  int fds[2];
  if (syscall (__NR_pipe2, fds, 0) == -1)
    {
      puts ("pipe2(0) failed");
      return 1;
    }
  for (int i = 0; i < 2; ++i)
    {
      int fl = fcntl (fds[i], F_GETFL);
      if (fl == -1)
        {
          puts ("fcntl failed");
          return 1;
        }
      if (fl & O_NONBLOCK)
        {
          printf ("pipe2(0) set non-blocking mode for fds[%d]\n", i);
          return 1;
        }
      close (fds[i]);
    }

  if (syscall (__NR_pipe2, fds, O_NONBLOCK) == -1)
    {
      puts ("pipe2(O_NONBLOCK) failed");
      return 1;
    }
  for (int i = 0; i < 2; ++i)
    {
      int fl = fcntl (fds[i], F_GETFL);
      if (fl == -1)
        {
          puts ("fcntl failed");
          return 1;
        }
      if ((fl & O_NONBLOCK) == 0)
        {
          printf ("pipe2(O_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i);
          return 1;
        }
      close (fds[i]);
    }

  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:29 -07:00
Ulrich Drepper
6b1ef0e60d flag parameters: NONBLOCK in timerfd_create
This patch adds support for the TFD_NONBLOCK flag to timerfd_create.  The
additional changes needed are minimal.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <sys/syscall.h>

#ifndef __NR_timerfd_create
# ifdef __x86_64__
#  define __NR_timerfd_create 283
# elif defined __i386__
#  define __NR_timerfd_create 322
# else
#  error "need __NR_timerfd_create"
# endif
#endif

#define TFD_NONBLOCK O_NONBLOCK

int
main (void)
{
  int fd = syscall (__NR_timerfd_create, CLOCK_REALTIME, 0);
  if (fd == -1)
    {
      puts ("timerfd_create(0) failed");
      return 1;
    }
  int fl = fcntl (fd, F_GETFL);
  if (fl == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if (fl & O_NONBLOCK)
    {
      puts ("timerfd_create(0) set non-blocking mode");
      return 1;
    }
  close (fd);

  fd = syscall (__NR_timerfd_create, CLOCK_REALTIME, TFD_NONBLOCK);
  if (fd == -1)
    {
      puts ("timerfd_create(TFD_NONBLOCK) failed");
      return 1;
    }
  fl = fcntl (fd, F_GETFL);
  if (fl == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if ((fl & O_NONBLOCK) == 0)
    {
      puts ("timerfd_create(TFD_NONBLOCK) set non-blocking mode");
      return 1;
    }
  close (fd);

  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:29 -07:00
Ulrich Drepper
e7d476dfdf flag parameters: NONBLOCK in eventfd
This patch adds support for the EFD_NONBLOCK flag to eventfd2.  The
additional changes needed are minimal.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>

#ifndef __NR_eventfd2
# ifdef __x86_64__
#  define __NR_eventfd2 290
# elif defined __i386__
#  define __NR_eventfd2 328
# else
#  error "need __NR_eventfd2"
# endif
#endif

#define EFD_NONBLOCK O_NONBLOCK

int
main (void)
{
  int fd = syscall (__NR_eventfd2, 1, 0);
  if (fd == -1)
    {
      puts ("eventfd2(0) failed");
      return 1;
    }
  int fl = fcntl (fd, F_GETFL);
  if (fl == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if (fl & O_NONBLOCK)
    {
      puts ("eventfd2(0) sets non-blocking mode");
      return 1;
    }
  close (fd);

  fd = syscall (__NR_eventfd2, 1, EFD_NONBLOCK);
  if (fd == -1)
    {
      puts ("eventfd2(EFD_NONBLOCK) failed");
      return 1;
    }
  fl = fcntl (fd, F_GETFL);
  if (fl == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if ((fl & O_NONBLOCK) == 0)
    {
      puts ("eventfd2(EFD_NONBLOCK) does not set non-blocking mode");
      return 1;
    }
  close (fd);

  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:29 -07:00
Ulrich Drepper
5fb5e04926 flag parameters: NONBLOCK in signalfd
This patch adds support for the SFD_NONBLOCK flag to signalfd4.  The
additional changes needed are minimal.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>

#ifndef __NR_signalfd4
# ifdef __x86_64__
#  define __NR_signalfd4 289
# elif defined __i386__
#  define __NR_signalfd4 327
# else
#  error "need __NR_signalfd4"
# endif
#endif

#define SFD_NONBLOCK O_NONBLOCK

int
main (void)
{
  sigset_t ss;
  sigemptyset (&ss);
  sigaddset (&ss, SIGUSR1);
  int fd = syscall (__NR_signalfd4, -1, &ss, 8, 0);
  if (fd == -1)
    {
      puts ("signalfd4(0) failed");
      return 1;
    }
  int fl = fcntl (fd, F_GETFL);
  if (fl == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if (fl & O_NONBLOCK)
    {
      puts ("signalfd4(0) set non-blocking mode");
      return 1;
    }
  close (fd);

  fd = syscall (__NR_signalfd4, -1, &ss, 8, SFD_NONBLOCK);
  if (fd == -1)
    {
      puts ("signalfd4(SFD_NONBLOCK) failed");
      return 1;
    }
  fl = fcntl (fd, F_GETFL);
  if (fl == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if ((fl & O_NONBLOCK) == 0)
    {
      puts ("signalfd4(SFD_NONBLOCK) does not set non-blocking mode");
      return 1;
    }
  close (fd);

  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:29 -07:00
Ulrich Drepper
77d2720059 flag parameters: NONBLOCK in socket and socketpair
This patch introduces support for the SOCK_NONBLOCK flag in socket,
socketpair, and  paccept.  To do this the internal function sock_attach_fd
gets an additional parameter which it uses to set the appropriate flag for
the file descriptor.

Given that in modern, scalable programs almost all socket connections are
non-blocking and the minimal additional cost for the new functionality
I see no reason not to add this code.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <sys/syscall.h>

#ifndef __NR_paccept
# ifdef __x86_64__
#  define __NR_paccept 288
# elif defined __i386__
#  define SYS_PACCEPT 18
#  define USE_SOCKETCALL 1
# else
#  error "need __NR_paccept"
# endif
#endif

#ifdef USE_SOCKETCALL
# define paccept(fd, addr, addrlen, mask, flags) \
  ({ long args[6] = { \
       (long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \
     syscall (__NR_socketcall, SYS_PACCEPT, args); })
#else
# define paccept(fd, addr, addrlen, mask, flags) \
  syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags)
#endif

#define PORT 57392

#define SOCK_NONBLOCK O_NONBLOCK

static pthread_barrier_t b;

static void *
tf (void *arg)
{
  pthread_barrier_wait (&b);
  int s = socket (AF_INET, SOCK_STREAM, 0);
  struct sockaddr_in sin;
  sin.sin_family = AF_INET;
  sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
  sin.sin_port = htons (PORT);
  connect (s, (const struct sockaddr *) &sin, sizeof (sin));
  close (s);
  pthread_barrier_wait (&b);

  pthread_barrier_wait (&b);
  s = socket (AF_INET, SOCK_STREAM, 0);
  sin.sin_port = htons (PORT);
  connect (s, (const struct sockaddr *) &sin, sizeof (sin));
  close (s);
  pthread_barrier_wait (&b);

  return NULL;
}

int
main (void)
{
  int fd;
  fd = socket (PF_INET, SOCK_STREAM, 0);
  if (fd == -1)
    {
      puts ("socket(0) failed");
      return 1;
    }
  int fl = fcntl (fd, F_GETFL);
  if (fl == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if (fl & O_NONBLOCK)
    {
      puts ("socket(0) set non-blocking mode");
      return 1;
    }
  close (fd);

  fd = socket (PF_INET, SOCK_STREAM|SOCK_NONBLOCK, 0);
  if (fd == -1)
    {
      puts ("socket(SOCK_NONBLOCK) failed");
      return 1;
    }
  fl = fcntl (fd, F_GETFL);
  if (fl == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if ((fl & O_NONBLOCK) == 0)
    {
      puts ("socket(SOCK_NONBLOCK) does not set non-blocking mode");
      return 1;
    }
  close (fd);

  int fds[2];
  if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1)
    {
      puts ("socketpair(0) failed");
      return 1;
    }
  for (int i = 0; i < 2; ++i)
    {
      fl = fcntl (fds[i], F_GETFL);
      if (fl == -1)
        {
          puts ("fcntl failed");
          return 1;
        }
      if (fl & O_NONBLOCK)
        {
          printf ("socketpair(0) set non-blocking mode for fds[%d]\n", i);
          return 1;
        }
      close (fds[i]);
    }

  if (socketpair (PF_UNIX, SOCK_STREAM|SOCK_NONBLOCK, 0, fds) == -1)
    {
      puts ("socketpair(SOCK_NONBLOCK) failed");
      return 1;
    }
  for (int i = 0; i < 2; ++i)
    {
      fl = fcntl (fds[i], F_GETFL);
      if (fl == -1)
        {
          puts ("fcntl failed");
          return 1;
        }
      if ((fl & O_NONBLOCK) == 0)
        {
          printf ("socketpair(SOCK_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i);
          return 1;
        }
      close (fds[i]);
    }

  pthread_barrier_init (&b, NULL, 2);

  struct sockaddr_in sin;
  pthread_t th;
  if (pthread_create (&th, NULL, tf, NULL) != 0)
    {
      puts ("pthread_create failed");
      return 1;
    }

  int s = socket (AF_INET, SOCK_STREAM, 0);
  int reuse = 1;
  setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse));
  sin.sin_family = AF_INET;
  sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
  sin.sin_port = htons (PORT);
  bind (s, (struct sockaddr *) &sin, sizeof (sin));
  listen (s, SOMAXCONN);

  pthread_barrier_wait (&b);

  int s2 = paccept (s, NULL, 0, NULL, 0);
  if (s2 < 0)
    {
      puts ("paccept(0) failed");
      return 1;
    }

  fl = fcntl (s2, F_GETFL);
  if (fl & O_NONBLOCK)
    {
      puts ("paccept(0) set non-blocking mode");
      return 1;
    }
  close (s2);
  close (s);

  pthread_barrier_wait (&b);

  s = socket (AF_INET, SOCK_STREAM, 0);
  sin.sin_port = htons (PORT);
  setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse));
  bind (s, (struct sockaddr *) &sin, sizeof (sin));
  listen (s, SOMAXCONN);

  pthread_barrier_wait (&b);

  s2 = paccept (s, NULL, 0, NULL, SOCK_NONBLOCK);
  if (s2 < 0)
    {
      puts ("paccept(SOCK_NONBLOCK) failed");
      return 1;
    }

  fl = fcntl (s2, F_GETFL);
  if ((fl & O_NONBLOCK) == 0)
    {
      puts ("paccept(SOCK_NONBLOCK) does not set non-blocking mode");
      return 1;
    }
  close (s2);
  close (s);

  pthread_barrier_wait (&b);
  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:29 -07:00
Ulrich Drepper
4006553b06 flag parameters: inotify_init
This patch introduces the new syscall inotify_init1 (note: the 1 stands for
the one parameter the syscall takes, as opposed to no parameter before).  The
values accepted for this parameter are function-specific and defined in the
inotify.h header.  Here the values must match the O_* flags, though.  In this
patch CLOEXEC support is introduced.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>

#ifndef __NR_inotify_init1
# ifdef __x86_64__
#  define __NR_inotify_init1 294
# elif defined __i386__
#  define __NR_inotify_init1 332
# else
#  error "need __NR_inotify_init1"
# endif
#endif

#define IN_CLOEXEC O_CLOEXEC

int
main (void)
{
  int fd;
  fd = syscall (__NR_inotify_init1, 0);
  if (fd == -1)
    {
      puts ("inotify_init1(0) failed");
      return 1;
    }
  int coe = fcntl (fd, F_GETFD);
  if (coe == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if (coe & FD_CLOEXEC)
    {
      puts ("inotify_init1(0) set close-on-exit");
      return 1;
    }
  close (fd);

  fd = syscall (__NR_inotify_init1, IN_CLOEXEC);
  if (fd == -1)
    {
      puts ("inotify_init1(IN_CLOEXEC) failed");
      return 1;
    }
  coe = fcntl (fd, F_GETFD);
  if (coe == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if ((coe & FD_CLOEXEC) == 0)
    {
      puts ("inotify_init1(O_CLOEXEC) does not set close-on-exit");
      return 1;
    }
  close (fd);

  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[akpm@linux-foundation.org: add sys_ni stub]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:28 -07:00
Ulrich Drepper
ed8cae8ba0 flag parameters: pipe
This patch introduces the new syscall pipe2 which is like pipe but it also
takes an additional parameter which takes a flag value.  This patch implements
the handling of O_CLOEXEC for the flag.  I did not add support for the new
syscall for the architectures which have a special sys_pipe implementation.  I
think the maintainers of those archs have the chance to go with the unified
implementation but that's up to them.

The implementation introduces do_pipe_flags.  I did that instead of changing
all callers of do_pipe because some of the callers are written in assembler.
I would probably screw up changing the assembly code.  To avoid breaking code
do_pipe is now a small wrapper around do_pipe_flags.  Once all callers are
changed over to do_pipe_flags the old do_pipe function can be removed.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>

#ifndef __NR_pipe2
# ifdef __x86_64__
#  define __NR_pipe2 293
# elif defined __i386__
#  define __NR_pipe2 331
# else
#  error "need __NR_pipe2"
# endif
#endif

int
main (void)
{
  int fd[2];
  if (syscall (__NR_pipe2, fd, 0) != 0)
    {
      puts ("pipe2(0) failed");
      return 1;
    }
  for (int i = 0; i < 2; ++i)
    {
      int coe = fcntl (fd[i], F_GETFD);
      if (coe == -1)
        {
          puts ("fcntl failed");
          return 1;
        }
      if (coe & FD_CLOEXEC)
        {
          printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
          return 1;
        }
    }
  close (fd[0]);
  close (fd[1]);

  if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
    {
      puts ("pipe2(O_CLOEXEC) failed");
      return 1;
    }
  for (int i = 0; i < 2; ++i)
    {
      int coe = fcntl (fd[i], F_GETFD);
      if (coe == -1)
        {
          puts ("fcntl failed");
          return 1;
        }
      if ((coe & FD_CLOEXEC) == 0)
        {
          printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
          return 1;
        }
    }
  close (fd[0]);
  close (fd[1]);

  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:28 -07:00
Ulrich Drepper
336dd1f70f flag parameters: dup2
This patch adds the new dup3 syscall.  It extends the old dup2 syscall by one
parameter which is meant to hold a flag value.  Support for the O_CLOEXEC flag
is added in this patch.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <sys/syscall.h>

#ifndef __NR_dup3
# ifdef __x86_64__
#  define __NR_dup3 292
# elif defined __i386__
#  define __NR_dup3 330
# else
#  error "need __NR_dup3"
# endif
#endif

int
main (void)
{
  int fd = syscall (__NR_dup3, 1, 4, 0);
  if (fd == -1)
    {
      puts ("dup3(0) failed");
      return 1;
    }
  int coe = fcntl (fd, F_GETFD);
  if (coe == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if (coe & FD_CLOEXEC)
    {
      puts ("dup3(0) set close-on-exec flag");
      return 1;
    }
  close (fd);

  fd = syscall (__NR_dup3, 1, 4, O_CLOEXEC);
  if (fd == -1)
    {
      puts ("dup3(O_CLOEXEC) failed");
      return 1;
    }
  coe = fcntl (fd, F_GETFD);
  if (coe == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if ((coe & FD_CLOEXEC) == 0)
    {
      puts ("dup3(O_CLOEXEC) set close-on-exec flag");
      return 1;
    }
  close (fd);

  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:28 -07:00
Ulrich Drepper
a0998b50c3 flag parameters: epoll_create
This patch adds the new epoll_create2 syscall.  It extends the old epoll_create
syscall by one parameter which is meant to hold a flag value.  In this
patch the only flag support is EPOLL_CLOEXEC which causes the close-on-exec
flag for the returned file descriptor to be set.

A new name EPOLL_CLOEXEC is introduced which in this implementation must
have the same value as O_CLOEXEC.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <sys/syscall.h>

#ifndef __NR_epoll_create2
# ifdef __x86_64__
#  define __NR_epoll_create2 291
# elif defined __i386__
#  define __NR_epoll_create2 329
# else
#  error "need __NR_epoll_create2"
# endif
#endif

#define EPOLL_CLOEXEC O_CLOEXEC

int
main (void)
{
  int fd = syscall (__NR_epoll_create2, 1, 0);
  if (fd == -1)
    {
      puts ("epoll_create2(0) failed");
      return 1;
    }
  int coe = fcntl (fd, F_GETFD);
  if (coe == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if (coe & FD_CLOEXEC)
    {
      puts ("epoll_create2(0) set close-on-exec flag");
      return 1;
    }
  close (fd);

  fd = syscall (__NR_epoll_create2, 1, EPOLL_CLOEXEC);
  if (fd == -1)
    {
      puts ("epoll_create2(EPOLL_CLOEXEC) failed");
      return 1;
    }
  coe = fcntl (fd, F_GETFD);
  if (coe == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if ((coe & FD_CLOEXEC) == 0)
    {
      puts ("epoll_create2(EPOLL_CLOEXEC) set close-on-exec flag");
      return 1;
    }
  close (fd);

  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:28 -07:00
Ulrich Drepper
11fcb6c146 flag parameters: timerfd_create
The timerfd_create syscall already has a flags parameter.  It just is
unused so far.  This patch changes this by introducing the TFD_CLOEXEC
flag to set the close-on-exec flag for the returned file descriptor.

A new name TFD_CLOEXEC is introduced which in this implementation must
have the same value as O_CLOEXEC.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <sys/syscall.h>

#ifndef __NR_timerfd_create
# ifdef __x86_64__
#  define __NR_timerfd_create 283
# elif defined __i386__
#  define __NR_timerfd_create 322
# else
#  error "need __NR_timerfd_create"
# endif
#endif

#define TFD_CLOEXEC O_CLOEXEC

int
main (void)
{
  int fd = syscall (__NR_timerfd_create, CLOCK_REALTIME, 0);
  if (fd == -1)
    {
      puts ("timerfd_create(0) failed");
      return 1;
    }
  int coe = fcntl (fd, F_GETFD);
  if (coe == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if (coe & FD_CLOEXEC)
    {
      puts ("timerfd_create(0) set close-on-exec flag");
      return 1;
    }
  close (fd);

  fd = syscall (__NR_timerfd_create, CLOCK_REALTIME, TFD_CLOEXEC);
  if (fd == -1)
    {
      puts ("timerfd_create(TFD_CLOEXEC) failed");
      return 1;
    }
  coe = fcntl (fd, F_GETFD);
  if (coe == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if ((coe & FD_CLOEXEC) == 0)
    {
      puts ("timerfd_create(TFD_CLOEXEC) set close-on-exec flag");
      return 1;
    }
  close (fd);

  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:27 -07:00
Ulrich Drepper
b087498eb5 flag parameters: eventfd
This patch adds the new eventfd2 syscall.  It extends the old eventfd
syscall by one parameter which is meant to hold a flag value.  In this
patch the only flag support is EFD_CLOEXEC which causes the close-on-exec
flag for the returned file descriptor to be set.

A new name EFD_CLOEXEC is introduced which in this implementation must
have the same value as O_CLOEXEC.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>

#ifndef __NR_eventfd2
# ifdef __x86_64__
#  define __NR_eventfd2 290
# elif defined __i386__
#  define __NR_eventfd2 328
# else
#  error "need __NR_eventfd2"
# endif
#endif

#define EFD_CLOEXEC O_CLOEXEC

int
main (void)
{
  int fd = syscall (__NR_eventfd2, 1, 0);
  if (fd == -1)
    {
      puts ("eventfd2(0) failed");
      return 1;
    }
  int coe = fcntl (fd, F_GETFD);
  if (coe == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if (coe & FD_CLOEXEC)
    {
      puts ("eventfd2(0) sets close-on-exec flag");
      return 1;
    }
  close (fd);

  fd = syscall (__NR_eventfd2, 1, EFD_CLOEXEC);
  if (fd == -1)
    {
      puts ("eventfd2(EFD_CLOEXEC) failed");
      return 1;
    }
  coe = fcntl (fd, F_GETFD);
  if (coe == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if ((coe & FD_CLOEXEC) == 0)
    {
      puts ("eventfd2(EFD_CLOEXEC) does not set close-on-exec flag");
      return 1;
    }
  close (fd);

  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[akpm@linux-foundation.org: add sys_ni stub]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:27 -07:00
Ulrich Drepper
9deb27baed flag parameters: signalfd
This patch adds the new signalfd4 syscall.  It extends the old signalfd
syscall by one parameter which is meant to hold a flag value.  In this
patch the only flag support is SFD_CLOEXEC which causes the close-on-exec
flag for the returned file descriptor to be set.

A new name SFD_CLOEXEC is introduced which in this implementation must
have the same value as O_CLOEXEC.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>

#ifndef __NR_signalfd4
# ifdef __x86_64__
#  define __NR_signalfd4 289
# elif defined __i386__
#  define __NR_signalfd4 327
# else
#  error "need __NR_signalfd4"
# endif
#endif

#define SFD_CLOEXEC O_CLOEXEC

int
main (void)
{
  sigset_t ss;
  sigemptyset (&ss);
  sigaddset (&ss, SIGUSR1);
  int fd = syscall (__NR_signalfd4, -1, &ss, 8, 0);
  if (fd == -1)
    {
      puts ("signalfd4(0) failed");
      return 1;
    }
  int coe = fcntl (fd, F_GETFD);
  if (coe == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if (coe & FD_CLOEXEC)
    {
      puts ("signalfd4(0) set close-on-exec flag");
      return 1;
    }
  close (fd);

  fd = syscall (__NR_signalfd4, -1, &ss, 8, SFD_CLOEXEC);
  if (fd == -1)
    {
      puts ("signalfd4(SFD_CLOEXEC) failed");
      return 1;
    }
  coe = fcntl (fd, F_GETFD);
  if (coe == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if ((coe & FD_CLOEXEC) == 0)
    {
      puts ("signalfd4(SFD_CLOEXEC) does not set close-on-exec flag");
      return 1;
    }
  close (fd);

  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[akpm@linux-foundation.org: add sys_ni stub]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:27 -07:00
Ulrich Drepper
7d9dbca342 flag parameters: anon_inode_getfd extension
This patch just extends the anon_inode_getfd interface to take an additional
parameter with a flag value.  The flag value is passed on to
get_unused_fd_flags in anticipation for a use with the O_CLOEXEC flag.

No actual semantic changes here, the changed callers all pass 0 for now.

[akpm@linux-foundation.org: KVM fix]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:27 -07:00
Ulrich Drepper
c019bbc612 flag parameters: paccept w/out set_restore_sigmask
Some platforms do not have support to restore the signal mask in the
return path from a syscall.  For those platforms syscalls like pselect are
not defined at all.  This is, I think, not a good choice for paccept()
since paccept() adds more value on top of accept() than just the signal
mask handling.

Therefore this patch defines a scaled down version of the sys_paccept
function for those platforms.  It returns -EINVAL in case the signal mask
is non-NULL but behaves the same otherwise.

Note that I explicitly included <linux/thread_info.h>.  I saw that it is
currently included but indirectly two levels down.  There is too much risk
in relying on this.  The header might change and then suddenly the
function definition would change without anyone immediately noticing.

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:27 -07:00
Ulrich Drepper
aaca0bdca5 flag parameters: paccept
This patch is by far the most complex in the series.  It adds a new syscall
paccept.  This syscall differs from accept in that it adds (at the userlevel)
two additional parameters:

- a signal mask
- a flags value

The flags parameter can be used to set flag like SOCK_CLOEXEC.  This is
imlpemented here as well.  Some people argued that this is a property which
should be inherited from the file desriptor for the server but this is against
POSIX.  Additionally, we really want the signal mask parameter as well
(similar to pselect, ppoll, etc).  So an interface change in inevitable.

The flag value is the same as for socket and socketpair.  I think diverging
here will only create confusion.  Similar to the filesystem interfaces where
the use of the O_* constants differs, it is acceptable here.

The signal mask is handled as for pselect etc.  The mask is temporarily
installed for the thread and removed before the call returns.  I modeled the
code after pselect.  If there is a problem it's likely also in pselect.

For architectures which use socketcall I maintained this interface instead of
adding a system call.  The symmetry shouldn't be broken.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <sys/syscall.h>

#ifndef __NR_paccept
# ifdef __x86_64__
#  define __NR_paccept 288
# elif defined __i386__
#  define SYS_PACCEPT 18
#  define USE_SOCKETCALL 1
# else
#  error "need __NR_paccept"
# endif
#endif

#ifdef USE_SOCKETCALL
# define paccept(fd, addr, addrlen, mask, flags) \
  ({ long args[6] = { \
       (long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \
     syscall (__NR_socketcall, SYS_PACCEPT, args); })
#else
# define paccept(fd, addr, addrlen, mask, flags) \
  syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags)
#endif

#define PORT 57392

#define SOCK_CLOEXEC O_CLOEXEC

static pthread_barrier_t b;

static void *
tf (void *arg)
{
  pthread_barrier_wait (&b);
  int s = socket (AF_INET, SOCK_STREAM, 0);
  struct sockaddr_in sin;
  sin.sin_family = AF_INET;
  sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
  sin.sin_port = htons (PORT);
  connect (s, (const struct sockaddr *) &sin, sizeof (sin));
  close (s);

  pthread_barrier_wait (&b);
  s = socket (AF_INET, SOCK_STREAM, 0);
  sin.sin_port = htons (PORT);
  connect (s, (const struct sockaddr *) &sin, sizeof (sin));
  close (s);
  pthread_barrier_wait (&b);

  pthread_barrier_wait (&b);
  sleep (2);
  pthread_kill ((pthread_t) arg, SIGUSR1);

  return NULL;
}

static void
handler (int s)
{
}

int
main (void)
{
  pthread_barrier_init (&b, NULL, 2);

  struct sockaddr_in sin;
  pthread_t th;
  if (pthread_create (&th, NULL, tf, (void *) pthread_self ()) != 0)
    {
      puts ("pthread_create failed");
      return 1;
    }

  int s = socket (AF_INET, SOCK_STREAM, 0);
  int reuse = 1;
  setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse));
  sin.sin_family = AF_INET;
  sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
  sin.sin_port = htons (PORT);
  bind (s, (struct sockaddr *) &sin, sizeof (sin));
  listen (s, SOMAXCONN);

  pthread_barrier_wait (&b);

  int s2 = paccept (s, NULL, 0, NULL, 0);
  if (s2 < 0)
    {
      puts ("paccept(0) failed");
      return 1;
    }

  int coe = fcntl (s2, F_GETFD);
  if (coe & FD_CLOEXEC)
    {
      puts ("paccept(0) set close-on-exec-flag");
      return 1;
    }
  close (s2);

  pthread_barrier_wait (&b);

  s2 = paccept (s, NULL, 0, NULL, SOCK_CLOEXEC);
  if (s2 < 0)
    {
      puts ("paccept(SOCK_CLOEXEC) failed");
      return 1;
    }

  coe = fcntl (s2, F_GETFD);
  if ((coe & FD_CLOEXEC) == 0)
    {
      puts ("paccept(SOCK_CLOEXEC) does not set close-on-exec flag");
      return 1;
    }
  close (s2);

  pthread_barrier_wait (&b);

  struct sigaction sa;
  sa.sa_handler = handler;
  sa.sa_flags = 0;
  sigemptyset (&sa.sa_mask);
  sigaction (SIGUSR1, &sa, NULL);

  sigset_t ss;
  pthread_sigmask (SIG_SETMASK, NULL, &ss);
  sigaddset (&ss, SIGUSR1);
  pthread_sigmask (SIG_SETMASK, &ss, NULL);

  sigdelset (&ss, SIGUSR1);
  alarm (4);
  pthread_barrier_wait (&b);

  errno = 0 ;
  s2 = paccept (s, NULL, 0, &ss, 0);
  if (s2 != -1 || errno != EINTR)
    {
      puts ("paccept did not fail with EINTR");
      return 1;
    }

  close (s);

  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[akpm@linux-foundation.org: make it compile]
[akpm@linux-foundation.org: add sys_ni stub]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Roland McGrath <roland@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:27 -07:00
Ulrich Drepper
a677a039be flag parameters: socket and socketpair
This patch adds support for flag values which are ORed to the type passwd
to socket and socketpair.  The additional code is minimal.  The flag
values in this implementation can and must match the O_* flags.  This
avoids overhead in the conversion.

The internal functions sock_alloc_fd and sock_map_fd get a new parameters
and all callers are changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <netinet/in.h>
#include <sys/socket.h>

#define PORT 57392

/* For Linux these must be the same.  */
#define SOCK_CLOEXEC O_CLOEXEC

int
main (void)
{
  int fd;
  fd = socket (PF_INET, SOCK_STREAM, 0);
  if (fd == -1)
    {
      puts ("socket(0) failed");
      return 1;
    }
  int coe = fcntl (fd, F_GETFD);
  if (coe == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if (coe & FD_CLOEXEC)
    {
      puts ("socket(0) set close-on-exec flag");
      return 1;
    }
  close (fd);

  fd = socket (PF_INET, SOCK_STREAM|SOCK_CLOEXEC, 0);
  if (fd == -1)
    {
      puts ("socket(SOCK_CLOEXEC) failed");
      return 1;
    }
  coe = fcntl (fd, F_GETFD);
  if (coe == -1)
    {
      puts ("fcntl failed");
      return 1;
    }
  if ((coe & FD_CLOEXEC) == 0)
    {
      puts ("socket(SOCK_CLOEXEC) does not set close-on-exec flag");
      return 1;
    }
  close (fd);

  int fds[2];
  if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1)
    {
      puts ("socketpair(0) failed");
      return 1;
    }
  for (int i = 0; i < 2; ++i)
    {
      coe = fcntl (fds[i], F_GETFD);
      if (coe == -1)
        {
          puts ("fcntl failed");
          return 1;
        }
      if (coe & FD_CLOEXEC)
        {
          printf ("socketpair(0) set close-on-exec flag for fds[%d]\n", i);
          return 1;
        }
      close (fds[i]);
    }

  if (socketpair (PF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, fds) == -1)
    {
      puts ("socketpair(SOCK_CLOEXEC) failed");
      return 1;
    }
  for (int i = 0; i < 2; ++i)
    {
      coe = fcntl (fds[i], F_GETFD);
      if (coe == -1)
        {
          puts ("fcntl failed");
          return 1;
        }
      if ((coe & FD_CLOEXEC) == 0)
        {
          printf ("socketpair(SOCK_CLOEXEC) does not set close-on-exec flag for fds[%d]\n", i);
          return 1;
        }
      close (fds[i]);
    }

  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:27 -07:00
Adrian Bunk
f606ddf42f remove the v850 port
Trying to compile the v850 port brings many compile errors, one of them exists
since at least kernel 2.6.19.

There also seems to be noone willing to bring this port back into a usable
state.

This patch therefore removes the v850 port.

If anyone ever decides to revive the v850 port the code will still be
available from older kernels, and it wouldn't be impossible for the port to
reenter the kernel if it would become actively maintained again.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:24 -07:00
Shaohua Li
bdfe6b7c68 pm: acpi hibernation: utilize hardware signature
ACPI defines a hardware signature.  BIOS calculates the signature according to
hardware configure and if hardware changes while hibernated, the signature
will change.  In that case, S4 resume should fail.

Still, there may be systems on which this mechanism does not work correctly,
so it is better to provide a workaround for them.  For this reason, add a new
switch to the acpi_sleep= command line argument allowing one to disable
hardware signature checking.

[shaohua.li@intel.com: build fix]
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: <Valdis.Kletnieks@vt.edu>
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:24 -07:00
Zhang Rui
c1a220e7ac pm: introduce new interfaces schedule_work_on() and queue_work_on()
This interface allows adding a job on a specific cpu.

Although a work struct on a cpu will be scheduled to other cpu if the cpu
dies, there is a recursion if a work task tries to offline the cpu it's
running on.  we need to schedule the task to a specific cpu in this case.
http://bugzilla.kernel.org/show_bug.cgi?id=10897

[oleg@tv-sign.ru: cleanups]
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Rus <harbour@sfinx.od.ua>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:23 -07:00
Alan Stern
8111d1b552 pm: add new PM_EVENT codes for runtime power transitions
This patch (as1112) adds some new PM_EVENT_* codes for use by kernel
subsystems.  They describe runtime power-state transitions of the sort already
implemented by the USB subsystem.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:23 -07:00
Rafael J. Wysocki
8c363265d5 pm: drop unnecessary includes from pm.h
Drop unnecessary includes from include/linux/pm.h .

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:23 -07:00
Rafael J. Wysocki
e7ecb331e1 pm: remove remaining obsolete definitions from pm.h
Remove the remaining obsolete definitions from include/linux/pm.h and move
the definitions of PM_SUSPEND and PM_RESUME to the header of h3600 which
is the only user of them.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:22 -07:00
Rafael J. Wysocki
558481f038 pm: remove definition of struct pm_dev
Remove the definition of 'struct pm_dev', which is not used any more,
along with some related stuff from include/linux/pm.h .

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:22 -07:00
Adrian Bunk
d75f65fd24 remove include/linux/pm_legacy.h
Remove the obsolete and no longer used include/linux/pm_legacy.h

Reviewed-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Pavel Machek <pavel@suse.cz>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:22 -07:00
Hugh Dickins
9b3e43a747 security: remove unused forwards
Why would linux/security.h need forward declarations for nfsctl_arg and
swap_info_struct?  It's hard to imagine: remove them.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:22 -07:00
Andrew G. Morgan
5459c164f0 security: protect legacy applications from executing with insufficient privilege
When cap_bset suppresses some of the forced (fP) capabilities of a file,
it is generally only safe to execute the program if it understands how to
recognize it doesn't have enough privilege to work correctly.  For legacy
applications (fE!=0), which have no non-destructive way to determine that
they are missing privilege, we fail to execute (EPERM) any executable that
requires fP capabilities, but would otherwise get pP' < fP.  This is a
fail-safe permission check.

For some discussion of why it is problematic for (legacy) privileged
applications to run with less than the set of capabilities requested for
them, see:

 http://userweb.kernel.org/~morgan/sendmail-capabilities-war-story.html

With this iteration of this support, we do not include setuid-0 based
privilege protection from the bounding set.  That is, the admin can still
(ab)use the bounding set to suppress the privileges of a setuid-0 program.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:22 -07:00
Gerald Schaefer
83d1674a94 mm: make CONFIG_MIGRATION available w/o CONFIG_NUMA
We'd like to support CONFIG_MEMORY_HOTREMOVE on s390, which depends on
CONFIG_MIGRATION.  So far, CONFIG_MIGRATION is only available with NUMA
support.

This patch makes CONFIG_MIGRATION selectable for architectures that define
ARCH_ENABLE_MEMORY_HOTREMOVE.  When MIGRATION is enabled w/o NUMA, the
kernel won't compile because migrate_vmas() does not know about
vm_ops->migrate() and vma_migratable() does not know about policy_zone.
To fix this, those two functions can be restricted to '#ifdef CONFIG_NUMA'
because they are not being used w/o NUMA.  vma_migratable() is moved over
from migrate.h to mempolicy.h.

[kosaki.motohiro@jp.fujitsu.com: build fix]
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: KOSAKI Motorhiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:21 -07:00
Milton Miller
9ca908f47b kcalloc: remove runtime division
While in all cases in the kernel we know the size of the elements to be
created, we don't always know the count of elements.  By commuting the size
and count in the overflow check, the compiler can reduce the runtime division
of size_t with a compare to a (unique) constant in these cases.

Signed-off-by: Milton Miller <miltonm@bga.com>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:21 -07:00
Badari Pulavarty
5c755e9fd8 memory-hotplug: add sysfs removable attribute for hotplug memory remove
Memory may be hot-removed on a per-memory-block basis, particularly on
POWER where the SPARSEMEM section size often matches the memory-block
size.  A user-level agent must be able to identify which sections of
memory are likely to be removable before attempting the potentially
expensive operation.  This patch adds a file called "removable" to the
memory directory in sysfs to help such an agent.  In this patch, a memory
block is considered removable if;

o It contains only MOVABLE pageblocks
o It contains only pageblocks with free pages regardless of pageblock type

On the other hand, a memory block starting with a PageReserved() page will
never be considered removable.  Without this patch, the user-agent is
forced to choose a memory block to remove randomly.

Sample output of the sysfs files:

./memory/memory0/removable: 0
./memory/memory1/removable: 0
./memory/memory2/removable: 0
./memory/memory3/removable: 0
./memory/memory4/removable: 0
./memory/memory5/removable: 0
./memory/memory6/removable: 0
./memory/memory7/removable: 1
./memory/memory8/removable: 0
./memory/memory9/removable: 0
./memory/memory10/removable: 0
./memory/memory11/removable: 0
./memory/memory12/removable: 0
./memory/memory13/removable: 0
./memory/memory14/removable: 0
./memory/memory15/removable: 0
./memory/memory16/removable: 0
./memory/memory17/removable: 1
./memory/memory18/removable: 1
./memory/memory19/removable: 1
./memory/memory20/removable: 1
./memory/memory21/removable: 1
./memory/memory22/removable: 1

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:21 -07:00
Yasunori Goto
af370fb8cb memory hotplug: small fixes to bootmem freeing for memory hotremove
- Change some naming
  * Magic -> types
  * MIX_INFO -> MIX_SECTION_INFO
  * Change definition of bootmem type from direct hex value

- __free_pages_bootmem() becomes __meminit.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:21 -07:00
Andrea Righi
27ac792ca0 PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures
On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
boundary. For example:

	u64 val = PAGE_ALIGN(size);

always returns a value < 4GB even if size is greater than 4GB.

The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
example):

#define PAGE_SHIFT      12
#define PAGE_SIZE       (_AC(1,UL) << PAGE_SHIFT)
#define PAGE_MASK       (~(PAGE_SIZE-1))
...
#define PAGE_ALIGN(addr)       (((addr)+PAGE_SIZE-1)&PAGE_MASK)

The "~" is performed on a 32-bit value, so everything in "and" with
PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
Using the ALIGN() macro seems to be the right way, because it uses
typeof(addr) for the mask.

Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
include/linux/mm.h.

See also lkml discussion: http://lkml.org/lkml/2008/6/11/237

[akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c]
[akpm@linux-foundation.org: fix v850]
[akpm@linux-foundation.org: fix powerpc]
[akpm@linux-foundation.org: fix arm]
[akpm@linux-foundation.org: fix mips]
[akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c]
[akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c]
[akpm@linux-foundation.org: fix powerpc]
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:21 -07:00
Timur Tabi
2be0ffe2b2 mm: add alloc_pages_exact() and free_pages_exact()
alloc_pages_exact() is similar to alloc_pages(), except that it allocates
the minimum number of pages to fulfill the request.  This is useful if you
want to allocate a very large buffer that is slightly larger than an even
power-of-two number of pages.  In that case, alloc_pages() will waste a
lot of memory.

I have a video driver that wants to allocate a 5MB buffer.  alloc_pages()
wiill waste 3MB of physically-contiguous memory.

Signed-off-by: Timur Tabi <timur@freescale.com>
Cc: Andi Kleen <andi@firstfloor.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:20 -07:00
Johannes Weiner
3560e249ab bootmem: replace node_boot_start in struct bootmem_data
Almost all users of this field need a PFN instead of a physical address,
so replace node_boot_start with node_min_pfn.

[Lee.Schermerhorn@hp.com: fix spurious BUG_ON() in mark_bootmem()]
Signed-off-by: Johannes Weiner <hannes@saeureba.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:20 -07:00
Johannes Weiner
5f2809e69c bootmem: clean up alloc_bootmem_core
alloc_bootmem_core has become quite nasty to read over time.  This is a
clean rewrite that keeps the semantics.

bdata->last_pos has been dropped.

bdata->last_success has been renamed to hint_idx and it is now an index
relative to the node's range.  Since further block searching might start
at this index, it is now set to the end of a succeeded allocation rather
than its beginning.

bdata->last_offset has been renamed to last_end_off to be more clear that
it represents the ending address of the last allocation relative to the
node.

[y-goto@jp.fujitsu.com: fix new alloc_bootmem_core()]
Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:20 -07:00
Johannes Weiner
223e8dc924 bootmem: reorder code to match new bootmem structure
This only reorders functions so that further patches will be easier to
read.  No code changed.

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:19 -07:00
Jon Tollefson
53ba51d21d hugetlb: allow arch overridden hugepage allocation
Allow alloc_bootmem_huge_page() to be overridden by architectures that
can't always use bootmem.  This requires huge_boot_pages to be available
for use by this function.

This is required for powerpc 16G pages, which have to be reserved prior to
boot-time.  The location of these pages are indicated in the device tree.

Acked-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:19 -07:00
Andi Kleen
ceb8687961 hugetlb: introduce pud_huge
Straight forward extensions for huge pages located in the PUD instead of
PMDs.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:18 -07:00
Andi Kleen
b54bbf7b81 mm: introduce non panic alloc_bootmem
Straight forward variant of the existing __alloc_bootmem_node, only
subsequent patch when allocating giant hugepages at boot -- don't want to
panic if we can't allocate as many as the user asked for.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:17 -07:00
Nishanth Aravamudan
a343787016 hugetlb: new sysfs interface
Provide new hugepages user APIs that are more suited to multiple hstates
in sysfs.  There is a new directory, /sys/kernel/hugepages.  Underneath
that directory there will be a directory per-supported hugepage size,
e.g.:

/sys/kernel/hugepages/hugepages-64kB
/sys/kernel/hugepages/hugepages-16384kB
/sys/kernel/hugepages/hugepages-16777216kB

corresponding to 64k, 16m and 16g respectively.  Within each
hugepages-size directory there are a number of files, corresponding to the
tracked counters in the hstate, e.g.:

/sys/kernel/hugepages/hugepages-64/nr_hugepages
/sys/kernel/hugepages/hugepages-64/nr_overcommit_hugepages
/sys/kernel/hugepages/hugepages-64/free_hugepages
/sys/kernel/hugepages/hugepages-64/resv_hugepages
/sys/kernel/hugepages/hugepages-64/surplus_hugepages

Of these files, the first two are read-write and the latter three are
read-only.  The size of the hugepage being manipulated is trivially
deducible from the enclosing directory and is always expressed in kB (to
match meminfo).

[dave@linux.vnet.ibm.com: fix build]
[nacc@us.ibm.com: hugetlb: hang off of /sys/kernel/mm rather than /sys/kernel]
[nacc@us.ibm.com: hugetlb: remove CONFIG_SYSFS dependency]
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:17 -07:00
Andi Kleen
a137e1cc6d hugetlbfs: per mount huge page sizes
Add the ability to configure the hugetlb hstate used on a per mount basis.

- Add a new pagesize= option to the hugetlbfs mount that allows setting
  the page size
- This option causes the mount code to find the hstate corresponding to the
  specified size, and sets up a pointer to the hstate in the mount's
  superblock.
- Change the hstate accessors to use this information rather than the
  global_hstate they were using (requires a slight change in mm/memory.c
  so we don't NULL deref in the error-unmap path -- see comments).

[np: take hstate out of hugetlbfs inode and vma->vm_private_data]

Acked-by: Adam Litke <agl@us.ibm.com>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:17 -07:00
Andi Kleen
e5ff215941 hugetlb: multiple hstates for multiple page sizes
Add basic support for more than one hstate in hugetlbfs.  This is the key
to supporting multiple hugetlbfs page sizes at once.

- Rather than a single hstate, we now have an array, with an iterator
- default_hstate continues to be the struct hstate which we use by default
- Add functions for architectures to register new hstates

[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Adam Litke <agl@us.ibm.com>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:17 -07:00
Andi Kleen
a551643895 hugetlb: modular state for hugetlb page size
The goal of this patchset is to support multiple hugetlb page sizes.  This
is achieved by introducing a new struct hstate structure, which
encapsulates the important hugetlb state and constants (eg.  huge page
size, number of huge pages currently allocated, etc).

The hstate structure is then passed around the code which requires these
fields, they will do the right thing regardless of the exact hstate they
are operating on.

This patch adds the hstate structure, with a single global instance of it
(default_hstate), and does the basic work of converting hugetlb to use the
hstate.

Future patches will add more hstate structures to allow for different
hugetlbfs mounts to have different page sizes.

[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Adam Litke <agl@us.ibm.com>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:17 -07:00
Nishanth Aravamudan
ff7ea79cf7 mm: create /sys/kernel/mm
Add a kobject to create /sys/kernel/mm when sysfs is mounted.  The kobject
will exist regardless.  This will allow for the hugepage related sysfs
directories to exist under the mm "subsystem" directory.  Add an ABI file
appropriately.

[kosaki.motohiro@jp.fujitsu.com: fix build]
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:17 -07:00
Andy Whitcroft
cdfd4325c0 mm: record MAP_NORESERVE status on vmas and fix small page mprotect reservations
With Mel's hugetlb private reservation support patches applied, strict
overcommit semantics are applied to both shared and private huge page
mappings.  This can be a problem if an application relied on unlimited
overcommit semantics for private mappings.  An example of this would be an
application which maps a huge area with the intention of using it very
sparsely.  These application would benefit from being able to opt-out of
the strict overcommit.  It should be noted that prior to hugetlb
supporting demand faulting all mappings were fully populated and so
applications of this type should be rare.

This patch stack implements the MAP_NORESERVE mmap() flag for huge page
mappings.  This flag has the same meaning as for small page mappings,
suppressing reservations for that mapping.

Thanks to Mel Gorman for reviewing a number of early versions of these
patches.

This patch:

When a small page mapping is created with mmap() reservations are created
by default for any memory pages required.  When the region is read/write
the reservation is increased for every page, no reservation is needed for
read-only regions (as they implicitly share the zero page).  Reservations
are tracked via the VM_ACCOUNT vma flag which is present when the region
has reservation backing it.  When we convert a region from read-only to
read-write new reservations are aquired and VM_ACCOUNT is set.  However,
when a read-only map is created with MAP_NORESERVE it is indistinguishable
from a normal mapping.  When we then convert that to read/write we are
forced to incorrectly create reservations for it as we have no record of
the original MAP_NORESERVE.

This patch introduces a new vma flag VM_NORESERVE which records the
presence of the original MAP_NORESERVE flag.  This allows us to
distinguish these two circumstances and correctly account the reserve.

As well as fixing this FIXME in the code, this makes it much easier to
introduce MAP_NORESERVE support for huge pages as this flag is available
consistantly for the life of the mapping.  VM_ACCOUNT on the other hand is
heavily used at the generic level in association with small pages.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Johannes Weiner <hannes@saeurebad.de>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:16 -07:00
Mel Gorman
04f2cbe356 hugetlb: guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed
After patch 2 in this series, a process that successfully calls mmap() for
a MAP_PRIVATE mapping will be guaranteed to successfully fault until a
process calls fork().  At that point, the next write fault from the parent
could fail due to COW if the child still has a reference.

We only reserve pages for the parent but a copy must be made to avoid
leaking data from the parent to the child after fork().  Reserves could be
taken for both parent and child at fork time to guarantee faults but if
the mapping is large it is highly likely we will not have sufficient pages
for the reservation, and it is common to fork only to exec() immediatly
after.  A failure here would be very undesirable.

Note that the current behaviour of mainline with MAP_PRIVATE pages is
pretty bad.  The following situation is allowed to occur today.

1. Process calls mmap(MAP_PRIVATE)
2. Process calls mlock() to fault all pages and makes sure it succeeds
3. Process forks()
4. Process writes to MAP_PRIVATE mapping while child still exists
5. If the COW fails at this point, the process gets SIGKILLed even though it
   had taken care to ensure the pages existed

This patch improves the situation by guaranteeing the reliability of the
process that successfully calls mmap().  When the parent performs COW, it
will try to satisfy the allocation without using reserves.  If that fails
the parent will steal the page leaving any children without a page.
Faults from the child after that point will result in failure.  If the
child COW happens first, an attempt will be made to allocate the page
without reserves and the child will get SIGKILLed on failure.

To summarise the new behaviour:

1. If the original mapper performs COW on a private mapping with multiple
   references, it will attempt to allocate a hugepage from the pool or
   the buddy allocator without using the existing reserves. On fail, VMAs
   mapping the same area are traversed and the page being COW'd is unmapped
   where found. It will then steal the original page as the last mapper in
   the normal way.

2. The VMAs the pages were unmapped from are flagged to note that pages
   with data no longer exist. Future no-page faults on those VMAs will
   terminate the process as otherwise it would appear that data was corrupted.
   A warning is printed to the console that this situation occured.

2. If the child performs COW first, it will attempt to satisfy the COW
   from the pool if there are enough pages or via the buddy allocator if
   overcommit is allowed and the buddy allocator can satisfy the request. If
   it fails, the child will be killed.

If the pool is large enough, existing applications will not notice that
the reserves were a factor.  Existing applications depending on the
no-reserves been set are unlikely to exist as for much of the history of
hugetlbfs, pages were prefaulted at mmap(), allocating the pages at that
point or failing the mmap().

[npiggin@suse.de: fix CONFIG_HUGETLB=n build]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:16 -07:00
Mel Gorman
a1e78772d7 hugetlb: reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings until fork()
This patch reserves huge pages at mmap() time for MAP_PRIVATE mappings in
a similar manner to the reservations taken for MAP_SHARED mappings.  The
reserve count is accounted both globally and on a per-VMA basis for
private mappings.  This guarantees that a process that successfully calls
mmap() will successfully fault all pages in the future unless fork() is
called.

The characteristics of private mappings of hugetlbfs files behaviour after
this patch are;

1. The process calling mmap() is guaranteed to succeed all future faults until
   it forks().
2. On fork(), the parent may die due to SIGKILL on writes to the private
   mapping if enough pages are not available for the COW. For reasonably
   reliable behaviour in the face of a small huge page pool, children of
   hugepage-aware processes should not reference the mappings; such as
   might occur when fork()ing to exec().
3. On fork(), the child VMAs inherit no reserves. Reads on pages already
   faulted by the parent will succeed. Successful writes will depend on enough
   huge pages being free in the pool.
4. Quotas of the hugetlbfs mount are checked at reserve time for the mapper
   and at fault time otherwise.

Before this patch, all reads or writes in the child potentially needs page
allocations that can later lead to the death of the parent.  This applies
to reads and writes of uninstantiated pages as well as COW.  After the
patch it is only a write to an instantiated page that causes problems.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:16 -07:00
Johannes Weiner
9109fb7b35 mm: drop unneeded pgdat argument from free_area_init_node()
free_area_init_node() gets passed in the node id as well as the node
descriptor.  This is redundant as the function can trivially get the node
descriptor itself by means of NODE_DATA() and the node's id.

I checked all the users and NODE_DATA() seems to be usable everywhere
from where this function is called.

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:16 -07:00
Andrew Morton
2185e69f68 mapping_set_error: add unlikely()
This is called on a per-page basis and in the vast majority of cases
`error' is zero.

Cc: Guillaume Chazarain <guichaz@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:15 -07:00
Andy Whitcroft
9023cb7e85 slob: record page flag overlays explicitly
SLOB reuses two page bits for internal purposes, it overlays PG_active and
PG_private.  This is hidden away in slob.c.  Document these overlays
explicitly in the main page-flags enum along with all the others.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:15 -07:00
Andy Whitcroft
8a38082d21 slub: record page flag overlays explicitly
SLUB reuses two page bits for internal purposes, it overlays PG_active and
PG_error.  This is hidden away in slub.c.  Document these overlays
explicitly in the main page-flags enum along with all the others.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:15 -07:00
Andy Whitcroft
0cad47cf13 page-flags: record page flag overlays explicitly
With the recent page flag reorganisation we have a single enum which
defines the valid page flags and their values, nice and clear.  However
there are a number of bits which are overloaded by different subsystems.
Firstly there is PG_owner_priv_1 which is used by filesystems and by XEN.
Secondly both SLOB and SLUB use a couple of extra page bits to manage
internal state for pages they own; both overlay other bits.  All of these
"aliases" are scattered about the source making it very hard for a reader
to know if the bits are safe to rely on in all contexts; confusion here is
bad.

As we now have a single place where the bits are clearly assigned it makes
sense to clarify the reuse of bits by making the aliases explicit and
visible with the original bit assignments.  This patch creates explicit
aliases within the enum itself for the overloaded bits, creates standard
bit accessors PageFoo etc.  and uses those throughout.

This version pulls the bit manipulation out to standard named page bit
accessors as suggested by Christoph, it retains the explicit mapping to
the overlayed bits.  A fusion of both ideas.  This has been SLUB and SLOB
have been compile tested on x86_64 only, and SLUB boot tested.  If people
feel this is worth doing then I can run a fuller set of testing.

This patch:

Some page flags are used for more than one purpose, for example
PG_owner_priv_1.  Currently there are individual accessors for each user,
each built using the common flag name far away from the bit definitions.
This makes it hard to see all possible uses of these bits.

Now that we have a single enum to generate the bit orders it makes sense
to express overlays in the same place.  So create per use aliases for this
bit in the main page-flags enum and use those in the accessors.

[akpm@linux-foundation.org: fix xen]
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:15 -07:00
Kentaro Makita
da3bbdd463 fix soft lock up at NFS mount via per-SB LRU-list of unused dentries
[Summary]

 Split LRU-list of unused dentries to one per superblock to avoid soft
 lock up during NFS mounts and remounting of any filesystem.

 Previously I posted here:
 http://lkml.org/lkml/2008/3/5/590

[Descriptions]

- background

  dentry_unused is a list of dentries which are not referenced.
  dentry_unused grows up when references on directories or files are
  released.  This list can be very long if there is huge free memory.

- the problem

  When shrink_dcache_sb() is called, it scans all dentry_unused linearly
  under spin_lock(), and if dentry->d_sb is differnt from given
  superblock, scan next dentry.  This scan costs very much if there are
  many entries, and very ineffective if there are many superblocks.

  IOW, When we need to shrink unused dentries on one dentry, but scans
  unused dentries on all superblocks in the system.  For example, we scan
  500 dentries to unmount a filesystem, but scans 1,000,000 or more unused
  dentries on other superblocks.

  In our case , At mounting NFS*, shrink_dcache_sb() is called to shrink
  unused dentries on NFS, but scans 100,000,000 unused dentries on
  superblocks in the system such as local ext3 filesystems.  I hear NFS
  mounting took 1 min on some system in use.

* : NFS uses virtual filesystem in rpc layer, so NFS is affected by
  this problem.

  100,000,000 is possible number on large systems.

  Per-superblock LRU of unused dentried can reduce the cost in
  reasonable manner.

- How to fix

  I found this problem is solved by David Chinner's "Per-superblock
  unused dentry LRU lists V3"(1), so I rebase it and add some fix to
  reclaim with fairness, which is in Andrew Morton's comments(2).

  1) http://lkml.org/lkml/2006/5/25/318
  2) http://lkml.org/lkml/2006/5/25/320

  Split LRU-list of unused dentries to each superblocks.  Then, NFS
  mounting will check dentries under a superblock instead of all.  But
  this spliting will break LRU of dentry-unused.  So, I've attempted to
  make reclaim unused dentrins with fairness by calculate number of
  dentries to scan on this sb based on following way

  number of dentries to scan on this sb =
  count * (number of dentries on this sb / number of dentries in the machine)

- ToDo
 - I have to measuring performance number and do stress tests.

 - When unmount occurs during prune_dcache(), scanning on same
  superblock, It is unable to reach next superblock because it is gone
  away.  We restart scannig superblock from first one, it causes
  unfairness of reclaim unused dentries on first superblock.  But I think
  this happens very rarely.

- Test Results

  Result on 6GB boxes with excessive unused dentries.

Without patch:

$ cat /proc/sys/fs/dentry-state
10181835        10180203        45      0       0       0
# mount -t nfs 10.124.60.70:/work/kernel-src nfs
real    0m1.830s
user    0m0.001s
sys     0m1.653s

 With this patch:
$ cat /proc/sys/fs/dentry-state
10236610        10234751        45      0       0       0
# mount -t nfs 10.124.60.70:/work/kernel-src nfs
real    0m0.106s
user    0m0.002s
sys     0m0.032s

[akpm@linux-foundation.org: fix comments]
Signed-off-by: Kentaro Makita <k-makita@np.css.fujitsu.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: David Chinner <dgc@sgi.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:15 -07:00
Jan Beulich
42b7772812 mm: remove double indirection on tlb parameter to free_pgd_range() & Co
The double indirection here is not needed anywhere and hence (at least)
confusing.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:15 -07:00
Rik van Riel
28b2ee20c7 access_process_vm device memory infrastructure
In order to be able to debug things like the X server and programs using
the PPC Cell SPUs, the debugger needs to be able to access device memory
through ptrace and /proc/pid/mem.

This patch:

Add the generic_access_phys access function and put the hooks in place
to allow access_process_vm to access device or PPC Cell SPU memory.

[riel@redhat.com: Add documentation for the vm_ops->access function]
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Benjamin Herrensmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:15 -07:00
Nick Piggin
0d71d10a42 mm: remove nopfn
There are no users of nopfn in the tree. Remove it.

[hugh@veritas.com: fix build error]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:15 -07:00
Adrian Bunk
c748e1340e mm/vmstat.c: proper externs
This patch adds proper extern declarations for five variables in
include/linux/vmstat.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:14 -07:00
KOSAKI Motohiro
e4048e5dc4 page allocator: inline some __alloc_pages() wrappers
Two zonelist patch series rewrote __page_alloc() largely.  Now, it is just
a wrapper function.  Inlining them will save a function call.

[akpm@linux-foundation.org: export __alloc_pages_internal]
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:14 -07:00
Johannes Weiner
ffc6421f07 mm: unexport __alloc_bootmem_core()
This function has no external callers, so unexport it.  Also fix its naming
inconsistency.

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:14 -07:00
Johannes Weiner
b61bfa3c46 mm: move bootmem descriptors definition to a single place
There are a lot of places that define either a single bootmem descriptor or an
array of them.  Use only one central array with MAX_NUMNODES items instead.

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:14 -07:00
FUJITA Tomonori
8b05c7e6e1 add a helper function to test if an object is on the stack
lib/debugobjects.c has a function to test if an object is on the stack.
The block layer and ide needs it (they need to avoid DMA from/to stack
buffers).  This patch moves the function to include/linux/sched.h so that
everyone can use it.

lib/debugobjects.c uses current->stack but this patch uses a
task_stack_page() accessor, which is a preferable way to access the stack.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:14 -07:00
Akinobu Mita
e108526e77 move memory_read_from_buffer() from fs.h to string.h
James Bottomley warns that inclusion of linux/fs.h in a low level
driver was always a danger signal.  This patch moves
memory_read_from_buffer() from fs.h to string.h and fixes includes in
existing memory_read_from_buffer() users.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Bob Moore <robert.moore@intel.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:13 -07:00
Matthew Wilcox
b552068999 Remove __DECLARE_SEMAPHORE_GENERIC
There are no users of __DECLARE_SEMAPHORE_GENERIC in the kernel

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2008-07-24 08:31:21 -04:00
Artem Bityutskiy
85c6e6e282 UBI: amend commentaries
Hch asked not to use "unit" for sub-systems, let it be so.
Also some other commentaries modifications.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:32:56 +03:00
Artem Bityutskiy
a5bf619041 UBI: add ubi_sync() interface
To flush MTD device caches.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:32:56 +03:00
Linus Torvalds
7f9dce3837 Merge branch 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: hrtick_enabled() should use cpu_active()
  sched, x86: clean up hrtick implementation
  sched: fix build error, provide partition_sched_domains() unconditionally
  sched: fix warning in inc_rt_tasks() to not declare variable 'rq' if it's not needed
  cpu hotplug: Make cpu_active_map synchronization dependency clear
  cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)
  sched: rework of "prioritize non-migratable tasks over migratable ones"
  sched: reduce stack size in isolated_cpu_setup()
  Revert parts of "ftrace: do not trace scheduler functions"

Fixed up conflicts in include/asm-x86/thread_info.h (due to the
TIF_SINGLESTEP unification vs TIF_HRTICK_RESCHED removal) and
kernel/sched_fair.c (due to cpu_active_map vs for_each_cpu_mask_nr()
introduction).
2008-07-23 19:36:53 -07:00
Linus Torvalds
26dcce0fab Merge branch 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
  NR_CPUS: Replace NR_CPUS in speedstep-centrino.c
  cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP
  NR_CPUS: Replace NR_CPUS in cpufreq userspace routines
  NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var
  NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c
  NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c
  NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c
  NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c
  cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix
  cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target
  cpumask: Provide a generic set of CPUMASK_ALLOC macros
  cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c
  cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c
  cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c
  cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c
  cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c
  cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr
  Revert "cpumask: introduce new APIs"
  cpumask: make for_each_cpu_mask a bit smaller
  net: Pass reference to cpumask variable in net/sunrpc/svc.c
  ...

Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually
2008-07-23 18:37:44 -07:00
Linus Torvalds
d7b6de14a0 Merge branch 'core/softlockup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core/softlockup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  softlockup: fix invalid proc_handler for softlockup_panic
  softlockup: fix watchdog task wakeup frequency
  softlockup: fix watchdog task wakeup frequency
  softlockup: show irqtrace
  softlockup: print a module list on being stuck
  softlockup: fix NMI hangs due to lock race - 2.6.26-rc regression
  softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds
  softlockup: fix softlockup_thresh fix
  softlockup: fix softlockup_thresh unaligned access and disable detection at runtime
  softlockup: allow panic on lockup
2008-07-23 18:34:13 -07:00
Linus Torvalds
30d38542ec Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (85 commits)
  [ARM] pxa: add base support for PXA930 Handheld Platform (aka SAAR)
  [ARM] pxa: add base support for PXA930 Evaluation Board (aka TavorEVB)
  [ARM] pxa: add base support for PXA930 (aka Tavor-P)
  [ARM] Update mach-types
  [ARM] pxa: make littleton to use the new smc91x platform data
  [ARM] pxa: make zylonite to use the new smc91x platform data
  [ARM] pxa: make mainstone to use the new smc91x platform data
  [ARM] pxa: make lubbock to use new smc91x platform data
  [NET] smc91x: prepare SMC_USE_PXA_DMA to be specified in platform data
  [NET] smc91x: prepare for SMC_IO_SHIFT to be a platform configurable variable
  [NET] smc91x: add SMC91X_NOWAIT flag to platform data
  [NET] smc91x: favor the use of SMC91X_USE_* instead of SMC_CAN_USE_*
  [NET] smc91x: remove "irq_flags" from "struct smc91x_platdata"
  [ARM] 5146/1: pxa2xx: convert all boards to call pxa2xx_transceiver_mode helper
  Support for LCD on e740 e750 e400 and e800 e-series PDAs
  E-series UDC support
  PXA UDC - allow use of inverted GPIO for pullup
  Add e350 support
  Fix broken e-series build
  E-series GPIO / IRQ definitions.
  ...
2008-07-23 18:24:08 -07:00
Dan Williams
d8e64406a0 md: delay notification of 'active_idle' to the recovery thread
sysfs_notify might sleep, so do not call it from md_safemode_timeout.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-07-23 13:09:48 -07:00
Linus Torvalds
20b7997e8a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
  sdhci: highmem capable PIO routines
  sg: reimplement sg mapping iterator
  mmc_test: print message when attaching to card
  mmc: Remove Russell as primecell mci maintainer
  mmc_block: bounce buffer highmem support
  sdhci: fix bad warning from commit c8b3e02
  sdhci: add warnings for bad buffers in ADMA path
  mmc_test: test oversized sg lists
  mmc_test: highmem tests
  s3cmci: ensure host stopped on machine shutdown
  au1xmmc: suspend/resume implementation
  s3cmci: fixes for section mismatch warnings
  pxamci: trivial fix of DMA alignment register bit clearing
2008-07-23 12:04:34 -07:00
Linus Torvalds
5554b35933 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (24 commits)
  I/OAT: I/OAT version 3.0 support
  I/OAT: tcp_dma_copybreak default value dependent on I/OAT version
  I/OAT: Add watchdog/reset functionality to ioatdma
  iop_adma: cleanup iop_chan_xor_slot_count
  iop_adma: document how to calculate the minimum descriptor pool size
  iop_adma: directly reclaim descriptors on allocation failure
  async_tx: make async_tx_test_ack a boolean routine
  async_tx: remove depend_tx from async_tx_sync_epilog
  async_tx: export async_tx_quiesce
  async_tx: fix handling of the "out of descriptor" condition in async_xor
  async_tx: ensure the xor destination buffer remains dma-mapped
  async_tx: list_for_each_entry_rcu() cleanup
  dmaengine: Driver for the Synopsys DesignWare DMA controller
  dmaengine: Add slave DMA interface
  dmaengine: add DMA_COMPL_SKIP_{SRC,DEST}_UNMAP flags to control dma unmap
  dmaengine: Add dma_client parameter to device_alloc_chan_resources
  dmatest: Simple DMA memcpy test client
  dmaengine: DMA engine driver for Marvell XOR engine
  iop-adma: fix platform driver hotplug/coldplug
  dmaengine: track the number of clients using a channel
  ...

Fixed up conflict in drivers/dca/dca-sysfs.c manually
2008-07-23 12:03:18 -07:00
Linus Torvalds
e669e8179d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (60 commits)
  ide: small whitespace fixes
  ide: ide-cd_ioctl.c fix sparse integer as NULL pointer warnings
  ide: ide-cd.c fix sparse endianness warnings
  ide-cd: convert to using the new atapi_flags
  ide: remove unused PC_FLAG_DRQ_INTERRUPT
  ide-scsi: convert to using the new atapi_flags
  ide-tape: convert to using the new atapi_flags
  ide-floppy: convert to using the new atapi_flags (take 2)
  ide: add per-device flags
  ide: use rq->cmd instead of pc->c in atapi common code
  ide-scsi: pass packet command in rq->cmd
  ide-tape: pass packet command in rq->cmd
  ide-tape: make room for packet command ids in rq->cmd
  ide-floppy: pass packet command in rq->cmd
  ide: remove pc->callback member from ide_atapi_pc
  ide-scsi: use drive->pc_callback instead of pc->callback
  ide-tape: use drive->pc_callback instead of pc->callback
  ide-floppy: use drive->pc_callback instead of pc->callback
  ide: push pc callback pointer into the ide_drive_t structure
  drivers/ide/ide-tape.c: remove double kfree
  ...
2008-07-23 11:59:09 -07:00
Dmitry Torokhov
a822bea796 Input: serio - mark serio_register_driver() __must_check
Also remove extra declaration of serio_register_driver().

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2008-07-23 14:01:49 -04:00
Borislav Petkov
ac77ef8b03 ide: remove unused PC_FLAG_DRQ_INTERRUPT
There should be no functionality change resulting from this patch.

Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:56:01 +02:00
Borislav Petkov
ea68d270ff ide-floppy: convert to using the new atapi_flags (take 2)
while at it, remove PC_FLAG_ZIP_DRIVE from the packed command flags altogether
and query the drive type through drive->atapi_flags.

v2:
ide-floppy fix.

There should be no functionality change resulting from this patch.

[bart: IDE_FLAG_* -> IDE_AFLAG_*, dev_flags -> atapi_flags]

Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:56:01 +02:00
Borislav Petkov
3b8ac5398c ide: add per-device flags
Push device flags up into ide_drive_t.

There should be no functionality change resulting from this patch.

[bart: IDE_FLAG_* -> IDE_AFLAG_*, dev_flags -> atapi_flags]

Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:56:01 +02:00
Borislav Petkov
8bcda3bc49 ide: remove pc->callback member from ide_atapi_pc
There should be no functionality change resulting from this patch.

Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:56:00 +02:00
Borislav Petkov
d7c26ebb5b ide: push pc callback pointer into the ide_drive_t structure
Refrain from carrying the callback ptr with every packet command since the
callback function is only one anyways. ide_drive_t is probably not the most
suitable place for it right now but is the more sane solution. Besides, these
structs are going to be reorganized anyways during the generic ide rewrite.

Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:59 +02:00
Bartlomiej Zolnierkiewicz
8a69580e1e ide: add ide_host_free() helper (take 2)
* Add ide_host_free() helper and convert ide_host_remove() to use it.

* Fix handling of ide_host_register() failure in ide_host_add(),
  icside.c, ide-generic.c, falconide.c and sgiioc4.c.

While at it:

* Fix handling of ide_host_alloc_all() failure in ide-generic.c.

* Fix handling of ide_host_alloc() failure in falconide.c
  (also return the correct error value if no device is found).

v2:
* falconide build fix. (From Stephen Rothwell)

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:59 +02:00
Bartlomiej Zolnierkiewicz
6f904d0152 ide: add ide_host_add() helper
Add ide_host_add() helper which does ide_host_alloc()+ide_host_register(),
then convert ide_setup_pci_device[s](), ide_legacy_device_add() and some
host drivers to use it.

While at it:

* Fix ide_setup_pci_device[s](), ide_arm.c, gayle.c, ide-4drives.c,
  macide.c, q40ide.c, cmd640.c and cs5520.c to return correct error value.

* -ENOENT -> -ENOMEM in rapide.c, ide-h8300.c, ide-generic.c, au1xxx-ide.c
  and pmac.c

* -ENODEV -> -ENOMEM in palm_bk3710.c, ide_platform.c and delkin_cb.c

* -1 -> -ENOMEM in ide-pnp.c

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:57 +02:00
Bartlomiej Zolnierkiewicz
48c3c10726 ide: add struct ide_host (take 3)
* Add struct ide_host which keeps pointers to host's ports.

* Add ide_host_alloc[_all]() and ide_host_remove() helpers.

* Pass 'struct ide_host *host' instead of 'u8 *idx' to
  ide_device_add[_all]() and rename it to ide_host_register[_all]().

* Convert host drivers and core code to use struct ide_host.

* Remove no longer needed ide_find_port().

* Make ide_find_port_slot() static.

* Unexport ide_unregister().

v2:
* Add missing 'struct ide_host *host' to macide.c.

v3:
* Fix build problem in pmac.c (s/ide_alloc_host/ide_host_alloc/)
  (Noticed by Stephen Rothwell).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:57 +02:00
Bartlomiej Zolnierkiewicz
374e042c3e ide: add struct ide_tp_ops (take 2)
* Add struct ide_tp_ops for transport methods.

* Add 'const struct ide_tp_ops *tp_ops' to struct ide_port_info
  and ide_hwif_t.

* Set the default hwif->tp_ops in ide_init_port_data().

* Set host driver specific hwif->tp_ops in ide_init_port().

* Export ide_exec_command(), ide_read_status(), ide_read_altstatus(),
  ide_read_sff_dma_status(), ide_set_irq(), ide_tf_{load,read}()
  and ata_{in,out}put_data().

* Convert host drivers and core code to use struct ide_tp_ops.

* Remove no longer needed default_hwif_transport().

* Cleanup ide_hwif_t from methods that are now in struct ide_tp_ops.

While at it:

* Use struct ide_port_info in falconide.c and q40ide.c.

* Rename ata_{in,out}put_data() to ide_{in,out}put_data().

v2:

* Fix missing convertion in ns87415.c.

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:56 +02:00
Bartlomiej Zolnierkiewicz
d6276b5f5c ide: add 'config' field to hw_regs_t
Add 'config' field to hw_regs_t and use it to set hwif->config_data in
ide_init_port_hw(), then convert ide_legacy_init_one() to use hw->config.

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:56 +02:00
Bartlomiej Zolnierkiewicz
3b2a5c7149 ide: filter out "default" transfer mode values in set_xfer_rate()
* Filter out "default" transfer mode values (0x00 - default PIO mode,
  0x01 - default PIO mode w/ IORDY disabled) in write handler for obsoleted
  /proc/ide/hd?/settings:current_speed setting.

  Allowing "default" transfer mode values is a dangerous thing to do as
  we don't support programming controller to the "default" transfer mode
  and devices often use different values for the default and maximum PIO
  mode (i.e. PIO2 default and PIO4 maximum) so the controller will stay
  programmed for higher PIO mode while device will use the lower PIO mode.

  There is no functionality loss as by using special IOCTLs device can
  still be programmed to "default" transfer modes (it is only useful for
  debugging/testing purposes anyway).

* Remove no longer needed IDE_HFLAG_ABUSE_SET_DMA_MODE host flag, it was
  previously used by few host drivers to program the controller to PIO0
  timings for "default" transfer mode == 0x01 (although some host drivers
  would program invalid PIO timings instead).

* Cleanup ide_set_xfer_rate() and add BUG_ON().

Suggested-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:56 +02:00
Bartlomiej Zolnierkiewicz
ba4b2e607e ide: remove dead Virtual DMA support
Lets remove dead Virtual DMA support for now so it doesn't clutter
core IDE code (it can be bring back when there is a need for it):

* Remove IDE_HFLAG_VDMA host flag.

* Remove ide_drive_t.vdma flag.

* cs5520.c: remove stale FIXMEs, cs5520_dma_host_set() and cs5520_dma_ops
  (also there is no longer a need to set IDE_HFLAG_NO_ATAPI_DMA).

There should be no functional changes caused by this patch.

Cc: TAKADA Yoshihito <takada@mbf.nifty.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:55 +02:00
Bartlomiej Zolnierkiewicz
761052e676 ide: remove ->INB, ->OUTB and ->OUTBSYNC methods
* Remove no longer needed ->INB, ->OUTB and ->OUTBSYNC methods.

Then:

* Remove no longer used default_hwif_[mm]iops() and ide_[mm_]outbsync().

* Cleanup SuperIO handling in ns87415.c.

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:54 +02:00
Bartlomiej Zolnierkiewicz
1823649b5a ide: add ide_read_bcount_and_ireason() helper
Add ide_read_bcount_and_ireason() helper and use it instead of ->INB
in {cdrom_newpc,ide_pc}_intr().

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:54 +02:00
Bartlomiej Zolnierkiewicz
92eb43800a ide: use ->tf_read in ide_read_error()
* Add IDE_TFLAG_IN_FEATURE taskfile flag for reading Feature
  register and handle it in ->tf_read.

* Convert ide_read_error() to use ->tf_read instead of ->INB,
  then uninline and export it.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:53 +02:00
Bartlomiej Zolnierkiewicz
6e6afb3b74 ide: add ->set_irq method
Add ->set_irq method for setting nIEN bit of ATA Device Control
register and use it instead of ide_set_irq().

While at it:

* Use ->set_irq in init_irq() and do_reset1().

* Don't use HWIF() macro in ide_check_pm_state().

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:52 +02:00
Bartlomiej Zolnierkiewicz
1f6d8a0fd8 ide: add ->read_altstatus method
* Remove ide_read_altstatus() inline helper.

* Add ->read_altstatus method for reading ATA Alternate Status
  register and use it instead of ->INB.

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:52 +02:00
Bartlomiej Zolnierkiewicz
b73c7ee25d ide: add ->read_status method
* Remove ide_read_status() inline helper.

* Add ->read_status method for reading ATA Status register
  and use it instead of ->INB.

While at it:

* Don't use HWGROUP() macro.

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:52 +02:00
Bartlomiej Zolnierkiewicz
c6dfa867bb ide: add ->exec_command method
Add ->exec_command method for writing ATA Command register
and use it instead of ->OUTBSYNC.

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:51 +02:00
Bartlomiej Zolnierkiewicz
ebb00fb55d ide: factor out simplex handling from ide_pci_dma_base()
* Factor out simplex handling from ide_pci_dma_base() to
  ide_pci_check_simplex().

* Set hwif->dma_base early in ->init_dma method / ide_hwif_setup_dma()
  and reset it in ide_init_port() if DMA initialization fails.

* Use ->read_sff_dma_status instead of ->INB in ide_pci_dma_base().

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:51 +02:00
Bartlomiej Zolnierkiewicz
81e8d5a34f ide: remove ide_setup_dma()
Export sff_dma_ops and then remove ide_setup_dma().

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:51 +02:00
Bartlomiej Zolnierkiewicz
cab7f8eda4 ide: remove ->dma_{status,command} fields from ide_hwif_t
* Use ->dma_base + offset instead of ->dma_{status,command}
  and remove no longer needed ->dma_{status,command}.

While at it:

* Use ATA_DMA_* defines.

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:51 +02:00
Bartlomiej Zolnierkiewicz
b2f951aabc ide: add ->read_sff_dma_status method
Add ->read_sff_dma_status method for reading DMA Status register
and use it instead of ->INB.

While at it:

* Use inb() directly in ns87415.c::ns87415_dma_end().

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:50 +02:00
Bartlomiej Zolnierkiewicz
c97c6aca75 ide: pass hw_regs_t-s to ide_device_add[_all]() (take 3)
* Add 'hw_regs_t **hws' argument to ide_device_add[_all]() and convert
  host drivers + ide_legacy_init_one() + ide_setup_pci_device[s]() to use
  it instead of calling ide_init_port_hw() directly.

  [ However if host has > 1 port we must still set hwif->chipset to hint
    consecutive ide_find_port() call that the previous slot is occupied. ]

* Unexport ide_init_port_hw().

v2:
* Use defines instead of hard-coded values in buddha.c, gayle.c and q40ide.c.
  (Suggested by Geert Uytterhoeven)

* Better patch description.

v3:
* Fix build problem in ide-cs.c. (Noticed by Stephen Rothwell)

There should be no functional changes caused by this patch.

Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-23 19:55:50 +02:00
Roland Dreier
95d04f0735 IB/mlx4: Add support for memory management extensions and local DMA L_Key
Add support for the following operations to mlx4 when device firmware
supports them:

 - Send with invalidate and local invalidate send queue work requests;
 - Allocate/free fast register MRs;
 - Allocate/free fast register MR page lists;
 - Fast register MR send queue work requests;
 - Local DMA L_Key.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-23 08:12:26 -07:00
Rafi Rubin
f472f80034 HID: add n-trig digitizer usage
This adds a hid usage that is reported by the N-Trig digitizer in the Dell
Latitude XT screen.

Signed-off-by: Rafi Rubin <rafi@seas.upenn.edu>
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2008-07-23 15:25:21 +02:00
Tejun Heo
137d3edb48 sg: reimplement sg mapping iterator
This is alternative implementation of sg content iterator introduced
by commit 83e7d317... from Pierre Ossman in next-20080716.  As there's
already an sg iterator which iterates over sg entries themselves, name
this sg_mapping_iterator.

Slightly edited description from the original implementation follows.

Iteration over a sg list is not that trivial when you take into
account that memory pages might have to be mapped before being used.
Unfortunately, that means that some parts of the kernel restrict
themselves to directly accesible memory just to not have to deal with
the mess.

This patch adds a simple iterator system that allows any code to
easily traverse an sg list and not have to deal with all the details.
The user can decide to consume part of the iteration.  Also, iteration
can be stopped and resumed later if releasing the kmap between
iteration steps is necessary.  These features are useful to implement
piecemeal sg copying for interrupt drive PIO for example.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2008-07-23 14:42:09 +02:00
Nate Case
f46e9203d9 leds: Add support for Philips PCA955x I2C LED drivers
This driver supports the PCA9550, PCA9551, PCA9552, and PCA9553
LED driver chips.

Signed-off-by: Nate Case <ncase@xes-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
2008-07-23 09:49:56 +01:00
Anton Vorontsov
781a54e766 leds: mark led_classdev.default_trigger as const
LED classdev core doesn't modify memory pointed by the default_trigger,
so mark it as const and we'll able to pass const char *s without getting
compiler warnings.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
2008-07-23 09:49:56 +01:00
Riku Voipio
e14fa82439 leds: Add pca9532 led driver
NXP pca9532 is a LED dimmer/controller attached to i2c bus.  It allows
attaching upto 16 leds which can either be on, off or dimmed and/or blinked
with the two PWM modulators available.

This driver is a "new-style" i2c driver that adheres to the driver model and
implements the led framework api.  Since the leds connected to the driver are
platform specific, it is only useful when platform data is passed to the
driver to define what leds are connected to which pins.

Signed-off-by: Riku Voipio <riku.voipio@iki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
2008-07-23 09:49:56 +01:00
Linus Torvalds
c010b2f76c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (82 commits)
  ipw2200: Call netif_*_queue() interfaces properly.
  netxen: Needs to include linux/vmalloc.h
  [netdrvr] atl1d: fix !CONFIG_PM build
  r6040: rework init_one error handling
  r6040: bump release number to 0.18
  r6040: handle RX fifo full and no descriptor interrupts
  r6040: change the default waiting time
  r6040: use definitions for magic values in descriptor status
  r6040: completely rework the RX path
  r6040: call napi_disable when puting down the interface and set lp->dev accordingly.
  mv643xx_eth: fix NETPOLL build
  r6040: rework the RX buffers allocation routine
  r6040: fix scheduling while atomic in r6040_tx_timeout
  r6040: fix null pointer access and tx timeouts
  r6040: prefix all functions with r6040
  rndis_host: support WM6 devices as modems
  at91_ether: use netstats in net_device structure
  sfc: Create one RX queue and interrupt per CPU package by default
  sfc: Use a separate workqueue for resets
  sfc: I2C adapter initialisation fixes
  ...
2008-07-22 19:09:51 -07:00
David S. Miller
7cf75262a4 Merge branch 'upstream-davem' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2008-07-22 17:54:47 -07:00
Maciej Sosnowski
7f1b358a23 I/OAT: I/OAT version 3.0 support
This patch adds to ioatdma and dca modules
support for Intel I/OAT DMA engine ver.3 (aka CB3 device).
The main features of I/OAT ver.3 are:
 * 8 single channel DMA devices (8 channels total)
 * 8 DCA providers, each can accept 2 requesters
 * 8-bit TAG values and 32-bit extended APIC IDs

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-07-22 17:30:57 -07:00
Rafael J. Wysocki
e5899e1b7d PCI PM: make more PCI PM core functionality available to drivers
Make more PCI PM core functionality available to drivers

* Export pci_pme_capable() so that it can be called directly by
  drivers (for example, tg3 needs that).

* Move the state choosing part of pci_prepare_to_sleep() to a
  separate function, pci_target_state(), that can be called directly
  by drivers (for example, tg3 needs that).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-07-22 14:25:38 -07:00
Roland Dreier
47b374752a IB/mlx4: Rename struct mlx4_lso_seg to mlx4_wqe_lso_seg
Make the struct name consistent with other WQE segment struct types
defined in <linux/mlx4/qp.h>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:19:39 -07:00
Adrian Bunk
8086cd451f netns: make get_proc_net() static
get_proc_net() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-22 14:19:19 -07:00
Dave Jones
d29f749e25 net: Fix build failure with 'make mandocs'.
The function header comments have to go with the functions
they are documenting, or things go horribly wrong when we
try to process them with the docbook tools.

Warning(include/linux/netdevice.h:1006): No description found for parameter 'dev_queue'
Warning(include/linux/netdevice.h:1033): No description found for parameter 'dev_queue'
Warning(include/linux/netdevice.h:1067): No description found for parameter 'dev_queue'
Warning(include/linux/netdevice.h:1093): No description found for parameter 'dev_queue'
Warning(include/linux/netdevice.h:1474): No description found for parameter 'txq'
Error(net/core/dev.c:1674): cannot understand prototype: 'u32 simple_tx_hashrnd; '

Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-22 14:09:06 -07:00
Linus Torvalds
6eaaaac974 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  remove CONFIG_KMOD from core kernel code
  remove CONFIG_KMOD from lib
  remove CONFIG_KMOD from sparc64
  rework try_then_request_module to do less in non-modular kernels
  remove mention of CONFIG_KMOD from documentation
  make CONFIG_KMOD invisible
  modules: Take a shortcut for checking if an address is in a module
  module: turn longs into ints for module sizes
  Shrink struct module: CONFIG_UNUSED_SYMBOLS ifdefs
  module: reorder struct module to save space on 64 bit builds
  module: generic each_symbol iterator function
  module: don't use stop_machine for waiting rmmod
2008-07-22 13:17:15 -07:00
Linus Torvalds
06b8147c5d Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (49 commits)
  powerpc: Fix build bug with binutils < 2.18 and GCC < 4.2
  powerpc/eeh: Don't panic when EEH_MAX_FAILS is exceeded
  fbdev: Teaches offb about palette on radeon r5xx/r6xx
  powerpc/cell/edac: Log a syndrome code in case of correctable error
  powerpc/cell: Add DMA_ATTR_WEAK_ORDERING dma attribute and use in Cell IOMMU code
  powerpc: Indicate which oprofile counters to use while in compat mode
  powerpc/boot: Change spaces to tabs
  powerpc: Remove duplicate 6xx option in Kconfig
  powerpc: Use PPC_LONG and PPC_LONG_ALIGN in lib/string.S
  powerpc: Use PPC_LONG_ALIGN in uaccess.h
  powerpc: Add a #define for aligning to a long-sized boundary
  powerpc: Fix OF parsing of 64 bits PCI addresses
  powerpc: Use WARN_ON(1) instead of __WARN()
  powerpc: Fix support for latencytop
  powerpc/ps3: Update ps3_defconfig
  powerpc/ps3: Add a sub-match id to ps3_system_bus
  powerpc: Add a 6xx defconfig
  powerpc/dma: Use the struct dma_attrs in iommu code
  powerpc/cell: Add support for power button of future IBM cell blades
  powerpc/cell: Cleanup sysreset_hack for IBM cell blades
  ...
2008-07-22 13:16:01 -07:00
Linus Torvalds
53baaaa968 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (79 commits)
  arm: bus_id -> dev_name() and dev_set_name() conversions
  sparc64: fix up bus_id changes in sparc core code
  3c59x: handle pci_name() being const
  MTD: handle pci_name() being const
  HP iLO driver
  sysdev: Convert the x86 mce tolerant sysdev attribute to generic attribute
  sysdev: Add utility functions for simple int/ulong variable sysdev attributes
  sysdev: Pass the attribute to the low level sysdev show/store function
  driver core: Suppress sysfs warnings for device_rename().
  kobject: Transmit return value of call_usermodehelper() to caller
  sysfs-rules.txt: reword API stability statement
  debugfs: Implement debugfs_remove_recursive()
  HOWTO: change email addresses of James in HOWTO
  always enable FW_LOADER unless EMBEDDED=y
  uio-howto.tmpl: use unique output names
  uio-howto.tmpl: use standard copyright/legal markings
  sysfs: don't call notify_change
  sysdev: fix debugging statements in registration code.
  kobject: should use kobject_put() in kset-example
  kobject: reorder kobject to save space on 64 bit builds
  ...
2008-07-22 13:13:47 -07:00
Laurent Pinchart
217d5a5195 fs_enet: Remove unused fields in the fs_mii_bb_platform_info structure.
The mdio_port, mdio_bit, mdc_port and mdc_bit fields in the
fs_mii_bb_platform_info structure are left-overs from the move to the Phy
Abstraction Layer subsystem. They are not used anymore and can be safely
removed.

Signed-off-by: Laurent Pinchart <laurentp@cse-semaphore.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-07-22 16:08:58 -04:00
Paul Fulghum
e5590717af synclink_gt: add serial bit order control
Add control of hardware serial bit order between LSB first
(default/standard) and MSB first.

Signed-off-by: Paul Fulghum <paulkf@microgate.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-22 13:03:29 -07:00
Alan Cox
9e98966c7b tty: rework break handling
Some hardware needs to do break handling itself and may have partial
support only. Make break_ctl return an error code. Add a tty driver flag
so you can indicate driver hardware side break support.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-22 13:03:28 -07:00
Alan Cox
01e1abb2c2 tty: Split ldisc code into its own file
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-22 13:03:27 -07:00
Alan Cox
95da310e66 usb_serial: API all change
USB serial likes to use port->tty back pointers for the real work it does and
to do so without any actual locking. Unfortunately when you consider hangup
events, hangup/parallel reopen or even worse hangup followed by parallel close
events the tty->port and port->tty pointers are not guaranteed to be the same
as port->tty is the active tty while tty->port is the port the tty may or
may not still be attached to.

So rework the entire API to pass the tty struct. For console cases we need
to pass both for now. This shows up multiple drivers that immediately crash
with USB console some of which have been fixed in the process.

Longer term we need a proper tty as console abstraction

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-22 13:03:22 -07:00
Juergen Beisert
0c36ec3147 gpio: gpio driver for max7301 SPI GPIO expander
Maxim's MAX7301 is an SPI GPIO expander with 28 GPIOs.  Note: MAX7301's
interrupt feature is not supported yet.

[akpm@linux-foundation.org: coding-style fixes]
[g.liakhovetski@pengutronix.de: Fix inaccuracies in comments, check spi_setup()
return code, mask off high byte in max7301_read()]
Signed-off-by: Juergen Beisert <j.beisert@pengutronix.de>
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-22 09:59:40 -07:00
John Reiser
6519108746 execve filename: document and export via auxiliary vector
The Linux kernel puts the filename argument of execve() into the new
address space.  Many developers are surprised to learn this.  Those who
know and could use it, object "But it's not documented."

Those who want to use it dislike the expression
  (char *)(1+ strlen(env[-1+ n_env]) + env[-1+ n_env])
because it requires locating the last original environment variable,
and assumes that the filename follows the characters.

This patch documents the insertion of the filename, and makes it easier
to find by adding a new tag AT_EXECFN in the ElfXX_auxv_t; see <elf.h>.

In many cases readlink("/proc/self/exe",) gives the same answer.  But if
all the original pages get unmapped, then the kernel erases the symlink
for /proc/self/exe.  This can happen when a program decompressor does a
good job of cleaning up after uncompressing directly to memory, so that
the address space of the target program looks the same as if compression
had never happened.  One example is http://upx.sourceforge.net .

One notable use of the underlying concept (what path containED the
executable) is glibc expanding $ORIGIN in DT_RUNPATH.  In practice for
the near term, it may be a good idea for user-mode code to use both
/proc/self/exe and AT_EXECFN as fall-back methods for each other.
/proc/self/exe can fail due to unmapping, AT_EXECFN can fail because it
won't be present on non-new systems.  The auxvec or {AT_EXECFN}.d_val
also can get overwritten, although in nearly all cases this would be the
result of a bug.

The runtime cost is one NEW_AUX_ENT using two words of stack space.  The
underlying value is maintained already as bprm->exec; setup_arg_pages()
in fs/exec.c slides it for stack_shift, etc.

Signed-off-by: John Reiser <jreiser@BitWagon.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-22 09:59:40 -07:00
Johannes Berg
a1ef5adb4c remove CONFIG_KMOD from core kernel code
Always compile request_module when the kernel allows modules.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-22 19:24:31 +10:00
Johannes Berg
df648c9fbe rework try_then_request_module to do less in non-modular kernels
This reworks try_then_request_module to only invoke the "lookup"
function "x" once when the kernel is not modular.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-22 19:24:29 +10:00
Denys Vlasenko
2f0f2a334b module: turn longs into ints for module sizes
This shrinks module.o and each *.ko file.

And finally, structure members which hold length of module
code (four such members there) and count of symbols
are converted from longs to ints.

We cannot possibly have a module where 32 bits won't
be enough to hold such counts.

For one, module loading checks module size for sanity
before loading, so such insanely big module will fail
that test first.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-22 19:24:27 +10:00
Denys Vlasenko
f7f5b67557 Shrink struct module: CONFIG_UNUSED_SYMBOLS ifdefs
module.c and module.h conatains code for finding
exported symbols which are declared with EXPORT_UNUSED_SYMBOL,
and this code is compiled in even if CONFIG_UNUSED_SYMBOLS is not set
and thus there can be no EXPORT_UNUSED_SYMBOLs in modules anyway
(because EXPORT_UNUSED_SYMBOL(x) are compiled out to nothing then).

This patch adds required #ifdefs.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-22 19:24:27 +10:00
Richard Kennedy
af5406895a module: reorder struct module to save space on 64 bit builds
reorder struct module to save space on 64 bit builds.
saves 1 cacheline_size  (128 on default x86_64 & 64 on AMD
Opteron/athlon) when CONFIG_MODULE_UNLOAD=y.

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-22 19:24:26 +10:00
Benjamin Herrenschmidt
8725f25acc Merge commit 'origin/master'
Manually fixed up:

	drivers/net/fs_enet/fs_enet-main.c
2008-07-22 17:12:37 +10:00
Ingo Molnar
76c3bb15d6 Merge branch 'linus' into x86/x2apic 2008-07-22 09:06:21 +02:00
Greg Kroah-Hartman
eadcf0d704 MTD: handle pci_name() being const
This changes the MTD core to handle pci_name() now returning a constant
string.

Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:55:03 -07:00
Andi Kleen
9800794ac1 sysdev: Add utility functions for simple int/ulong variable sysdev attributes
This adds a new sysdev_ext_attribute that stores a pointer to the variable
it manages and some utility functions/macro to easily use them.

Previously all users wrote custom macros to generate show/store
functions for each variable, with this it is possible to avoid
that in many cases.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:55:02 -07:00
Andi Kleen
4a0b2b4dbe sysdev: Pass the attribute to the low level sysdev show/store function
This allow to dynamically generate attributes and share show/store
functions between attributes. Right now most attributes are generated
by special macros and lots of duplicated code. With the attribute
passed it's instead possible to attach some data to the attribute
and then use that in shared low level functions to do different things.

I need this for the dynamically generated bank attributes in the x86
machine check code, but it'll allow some further cleanups.

I converted all users in tree to the new show/store prototype. It's a single
huge patch to avoid unbisectable sections.

Runtime tested: x86-32, x86-64
Compiled only: ia64, powerpc
Not compile tested/only grep converted: sh, arm, avr32

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:55:02 -07:00
Cornelia Huck
36ce6dad6e driver core: Suppress sysfs warnings for device_rename().
driver core: Suppress sysfs warnings for device_rename().

Renaming network devices to an already existing name is not
something we want sysfs to print a scary warning for, since the
callers can deal with this correctly. So let's introduce
sysfs_create_link_nowarn() which gets rid of the common warning.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:55:01 -07:00
Haavard Skinnemoen
9505e63756 debugfs: Implement debugfs_remove_recursive()
debugfs_remove_recursive() will remove a dentry and all its children.
Drivers can use this to zap their whole debugfs tree so that they don't
need to keep track of every single debugfs dentry they created.

It may fail to remove the whole tree in certain cases:

sh-3.2# rmmod atmel-mci < /sys/kernel/debug/mmc0/ios/clock
mmc0: card b368 removed
atmel_mci atmel_mci.0: Lost dma0chan1, falling back to PIO
sh-3.2# ls /sys/kernel/debug/mmc0/
ios

But I'm not sure if that case can be handled in any sane manner.

Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Cc: Pierre Ossman <drzeus-list@drzeus.cx>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:59 -07:00
Richard Kennedy
a231934bdf kobject: reorder kobject to save space on 64 bit builds
reorder kobject to save space on 64 bit builds.
shrinks from 72 to 64 bytes & moves allocated kobject to a smaller
slab.

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:56 -07:00
Uwe Kleine-König
6d8333c24d UIO: minor style and comment fixes
Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Signed-off-by: Hans J. Koch <hjk@linutronix.de>
2008-07-21 21:54:55 -07:00
Hans J. Koch
328a14e70e UIO: Add write function to allow irq masking
Sometimes it is necessary to enable/disable the interrupt of a UIO device
from the userspace part of the driver. With this patch, the UIO kernel driver
can implement an "irqcontrol()" function that does this. Userspace can write
an s32 value to /dev/uioX (usually 0 or 1 to turn the irq off or on). The
UIO core will then call the driver's irqcontrol function.

Signed-off-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Acked-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:55 -07:00
Greg Kroah-Hartman
b98cb4b7fe driver core: remove DEVICE_ID_SIZE define
There is no such thing as a "device id size" in the driver core, so
remove the define and fix up any users of this odd define in the rest of
the kernel.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:53 -07:00
Kay Sievers
ca52a49846 driver core: remove DEVICE_NAME_SIZE define
There is no such thing as a "device name size" in the driver core, so
remove the define and fix up any users of this odd define in the rest of
the kernel.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:53 -07:00
Kay Sievers
aab0de2451 driver core: remove KOBJ_NAME_LEN define
Kobjects do not have a limit in name size since a while, so stop
pretending that they do.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:52 -07:00
Matthew Wilcox
d2a3b9146e class: add lockdep infrastructure
This adds the infrastructure to properly handle lockdep issues when the
internal class semaphore is changed to a mutex.

Matthew wrote the original patch, and Greg fixed it up to work properly
with the class_create() function.


From: Matthew Wilcox <matthew@wil.cx>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Dave Young <hidave.darkstar@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:52 -07:00
Greg Kroah-Hartman
7c71448b8a class: move driver core specific parts to a private structure
This moves the portions of struct class that are dynamic (kobject and
lock and lists) out of the main structure and into a dynamic, private,
structure.


Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:51 -07:00
Greg Kroah-Hartman
695794ae0c Driver Core: add ability for class_find_device to start in middle of list
This mirrors the functionality that driver_find_device has as well.

We add a start variable, and all callers of the function are fixed up at
the same time.

The block layer will be using this new functionality in a follow-on
patch.


Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:47 -07:00
Greg Kroah-Hartman
93562b5376 Driver Core: add ability for class_for_each_device to start in middle of list
This mirrors the functionality that driver_for_each_device has as well.

We add a start variable, and all callers of the function are fixed up at
the same time.

The block layer will be using this new functionality in a follow-on
patch.


Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:47 -07:00
Greg Kroah-Hartman
4e10673944 device create: convert device_create_drvdata to device_create
Now that device_create() has been audited, rename things back to the
original call to be sane.

Keep the device_create_drvdata macro around to make merges easier.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:47 -07:00
Greg Kroah-Hartman
ccea44fadc driver core: remove device_create()
There are no more users of this, and it is racy.  Use
device_create_drvdata() or device_create_vargs() instead.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:47 -07:00
Dan Williams
e105b8bfc7 sysfs: add /sys/dev/{char,block} to lookup sysfs path by major:minor
Why?:
There are occasions where userspace would like to access sysfs
attributes for a device but it may not know how sysfs has named the
device or the path.  For example what is the sysfs path for
/dev/disk/by-id/ata-ST3160827AS_5MT004CK?  With this change a call to
stat(2) returns the major:minor then userspace can see that
/sys/dev/block/8:32 links to /sys/block/sdc.

What are the alternatives?:
1/ Add an ioctl to return the path: Doable, but sysfs is meant to reduce
   the need to proliferate ioctl interfaces into the kernel, so this
   seems counter productive.

2/ Use udev to create these symlinks: Also doable, but it adds a
   udev dependency to utilities that might be running in a limited
   environment like an initramfs.

3/ Do a full-tree search of sysfs.

[kay.sievers@vrfy.org: fix duplicate registrations]
[kay.sievers@vrfy.org: cleanup suggestions]

Cc: Neil Brown <neilb@suse.de>
Cc: Tejun Heo <htejun@gmail.com>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Reviewed-by: SL Baur <steve@xemacs.org>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: Mark Lord <lkml@rtr.ca>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:40 -07:00
Mark Nelson
1ed6af7344 powerpc/cell: Add DMA_ATTR_WEAK_ORDERING dma attribute and use in Cell IOMMU code
Introduce a new dma attriblue DMA_ATTR_WEAK_ORDERING to use weak ordering
on DMA mappings in the Cell processor. Add the code to the Cell's IOMMU
implementation to use this code.

Dynamic mappings can be weakly or strongly ordered on an individual basis
but the fixed mapping has to be either completely strong or completely weak.
This is currently decided by a kernel boot option (pass iommu_fixed=weak
for a weakly ordered fixed linear mapping, strongly ordered is the default).

Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-22 10:39:36 +10:00
Wolfgang Grandegger
18ad7a61e1 of_gpio: Should use new <linux/gpio.h> header
Since commit 7560fa60fc (gpio: <linux/gpio.h>
and "no GPIO support here" stubs) drivers can use GPIOs if they're available,
but don't require them.

This patch actually enables this feature.

Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-22 10:39:30 +10:00
Linus Torvalds
93ded9b8fd Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (100 commits)
  usb-storage: revert DMA-alignment change for Wireless USB
  USB: use reset_resume when normal resume fails
  usb_gadget: composite cdc gadget fault handling
  usb gadget: minor USBCV fix for composite framework
  USB: Fix bug with byte order in isp116x-hcd.c fio write/read
  USB: fix double kfree in ipaq in error case
  USB: fix build error in cdc-acm for CONFIG_PM=n
  USB: remove board-specific UP2OCR configuration from pxa27x-udc
  USB: EHCI: Reconciling USB register differences on MPC85xx vs MPC83xx
  USB: Fix pointer/int cast in USB devio code
  usb gadget: g_cdc dependso on NET
  USB: Au1xxx-usb: suspend/resume support.
  USB: Au1xxx-usb: clean up ohci/ehci bus glue sources.
  usbfs: don't store bad pointers in registration
  usbfs: fix race between open and unregister
  usbfs: simplify the lookup-by-minor routines
  usbfs: send disconnect signals when device is unregistered
  USB: Force unbinding of drivers lacking reset_resume or other methods
  USB: ohci-pnx4008: I2C cleanups and fixes
  USB: debug port converter does not accept more than 8 byte packets
  ...
2008-07-21 15:42:53 -07:00
Alan Stern
78d9a487ee USB: Force unbinding of drivers lacking reset_resume or other methods
This patch (as1024) takes care of a FIXME issue: Drivers that don't
have the necessary suspend, resume, reset_resume, pre_reset, or
post_reset methods will be unbound and their interface reprobed when
one of the unsupported events occurs.

This is made slightly more difficult by the fact that bind operations
won't work during a system sleep transition.  So instead the code has
to defer the operation until the transition ends.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 15:16:40 -07:00
Ming Lei
742120c631 USB: fix usb_reset_device and usb_reset_composite_device(take 3)
This patch renames the existing usb_reset_device in hub.c to
usb_reset_and_verify_device and renames the existing
usb_reset_composite_device to usb_reset_device. Also the new
usb_reset_and_verify_device does't need to be EXPORTED .

The idea of the patch is that external interface driver
should warn the other interfaces' driver of the same
device before and after reseting the usb device. One interface
driver shoud call _old_ usb_reset_composite_device instead of
_old_ usb_reset_device since it can't assume the device contains
only one interface. The _old_ usb_reset_composite_device
is safe for single interface device also. we rename the two
functions to make the change easily.

This patch is under guideline from Alan Stern.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
2008-07-21 15:16:33 -07:00
Ming Lei
625f694936 USB: remove interface parameter of usb_reset_composite_device
From the current implementation of usb_reset_composite_device
function, the iface parameter is no longer useful. This function
doesn't do something special for the iface usb_interface,compared
with other interfaces in the usb_device. So remove the parameter
and fix the related caller.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 15:16:32 -07:00
Alan Stern
f579c2b46f USB Gadget: documentation update
This patch (as1102) clarifies two points in the USB Gadget kerneldoc:

	Request completion callbacks are always made with interrupts
	disabled;

	Device controllers may not support STALLing the status stage
	of a control transfer after the data stage is over.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 15:16:27 -07:00
Felipe Balbi
e0d795e4f3 usb: irda: cleanup on ir-usb module
General cleanup on ir-usb module. Introduced
a common header that could be used also on
usb gadget framework.

Lot's of cleanups and now using macros from the header
file.

Signed-off-by: Felipe Balbi <me@felipebalbi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 15:16:27 -07:00
David Brownell
40982be52d usb gadget: composite gadget core
Add <linux/usb/composite.h> interfaces for composite gadget drivers, and
basic implementation support behind it:

  - struct usb_function ... groups one or more interfaces into a function
    managed as one unit within a configuration, to which it's added by
    usb_add_function().

  - struct usb_configuration ... groups one or more such functions into
    a configuration managed as one unit by a driver, to which it's added
    by usb_add_config().  These operate at either high or full/low speeds
    and at a given bMaxPower.

  - struct usb_composite_driver ... groups one or more such configurations
    into a gadget driver, which may be registered or unregistered.

  - struct usb_composite_dev ... a usb_composite_driver manages this; it
    wraps the usb_gadget exposed by the controller driver.

This also includes some basic kerneldoc.

How to use it (the short version):  provide a usb_composite_driver with a
bind() that calls usb_add_config() for each of the needed configurations.
The configurations in turn have bind() calls, which will usb_add_function()
for each function required.  Each function's bind() allocates resources
needed to perform its tasks, like endpoints; sometimes configurations will
allocate resources too.

Separate patches will convert most gadget drivers to this infrastructure.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 15:16:01 -07:00
David Brownell
a4c39c41bf usb gadget: descriptor copying support
Define three new descriptor manipulation utilities, for use when
setting up functions that may have multiple instances:

	usb_copy_descriptors() to copy a vector of descriptors
	usb_free_descriptors() to free the copy
	usb_find_endpoint() to find a copied version

These will be used as follows.  Functions will continue to have static
tables of descriptors they update, now used as __initdata templates.

When a function creates a new instance, it patches those tables with
relevant interface and string IDs, plus endpoint assignments.  Then it
copies those morphed descriptors, associates the copies with the new
function instance, and records the endpoint descriptors to use when
activating the endpoints.  When initialization is done, only the copies
remain in memory.  The copies are freed on driver removal.

This ensures that each instance has descriptors which hold the right
instance-specific data.  Two instances in the same configuration will
obviously never share the same interface IDs or use the same endpoints.
Instances in different configurations won't do so either, which means
this is slightly less memory-efficient in some cases.

This also includes a bugfix to the epautoconf code that shows up with
this usage model.  It must replace the previous endpoint number when
updating the template descriptors, not just mask in a few more bits.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 15:16:00 -07:00
Adrian Bunk
ea05af61a8 USB: remove CVS keywords
This patch removes CVS keywords that weren't updated for a long time
from comments.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 15:15:55 -07:00
Alan Stern
9da82bd464 USB: implement "soft" unbinding
This patch (as1091) changes the way usbcore handles interface
unbinding.  If the interface's driver supports "soft" unbinding (a new
flag in the driver structure) then in-flight URBs are not cancelled
and endpoints are not disabled.  Instead the driver is allowed to
continue communicating with the device (although of course it should
stop before its disconnect routine returns).

The purpose of this change is to allow drivers to do a clean shutdown
when they get unbound from a device that is still plugged in.  Killing
all the URBs and disabling the endpoints before calling the driver's
disconnect method doesn't give the driver any control over what
happens, and it can leave devices in indeterminate states.  For
example, when usb-storage unbinds it doesn't want to stop while in the
middle of transmitting a SCSI command.

The soft_unbind flag is added because in the past, a number of drivers
have experienced problems related to ongoing I/O after their disconnect
routine returned.  Hence "soft" unbinding is made available only to
drivers that claim to support it.

The patch also replaces "interface_to_usbdev(intf)" with "udev" in a
couple of places, a minor simplification.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 15:15:54 -07:00
Greg Kroah-Hartman
1b26da1510 USB: handle pci_name() being const
This changes usb_create_hcd() to be able to handle the fact that
pci_name() has changed to a constant string.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 15:15:46 -07:00
Linus Torvalds
6d52dcbe56 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] cpufreq: remove CVS keywords
  [CPUFREQ] change cpu freq arrays to per_cpu variables
2008-07-21 15:10:37 -07:00
David S. Miller
ebb36a9781 ipv6: __KERNEL__ ifdef struct ipv6_devconf
Based upon a report by Olaf Hering.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-21 13:41:16 -07:00
Arjan van de Ven
6579e57b31 net: Print the module name as part of the watchdog message
As suggested by Dave:

This patch adds a function to get the driver name from a struct net_device,
and consequently uses this in the watchdog timeout handler to print as 
part of the message. 

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-21 13:31:48 -07:00
Linus Torvalds
e89970aa93 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  netfilter: nf_conntrack_sctp: fix sparse warnings
  netfilter: nf_nat_sip: c= is optional for session
  netfilter: xt_TCPMSS: collapse tcpmss_reverse_mtu{4,6} into one function
  netfilter: nfnetlink_log: send complete hardware header
  netfilter: xt_time: fix time's time_mt()'s use of do_div()
  netfilter: accounting rework: ct_extend + 64bit counters (v4)
  netlink: add NLA_PUT_BE64 macro
  netfilter: nf_nat_core: eliminate useless find_appropriate_src for IP_NAT_RANGE_PROTO_RANDOM
  hdlcdrv: Fix CRC calculation.
  Revert "pkt_sched: Make default qdisc nonshared-multiqueue safe."
  net: In __netif_schedule() use WARN_ON instead of BUG_ON
  net: Improve simple_tx_hash().
  pkt_sched: Remove unused variable skb in dev_deactivate_queue function.
  sunhme: Remove stop/wake TX queue calls in set-multicast-list handler.
  ucc_geth: do not touch net queue in adjust_link phylib callback
  gianfar: do not touch net queue in adjust_link phylib callback
  atl1: Do not wake queue before queue has been started.
2008-07-21 11:29:52 -07:00
Linus Torvalds
72a73693aa Merge branch 'x86/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (160 commits)
  x86: remove extra calling to get ext cpuid level
  x86: use setup_clear_cpu_cap() when disabling the lapic
  KVM: fix exception entry / build bug, on 64-bit
  x86: add unknown_nmi_panic kernel parameter
  x86, VisWS: turn into generic arch, eliminate leftover files
  x86: add ->pre_time_init to x86_quirks
  x86: extend and use x86_quirks to clean up NUMAQ code
  x86: introduce x86_quirks
  x86: improve debug printout: add target bootmem range in early_res_to_bootmem()
  Subject: devmem, x86: fix rename of CONFIG_NONPROMISC_DEVMEM
  x86: remove arch_get_ram_range
  x86: Add a debugfs interface to dump PAT memtype
  x86: Add a arch directory for x86 under debugfs
  x86: i386: reduce boot fixmap space
  i386/xen: add proper unwind annotations to xen_sysenter_target
  x86: reduce force_mwait visibility
  x86: reduce forbid_dac's visibility
  x86: fix two modpost warnings
  x86: check function status in EDD boot code
  x86_64: ia32_signal.c: remove signal number conversion
  ...
2008-07-21 10:34:25 -07:00
Linus Torvalds
b7e6f62fe2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
  dm crypt: add merge
  dm table: remove merge_bvec sector restriction
  dm: linear add merge
  dm: introduce merge_bvec_fn
  dm snapshot: use per device mempools
  dm snapshot: fix race during exception creation
  dm snapshot: track snapshot reads
  dm mpath: fix test for reinstate_path
  dm mpath: return parameter error
  dm io: remove struct padding
  dm log: make dm_dirty_log init and exit static
  dm mpath: free path selector on invalid args
2008-07-21 10:30:10 -07:00
Linus Torvalds
8a392625b6 Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md: (52 commits)
  md: Protect access to mddev->disks list using RCU
  md: only count actual openers as access which prevent a 'stop'
  md: linear: Make array_size sector-based and rename it to array_sectors.
  md: Make mddev->array_size sector-based.
  md: Make super_type->rdev_size_change() take sector-based sizes.
  md: Fix check for overlapping devices.
  md: Tidy up rdev_size_store a bit:
  md: Remove some unused macros.
  md: Turn rdev->sb_offset into a sector-based quantity.
  md: Make calc_dev_sboffset() return a sector count.
  md: Replace calc_dev_size() by calc_num_sectors().
  md: Make update_size() take the number of sectors.
  md: Better control of when do_md_stop is allowed to stop the array.
  md: get_disk_info(): Don't convert between signed and unsigned and back.
  md: Simplify restart_array().
  md: alloc_disk_sb(): Return proper error value.
  md: Simplify sb_equal().
  md: Simplify uuid_equal().
  md: sb_equal(): Fix misleading printk.
  md: Fix a typo in the comment to cmd_match().
  ...
2008-07-21 10:29:12 -07:00
Eric Leblond
72961ecf84 netfilter: nfnetlink_log: send complete hardware header
This patch adds some fields to NFLOG to be able to send the complete
hardware header with all necessary informations.
It sends to userspace:
 * the type of hardware link
 * the lenght of hardware header
 * the hardware header

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-21 10:11:00 -07:00
Krzysztof Piotr Oledzki
584015727a netfilter: accounting rework: ct_extend + 64bit counters (v4)
Initially netfilter has had 64bit counters for conntrack-based accounting, but
it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit counters are
still required, for example for "connbytes" extension. However, 64bit counters
waste a lot of memory and it was not possible to enable/disable it runtime.

This patch:
 - reimplements accounting with respect to the extension infrastructure,
 - makes one global version of seq_print_acct() instead of two seq_print_counters(),
 - makes it possible to enable it at boot time (for CONFIG_SYSCTL/CONFIG_SYSFS=n),
 - makes it possible to enable/disable it at runtime by sysctl or sysfs,
 - extends counters from 32bit to 64bit,
 - renames ip_conntrack_counter -> nf_conn_counter,
 - enables accounting code unconditionally (no longer depends on CONFIG_NF_CT_ACCT),
 - set initial accounting enable state based on CONFIG_NF_CT_ACCT
 - removes buggy IPCT_COUNTER_FILLING event handling.

If accounting is enabled newly created connections get additional acct extend.
Old connections are not changed as it is not possible to add a ct_extend area
to confirmed conntrack. Accounting is performed for all connections with
acct extend regardless of a current state of "net.netfilter.nf_conntrack_acct".

Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-21 10:10:58 -07:00
Ingo Molnar
eb6a12c242 Merge branch 'linus' into cpus4096-for-linus
Conflicts:

	net/sunrpc/svc.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-21 17:19:50 +02:00
Ingo Molnar
acee709cab Merge branches 'x86/urgent', 'x86/amd-iommu', 'x86/apic', 'x86/cleanups', 'x86/core', 'x86/cpu', 'x86/fixmap', 'x86/gart', 'x86/kprobes', 'x86/memtest', 'x86/modules', 'x86/nmi', 'x86/pat', 'x86/reboot', 'x86/setup', 'x86/step', 'x86/unify-pci', 'x86/uv', 'x86/xen' and 'xen-64bit' into x86/for-linus 2008-07-21 16:37:17 +02:00
Milan Broz
f6fccb1213 dm: introduce merge_bvec_fn
Introduce a bvec merge function for device mapper devices
for dynamic size restrictions.

This code ensures the requested biovec lies within a single
target and then calls a target-specific function to check
against any constraints imposed by underlying devices.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-07-21 12:00:37 +01:00
NeilBrown
4b80991c6c md: Protect access to mddev->disks list using RCU
All modifications and most access to the mddev->disks list are made
under the reconfig_mutex lock.  However there are three places where
the list is walked without any locking.  If a reconfig happens at this
time, havoc (and oops) can ensue.

So use RCU to protect these accesses:
  - wrap them in rcu_read_{,un}lock()
  - use list_for_each_entry_rcu
  - add to the list with list_add_rcu
  - delete from the list with list_del_rcu
  - delay the 'free' with call_rcu rather than schedule_work

Note that export_rdev did a list_del_init on this list.  In almost all
cases the entry was not in the list anymore so it was a no-op and so
safe.  It is no longer safe as after list_del_rcu we may not touch
the list_head.
An audit shows that export_rdev is called:
  - after unbind_rdev_from_array, in which case the delete has
     already been done,
  - after bind_rdev_to_array fails, in which case the delete isn't needed.
  - before the device has been put on a list at all (e.g. in
      add_new_disk where reading the superblock fails).
  - and in autorun devices after a failure when the device is on a
      different list.

So remove the list_del_init call from export_rdev, and add it back
immediately before the called to export_rdev for that last case.

Note also that ->same_set is sometimes used for lists other than
mddev->list (e.g. candidates).  In these cases rcu is not needed.

Signed-off-by: NeilBrown <neilb@suse.de>
2008-07-21 17:05:25 +10:00
NeilBrown
f2ea68cf42 md: only count actual openers as access which prevent a 'stop'
Open isn't the only thing that increments ->active.  e.g. reading
/proc/mdstat will increment it briefly.  So to avoid false positives
in testing for concurrent access, introduce a new counter that counts
just the number of times the md device it open.

Signed-off-by: NeilBrown <neilb@suse.de>
2008-07-21 17:05:25 +10:00
Andre Noll
d6e2215052 md: linear: Make array_size sector-based and rename it to array_sectors.
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2008-07-21 17:05:25 +10:00
Andre Noll
f233ea5c9e md: Make mddev->array_size sector-based.
This patch renames the array_size field of struct mddev_s to array_sectors
and converts all instances to use units of 512 byte sectors instead of 1k
blocks.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2008-07-21 17:05:22 +10:00
Dmitry Torokhov
908cf4b925 Merge master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 into next 2008-07-21 00:55:14 -04:00
Linus Torvalds
14b395e35d Merge branch 'for-2.6.27' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.27' of git://linux-nfs.org/~bfields/linux: (51 commits)
  nfsd: nfs4xdr.c do-while is not a compound statement
  nfsd: Use C99 initializers in fs/nfsd/nfs4xdr.c
  lockd: Pass "struct sockaddr *" to new failover-by-IP function
  lockd: get host reference in nlmsvc_create_block() instead of callers
  lockd: minor svclock.c style fixes
  lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_lock
  lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_testlock
  lockd: nlm_release_host() checks for NULL, caller needn't
  file lock: reorder struct file_lock to save space on 64 bit builds
  nfsd: take file and mnt write in nfs4_upgrade_open
  nfsd: document open share bit tracking
  nfsd: tabulate nfs4 xdr encoding functions
  nfsd: dprint operation names
  svcrdma: Change WR context get/put to use the kmem cache
  svcrdma: Create a kmem cache for the WR contexts
  svcrdma: Add flush_scheduled_work to module exit function
  svcrdma: Limit ORD based on client's advertised IRD
  svcrdma: Remove unused wait q from svcrdma_xprt structure
  svcrdma: Remove unneeded spin locks from __svc_rdma_free
  svcrdma: Add dma map count and WARN_ON
  ...
2008-07-20 21:21:46 -07:00
Linus Torvalds
ae0645a451 Merge branch 'for-linus' of git://git.o-hand.com/linux-mfd
* 'for-linus' of git://git.o-hand.com/linux-mfd:
  mfd: let asic3 use mem resource instead of bus_shift
  mfd: remove DS1WM register definitions from asic3.h
  mfd: add ASIC3_CONFIG_GPIO templates
  mfd: fix the asic3 irq demux code
  mfd: asic3 should depend on gpiolib
  mfd: fix asic3 config array initialisation
  mfd: move asic3 probe functions into __init section
  mfd: Use uppercase only for asic3 macros and defines
  mfd: use dev_* macros for asic3 debugging
  mfd: New asic3 gpio configuration code
  mfd: asic3 children platform data removal
  mfd: asic3 gpiolib support
2008-07-20 21:16:27 -07:00
Linus Torvalds
f894d18380 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb
* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (277 commits)
  V4L/DVB (8415): gspca: Infinite loop in i2c_w() of etoms.
  V4L/DVB (8414): videodev/cx18: fix get_index bug and error-handling lock-ups
  V4L/DVB (8411): videobuf-dma-contig.c: fix 64-bit build for pre-2.6.24 kernels
  V4L/DVB (8410): sh_mobile_ceu_camera: fix 64-bit compiler warnings
  V4L/DVB (8397): video: convert select VIDEO_ZORAN_ZR36060 into depends on
  V4L/DVB (8396): video: Fix Kbuild dependency for VIDEO_IR_I2C
  V4L/DVB (8395): saa7134: Fix Kbuild dependency of ir-kbd-i2c
  V4L/DVB (8394): ir-common: CodingStyle fix: move EXPORT_SYMBOL_GPL to their proper places
  V4L/DVB (8393): media/video: Fix depencencies for VIDEOBUF
  V4L/DVB (8392): media/Kconfig: Convert V4L1_COMPAT select into "depends on"
  V4L/DVB (8390): videodev: add comment and remove magic number.
  V4L/DVB (8389): videodev: simplify get_index()
  V4L/DVB (8387): Some cosmetic changes
  V4L/DVB (8381): ov7670: fix compile warnings
  V4L/DVB (8380): saa7115: use saa7115_auto instead of saa711x as the autodetect driver name.
  V4L/DVB (8379): saa7127: Make device detection optional
  V4L/DVB (8378): cx18: move cx18_av_vbi_setup to av-core.c and rename to cx18_av_std_setup
  V4L/DVB (8377): ivtv/cx18: ensure the default control values are correct
  V4L/DVB (8376): cx25840: move cx25840_vbi_setup to core.c and rename to cx25840_std_setup
  V4L/DVB (8374): gspca: No conflict of 0c45:6011 with the sn9c102 driver.
  ...
2008-07-20 21:14:42 -07:00
Linus Torvalds
f076ab8d04 Merge branch 'kvm-updates-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (70 commits)
  KVM: Adjust smp_call_function_mask() callers to new requirements
  KVM: MMU: Fix potential race setting upper shadow ptes on nonpae hosts
  KVM: x86 emulator: emulate clflush
  KVM: MMU: improve invalid shadow root page handling
  KVM: MMU: nuke shadowed pgtable pages and ptes on memslot destruction
  KVM: Prefix some x86 low level function with kvm_, to avoid namespace issues
  KVM: check injected pic irq within valid pic irqs
  KVM: x86 emulator: Fix HLT instruction
  KVM: Apply the kernel sigmask to vcpus blocked due to being uninitialized
  KVM: VMX: Add ept_sync_context in flush_tlb
  KVM: mmu_shrink: kvm_mmu_zap_page requires slots_lock to be held
  x86: KVM guest: make kvm_smp_prepare_boot_cpu() static
  KVM: SVM: fix suspend/resume support
  KVM: s390: rename private structures
  KVM: s390: Set guest storage limit and offset to sane values
  KVM: Fix memory leak on guest exit
  KVM: s390: dont allocate dirty bitmap
  KVM: move slots_lock acquision down to vapic_exit
  KVM: VMX: Fake emulate Intel perfctr MSRs
  KVM: VMX: Fix a wrong usage of vmcs_config
  ...
2008-07-20 21:13:26 -07:00
Linus Torvalds
db6d8c7a40 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (1232 commits)
  iucv: Fix bad merging.
  net_sched: Add size table for qdiscs
  net_sched: Add accessor function for packet length for qdiscs
  net_sched: Add qdisc_enqueue wrapper
  highmem: Export totalhigh_pages.
  ipv6 mcast: Omit redundant address family checks in ip6_mc_source().
  net: Use standard structures for generic socket address structures.
  ipv6 netns: Make several "global" sysctl variables namespace aware.
  netns: Use net_eq() to compare net-namespaces for optimization.
  ipv6: remove unused macros from net/ipv6.h
  ipv6: remove unused parameter from ip6_ra_control
  tcp: fix kernel panic with listening_get_next
  tcp: Remove redundant checks when setting eff_sacks
  tcp: options clean up
  tcp: Fix MD5 signatures for non-linear skbs
  sctp: Update sctp global memory limit allocations.
  sctp: remove unnecessary byteshifting, calculate directly in big-endian
  sctp: Allow only 1 listening socket with SO_REUSEADDR
  sctp: Do not leak memory on multiple listen() calls
  sctp: Support ipv6only AF_INET6 sockets.
  ...
2008-07-20 17:43:29 -07:00
Linus Torvalds
f7df406dce Merge branch 'configfs-fixup-ptr-error' of git://oss.oracle.com/git/jlbec/linux-2.6
* 'configfs-fixup-ptr-error' of git://oss.oracle.com/git/jlbec/linux-2.6:
  configfs: Allow ->make_item() and ->make_group() to return detailed errors.
  Revert "configfs: Allow ->make_item() and ->make_group() to return detailed errors."
2008-07-20 17:17:52 -07:00
Alan Cox
44b7d1b37f tty: add more tty_port fields
Move more bits into the tty_port structure

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-20 17:12:38 -07:00
Alan Cox
77451e53e0 cyclades: use tty_port
Switch cyclades to use the new tty_port structure

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-20 17:12:38 -07:00
Alan Cox
f8ae476416 stallion: use tty_port
Switch the stallion driver to use the tty_port structure

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-20 17:12:37 -07:00
Alan Cox
b02f5ad6a3 istallion: use tty_port
Switch istallion to use the new tty_port structure

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-20 17:12:37 -07:00
Alan Cox
b5391e29f4 gs: use tty_port
Switch drivers using the old "generic serial" driver to use the tty_port
structures

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-20 17:12:36 -07:00
Alan Cox
4982d6b37a esp: use tty_port
Switch esp to use the new tty_port structures

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-20 17:12:36 -07:00
Alan Cox
7a4d29f426 tty.h: clean up
Coding style clean up and white space tidy

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-20 17:12:36 -07:00
Alan Cox
df4f4dd429 serial: use tty_port
Switch the serial_core based drivers to use the new tty_port structure.
We can't quite use all of it yet because of the dynamically allocated
extras in the serial_core layer.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-20 17:12:35 -07:00
Alan Cox
6f67048cd0 tty: Introduce a tty_port common structure
Every tty driver has its own concept of a port structure and because
they all differ we cannot extract commonality.  Begin fixing this by
creating a structure drivers can elect to use so that over time we can
push fields into this and create commonality and then introduce common
methods.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-20 17:12:35 -07:00
Alan Cox
a352def21a tty: Ldisc revamp
Move the line disciplines towards a conventional ->ops arrangement.  For
the moment the actual 'tty_ldisc' struct in the tty is kept as part of
the tty struct but this can then be changed if it turns out that when it
all settles down we want to refcount ldiscs separately to the tty.

Pull the ldisc code out of /proc and put it with our ldisc code.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-20 17:12:34 -07:00
Haavard Skinnemoen
6bb0e3a59a Subject: [PATCH 1/2] serial: Add flush_buffer() operation to uart_ops
Serial drivers using DMA (like the atmel_serial driver) tend to get very
confused when the xmit buffer is flushed and nobody told them.  They
also tend to spew a lot of garbage since the DMA engine keeps running
after the buffer is flushed and possibly refilled with unrelated data.

This patch adds a new flush_buffer operation to the uart_ops struct,
along with a call to it from uart_flush_buffer() right after the xmit
buffer has been cleared. The driver can implement this in order to
syncronize its internal DMA state with the xmit buffer when the buffer
is flushed.

Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-20 17:12:34 -07:00
Philipp Zabel
99cdb0c8c5 mfd: let asic3 use mem resource instead of bus_shift
The bus_shift parameter in platform_data is not needed
as we can tell the driver with the IOMEM_RESOURCE whether
the ASIC is located on a 16bit or 32bit memory bus.

The htc-egpio driver uses a more descriptive bus_width parameter,
but for drivers where the register map size fixed, we don't even
need this.

Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2008-07-20 19:56:44 +02:00
Philipp Zabel
279cac484e mfd: remove DS1WM register definitions from asic3.h
There is a dedicated ds1wm driver, no need to duplicate this
information here.

Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2008-07-20 19:56:24 +02:00
Philipp Zabel
4a67b528e0 mfd: add ASIC3_CONFIG_GPIO templates
As ASIC3 GPIO alternate function configuration is expected to be similar
for several devices, it is convenient to define descriptive macros. This
patch is inspired by the PXA MFP configuration, the alternate functions
were observed on hx4700 and blueangel.

Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2008-07-20 19:56:19 +02:00
Samuel Ortiz
3b8139f8b1 mfd: Use uppercase only for asic3 macros and defines
Let's be consistent and use uppercase only, for both macro and defines.

Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2008-07-20 19:55:14 +02:00
Samuel Ortiz
3b26bf1722 mfd: New asic3 gpio configuration code
The ASIC3 GPIO configuration code is a bit obscure and hardly readable.
This patch changes it so that it is now more readable and understandable,
by being more explicit.

Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2008-07-20 19:55:00 +02:00
Samuel Ortiz
1effe5bc6c mfd: asic3 children platform data removal
Platform devices should be dynamically allocated, and each supported
device should have its own platform data.
For now we just remove this buggy code.

Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2008-07-20 19:54:51 +02:00
Samuel Ortiz
6f2384c4bd mfd: asic3 gpiolib support
ASIC3 is, among other things, a GPIO extender. We should thus have it
supporting the current gpiolib API.

Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2008-07-20 19:52:38 +02:00
Jean Delvare
5a367dfb73 V4L/DVB (8246): tvaudio: Stop I2C driver ID abuse
The tvaudio driver is using "official" I2C device IDs for internal
purpose. There must be some historical reason behind this but anyway,
it shouldn't do that. As the stored values are never used, the easiest
way to fix the problem is simply to remove them altogether.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2008-07-20 07:18:38 -03:00
Jean Delvare
be99af6679 V4L/DVB (8245): ovcamchip: Delete stray I2C bus ID
I2C_HW_SMBUS_OVFX2 is referenced in ovcamchip_core.c, but no bus uses
this driver ID, so we can remove the reference. As far as I can see,
the Cypress FX2 webcam is handled by a different driver (dvb-usb).

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2008-07-20 07:18:34 -03:00
Hans de Goede
ab8f12cf8e V4L/DVB (8197): gspca: pac207 frames no more decoded in the subdriver.
videodev2: New pixfmt
pac207:   Remove the specific decoding.
main:     get_buff_size operation added for the subdriver.

Signed-off-by: Hans de Goede <j.w.r.degoede@hhs.nl>
Signed-off-by: Jean-Francois Moine <moinejf@free.fr>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2008-07-20 07:17:02 -03:00
Hans de Goede
54ab92ca05 V4L/DVB (8194): gspca: Fix the format of the low resolution mode of spca561.
The low (half) res modes of the spca561 are not spca561 compressed, but are
raw bayer, this patches fixes this and adds a PIX_FMT define for the GBRG
bayer format used by the spca561 in low res mode.

Signed-off-by: Hans de Goede <j.w.r.degoede@hhs.nl>
Signed-off-by: Jean-Francois Moine <moinejf@free.fr>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2008-07-20 07:16:47 -03:00
Jean-Francois Moine
6a7eba24e4 V4L/DVB (8157): gspca: all subdrivers
- remaning subdrivers added
- remove the decoding helper and some specific frame decodings

Signed-off-by: Jean-Francois Moine <moinejf@free.fr>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2008-07-20 07:14:49 -03:00
Tobias Lorenz
1d0ba5f378 V4L/DVB (7942): Hardware frequency seek ioctl interface
Signed-off-by: Tobias Lorenz <tobias.lorenz@gmx.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2008-07-20 07:07:12 -03:00
Marcelo Tosatti
34d4cb8fca KVM: MMU: nuke shadowed pgtable pages and ptes on memslot destruction
Flush the shadow mmu before removing regions to avoid stale entries.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:40 +03:00
Tan, Li
9ef621d3be KVM: Support mixed endian machines
Currently kvmtrace is not portable. This will prevent from copying a
trace file from big-endian target to little-endian workstation for analysis.
In the patch, kernel outputs metadata containing a magic number to trace
log, and changes 64-bit words to be u64 instead of a pair of u32s.

Signed-off-by: Tan Li <li.tan@intel.com>
Acked-by: Jerone Young <jyoung5@us.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:32 +03:00
Laurent Vivier
5f94c1741b KVM: Add coalesced MMIO support (common part)
This patch adds all needed structures to coalesce MMIOs.
Until an architecture uses it, it is not compiled.

Coalesced MMIO introduces two ioctl() to define where are the MMIO zones that
can be coalesced:

- KVM_REGISTER_COALESCED_MMIO registers a coalesced MMIO zone.
  It requests one parameter (struct kvm_coalesced_mmio_zone) which defines
  a memory area where MMIOs can be coalesced until the next switch to
  user space. The maximum number of MMIO zones is KVM_COALESCED_MMIO_ZONE_MAX.

- KVM_UNREGISTER_COALESCED_MMIO cancels all registered zones inside
  the given bounds (bounds are also given by struct kvm_coalesced_mmio_zone).

The userspace client can check kernel coalesced MMIO availability by asking
ioctl(KVM_CHECK_EXTENSION) for the KVM_CAP_COALESCED_MMIO capability.
The ioctl() call to KVM_CAP_COALESCED_MMIO will return 0 if not supported,
or the page offset where will be stored the ring buffer.
The page offset depends on the architecture.

After an ioctl(KVM_RUN), the first page of the KVM memory mapped points to
a kvm_run structure. The offset given by KVM_CAP_COALESCED_MMIO is
an offset to the coalesced MMIO ring expressed in PAGE_SIZE relatively
to the address of the start of th kvm_run structure. The MMIO ring buffer
is defined by the structure kvm_coalesced_mmio_ring.

[akio: fix oops during guest shutdown]

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:31 +03:00
Laurent Vivier
92760499d0 KVM: kvm_io_device: extend in_range() to manage len and write attribute
Modify member in_range() of structure kvm_io_device to pass length and the type
of the I/O (write or read).

This modification allows to use kvm_io_device with coalesced MMIO.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:30 +03:00
Avi Kivity
7cc8883074 KVM: Remove decache_vcpus_on_cpu() and related callbacks
Obsoleted by the vmx-specific per-cpu list.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:25 +03:00
Mike Travis
80422d3431 cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP
* Rename CPUMASK_VAR --> CPUMASK_PTR (and simplify)

  * Fix a semantic error in CPUMASK_ALLOC

  * Add a bit of commentry to cpumask.h

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-20 10:21:12 +02:00
Jussi Kivilinna
175f9c1bba net_sched: Add size table for qdiscs
Add size table functions for qdiscs and calculate packet size in
qdisc_enqueue().

Based on patch by Patrick McHardy
 http://marc.info/?l=linux-netdev&m=115201979221729&w=2

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-20 00:08:47 -07:00
YOSHIFUJI Hideaki
230b183921 net: Use standard structures for generic socket address structures.
Use sockaddr_storage{} for generic socket address storage
and ensures proper alignment.
Use sockaddr{} for pointers to omit several casts.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-19 22:35:47 -07:00
Adam Langley
4389dded77 tcp: Remove redundant checks when setting eff_sacks
Remove redundant checks when setting eff_sacks and make the number of SACKs a
compile time constant. Now that the options code knows how many SACK blocks can
fit in the header, we don't need to have the SACK code guessing at it.

Signed-off-by: Adam Langley <agl@imperialviolet.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-19 00:07:02 -07:00
David S. Miller
3072367300 pkt_sched: Manage qdisc list inside of root qdisc.
Idea is from Patrick McHardy.

Instead of managing the list of qdiscs on the device level, manage it
in the root qdisc of a netdev_queue.  This solves all kinds of
visibility issues during qdisc destruction.

The way to iterate over all qdiscs of a netdev_queue is to visit
the netdev_queue->qdisc, and then traverse it's list.

The only special case is to ignore builting qdiscs at the root when
dumping or doing a qdisc_lookup().  That was not needed previously
because builtin qdiscs were not added to the device's qdisc_list.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 22:50:15 -07:00
Matthew Garrett
92c4989092 Input: add switch for dock events
Add a SW_DOCK switch to input.h.  ACPI docks currently send their docking
status as a uevent, but not all docks are ACPI or correspond to a device.
In that case, it makes more sense to simply generate an input event on
docking or undocking.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2008-07-19 00:52:43 -04:00
Mark Brown
5ec461d083 Input: add microphone insert switch definition
Add a new switch type to the input API for reporting microphone
insertion. This will be used by the ALSA jack reporting API.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2008-07-19 00:52:36 -04:00
Patrick McHardy
8913336a7e packet: add PACKET_RESERVE sockopt
Add new sockopt to reserve some headroom in the mmaped ring frames in
front of the packet payload. This can be used f.i. when the VLAN header
needs to be (re)constructed to avoid moving the entire payload.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 18:05:19 -07:00
venkatesh.pallipadi@intel.com
ae79cdaacb x86: Add a arch directory for x86 under debugfs
Add a directory for x86 arch under debugfs. Can be used to accumulate all
x86 specific debugfs files.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-07-18 17:22:04 -07:00
Ingo Molnar
a208f37a46 Merge branch 'linus' into x86/x2apic 2008-07-18 22:50:34 +02:00
Mike Travis
77586c2bda cpumask: Provide a generic set of CPUMASK_ALLOC macros
* Provide a generic set of CPUMASK_ALLOC macros patterned after the
    SCHED_CPUMASK_ALLOC macros.  This is used where multiple cpumask_t
    variables are declared on the stack to reduce the amount of stack
    space required.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 22:03:00 +02:00
Mike Travis
65c0118453 cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr
* This patch replaces the dangerous lvalue version of cpumask_of_cpu
    with new cpumask_of_cpu_ptr macros.  These are patterned after the
    node_to_cpumask_ptr macros.

    In general terms, if there is a cpumask_of_cpu_map[] then a pointer to
    the cpumask_of_cpu_map[cpu] entry is used.  The cpumask_of_cpu_map
    is provided when there is a large NR_CPUS count, reducing
    greatly the amount of code generated and stack space used for
    cpumask_of_cpu().  The pointer to the cpumask_t value is needed for
    calling set_cpus_allowed_ptr() to reduce the amount of stack space
    needed to pass the cpumask_t value.

    If there isn't a cpumask_of_cpu_map[], then a temporary variable is
    declared and filled in with value from cpumask_of_cpu(cpu) as well as
    a pointer variable pointing to this temporary variable.  Afterwards,
    the pointer is used to reference the cpumask value.  The compiler
    will optimize out the extra dereference through the pointer as well
    as the stack space used for the pointer, resulting in identical code.

    A good example of the orthogonal usages is in net/sunrpc/svc.c:

	case SVC_POOL_PERCPU:
	{
		unsigned int cpu = m->pool_to[pidx];
		cpumask_of_cpu_ptr(cpumask, cpu);

		*oldmask = current->cpus_allowed;
		set_cpus_allowed_ptr(current, cpumask);
		return 1;
	}
	case SVC_POOL_PERNODE:
	{
		unsigned int node = m->pool_to[pidx];
		node_to_cpumask_ptr(nodecpumask, node);

		*oldmask = current->cpus_allowed;
		set_cpus_allowed_ptr(current, nodecpumask);
		return 1;
	}

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 22:02:57 +02:00
Ingo Molnar
bb2c018b09 Merge branch 'linus' into cpus4096
Conflicts:

	drivers/acpi/processor_throttling.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 22:00:54 +02:00
Ingo Molnar
9b610fda0d Merge branch 'linus' into timers/nohz 2008-07-18 19:53:16 +02:00
Thomas Gleixner
b8f8c3cf0a nohz: prevent tick stop outside of the idle loop
Jack Ren and Eric Miao tracked down the following long standing
problem in the NOHZ code:

	scheduler switch to idle task
	enable interrupts

Window starts here

	----> interrupt happens (does not set NEED_RESCHED)
	      	irq_exit() stops the tick

	----> interrupt happens (does set NEED_RESCHED)

	return from schedule()
	
	cpu_idle(): preempt_disable();

Window ends here

The interrupts can happen at any point inside the race window. The
first interrupt stops the tick, the second one causes the scheduler to
rerun and switch away from idle again and we end up with the tick
disabled.

The fact that it needs two interrupts where the first one does not set
NEED_RESCHED and the second one does made the bug obscure and extremly
hard to reproduce and analyse. Kudos to Jack and Eric.

Solution: Limit the NOHZ functionality to the idle loop to make sure
that we can not run into such a situation ever again.

cpu_idle()
{
	preempt_disable();

	while(1) {
		 tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
		 			          are in the idle loop

		 while (!need_resched())
		       halt();

		 tick_nohz_restart_sched_tick(); <- disables NOHZ mode
		 preempt_enable_no_resched();
		 schedule();
		 preempt_disable();
	}
}

In hindsight we should have done this forever, but ... 

/me grabs a large brown paperbag.

Debugged-by: Jack Ren <jack.ren@marvell.com>, 
Debugged-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-18 18:10:28 +02:00
Lai Jiangshan
5127bed588 rcu classic: new algorithm for callbacks-processing(v2)
This is v2, it's a little deference from v1 that I
had send to lkml.
use ACCESS_ONCE
use rcu_batch_after/rcu_batch_before for batch # comparison.

rcutorture test result:
(hotplugs: do cpu-online/offline once per second)

No CONFIG_NO_HZ:           OK, 12hours
No CONFIG_NO_HZ, hotplugs: OK, 12hours
CONFIG_NO_HZ=y:            OK, 24hours
CONFIG_NO_HZ=y, hotplugs:  Failed.
(Failed also without my patch applied, exactly the same bug occurred,
http://lkml.org/lkml/2008/7/3/24)

v1's email thread:
http://lkml.org/lkml/2008/6/2/539

v1's description:

The code/algorithm of the implement of current callbacks-processing
is very efficient and technical. But when I studied it and I found
a disadvantage:

In multi-CPU systems, when a new RCU callback is being
queued(call_rcu[_bh]), this callback will be invoked after the grace
period for the batch with batch number = rcp->cur+2 has completed
very very likely in current implement. Actually, this callback can be
invoked after the grace period for the batch with
batch number = rcp->cur+1 has completed. The delay of invocation means
that latency of synchronize_rcu() is extended. But more important thing
is that the callbacks usually free memory, and these works are delayed
too! it's necessary for reclaimer to free memory as soon as
possible when left memory is few.

A very simple way can solve this problem:
a field(struct rcu_head::batch) is added to record the batch number for
the RCU callback. And when a new RCU callback is being queued, we
determine the batch number for this callback(head->batch = rcp->cur+1)
and we move this callback to rdp->donelist if we find
that head->batch <= rcp->completed when we process callbacks.
This simple way reduces the wait time for invocation a lot. (about
2.5Grace Period -> 1.5Grace Period in average in multi-CPU systems)

This is my algorithm. But I do not add any field for struct rcu_head
in my implement. We just need to memorize the last 2 batches and
their batch number, because these 2 batches include all entries that
for whom the grace period hasn't completed. So we use a special
linked-list rather than add a field.
Please see the comment of struct rcu_data.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Gautham Shenoy <ego@in.ibm.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 16:07:33 +02:00
Lai Jiangshan
3cac97cbb1 rcu classic: simplify the next pending batch
use a batch number(rcp->pending) instead of a flag(rcp->next_pending)

rcu_start_batch() need to change this flag, so mb()s is needed
for memory-access safe.

but(after this patch applied) rcu_start_batch() do not change
this batch number(rcp->pending), rcp->pending is managed by
__rcu_process_callbacks only, and troublesome mb()s are eliminated.

And codes look simpler and clearer.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Gautham Shenoy <ego@in.ibm.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 16:07:32 +02:00
Ingo Molnar
1b427c153a sched: fix build error, provide partition_sched_domains() unconditionally
provide an empty partition_sched_domains() definition for the UP case:

 include/linux/cpuset.h: In function ‘rebuild_sched_domains':
 include/linux/cpuset.h:163: error: implicit declaration of function ‘partition_sched_domains'

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 14:02:46 +02:00
Max Krasnyansky
e761b77252 cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)
This is based on Linus' idea of creating cpu_active_map that prevents
scheduler load balancer from migrating tasks to the cpu that is going
down.

It allows us to simplify domain management code and avoid unecessary
domain rebuilds during cpu hotplug event handling.

Please ignore the cpusets part for now. It needs some more work in order
to avoid crazy lock nesting. Although I did simplfy and unify domain
reinitialization logic. We now simply call partition_sched_domains() in
all the cases. This means that we're using exact same code paths as in
cpusets case and hence the test below cover cpusets too.
Cpuset changes to make rebuild_sched_domains() callable from various
contexts are in the separate patch (right next after this one).

This not only boots but also easily handles
	while true; do make clean; make -j 8; done
and
	while true; do on-off-cpu 1; done
at the same time.
(on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing).

Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing
this on right now in gnome-terminal and things are moving just fine.

Also this is running with most of the debug features enabled (lockdep,
mutex, etc) no BUG_ONs or lockdep complaints so far.

I believe I addressed all of the Dmitry's comments for original Linus'
version. I changed both fair and rt balancer to mask out non-active cpus.
And replaced cpu_is_offline() with !cpu_active() in the main scheduler
code where it made sense (to me).

Signed-off-by: Max Krasnyanskiy <maxk@qualcomm.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Gregory Haskins <ghaskins@novell.com>
Cc: dmitry.adamushko@gmail.com
Cc: pj@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 13:22:25 +02:00
Pavel Emelyanov
b6fcbdb4f2 proc: consolidate per-net single-release callers
They are symmetrical to single_open ones :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:07:44 -07:00
Pavel Emelyanov
de05c557b2 proc: consolidate per-net single_open callers
There are already 7 of them - time to kill some duplicate code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:07:21 -07:00
David S. Miller
49997d7515 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	Documentation/powerpc/booting-without-of.txt
	drivers/atm/Makefile
	drivers/net/fs_enet/fs_enet-main.c
	drivers/pci/pci-acpi.c
	net/8021q/vlan.c
	net/iucv/iucv.c
2008-07-18 02:39:39 -07:00
David S. Miller
8387400092 pkt_sched: Kill netdev_queue lock.
We can simply use the qdisc->q.lock for all of the
qdisc tree synchronization.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:30 -07:00
David S. Miller
ead81cc5fc netdevice: Move qdisc_list back into net_device proper.
And give it it's own lock.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:26 -07:00
David S. Miller
37437bb2e1 pkt_sched: Schedule qdiscs instead of netdev_queue.
When we have shared qdiscs, packets come out of the qdiscs
for multiple transmit queues.

Therefore it doesn't make any sense to schedule the transmit
queue when logically we cannot know ahead of time the TX
queue of the SKB that the qdisc->dequeue() will give us.

Just for sanity I added a BUG check to make sure we never
get into a state where the noop_qdisc is scheduled.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:20 -07:00
David S. Miller
e2627c8c22 pkt_sched: Make QDISC_RUNNING a qdisc state.
Currently it is associated with a netdev_queue, but when we have
qdisc sharing that no longer makes any sense.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:18 -07:00
David S. Miller
d3b753db7c pkt_sched: Move gso_skb into Qdisc.
We liberate any dangling gso_skb during qdisc destruction.

It really only matters for the root qdisc.  But when qdiscs
can be shared by multiple netdev_queue objects, we can't
have the gso_skb in the netdev_queue any more.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:18 -07:00
David S. Miller
92831bc395 netdev: Kill plain netif_schedule()
No more users.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:16 -07:00
David S. Miller
eae792b722 netdev: Add netdev->select_queue() method.
Devices or device layers can set this to control the queue selection
performed by dev_pick_tx().

This function runs under RCU protection, which allows overriding
functions to have some way of synchronizing with things like dynamic
->real_num_tx_queues adjustments.

This makes the spinlock prefetch in dev_queue_xmit() a little bit
less effective, but that's the price right now for correctness.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:10 -07:00
David S. Miller
e3c50d5d25 netdev: netdev_priv() can now be sane again.
The private area of a netdev is now at a fixed offset once more.

Unfortunately, some assumptions that netdev_priv() == netdev->priv
crept back into the tree.  In particular this happened in the
loopback driver.  Make it use netdev->ml_priv.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:09 -07:00
David S. Miller
6b0fb1261a netdev: Kill struct net_device_subqueue and netdev->egress_subqueue*
No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:08 -07:00
David S. Miller
fd2ea0a79f net: Use queue aware tests throughout.
This effectively "flips the switch" by making the core networking
and multiqueue-aware drivers use the new TX multiqueue structures.

Non-multiqueue drivers need no changes.  The interfaces they use such
as netif_stop_queue() degenerate into an operation on TX queue zero.
So everything "just works" for them.

Code that really wants to do "X" to all TX queues now invokes a
routine that does so, such as netif_tx_wake_all_queues(),
netif_tx_stop_all_queues(), etc.

pktgen and netpoll required a little bit more surgery than the others.

In particular the pktgen changes, whilst functional, could be largely
improved.  The initial check in pktgen_xmit() will sometimes check the
wrong queue, which is mostly harmless.  The thing to do is probably to
invoke fill_packet() earlier.

The bulk of the netpoll changes is to make the code operate solely on
the TX queue indicated by by the SKB queue mapping.

Setting of the SKB queue mapping is entirely confined inside of
net/core/dev.c:dev_pick_tx().  If we end up needing any kind of
special semantics (drops, for example) it will be implemented here.

Finally, we now have a "real_num_tx_queues" which is where the driver
indicates how many TX queues are actually active.

With IGB changes from Jeff Kirsher.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:07 -07:00
David S. Miller
1d8ae3fdeb pkt_sched: Remove RR scheduler.
This actually fixes a bug added by the RR scheduler changes.  The
->bands and ->prio2band parameters were being set outside of the
sch_tree_lock() and thus could result in strange behavior and
inconsistencies.

It might be possible, in the new design (where there will be one qdisc
per device TX queue) to allow similar functionality via a TX hash
algorithm for RR but I really see no reason to export this aspect of
how these multiqueue cards actually implement the scheduling of the
the individual DMA TX rings and the single physical MAC/PHY port.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:04 -07:00
David S. Miller
09e83b5d7d netdev: Kill NETIF_F_MULTI_QUEUE.
There is no need for a feature bit for something that
can be tested by simply checking the TX queue count.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:03 -07:00
David S. Miller
e8a0464cc9 netdev: Allocate multiple queues for TX.
alloc_netdev_mq() now allocates an array of netdev_queue
structures for TX, based upon the queue_count argument.

Furthermore, all accesses to the TX queues are now vectored
through the netdev_get_tx_queue() and netdev_for_each_tx_queue()
interfaces.  This makes it easy to grep the tree for all
things that want to get to a TX queue of a net device.

Problem spots which are not really multiqueue aware yet, and
only work with one queue, can easily be spotted by grepping
for all netdev_get_tx_queue() calls that pass in a zero index.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:00 -07:00
Dan Williams
0839875e0c async_tx: make async_tx_test_ack a boolean routine
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-07-17 17:59:56 -07:00
Dan Williams
3dce017137 async_tx: remove depend_tx from async_tx_sync_epilog
All callers of async_tx_sync_epilog have called async_tx_quiesce on the
depend_tx, so async_tx_sync_epilog need only call the callback to
complete the operation.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-07-17 17:59:55 -07:00
Dan Williams
d2c52b7983 async_tx: export async_tx_quiesce
Replace open coded "wait and acknowledge" instances with async_tx_quiesce.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-07-17 17:59:55 -07:00
Joel Becker
a6795e9ebb configfs: Allow ->make_item() and ->make_group() to return detailed errors.
The configfs operations ->make_item() and ->make_group() currently
return a new item/group.  A return of NULL signifies an error.  Because
of this, -ENOMEM is the only return code bubbled up the stack.

Multiple folks have requested the ability to return specific error codes
when these operations fail.  This patch adds that ability by changing the
->make_item/group() ops to return ERR_PTR() values.  These errors are
bubbled up appropriately.  NULL returns are changed to -ENOMEM for
compatibility.

Also updated are the in-kernel users of configfs.

This is a rework of reverted commit 11c3b79218.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2008-07-17 15:21:29 -07:00
Joel Becker
f89ab8619e Revert "configfs: Allow ->make_item() and ->make_group() to return detailed errors."
This reverts commit 11c3b79218.  The code
will move to PTR_ERR().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2008-07-17 14:53:48 -07:00
Linus Torvalds
5b664cb235 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  [PATCH] ocfs2: fix oops in mmap_truncate testing
  configfs: call drop_link() to cleanup after create_link() failure
  configfs: Allow ->make_item() and ->make_group() to return detailed errors.
  configfs: Fix failing mkdir() making racing rmdir() fail
  configfs: Fix deadlock with racing rmdir() and rename()
  configfs: Make configfs_new_dirent() return error code instead of NULL
  configfs: Protect configfs_dirent s_links list mutations
  configfs: Introduce configfs_dirent_lock
  ocfs2: Don't snprintf() without a format.
  ocfs2: Fix CONFIG_OCFS2_DEBUG_FS #ifdefs
  ocfs2/net: Silence build warnings on sparc64
  ocfs2: Handle error during journal load
  ocfs2: Silence an error message in ocfs2_file_aio_read()
  ocfs2: use simple_read_from_buffer()
  ocfs2: fix printk format warnings with OCFS2_FS_STATS=n
  [PATCH 2/2] ocfs2: Instrument fs cluster locks
  [PATCH 1/2] ocfs2: Add CONFIG_OCFS2_FS_STATS config option
2008-07-17 10:55:51 -07:00
Roland McGrath
f470021adb ptrace children revamp
ptrace no longer fiddles with the children/sibling links, and the
old ptrace_children list is gone.  Now ptrace, whether of one's own
children or another's via PTRACE_ATTACH, just uses the new ptraced
list instead.

There should be no user-visible difference that matters.  The only
change is the order in which do_wait() sees multiple stopped
children and stopped ptrace attachees.  Since wait_task_stopped()
was changed earlier so it no longer reorders the children list, we
already know this won't cause any new problems.

Signed-off-by: Roland McGrath <roland@redhat.com>
2008-07-16 18:02:33 -07:00
Linus Torvalds
dc7c65db28 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (72 commits)
  Revert "x86/PCI: ACPI based PCI gap calculation"
  PCI: remove unnecessary volatile in PCIe hotplug struct controller
  x86/PCI: ACPI based PCI gap calculation
  PCI: include linux/pm_wakeup.h for device_set_wakeup_capable
  PCI PM: Fix pci_prepare_to_sleep
  x86/PCI: Fix PCI config space for domains > 0
  Fix acpi_pm_device_sleep_wake() by providing a stub for CONFIG_PM_SLEEP=n
  PCI: Simplify PCI device PM code
  PCI PM: Introduce pci_prepare_to_sleep and pci_back_from_sleep
  PCI ACPI: Rework PCI handling of wake-up
  ACPI: Introduce new device wakeup flag 'prepared'
  ACPI: Introduce acpi_device_sleep_wake function
  PCI: rework pci_set_power_state function to call platform first
  PCI: Introduce platform_pci_power_manageable function
  ACPI: Introduce acpi_bus_power_manageable function
  PCI: make pci_name use dev_name
  PCI: handle pci_name() being const
  PCI: add stub for pci_set_consistent_dma_mask()
  PCI: remove unused arch pcibios_update_resource() functions
  PCI: fix pci_setup_device()'s sprinting into a const buffer
  ...

Fixed up conflicts in various files (arch/x86/kernel/setup_64.c,
arch/x86/pci/irq.c, arch/x86/pci/pci.h, drivers/acpi/sleep/main.c,
drivers/pci/pci.c, drivers/pci/pci.h, include/acpi/acpi_bus.h) from x86
and ACPI updates manually.
2008-07-16 17:25:46 -07:00
Kumar Gala
b219108cba fs_enet: Remove !CONFIG_PPC_CPM_NEW_BINDING code
Now that arch/ppc is gone we always define CONFIG_PPC_CPM_NEW_BINDING so
we can remove all the code associated with !CONFIG_PPC_CPM_NEW_BINDING.

Also fixed some asm/of_platform.h to linux/of_platform.h (and of_device.h)

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-07-16 17:57:49 -05:00
Scott Wood
d87eb12785 gianfar: Add magic packet and suspend/resume support.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-07-16 17:57:47 -05:00
Scott Wood
d49747bdfb powerpc/mpc83xx: Power Management support
Basic PM support for 83xx.  Standby is implemented as sleep.
Suspend-to-RAM is implemented as "deep sleep" (with the processor
turned off) on 831x.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-07-16 17:57:30 -05:00
Linus Torvalds
8a0ca91e1d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc: (68 commits)
  sdio_uart: Fix SDIO break control to now return success or an error
  mmc: host driver for Ricoh Bay1Controllers
  sdio: sdio_io.c Fix sparse warnings
  sdio: fix the use of hard coded timeout value.
  mmc: OLPC: update vdd/powerup quirk comment
  mmc: fix spares errors of sdhci.c
  mmc: remove multiwrite capability
  wbsd: fix bad dma_addr_t conversion
  atmel-mci: Driver for Atmel on-chip MMC controllers
  mmc: fix sdio_io sparse errors
  mmc: wbsd.c fix shadowing of 'dma' variable
  MMC: S3C24XX: Refuse incorrectly aligned transfers
  MMC: S3C24XX: Add maintainer entry
  MMC: S3C24XX: Update error debugging.
  MMC: S3C24XX: Add media presence test to request handling.
  MMC: S3C24XX: Fix use of msecs where jiffies are needed
  MMC: S3C24XX: Add MODULE_ALIAS() entries for the platform devices
  MMC: S3C24XX: Fix s3c2410_dma_request() return code check.
  MMC: S3C24XX: Allow card-detect on non-IRQ capable pin
  MMC: S3C24XX: Ensure host->mrq->data is valid
  ...

Manually fixed up bogus executable bits on drivers/mmc/core/sdio_io.c
and include/linux/mmc/sdio_func.h when merging.
2008-07-16 15:17:52 -07:00
Linus Torvalds
9c1be0c471 Merge branch 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6
* 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6:
  UBIFS: include to compilation
  UBIFS: add new flash file system
  UBIFS: add brief documentation
  MAINTAINERS: add UBIFS section
  do_mounts: allow UBI root device name
  VFS: export sync_sb_inodes
  VFS: move inode_lock into sync_sb_inodes
2008-07-16 15:02:57 -07:00
Linus Torvalds
42fdd144a4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (76 commits)
  IDE: Report errors during drive reset back to user space
  Update documentation of HDIO_DRIVE_RESET ioctl
  IDE: Remove unused code
  IDE: Fix HDIO_DRIVE_RESET handling
  hd.c: remove the #include <linux/mc146818rtc.h>
  update the BLK_DEV_HD help text
  move ide/legacy/hd.c to drivers/block/
  ide/legacy/hd.c: use late_initcall()
  remove BLK_DEV_HD_ONLY
  ide: endian annotations in ide-floppy.c
  ide-floppy: zero out the whole struct ide_atapi_pc on init
  ide-floppy: fold idefloppy_create_test_unit_ready_cmd into idefloppy_open
  ide-cd: move request prep chunk from cdrom_do_newpc_cont to rq issue path
  ide-cd: move request prep from cdrom_start_rw_cont to rq issue path
  ide-cd: move request prep from cdrom_start_seek_continuation to rq issue path
  ide-cd: fold cdrom_start_seek into ide_cd_do_request
  ide-cd: simplify request issuing path
  ide-cd: mv ide_do_rw_cdrom ide_cd_do_request
  ide-cd: cdrom_start_seek: remove unused argument block
  ide-cd: ide_do_rw_cdrom: add the catch-all bad request case to the if-else block
  ...
2008-07-16 14:53:54 -07:00
Linus Torvalds
4314652bb4 Merge branch 'release-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-acpi-merge-2.6
* 'release-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-acpi-merge-2.6: (87 commits)
  Fix FADT parsing
  Add the ability to reset the machine using the RESET_REG in ACPI's FADT table.
  ACPI: use dev_printk when possible
  PNPACPI: add support for HP vendor-specific CCSR descriptors
  PNP: avoid legacy IDE IRQs
  PNP: convert resource options to single linked list
  ISAPNP: handle independent options following dependent ones
  PNP: remove extra 0x100 bit from option priority
  PNP: support optional IRQ resources
  PNP: rename pnp_register_*_resource() local variables
  PNPACPI: ignore _PRS interrupt numbers larger than PNP_IRQ_NR
  PNP: centralize resource option allocations
  PNP: remove redundant pnp_can_configure() check
  PNP: make resource assignment functions return 0 (success) or -EBUSY (failure)
  PNP: in debug resource dump, make empty list obvious
  PNP: improve resource assignment debug
  PNP: increase I/O port & memory option address sizes
  PNP: introduce pnp_irq_mask_t typedef
  PNP: make resource option structures private to PNP subsystem
  PNP: define PNP-specific IORESOURCE_IO_* flags alongside IRQ, DMA, MEM
  ...
2008-07-16 14:52:12 -07:00
Martin K. Petersen
d442cc44c0 block: Trivial fix for blk_integrity_rq()
Fail integrity check gracefully when request does not have a bio
attached (BLOCK_PC).

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-16 14:51:41 -07:00
Linus Torvalds
8df1b049bc Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (82 commits)
  NFSv4: Remove BKL from the nfsv4 state recovery
  SUNRPC: Remove the BKL from the callback functions
  NFS: Remove BKL from the readdir code
  NFS: Remove BKL from the symlink code
  NFS: Remove BKL from the sillydelete operations
  NFS: Remove the BKL from the rename, rmdir and unlink operations
  NFS: Remove BKL from NFS lookup code
  NFS: Remove the BKL from nfs_link()
  NFS: Remove the BKL from the inode creation operations
  NFS: Remove BKL usage from open()
  NFS: Remove BKL usage from the write path
  NFS: Remove the BKL from the permission checking code
  NFS: Remove attribute update related BKL references
  NFS: Remove BKL requirement from attribute updates
  NFS: Protect inode->i_nlink updates using inode->i_lock
  nfs: set correct fl_len in nlmclnt_test()
  SUNRPC: Support registering IPv6 interfaces with local rpcbind daemon
  SUNRPC: Refactor rpcb_register to make rpcbindv4 support easier
  SUNRPC: None of rpcb_create's callers wants a privileged source port
  SUNRPC: Introduce a specific rpcb_create for contacting localhost
  ...
2008-07-16 14:49:49 -07:00
Bjorn Helgaas
1f32ca31e7 PNP: convert resource options to single linked list
ISAPNP, PNPBIOS, and ACPI describe the "possible resource settings" of
a device, i.e., the possibilities an OS bus driver has when it assigns
I/O port, MMIO, and other resources to the device.

PNP used to maintain this "possible resource setting" information in
one independent option structure and a list of dependent option
structures for each device.  Each of these option structures had lists
of I/O, memory, IRQ, and DMA resources, for example:

  dev
    independent options
      ind-io0  -> ind-io1  ...
      ind-mem0 -> ind-mem1 ...
      ...
    dependent option set 0
      dep0-io0  -> dep0-io1  ...
      dep0-mem0 -> dep0-mem1 ...
      ...
    dependent option set 1
      dep1-io0  -> dep1-io1  ...
      dep1-mem0 -> dep1-mem1 ...
      ...
    ...

This data structure was designed for ISAPNP, where the OS configures
device resource settings by writing directly to configuration
registers.  The OS can write the registers in arbitrary order much
like it writes PCI BARs.

However, for PNPBIOS and ACPI devices, the OS uses firmware interfaces
that perform device configuration, and it is important to pass the
desired settings to those interfaces in the correct order.  The OS
learns the correct order by using firmware interfaces that return the
"current resource settings" and "possible resource settings," but the
option structures above doesn't store the ordering information.

This patch replaces the independent and dependent lists with a single
list of options.  For example, a device might have possible resource
settings like this:

  dev
    options
      ind-io0 -> dep0-io0 -> dep1->io0 -> ind-io1 ...

All the possible settings are in the same list, in the order they
come from the firmware "possible resource settings" list.  Each entry
is tagged with an independent/dependent flag.  Dependent entries also
have a "set number" and an optional priority value.  All dependent
entries must be assigned from the same set.  For example, the OS can
use all the entries from dependent set 0, or all the entries from
dependent set 1, but it cannot mix entries from set 0 with entries
from set 1.

Prior to this patch PNP didn't keep track of the order of this list,
and it assigned all independent options first, then all dependent
ones.  Using the example above, that resulted in a "desired
configuration" list like this:

  ind->io0 -> ind->io1 -> depN-io0 ...

instead of the list the firmware expects, which looks like this:

  ind->io0 -> depN-io0 -> ind-io1 ...

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-07-16 23:27:07 +02:00
Bjorn Helgaas
d5ebde6ef5 PNP: support optional IRQ resources
This patch adds an IORESOURCE_IRQ_OPTIONAL flag for use when
assigning resources to a device.  If the flag is set and we are
unable to assign an IRQ to the device, we can leave the IRQ
disabled but allow the overall resource allocation to succeed.

Some devices request an IRQ, but can run without an IRQ
(possibly with degraded performance).  This flag lets us run
the device without the IRQ instead of just leaving the
device disabled.

This is a reimplementation of this previous change by Rene
Herman <rene.herman@gmail.com>:
    http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=3b73a223661ed137c5d3d2635f954382e94f5a43

I reimplemented this for two reasons:
    - to prepare for converting all resource options into a single linked
      list, as opposed to the per-resource-type lists we have now, and
    - to preserve the order and number of resource options.

In PNPBIOS and ACPI, we configure a device by giving firmware a
list of resource assignments.  It is important that this list
has exactly the same number of resources, in the same order,
as the "template" list we got from the firmware in the first
place.

The problem of a sound card MPU401 being left disabled for want of
an IRQ was reported by Uwe Bugla <uwe.bugla@gmx.de>.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-07-16 23:27:07 +02:00
Bjorn Helgaas
a1802c4295 PNP: make resource option structures private to PNP subsystem
Nothing outside the PNP subsystem should need access to a
device's resource options, so this patch moves the option
structure declarations to a private header file.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-07-16 23:27:06 +02:00
Bjorn Helgaas
08c9f262f2 PNP: define PNP-specific IORESOURCE_IO_* flags alongside IRQ, DMA, MEM
PNP previously defined PNP_PORT_FLAG_16BITADDR and PNP_PORT_FLAG_FIXED
in a private header file, but put those flags in struct resource.flags
fields.  Better to make them IORESOURCE_IO_* flags like the existing
IRQ, DMA, and MEM flags.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-07-16 23:27:06 +02:00
Bjorn Helgaas
57fd51a8be PNP: add pnp_possible_config() -- can a device could be configured this way?
As part of a heuristic to identify modem devices, 8250_pnp.c
checks to see whether a device can be configured at any of the
legacy COM port addresses.

This patch moves the code that traverses the PNP "possible resource
options" from 8250_pnp.c to the PNP subsystem.  This encapsulation
is important because a future patch will change the implementation
of those resource options.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-07-16 23:27:06 +02:00
Bjorn Helgaas
aee3ad815d PNP: replace pnp_resource_table with dynamically allocated resources
PNP used to have a fixed-size pnp_resource_table for tracking the
resources used by a device.  This table often overflowed, so we've
had to increase the table size, which wastes memory because most
devices have very few resources.

This patch replaces the table with a linked list of resources where
the entries are allocated on demand.

This removes messages like these:

    pnpacpi: exceeded the max number of IO resources
    00:01: too many I/O port resources

References:

    http://bugzilla.kernel.org/show_bug.cgi?id=9535
    http://bugzilla.kernel.org/show_bug.cgi?id=9740
    http://lkml.org/lkml/2007/11/30/110

This patch also changes the way PNP uses the IORESOURCE_UNSET,
IORESOURCE_AUTO, and IORESOURCE_DISABLED flags.

Prior to this patch, the pnp_resource_table entries used the flags
like this:

    IORESOURCE_UNSET
	This table entry is unused and available for use.  When this flag
	is set, we shouldn't look at anything else in the resource structure.
	This flag is set when a resource table entry is initialized.

    IORESOURCE_AUTO
	This resource was assigned automatically by pnp_assign_{io,mem,etc}().

	This flag is set when a resource table entry is initialized and
	cleared whenever we discover a resource setting by reading an ISAPNP
	config register, parsing a PNPBIOS resource data stream, parsing an
	ACPI _CRS list, or interpreting a sysfs "set" command.

	Resources marked IORESOURCE_AUTO are reinitialized and marked as
	IORESOURCE_UNSET by pnp_clean_resource_table() in these cases:

	    - before we attempt to assign resources automatically,
	    - if we fail to assign resources automatically,
	    - after disabling a device

    IORESOURCE_DISABLED
	Set by pnp_assign_{io,mem,etc}() when automatic assignment fails.
	Also set by PNPBIOS and PNPACPI for:

	    - invalid IRQs or GSI registration failures
	    - invalid DMA channels
	    - I/O ports above 0x10000
	    - mem ranges with negative length

After this patch, there is no pnp_resource_table, and the resource list
entries use the flags like this:

    IORESOURCE_UNSET
	This flag is no longer used in PNP.  Instead of keeping
	IORESOURCE_UNSET entries in the resource list, we remove
	entries from the list and free them.

    IORESOURCE_AUTO
	No change in meaning: it still means the resource was assigned
	automatically by pnp_assign_{port,mem,etc}(), but these functions
	now set the bit explicitly.

	We still "clean" a device's resource list in the same places,
	but rather than reinitializing IORESOURCE_AUTO entries, we
	just remove them from the list.

	Note that IORESOURCE_AUTO entries are always at the end of the
	list, so removing them doesn't reorder other list entries.
	This is because non-IORESOURCE_AUTO entries are added by the
	ISAPNP, PNPBIOS, or PNPACPI "get resources" methods and by the
	sysfs "set" command.  In each of these cases, we completely free
	the resource list first.

    IORESOURCE_DISABLED
	In addition to the cases where we used to set this flag, ISAPNP now
	adds an IORESOURCE_DISABLED resource when it reads a configuration
	register with a "disabled" value.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2008-07-16 23:27:05 +02:00
Bjorn Helgaas
20bfdbba72 PNP: make pnp_{port,mem,etc}_start(), et al work for invalid resources
Some callers use pnp_port_start() and similar functions without
making sure the resource is valid.  This patch makes us fall
back to returning the initial values if the resource is not
valid or not even present.

This mostly preserves the previous behavior, where we would just
return the initial values set by pnp_init_resource_table().  The
original 2.6.25 code didn't range-check the "bar", so it would
return garbage if the bar exceeded the table size.  This code
returns sensible values instead.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2008-07-16 23:27:05 +02:00
Rafael J. Wysocki
ebb12db51f Freezer: Introduce PF_FREEZER_NOSIG
The freezer currently attempts to distinguish kernel threads from
user space tasks by checking if their mm pointer is unset and it
does not send fake signals to kernel threads.  However, there are
kernel threads, mostly related to networking, that behave like
user space tasks and may want to be sent a fake signal to be frozen.

Introduce the new process flag PF_FREEZER_NOSIG that will be set
by default for all kernel threads and make the freezer only send
fake signals to the tasks having PF_FREEZER_NOSIG unset.  Provide
the set_freezable_with_signal() function to be called by the kernel
threads that want to be sent a fake signal for freezing.

This patch should not change the freezer's observable behavior.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-07-16 23:27:03 +02:00
Elias Oltmanns
3ef5eb424e IDE: Remove unused code
Remove some code which has been made obsolete and hasn't worked properly
before anyway.  Part of the infrastructure may be reintroduced in a
follow up patch to implement a working command aborting facility.

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
Cc: "Alan Cox" <alan@lxorguk.ukuu.org.uk>
Cc: "Randy Dunlap" <randy.dunlap@oracle.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-16 20:33:48 +02:00
Elias Oltmanns
79e36a9f54 IDE: Fix HDIO_DRIVE_RESET handling
Currently, the code path executing an HDIO_DRIVE_RESET ioctl is broken
in various ways.  Most importantly, it is treated as an out of band
request in an illegal way which may very likely lead to system lock ups.
Use the drive's request queue to avoid this problem (and fix a locking
issue for free along the way).

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
Cc: "Alan Cox" <alan@lxorguk.ukuu.org.uk>
Cc: "Randy Dunlap" <randy.dunlap@oracle.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-16 20:33:48 +02:00
Bartlomiej Zolnierkiewicz
e6d95bd149 ide: ->port_init_devs -> ->init_dev
Change ->port_init_devs method to take 'ide_drive_t *' as an argument
instead of 'ide_hwif_t *' and rename it to ->init_dev.

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-16 20:33:42 +02:00
Bartlomiej Zolnierkiewicz
c56c5648a3 ide: set hwif->dev in ide_init_port_hw() (take 2)
* Add 'parent' field to hw_regs_t for optional parent device pointer (needed
  by macio PMAC IDE controllers) and set hwif->dev in ide_init_port_hw().

* Update au1xxx-ide.c, sgiioc4.c, pmac.c and setup-pci.c accordingly.

v2:

* Update scc_pata.c.

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-16 20:33:40 +02:00
Bartlomiej Zolnierkiewicz
63b51c6d1d ide: make ide_hwifs[] static
Move ide_hwifs[] from ide.c to ide-probe.c and make it static.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-16 20:33:40 +02:00
Bartlomiej Zolnierkiewicz
9ad5409375 ide: move PIO blacklist to ide-pio-blacklist.c
Move PIO blacklist to ide-pio-blacklist.c.

While at it:

- fix comment

- fix whitespace damage

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-16 20:33:39 +02:00
Bartlomiej Zolnierkiewicz
3e153cfb5e ide: remove no longer used ide_pio_timings[]
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-16 20:33:39 +02:00
Bartlomiej Zolnierkiewicz
c9d6c1a237 ide: move ide_pio_cycle_time() to ide-timings.c
All ide_pio_cycle_time() users already select CONFIG_IDE_TIMINGS
so move the function from ide-lib.c to ide-timings.c.

While at it:

- convert ide_pio_cycle_time() to use ide_timing_find_mode()

- cleanup ide_pio_cycle_time() a bit

There should be no functional changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-16 20:33:39 +02:00
Bartlomiej Zolnierkiewicz
f06ab3402a ide: convert ide-timing.h to ide-timings.c library (take 2)
* Don't include ide-timing.h in cs5535 and sis5513 host drivers
  (they don't need it currently).

* Convert ide-timing.h to ide-timings.c library and add CONFIG_IDE_TIMINGS
  config option to be selected by host drivers using the library.

While at it:

- fix ide_timing_find_mode() placement

v2:
* Add missing EXPORT_SYMBOLs. (Stephen Rothwell <sfr@canb.auug.org.au>)

There should be no functional changes caused by this patch.

Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-16 20:33:37 +02:00
Bartlomiej Zolnierkiewicz
3be53f3f21 ide: move some bits from ide-timing.h to <linux/ide.h>
Move struct ide_timing and IDE_TIMING_* defines to <linux/ide.h>
from drivers/ide/ide-timing.h.

While at it:

- use u8/u16 instead of short for struct ide_timing fields

- use enum for IDE_TIMING_*

There should be no functional changes caused by this patch.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-16 20:33:36 +02:00
Linus Torvalds
45158894d4 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (249 commits)
  powerpc: Fix pte_update for CONFIG_PTE_64BIT and !PTE_ATOMIC_UPDATES
  powerpc: Fix a build problem on ppc32 with new DMA_ATTRs
  ibm_newemac: Add MII mode support to the EMAC RGMII bridge.
  powerpc: Don't spin on sync instruction at boot time
  powerpc: Add VSX load/store alignment exception handler
  powerpc: fix giveup_vsx to save registers correctly
  powerpc: support for latencytop
  powerpc: Remove unnecessary condition when sanity-checking WIMG bits
  powerpc: Add PPC_FEATURE_PSERIES_PERFMON_COMPAT
  powerpc: Add driver for Barrier Synchronization Register
  powerpc: mman.h export fixups
  powerpc/fsl: update crypto node definition and device tree instances
  powerpc/fsl: Refactor device bindings
  powerpc/85xx: Minor fixes for 85xxds and 8536ds board.
  powerpc: Add 82xx/83xx/86xx to 6xx Multiplatform
  powerpc/85xx: publish of device for cds platforms
  powerpc/booke: don't reinitialize time base
  powerpc/86xx: Refactor pic init
  powerpc/CPM: Add i2c pins to dts and board setup
  cpm_uart: Support uart_wait_until_sent()
  ...
2008-07-15 19:04:58 -07:00
Linus Torvalds
89a93f2f48 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (102 commits)
  [SCSI] scsi_dh: fix kconfig related build errors
  [SCSI] sym53c8xx: Fix bogus sym_que_entry re-implementation of container_of
  [SCSI] scsi_cmnd.h: remove double inclusion of linux/blkdev.h
  [SCSI] make struct scsi_{host,target}_type static
  [SCSI] fix locking in host use of blk_plug_device()
  [SCSI] zfcp: Cleanup external header file
  [SCSI] zfcp: Cleanup code in zfcp_erp.c
  [SCSI] zfcp: zfcp_fsf cleanup.
  [SCSI] zfcp: consolidate sysfs things into one file.
  [SCSI] zfcp: Cleanup of code in zfcp_aux.c
  [SCSI] zfcp: Cleanup of code in zfcp_scsi.c
  [SCSI] zfcp: Move status accessors from zfcp to SCSI include file.
  [SCSI] zfcp: Small QDIO cleanups
  [SCSI] zfcp: Adapter reopen for large number of unsolicited status
  [SCSI] zfcp: Fix error checking for ELS ADISC requests
  [SCSI] zfcp: wait until adapter is finished with ERP during auto-port
  [SCSI] ibmvfc: IBM Power Virtual Fibre Channel Adapter Client Driver
  [SCSI] sg: Add target reset support
  [SCSI] lib: Add support for the T10 (SCSI) Data Integrity Field CRC
  [SCSI] sd: Move scsi_disk() accessor function to sd.h
  ...
2008-07-15 18:58:04 -07:00
Benjamin Herrenschmidt
84c3d4aaec Merge commit 'origin/master'
Manual merge of:

	arch/powerpc/Kconfig
	arch/powerpc/kernel/stacktrace.c
	arch/powerpc/mm/slice.c
	arch/ppc/kernel/smp.c
2008-07-16 11:07:59 +10:00
Trond Myklebust
e89e896d31 Merge branch 'devel' into next
Conflicts:

	fs/nfs/file.c

Fix up the conflict with Jon Corbet's bkl-removal tree
2008-07-15 18:34:16 -04:00
Ingo Molnar
82638844d9 Merge branch 'linus' into cpus4096
Conflicts:

	arch/x86/xen/smp.c
	kernel/sched_rt.c
	net/iucv/iucv.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 00:29:07 +02:00
Chuck Lever
c2e1b09ff2 SUNRPC: Support registering IPv6 interfaces with local rpcbind daemon
Introduce a new API to register RPC services on IPv6 interfaces to allow
the NFS server and lockd to advertise on IPv6 networks.

Unlike rpcb_register(), the new rpcb_v4_register() function uses rpcbind
protocol version 4 to contact the local rpcbind daemon.  The version 4
SET/UNSET procedures allow services to register address families besides
AF_INET, register at specific network interfaces, and register transport
protocols besides UDP and TCP.  All of this functionality is exposed via
the new rpcb_v4_register() kernel API.

A user-space rpcbind daemon implementation that supports version 4 of the
rpcbind protocol is required in order to make use of this new API.

Note that rpcbind version 3 is sufficient to support the new rpcbind
facilities listed above, but most extant implementations use version 4.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:08:55 -04:00
Ingo Molnar
1e09481365 Merge branch 'linus' into core/softlockup
Conflicts:

	kernel/softlockup.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-15 23:12:58 +02:00
Linus Torvalds
59190f4213 Merge branch 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits)
  generic-ipi: more merge fallout
  generic-ipi: merge fix
  x86, visws: use mach-default/entry_arch.h
  x86, visws: fix generic-ipi build
  generic-ipi: fixlet
  generic-ipi: fix s390 build bug
  generic-ipi: fix linux-next tree build failure
  fix: "smp_call_function: get rid of the unused nonatomic/retry argument"
  fix: "smp_call_function: get rid of the unused nonatomic/retry argument"
  fix "smp_call_function: get rid of the unused nonatomic/retry argument"
  on_each_cpu(): kill unused 'retry' parameter
  smp_call_function: get rid of the unused nonatomic/retry argument
  sh: convert to generic helpers for IPI function calls
  parisc: convert to generic helpers for IPI function calls
  mips: convert to generic helpers for IPI function calls
  m32r: convert to generic helpers for IPI function calls
  arm: convert to generic helpers for IPI function calls
  alpha: convert to generic helpers for IPI function calls
  ia64: convert to generic helpers for IPI function calls
  powerpc: convert to generic helpers for IPI function calls
  ...

Fix trivial conflicts due to rcu updates in kernel/rcupdate.c manually
2008-07-15 14:12:03 -07:00
Chuck Lever
367c8c7bd9 lockd: Pass "struct sockaddr *" to new failover-by-IP function
Pass a more generic socket address type to nlmsvc_unlock_all_by_ip() to
allow for future support of IPv6.  Also provide additional sanity
checking in failover_unlock_ip() when constructing the server's IP
address.

As an added bonus, provide clean kerneldoc comments on related NLM
interfaces which were recently added.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-07-15 16:11:29 -04:00
Ingo Molnar
1a781a777b Merge branch 'generic-ipi' into generic-ipi-for-linus
Conflicts:

	arch/powerpc/Kconfig
	arch/s390/kernel/time.c
	arch/x86/kernel/apic_32.c
	arch/x86/kernel/cpu/perfctr-watchdog.c
	arch/x86/kernel/i8259_64.c
	arch/x86/kernel/ldt.c
	arch/x86/kernel/nmi_64.c
	arch/x86/kernel/smpboot.c
	arch/x86/xen/smp.c
	include/asm-x86/hw_irq_32.h
	include/asm-x86/hw_irq_64.h
	include/asm-x86/mach-default/irq_vectors.h
	include/asm-x86/mach-voyager/irq_vectors.h
	include/asm-x86/smp.h
	kernel/Makefile

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-15 21:55:59 +02:00
Ingo Molnar
6c9fcaf2ee Merge branch 'core/rcu' into core/rcu-for-linus 2008-07-15 21:10:12 +02:00
Jeff Layton
6cde4de807 lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_lock
nlmsvc_lock calls nlmsvc_lookup_host to find a nlm_host struct. The
callers of this function, however, call nlmsvc_retrieve_args or
nlm4svc_retrieve_args, which also return a nlm_host struct.

Change nlmsvc_lock to take a host arg instead of calling
nlmsvc_lookup_host itself and change the callers to pass a pointer to
the nlm_host they've already found.

Since nlmsvc_testlock() now just uses the caller's reference, we no
longer need to get or release it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-07-15 14:53:33 -04:00
Jeff Layton
8f920d5e29 lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_testlock
nlmsvc_testlock calls nlmsvc_lookup_host to find a nlm_host struct. The
callers of this functions, however, call nlmsvc_retrieve_args or
nlm4svc_retrieve_args, which also return a nlm_host struct.

Change nlmsvc_testlock to take a host arg instead of calling
nlmsvc_lookup_host itself and change the callers to pass a pointer to
the nlm_host they've already found.

We take a reference to host in the place where nlmsvc_testlock()
previous did a new lookup, so the reference counting is unchanged from
before.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-07-15 14:26:52 -04:00
Linus Torvalds
b312bf359e Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  AHCI: Remove an unnecessary flush from ahci_qc_issue
  AHCI: speed up resume
  [libata] Add support for VPD page b1
  ata: endianness annotations in pata drivers
  libata-eh: update atapi_eh_request_sense() to take @dev instead of @qc
  [libata] sata_svw: update code comments relating to data corruption
  libata/ahci: enclosure management support
  libata: improve EH internal command timeout handling
  libata: use ULONG_MAX to terminate reset timeout table
  libata: improve EH retry delay handling
  libata: consistently use msecs for time durations
2008-07-15 11:18:10 -07:00
Linus Torvalds
dc221eae08 Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6: (56 commits)
  i2c: Add detection capability to new-style drivers
  i2c: Call client_unregister for new-style devices too
  i2c: Clean up old chip drivers
  i2c-ibm_iic: Register child nodes
  i2c: New-style EEPROM driver using device IDs
  i2c: Export the i2c_bus_type symbol
  i2c-au1550: Fix PM support
  i2c-dev: Delete empty detach_client callback
  i2c: Drop stray references to lm_sensors
  i2c: Check for ACPI resource conflicts
  i2c-ocores: basic PM support
  i2c-sibyte: SWARM I2C board initialization
  i2c-i801: Fix handling of error conditions
  i2c-i801: Rename local variable temp to status
  i2c-i801: Properly report bus arbitration loss
  i2c-i801: Remove verbose debugging messages
  i2c-algo-pcf: Drop unused struct members
  i2c-algo-pcf: Multi-master lost-arbitration improvement
  i2c: Deprecate the legacy gpio drivers
  i2c-pxa: Initialize early
  ...
2008-07-15 11:16:05 -07:00
Linus Torvalds
98339cbd36 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (80 commits)
  ide-floppy: fix unfortunate function naming
  ide-tape: unify idetape_create_read/write_cmd
  ide: add ide_pc_intr() helper
  ide-{floppy,scsi}: read Status Register before stopping DMA engine
  ide-scsi: add more debugging to idescsi_pc_intr()
  ide-scsi: use pc->callback
  ide-floppy: add more debugging to idefloppy_pc_intr()
  ide-tape: always log debug info in idetape_pc_intr() if debugging is enabled
  ide-tape: add ide_tape_io_buffers() helper
  ide-tape: factor out DSC handling from idetape_pc_intr()
  ide-{floppy,tape}: move checking of ->failed_pc to ->callback
  ide: add ide_issue_pc() helper
  ide: add PC_FLAG_DRQ_INTERRUPT pc flag
  ide-scsi: move idescsi_map_sg() call out from idescsi_issue_pc()
  ide: add ide_transfer_pc() helper
  ide-scsi: set drive->scsi flag for devices handled by the driver
  ide-{cd,floppy,tape}: remove checking for drive->scsi
  ide: add PC_FLAG_ZIP_DRIVE pc flag
  ide-tape: factor out waiting for good ireason from idetape_transfer_pc()
  ide-tape: set PC_FLAG_DMA_IN_PROGRESS flag in idetape_transfer_pc()
  ...
2008-07-15 11:15:36 -07:00
Bartlomiej Zolnierkiewicz
646c0cb6c4 ide: add ide_pc_intr() helper
* ide-tape.c: add 'drive' argument to idetape_update_buffers().

* Add generic ide_pc_intr() helper to ide-atapi.c and then
  convert ide-{floppy,tape,scsi} device drivers to use it.

* ide-tape.c: remove no longer needed DBG_PC_INTR.

There should be no functional changes caused by this patch
(unless the debugging is explicitely compiled in).

Cc: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:22:03 +02:00
Bartlomiej Zolnierkiewicz
6bf1641ca1 ide: add ide_issue_pc() helper
Add generic ide_issue_pc() helper to ide-atapi.c and then
convert ide-{floppy,tape,scsi} device drivers to use it.

There should be no functional changes caused by this patch.

Cc: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:22:00 +02:00
Bartlomiej Zolnierkiewicz
28c7214bd8 ide: add PC_FLAG_DRQ_INTERRUPT pc flag
Add PC_FLAG_DRQ_INTERRUPT pc flag, set it in ide*_do_request()
and check for it (instead of checking for IDE*_FLAG_DRQ_INTERRUPT)
in ide*_issue_pc().  This is a preparation for adding generic
ide_issue_pc() helper.

There should be no functional changes caused by this patch.

Cc: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:21:59 +02:00
Bartlomiej Zolnierkiewicz
594c16d8dd ide: add ide_transfer_pc() helper
* Add ide-atapi.c file for generic ATAPI support together with
  CONFIG_IDE_ATAPI config option.

* Add generic ide_transfer_pc() helper to ide-atapi.c and then
  convert ide-{floppy,tape,scsi} device drivers to use it.

There should be no functional changes caused by this patch.

Cc: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:21:58 +02:00
Bartlomiej Zolnierkiewicz
5d41893c0f ide: add PC_FLAG_ZIP_DRIVE pc flag
Add PC_FLAG_ZIP_DRIVE pc flag, set it in idefloppy_do_request()
and check for it (instead of checking for IDEFLOPPY_FLAG_ZIP_DRIVE)
in idefloppy_transfer_pc().  This is a preparation for adding
generic ide_transfer_pc() helper.

There should be no functional changes caused by this patch.

Cc: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:21:57 +02:00
Bartlomiej Zolnierkiewicz
5e33109582 ide-{floppy,tape}: PC_FLAG_DMA_RECOMMENDED -> PC_FLAG_DMA_OK
* Use PC_FLAG_DMA_OK flag instead of PC_FLAG_DMA_RECOMMENDED one.

* Remove no longer used PC_FLAG_DMA_RECOMMENDED flag.

There should be no functional changes caused by this patch.

Cc: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:21:56 +02:00
Bartlomiej Zolnierkiewicz
1b06e92aa0 ide-{floppy,tape}: merge pc->idefloppy_callback and pc->idetape_callback
Merge pc->idefloppy_callback and pc->idetape_callback into pc->callback.

There should be no functional changes caused by this patch.

Cc: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:21:56 +02:00
Bartlomiej Zolnierkiewicz
92f5daff2b ide-tape: make pc->idetape_callback void
There should be no functional changes caused by this patch.

Cc: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:21:55 +02:00
FUJITA Tomonori
63f5abb095 ide: remove action argument in ide_do_drive_cmd
ide_do_drive_cmd is called only with ide_preempt action argument. So
we can remove the action argument in ide_do_drive_cmd and ide_action_t
typedef.

This patch also includes two minor cleanups: 1) ide_do_drive_cmd
always succeeds so we don't need the return value; 2) the callers use
blk_rq_init before ide_do_drive_cmd so there is no need to initialize
rq->errors.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:21:51 +02:00
Bartlomiej Zolnierkiewicz
ff07488346 ide: remove drive->ctl
Remove drive->ctl (it is always equal to 0x08 after init time).

While at it:

* Use ATA_DEVCTL_OBS define.

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:21:50 +02:00
Bartlomiej Zolnierkiewicz
0fd04dcc2e ide: use ->OUTBSYNC in ide_set_irq()
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:21:50 +02:00
Bartlomiej Zolnierkiewicz
f8c4bd0ab2 ide: pass 'hwif *' instead of 'drive *' to ->OUTBSYNC method
There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:21:49 +02:00
Bartlomiej Zolnierkiewicz
1357214461 ide: remove ->mmio flag from ide_hwif_t
Since scc_pata host driver no longer uses IDE PCI layer / ide_dma_setup()
and all other ->mmio users set also IDE_HFLAG_MMIO host flag we can safely
remove ->mmio flag.

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:21:49 +02:00
Bartlomiej Zolnierkiewicz
ed4af48fd6 ide: move IRQ unmasking out from ->tf_load method
Move IRQ unmasking out from ->tf_load method to its users.

There should be no functional changes caused by this patch
(SELECT_MASK() is NOP except for hpt366, icside and sgiioc4).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:21:48 +02:00
Bartlomiej Zolnierkiewicz
9a410e79b5 ide: remove IDE_TFLAG_NO_SELECT_MASK taskfile flag
Always call SELECT_MASK(..., 0) in ide_tf_load() (needs to be done
to match ide_set_irq(..., 1)) and then remove IDE_TFLAG_NO_SELECT_MASK
taskfile flag.

This change should only affect hpt366 and icside host drivers since
->maskproc(..., 0) for sgiioc4 is equivalent to ide_set_irq(..., 1).

Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:21:48 +02:00
Bartlomiej Zolnierkiewicz
931ee0dc5c ide: remove obsoleted "ide=" kernel parameters
* Remove obsoleted "ide=" kernel parameters.

* Remove no longer needed:
  - ide_setup()
  - parse_options()
  - __setup("", ...)
  - module_param(options, ...)

* Use module_{init,exit}() for MODULE=y case and remove MODULE ifdef.

* Make ide_*acpi* and ide_doubler variables static.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:21:47 +02:00
Bartlomiej Zolnierkiewicz
30e5ee4d1a ide: remove obsoleted "idebus=" kernel parameter
* Remove obsoleted "idebus=" kernel parameter.

* Remove no longer needed ide_system_bus_speed() and system_bus_clock()
  (together with idebus_parameter and system_bus_speed variables).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:21:46 +02:00
FUJITA Tomonori
681a561b7e block: unexport blk_end_sync_rq
All the users of blk_end_sync_rq has gone (they are converted to use
blk_execute_rq). This unexports blk_end_sync_rq.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:21:45 +02:00
FUJITA Tomonori
124cafc5eb ide: remove ide_init_drive_cmd
ide_init_drive_cmd just calls blk_rq_init. This converts the users of
ide_init_drive_cmd to use blk_rq_init directly and removes
ide_init_drive_cmd.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Borislav Petkov <petkovbb@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-15 21:21:44 +02:00
Linus Torvalds
61d97f4fcf Merge branch 'genirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'genirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  genirq: remove extraneous checks in manage.c
  genirq: Expose default irq affinity mask (take 3)
2008-07-15 10:39:22 -07:00
Linus Torvalds
38c46578ff Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
  [GFS2] Fix GFS2's use of do_div() in its quota calculations
  [GFS2] Remove unused declaration
  [GFS2] Remove support for unused and pointless flag
  [GFS2] Replace rgrp "recent list" with mru list
  [GFS2] Allow local DF locks when holding a cached EX glock
  [GFS2] Fix delayed demote race
  [GFS2] don't call permission()
  [GFS2] Fix module building
  [GFS2] Glock documentation
  [GFS2] Remove all_list from lock_dlm
  [GFS2] Remove obsolete conversion deadlock avoidance code
  [GFS2] Remove remote lock dropping code
  [GFS2] kernel panic mounting volume
  [GFS2] Revise readpage locking
  [GFS2] Fix ordering of args for list_add
  [GFS2] trivial sparse lock annotations
  [GFS2] No lock_nolock
  [GFS2] Fix ordering bug in lock_dlm
  [GFS2] Clean up the glock core
2008-07-15 10:38:46 -07:00
Linus Torvalds
e7849f16c1 Merge branch 'core/topology' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core/topology' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  cputopology: always define CPU topology information, clean up
  cpu topology: always define CPU topology information
2008-07-15 10:32:39 -07:00
Linus Torvalds
8d2567a620 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (61 commits)
  ext4: Documention update for new ordered mode and delayed allocation
  ext4: do not set extents feature from the kernel
  ext4: Don't allow nonextenst mount option for large filesystem
  ext4: Enable delalloc by default.
  ext4: delayed allocation i_blocks fix for stat
  ext4: fix delalloc i_disksize early update issue
  ext4: Handle page without buffers in ext4_*_writepage()
  ext4: Add ordered mode support for delalloc
  ext4: Invert lock ordering of page_lock and transaction start in delalloc
  mm: Add range_cont mode for writeback
  ext4: delayed allocation ENOSPC handling
  percpu_counter: new function percpu_counter_sum_and_set
  ext4: Add delayed allocation support in data=writeback mode
  vfs: add hooks for ext4's delayed allocation support
  jbd2: Remove data=ordered mode support using jbd buffer heads
  ext4: Use new framework for data=ordered mode in JBD2
  jbd2: Implement data=ordered mode handling via inodes
  vfs: export filemap_fdatawrite_range()
  ext4: Fix lock inversion in ext4_ext_truncate()
  ext4: Invert the locking order of page_lock and transaction start
  ...
2008-07-15 08:36:38 -07:00
Benzi Zbit
62a7573ee9 sdio: fix the use of hard coded timeout value.
This adds reading and using of enable_timeout from the CIS

Signed-off-by: Benzi Zbit <benzi.zbit@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2008-07-15 15:47:03 +02:00
Pierre Ossman
23af60398a mmc: remove multiwrite capability
Relax requirements on host controllers and only require that they do not
report a transfer count than is larger than the actual one (i.e. a lower
value is okay). This is how many other parts of the kernel behaves so
upper layers should already be prepared to handle that scenario. This
gives us a performance boost on MMC cards.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2008-07-15 14:14:49 +02:00
Tomas Winkler
6d37333163 mmc: fix sdio_io sparse errors
This patch fixes sdio_io sparse errors.
This fix changes signature of API functions,
changing
unsigned char -> u8
unsigned short -> u16
unsigned long -> u32 - this was probably a bug in 64 bit platforms

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2008-07-15 14:14:48 +02:00
Pierre Ossman
ad3868b2ec mmc,sdio: helper function for transfer padding
There are a lot of crappy controllers out there that cannot handle
all the request sizes that the MMC/SD/SDIO specifications require.
In case the card driver can pad the data to overcome the problems,
this commit adds a helper that calculates how much that padding
should be.

A corresponding helper is also added for SDIO, but it can also deal
with all the complexities of splitting up a large transfer efficiently.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2008-07-15 14:14:44 +02:00
Anton Vorontsov
08f80bb519 mmc: change .get_ro() callback semantics
Now get_ro() callback must return 0/1 values for its logical states, and
negative errno values in case of error. If particular host instance doesn't
support RO/WP switch, it should return -ENOSYS.

This patch changes some hosts in two ways:

1. Now functions should be smart to not return negative values in
   "RO asserted" case (particularly gpio_ calls could return negative
   values for the outermost GPIOs).

   Also, board code usually passes get_ro() callbacks that directly return
   gpioreg & bit result, so at91_mci, imxmmc, pxamci and mmc_spi's get_ro()
   handlers need take special care when returning platform's values to the
   mmc core.

2. In case of host instance didn't implement get_ro() callback, it should
   really return -ENOSYS and let the mmc core decide what to do about it
   (mmc core thinks the same way as the hosts, so it isn't functional
   change).

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2008-07-15 14:14:41 +02:00
Anton Vorontsov
619ef4b421 mmc_spi: add support for card-detection polling
This patch adds new platform data variable "caps", so platforms
could pass theirs capabilities into MMC core (for example, platforms
without interrupt on the CD line will most probably want to pass
MMC_CAP_NEEDS_POLL).

New platform get_cd() callback provided to optimize polling.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2008-07-15 14:14:41 +02:00
Anton Vorontsov
28f52482b4 mmc: add support for card-detection polling
Some hosts (and boards that use mmc_spi) do not use interrupts on the CD
line, so they can't trigger mmc_detect_change. We want to poll the card
and see if there was a change. 1 second poll interval seems resonable.

This patch also implements .get_cd() host operation, that could be used
by the hosts that are able to report card-detect status without need to
talk MMC.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2008-07-15 14:14:41 +02:00
Adrian Bunk
150a55683b include/linux/mmc/mmc.h: remove CVS tags
This patch removes a CVS tag that wasn't updated for a long time.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2008-07-15 14:14:41 +02:00
Pierre Ossman
4489428ab5 sdhci: support JMicron secondary interface
JMicron chips sometimes have two interfaces to work around limitations
in Microsoft's sdhci driver. This patch allows us to use either interface.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2008-07-15 14:14:40 +02:00
David S. Miller
e308a5d806 netdev: Add netdev->addr_list_lock protection.
Add netif_addr_{lock,unlock}{,_bh}() helpers.

Use them to protect operations that operate on or read
the network device unicast and multicast address lists.

Also use them in cases where the code simply wants to
block calls into the driver's ->set_rx_mode() and
->set_multicast_list() methods.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15 00:13:44 -07:00
David S. Miller
f1f28aa351 netdev: Add addr_list_lock to struct net_device.
This will be used to protect the per-device unicast and multicast
address lists, as well as the callbacks into the drivers which
configure such state such as ->set_rx_mode() and ->set_multicast_list().

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15 00:08:33 -07:00
Ron Livne
521e575b9a IB/mlx4: Add support for blocking multicast loopback packets
Add support for handling the IB_QP_CREATE_MULTICAST_BLOCK_LOOPBACK
flag by using the per-multicast group loopback blocking feature of
mlx4 hardware.

Signed-off-by: Ron Livne <ronli@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Patrick McHardy
393e52e33c packet: deliver VLAN TCI to userspace
Store the VLAN tag in the auxillary data/tpacket2_hdr so userspace can
properly deal with hardware VLAN tagging/stripping.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:50:39 -07:00
Patrick McHardy
bbd6ef87c5 packet: support extensible, 64 bit clean mmaped ring structure
The tpacket_hdr is not 64 bit clean due to use of an unsigned long
and can't be extended because the following struct sockaddr_ll needs
to be at a fixed offset.

Add support for a version 2 tpacket protocol that removes these
limitations.

Userspace can query the header size through a new getsockopt option
and change the protocol version through a setsockopt option. The
changes needed to switch to the new protocol version are:

1. replace struct tpacket_hdr by struct tpacket2_hdr
2. query header len and save
3. set protocol version to 2
 - set up ring as usual
4. for getting the sockaddr_ll, use (void *)hdr + TPACKET_ALIGN(hdrlen)
   instead of (void *)hdr + TPACKET_ALIGN(sizeof(struct tpacket_hdr))

Steps 2 and 4 can be omitted if the struct sockaddr_ll isn't needed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:50:15 -07:00
Patrick McHardy
bc1d0411b8 vlan: deliver packets received with VLAN acceleration to network taps
When VLAN header stripping is used, packets currently bypass packet
sockets (and other network taps) completely. For locally existing
VLANs, they appear directly on the VLAN device, for unknown VLANs
they are silently dropped.

Add a new function netif_nit_deliver() to deliver incoming packets
to all network interface taps and use it in __vlan_hwaccel_rx() to
make VLAN packets visible on the underlying device.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:49:30 -07:00
Patrick McHardy
6aa895b047 vlan: Don't store VLAN tag in cb
Use a real skb member to store the skb to avoid clashes with qdiscs,
which are allowed to use the cb area themselves. As currently only real
devices that consume the skb set the NETIF_F_HW_VLAN_TX flag, no explicit
invalidation is neccessary.

The new member fills a hole on 64 bit, the skb layout changes from:

        __u32                      mark;                 /*   172     4 */
        sk_buff_data_t             transport_header;     /*   176     4 */
        sk_buff_data_t             network_header;       /*   180     4 */
        sk_buff_data_t             mac_header;           /*   184     4 */
        sk_buff_data_t             tail;                 /*   188     4 */
        /* --- cacheline 3 boundary (192 bytes) --- */
        sk_buff_data_t             end;                  /*   192     4 */

        /* XXX 4 bytes hole, try to pack */

to

        __u32                      mark;                 /*   172     4 */
        __u16                      vlan_tci;             /*   176     2 */

        /* XXX 2 bytes hole, try to pack */

        sk_buff_data_t             transport_header;     /*   180     4 */
        sk_buff_data_t             network_header;       /*   184     4 */

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:49:06 -07:00
Benjamin Herrenschmidt
43d2548bb2 Merge commit '85082fd7cbe3173198aac0eb5e85ab1edcc6352c' into test-build
Manual fixup of:

	arch/powerpc/Kconfig
2008-07-15 15:44:51 +10:00
David S. Miller
925068dcdc Merge branch 'davem-next' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2008-07-14 22:30:17 -07:00
Max Krasnyansky
f271b2cc78 tun: Fix/rewrite packet filtering logic
Please see the following thread to get some context on this
	http://marc.info/?l=linux-netdev&m=121564433018903&w=2

Basically the issue is that current multi-cast filtering stuff in
the TUN/TAP driver is seriously broken.
Original patch went in without proper review and ACK. It was broken and
confusing to start with and subsequent patches broke it completely.
To give you an idea of what's broken here are some of the issues:

- Very confusing comments throughout the code that imply that the
character device is a network interface in its own right, and that packets
are passed between the two nics. Which is completely wrong.

- Wrong set of ioctls is used for setting up filters. They look like
shortcuts for manipulating state of the tun/tap network interface but
in reality manipulate the state of the TX filter.

- ioctls that were originally used for setting address of the the TX filter
got "fixed" and now set the address of the network interface itself. Which
made filter totaly useless.

- Filtering is done too late. Instead of filtering early on, to avoid
unnecessary wakeups, filtering is done in the read() call.

The list goes on and on :)

So the patch cleans all that up. It introduces simple and clean interface for
setting up TX filters (TUNSETTXFILTER + tun_filter spec) and does filtering
before enqueuing the packets.

TX filtering is useful in the scenarios where TAP is part of a bridge, in
which case it gets all broadcast, multicast and potentially other packets when
the bridge is learning. So for example Ethernet tunnelling app may want to
setup TX filters to avoid tunnelling multicast traffic. QEMU and other
hypervisors can push RX filtering that is currently done in the guest into the
host context therefore saving wakeups and unnecessary data transfer.

Signed-off-by: Max Krasnyansky <maxk@qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:18:19 -07:00
David S. Miller
fc943b12e4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2008-07-14 20:40:34 -07:00
Patrick McHardy
72d9794f44 net-sched: cls_flow: add perturbation support
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 20:36:32 -07:00
David S. Miller
0c4c8cae44 Merge branch 'master' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6 2008-07-14 20:32:07 -07:00
David S. Miller
2aec609fb4 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	net/netfilter/nf_conntrack_proto_tcp.c
2008-07-14 20:23:54 -07:00
Linus Torvalds
5a86102248 Merge branch 'for-2.6.27' of git://git.infradead.org/users/dwmw2/firmware-2.6
* 'for-2.6.27' of git://git.infradead.org/users/dwmw2/firmware-2.6: (64 commits)
  firmware: convert sb16_csp driver to use firmware loader exclusively
  dsp56k: use request_firmware
  edgeport-ti: use request_firmware()
  edgeport: use request_firmware()
  vicam: use request_firmware()
  dabusb: use request_firmware()
  cpia2: use request_firmware()
  ip2: use request_firmware()
  firmware: convert Ambassador ATM driver to request_firmware()
  whiteheat: use request_firmware()
  ti_usb_3410_5052: use request_firmware()
  emi62: use request_firmware()
  emi26: use request_firmware()
  keyspan_pda: use request_firmware()
  keyspan: use request_firmware()
  ttusb-budget: use request_firmware()
  kaweth: use request_firmware()
  smctr: use request_firmware()
  firmware: convert ymfpci driver to use firmware loader exclusively
  firmware: convert maestro3 driver to use firmware loader exclusively
  ...

Fix up trivial conflicts with BKL removal in drivers/char/dsp56k.c and
drivers/char/ip2/ip2main.c manually.
2008-07-14 16:54:07 -07:00
Linus Torvalds
85082fd7cb Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (241 commits)
  [ARM] 5171/1: ep93xx: fix compilation of modules using clocks
  [ARM] 5133/2: at91sam9g20 defconfig file
  [ARM] 5130/4: Support for the at91sam9g20
  [ARM] 5160/1: IOP3XX: gpio/gpiolib support
  [ARM] at91: Fix NAND FLASH timings for at91sam9x evaluation kits.
  [ARM] 5084/1: zylonite: Register AC97 device
  [ARM] 5085/2: PXA: Move AC97 over to the new central device declaration model
  [ARM] 5120/1: pxa: correct platform driver names for PXA25x and PXA27x UDC drivers
  [ARM] 5147/1: pxaficp_ir: drop pxa_gpio_mode calls, as pin setting
  [ARM] 5145/1: PXA2xx: provide api to control IrDA pins state
  [ARM] 5144/1: pxaficp_ir: cleanup includes
  [ARM] pxa: remove pxa_set_cken()
  [ARM] pxa: allow clk aliases
  [ARM] Feroceon: don't disable BPU on boot
  [ARM] Orion: LED support for HP mv2120
  [ARM] Orion: add RD88F5181L-FXO support
  [ARM] Orion: add RD88F5181L-GE support
  [ARM] Orion: add Netgear WNR854T support
  [ARM] s3c2410_defconfig: update for current build
  [ARM] Acer n30: Minor style and indentation fixes.
  ...
2008-07-14 16:06:58 -07:00
David Woodhouse
751851af7a Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
Conflicts:

	sound/pci/Kconfig
2008-07-14 15:51:11 -07:00
Russell King
53ffe3b440 [ARM] Merge most of the PXA work for initial merge
This includes PXA work up to the SPI changes for the initial merge,
since e172274ccc depends on the SPI
tree being merged.

Conflicts:

	arch/arm/configs/em_x270_defconfig
	arch/arm/configs/xm_x270_defconfig
2008-07-14 23:34:46 +01:00
Linus Torvalds
666484f025 Merge branch 'core/softirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core/softirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  softirq: remove irqs_disabled warning from local_bh_enable
  softirq: remove initialization of static per-cpu variable
  Remove argument from open_softirq which is always NULL
2008-07-14 15:28:42 -07:00
Linus Torvalds
4bb0057f99 Merge branch 'core/printk' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core/printk' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, generic: mark early_printk as asmlinkage
  printk: export console_drivers
  printk: remember the message level for multi-line output
  printk: refactor processing of line severity tokens
  printk: don't prefer unsuited consoles on registration
  printk: clean up recursion check related static variables
  namespacecheck: more kernel/printk.c fixes
  namespacecheck: fix kernel printk.c
2008-07-14 15:27:43 -07:00
Linus Torvalds
40e7babbb5 Merge branch 'core/locking' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core/locking' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  lockdep: fix kernel/fork.c warning
  lockdep: fix ftrace irq tracing false positive
  lockdep: remove duplicate definition of STATIC_LOCKDEP_MAP_INIT
  lockdep: add lock_class information to lock_chain and output it
  lockdep: add lock_class information to lock_chain and output it
  lockdep: output lock_class key instead of address for forward dependency output
  __mutex_lock_common: use signal_pending_state()
  mutex-debug: check mutex magic before owner

Fixed up conflict in kernel/fork.c manually
2008-07-14 14:55:13 -07:00
Linus Torvalds
948769a5ba Merge branch 'sched/new-API-sched_setscheduler' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched/new-API-sched_setscheduler' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: add new API sched_setscheduler_nocheck: add a flag to control access checks
2008-07-14 14:50:49 -07:00
Linus Torvalds
e18425a0ab Merge branch 'tracing/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (228 commits)
  ftrace: build fix for ftraced_suspend
  ftrace: separate out the function enabled variable
  ftrace: add ftrace_kill_atomic
  ftrace: use current CPU for function startup
  ftrace: start wakeup tracing after setting function tracer
  ftrace: check proper config for preempt type
  ftrace: trace schedule
  ftrace: define function trace nop
  ftrace: move sched_switch enable after markers
  ftrace: prevent ftrace modifications while being kprobe'd, v2
  fix "ftrace: store mcount address in rec->ip"
  mmiotrace broken in linux-next (8-bit writes only)
  ftrace: avoid modifying kprobe'd records
  ftrace: freeze kprobe'd records
  kprobes: enable clean usage of get_kprobe
  ftrace: store mcount address in rec->ip
  ftrace: build fix with gcc 4.3
  namespacecheck: fixes
  ftrace: fix "notrace" filtering priority
  ftrace: fix printout
  ...
2008-07-14 14:49:54 -07:00
Linus Torvalds
d1794f2c5b Merge branch 'bkl-removal' of git://git.lwn.net/linux-2.6
* 'bkl-removal' of git://git.lwn.net/linux-2.6: (146 commits)
  IB/umad: BKL is not needed for ib_umad_open()
  IB/uverbs: BKL is not needed for ib_uverbs_open()
  bf561-coreb: BKL unneeded for open()
  Call fasync() functions without the BKL
  snd/PCM: fasync BKL pushdown
  ipmi: fasync BKL pushdown
  ecryptfs: fasync BKL pushdown
  Bluetooth VHCI: fasync BKL pushdown
  tty_io: fasync BKL pushdown
  tun: fasync BKL pushdown
  i2o: fasync BKL pushdown
  mpt: fasync BKL pushdown
  Remove BKL from remote_llseek v2
  Make FAT users happier by not deadlocking
  x86-mce: BKL pushdown
  vmwatchdog: BKL pushdown
  vmcp: BKL pushdown
  via-pmu: BKL pushdown
  uml-random: BKL pushdown
  uml-mmapper: BKL pushdown
  ...
2008-07-14 14:48:31 -07:00
Stephen Rothwell
c300bd2fb5 PCI: include linux/pm_wakeup.h for device_set_wakeup_capable
drivers/pci/pci.c needs pm_wakeup.h since it uses device_set_wakup_capable().
The latter also needs to be stubbed out for !CONFIG_PM.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-07-14 14:30:21 -07:00
Jonathan Corbet
2fceef397f Merge commit 'v2.6.26' into bkl-removal 2008-07-14 15:29:34 -06:00
Joel Becker
11c3b79218 configfs: Allow ->make_item() and ->make_group() to return detailed errors.
The configfs operations ->make_item() and ->make_group() currently
return a new item/group.  A return of NULL signifies an error.  Because
of this, -ENOMEM is the only return code bubbled up the stack.

Multiple folks have requested the ability to return specific error codes
when these operations fail.  This patch adds that ability by changing the
->make_item/group() ops to return an int.

Also updated are the in-kernel users of configfs.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2008-07-14 13:57:16 -07:00
Linus Torvalds
17489c058e Merge branch 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (76 commits)
  sched_clock: and multiplier for TSC to gtod drift
  sched_clock: record TSC after gtod
  sched_clock: only update deltas with local reads.
  sched_clock: fix calculation of other CPU
  sched_clock: stop maximum check on NO HZ
  sched_clock: widen the max and min time
  sched_clock: record from last tick
  sched: fix accounting in task delay accounting & migration
  sched: add avg-overlap support to RT tasks
  sched: terminate newidle balancing once at least one task has moved over
  sched: fix warning
  sched: build fix
  sched: sched_clock_cpu() based cpu_clock(), lockdep fix
  sched: export cpu_clock
  sched: make sched_{rt,fair}.c ifdefs more readable
  sched: bias effective_load() error towards failing wake_affine().
  sched: incremental effective_load()
  sched: correct wakeup weight calculations
  sched: fix mult overflow
  sched: update shares on wakeup
  ...
2008-07-14 13:54:49 -07:00
Linus Torvalds
a3da5bf84a Merge branch 'x86/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (821 commits)
  x86: make 64bit hpet_set_mapping to use ioremap too, v2
  x86: get x86_phys_bits early
  x86: max_low_pfn_mapped fix #4
  x86: change _node_to_cpumask_ptr to return const ptr
  x86: I/O APIC: remove an IRQ2-mask hack
  x86: fix numaq_tsc_disable calling
  x86, e820: remove end_user_pfn
  x86: max_low_pfn_mapped fix, #3
  x86: max_low_pfn_mapped fix, #2
  x86: max_low_pfn_mapped fix, #1
  x86_64: fix delayed signals
  x86: remove conflicting nx6325 and nx6125 quirks
  x86: Recover timer_ack lost in the merge of the NMI watchdog
  x86: I/O APIC: Never configure IRQ2
  x86: L-APIC: Always fully configure IRQ0
  x86: L-APIC: Set IRQ0 as edge-triggered
  x86: merge dwarf2 headers
  x86: use AS_CFI instead of UNWIND_INFO
  x86: use ignore macro instead of hash comment
  x86: use matching CFI_ENDPROC
  ...
2008-07-14 13:43:24 -07:00
Linus Torvalds
3b23e665b6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (50 commits)
  crypto: ixp4xx - Select CRYPTO_AUTHENC
  crypto: s390 - Respect STFL bit
  crypto: talitos - Add support for sha256 and md5 variants
  crypto: hash - Move ahash functions into crypto/hash.h
  crypto: crc32c - Add ahash implementation
  crypto: hash - Added scatter list walking helper
  crypto: prng - Deterministic CPRNG
  crypto: hash - Removed vestigial ahash fields
  crypto: hash - Fixed digest size check
  crypto: rmd - sparse annotations
  crypto: rmd128 - sparse annotations
  crypto: camellia - Use kernel-provided bitops, unaligned access helpers
  crypto: talitos - Use proper form for algorithm driver names
  crypto: talitos - Add support for 3des
  crypto: padlock - Make module loading quieter when hardware isn't available
  crypto: tcrpyt - Remove unnecessary kmap/kunmap calls
  crypto: ixp4xx - Hardware crypto support for IXP4xx CPUs
  crypto: talitos - Freescale integrated security engine (SEC) driver
  [CRYPTO] tcrypt: Add self test for des3_ebe cipher operating in cbc mode
  [CRYPTO] rmd: Use pointer form of endian swapping operations
  ...
2008-07-14 13:40:42 -07:00
Jean Delvare
4735c98f84 i2c: Add detection capability to new-style drivers
Add a mechanism to let new-style i2c drivers optionally autodetect
devices they would support on selected buses and ask i2c-core to
instantiate them. This is a replacement for legacy i2c drivers, much
cleaner.

Where drivers had to implement both a legacy i2c_driver and a
new-style i2c_driver so far, this mechanism makes it possible to get
rid of the legacy i2c_driver and implement both enumerated and
detected device support with just one (new-style) i2c_driver.

Here is a quick conversion guide for these drivers, step by step:

* Delete the legacy driver definition, registration and removal.
  Delete the attach_adapter and detach_client methods of the legacy
  driver.

* Change the prototype of the legacy detect function from
    static int foo_detect(struct i2c_adapter *adapter, int address, int kind);
  to
    static int foo_detect(struct i2c_client *client, int kind,
    			  struct i2c_board_info *info);

* Set the new-style driver detect callback to this new function, and
  set its address_data to &addr_data (addr_data is generally provided
  by I2C_CLIENT_INSMOD.)

* Add the appropriate class to the new-style driver. This is
  typically the class the legacy attach_adapter method was checking
  for. Class checking is now mandatory (done by i2c-core.) See
  <linux/i2c.h> for the list of available classes.

* Remove the i2c_client allocation and freeing from the detect
  function. A pre-allocated client is now handed to you by i2c-core,
  and is freed automatically.

* Make the detect function fill the type field of the i2c_board_info
  structure it was passed as a parameter, and return 0, on success. If
  the detection fails, return -ENODEV.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-07-14 22:38:36 +02:00
Wolfram Sang
2b7a5056a0 i2c: New-style EEPROM driver using device IDs
Add a new-style driver for most I2C EEPROMs, giving sysfs read/write
access to their data. Tested with various chips and clock rates.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-07-14 22:38:35 +02:00
Jon Smirl
e9ca9eb9d7 i2c: Export the i2c_bus_type symbol
Export the root of the i2c bus so that PowerPC device tree code can
iterate over devices on the i2c bus.

Signed-off-by: Jon Smirl <jonsmirl@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-07-14 22:38:35 +02:00
Jean Delvare
f6a7110520 i2c-dev: Delete empty detach_client callback
Implementing detach_client is optional, so there is no point in
an empty implementation.

Likewise, i2c driver IDs are optional, and we don't need one.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-07-14 22:38:34 +02:00
Jean Delvare
e3e7fc3c40 i2c-algo-pcf: Drop unused struct members
Struct members udelay and timeout aren't used anywhere, so drop them.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Eric Brower <ebrower@gmail.com>
2008-07-14 22:38:31 +02:00
Eric Brower
0573d11b2b i2c-algo-pcf: Multi-master lost-arbitration improvement
Improve lost-arbitration handling of PCF8584.  This is necessary for
support of a currently out-of-kernel driver for Sun Microsystems E250
environmental management; perhaps others.

Signed-off-by: Eric Brower <ebrower@gmail.com>
Acked-by: Dan Smolik <marvin@mydatex.cz>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-07-14 22:38:31 +02:00
Jean Delvare
3401b2fff3 i2c: Let bus drivers add SPD to their class
Let general purpose I2C/SMBus bus drivers add SPD to their class. Once
this is done, we will be able to tell the eeprom driver to only probe
for SPD EEPROMs and similar on these buses.

Note that I took a conservative approach here, adding I2C_CLASS_SPD to
many drivers that have no idea whether they can host SPD EEPROMs or not.
This is to make sure that the eeprom driver doesn't stop probing buses
where SPD EEPROMs or equivalent live.

So, bus driver maintainers and users should feel free to remove the SPD
class from drivers those buses never have SPD EEPROMs or they don't
want the eeprom driver to bind to them. Likewise, feel free to add the
SPD class to any bus driver I might have missed.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-07-14 22:38:29 +02:00
Jean Delvare
c1b6b4f234 i2c: Let framebuffer drivers set their I2C bus class to DDC
Let framebuffer drivers set their I2C bus class to DDC. Once this is
done, we will be able to tell the eeprom driver to only probe for
EDID EEPROMs on these buses.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-07-14 22:38:28 +02:00
Jean Delvare
ae7193f7fa i2c: Update stray references to smbus_access
That function is actually named i2c_smbus_xfer.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-07-14 22:38:24 +02:00
Jean Delvare
67c2e66571 i2c: Delete unused function i2c_smbus_write_quick
Function i2c_smbus_write_quick has no users left, so we can delete it.

Also update the list of these helper functions which are gone but
could be added back if needed.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-07-14 22:38:23 +02:00
Adrian Bunk
20a9b6e7c3 i2c: Remove 3 deprecated bus drivers
This patch contains the scheduled removal of i2c-i810, i2c-prosavage
and i2c-savage4.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-07-14 22:38:22 +02:00
Linus Torvalds
847106ff62 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (25 commits)
  security: remove register_security hook
  security: remove dummy module fix
  security: remove dummy module
  security: remove unused sb_get_mnt_opts hook
  LSM/SELinux: show LSM mount options in /proc/mounts
  SELinux: allow fstype unknown to policy to use xattrs if present
  security: fix return of void-valued expressions
  SELinux: use do_each_thread as a proper do/while block
  SELinux: remove unused and shadowed addrlen variable
  SELinux: more user friendly unknown handling printk
  selinux: change handling of invalid classes (Was: Re: 2.6.26-rc5-mm1 selinux whine)
  SELinux: drop load_mutex in security_load_policy
  SELinux: fix off by 1 reference of class_to_string in context_struct_compute_av
  SELinux: open code sidtab lock
  SELinux: open code load_mutex
  SELinux: open code policy_rwlock
  selinux: fix endianness bug in network node address handling
  selinux: simplify ioctl checking
  SELinux: enable processes with mac_admin to get the raw inode contexts
  Security: split proc ptrace checking into read vs. attach
  ...
2008-07-14 13:36:55 -07:00
Linus Torvalds
b7f80afa28 Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: (71 commits)
  [S390] sclp_tty: Fix scheduling while atomic bug.
  [S390] sclp_tty: remove ioctl interface.
  [S390] Remove P390 support.
  [S390] Cleanup vmcp printk messages.
  [S390] Cleanup lcs printk messages.
  [S390] Cleanup kprobes printk messages.
  [S390] Cleanup vmwatch printk messages.
  [S390] Cleanup dcssblk printk messages.
  [S390] Cleanup zfcp dumper printk messages.
  [S390] Cleanup vmlogrdr printk messages.
  [S390] Cleanup s390 debug feature print messages.
  [S390] Cleanup monreader printk messages.
  [S390] Cleanup appldata printk messages.
  [S390] Cleanup smsgiucv printk messages.
  [S390] Cleanup cpacf printk messages.
  [S390] Cleanup qeth print messages.
  [S390] Cleanup netiucv printk messages.
  [S390] Cleanup iucv printk messages.
  [S390] Cleanup sclp printk messages.
  [S390] Cleanup zcrypt printk messages.
  ...
2008-07-14 13:25:01 -07:00
Linus Torvalds
dddec01eb8 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (37 commits)
  splice: fix generic_file_splice_read() race with page invalidation
  ramfs: enable splice write
  drivers/block/pktcdvd.c: avoid useless memset
  cdrom: revert commit 22a9189 (cdrom: use kmalloced buffers instead of buffers on stack)
  scsi: sr avoids useless buffer allocation
  block: blk_rq_map_kern uses the bounce buffers for stack buffers
  block: add blk_queue_update_dma_pad
  DAC960: push down BKL
  pktcdvd: push BKL down into driver
  paride: push ioctl down into driver
  block: use get_unaligned_* helpers
  block: extend queue_flag bitops
  block: request_module(): use format string
  Add bvec_merge_data to handle stacked devices and ->merge_bvec()
  block: integrity flags can't use bit ops on unsigned short
  cmdfilter: extend default read filter
  sg: fix odd style (extra parenthesis) introduced by cmd filter patch
  block: add bounce support to blk_rq_map_user_iov
  cfq-iosched: get rid of enable_idle being unused warning
  allow userspace to modify scsi command filter on per device basis
  ...
2008-07-14 13:15:14 -07:00
Kristen Carlson Accardi
18f7ba4c2f libata/ahci: enclosure management support
Add Enclosure Management support to libata and ahci.

Signed-off-by:  Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-07-14 15:59:33 -04:00
Tejun Heo
87fbc5a060 libata: improve EH internal command timeout handling
ATA_TMOUT_INTERNAL which was 30secs were used for all internal
commands which is way too long when something goes wrong.  This patch
implements command type based stepped timeouts.  Different command
types can use different timeouts and each command type can use
different timeout values after timeouts.

ie. the initial timeout is set to a value which should cover most of
the cases but not too long so that run away cases don't delay things
too much.  After the first try times out, the second try can use
longer timeout and if that one times out too, it can go for full 30sec
timeout.

IDENTIFYs use 5s - 10s - 30s timeout and all other commands use 5s -
10s timeouts.

This patch significantly cuts down the needed time to handle failure
cases while still allowing libata to work with nut job devices through
retries.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-07-14 15:59:32 -04:00
Tejun Heo
0a2c0f5615 libata: improve EH retry delay handling
EH retries were delayed by 5 seconds to ensure that resets don't occur
back-to-back.  However, this 5 second delay is superflous or excessive
in many cases.  For example, after IDENTIFY times out, there's no
reason to wait five more seconds before retrying.

This patch adds ehc->last_reset timestamp and record the timestamp for
the last reset trial or success and uses it to space resets by
ATA_EH_RESET_COOL_DOWN which is 5 secs and removes unconditional 5 sec
sleeps.

As this change makes inter-try waits often shorter and they're
redundant in nature, this patch also removes the "retrying..."
messages.

While at it, convert explicit rounding up division to DIV_ROUND_UP().

This change speeds up EH in many cases w/o sacrificing robustness.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-07-14 15:59:32 -04:00
Tejun Heo
341c2c958e libata: consistently use msecs for time durations
libata has been using mix of jiffies and msecs for time druations.
This is getting confusing.  As writing sub HZ values in jiffies is
PITA and msecs_to_jiffies() can't be used as initializer, unify unit
for all time durations to msecs.  So, durations are in msecs and
deadlines are in jiffies.  ata_deadline() is added to compute deadline
from a start time and duration in msecs.

While at it, drop now superflous _msec suffix from arguments and
rename @timeout to @deadline if it represents a fixed point in time
rather than duration.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-07-14 15:59:32 -04:00
Michael Buesch
9c0c7a429a ssb: Include dma-mapping.h
ssb.h implements DMA mapping functions, so it should
include dma-mapping.h. This fixes compile failures on certain architectures.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-07-14 14:52:56 -04:00
Artem Bityutskiy
4ee6afd344 VFS: export sync_sb_inodes
This patch exports the 'sync_sb_inodes()' which is needed for
UBIFS because it has to force write-back from time to time.
Namely, the UBIFS budgeting subsystem forces write-back when
its pessimistic callculations show that there is no free
space on the media.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-14 19:10:52 +03:00
Ingo Molnar
5806b81ac1 Merge branch 'auto-ftrace-next' into tracing/for-linus
Conflicts:

	arch/x86/kernel/entry_32.S
	arch/x86/kernel/process_32.c
	arch/x86/kernel/process_64.c
	arch/x86/lib/Makefile
	include/asm-x86/irqflags.h
	kernel/Makefile
	kernel/sched.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-14 16:11:52 +02:00
Ingo Molnar
d14c8a680c Merge branch 'sched/for-linus' into tracing/for-linus 2008-07-14 16:11:02 +02:00
Ingo Molnar
6712e299b7 Merge branch 'tracing/ftrace' into auto-ftrace-next 2008-07-14 15:58:35 +02:00
Kumar Gala
2f3804edf9 powerpc/85xx: Add support for MPC8536DS
Add support for the MPC8536 process and MPC8536DS reference board.  The
MPC8536 is an e500v2 based SoC which eTSEC, USB, SATA, PCI, and PCIe.

The USB and SATA IP blocks are similiar to those on the PQ2 Pro SoCs and
thus use the same drivers.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-07-14 07:55:37 -05:00
Ingo Molnar
361833efac Merge branch 'sched/clock' into sched/devel 2008-07-14 12:19:13 +02:00
Ingo Molnar
b4ba0ba24b Merge commit 'v2.6.26' into core/locking 2008-07-14 10:31:59 +02:00
Cornelia Huck
f08adc008d [S390] css: Use css_device_id for bus matching.
css_device_id exists, so use it for determining the right driver
(and add a match_flags which is always 1 for valid types).

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2008-07-14 10:02:12 +02:00
Cornelia Huck
7e9db9eaef [S390] cio: Introduce modalias for css bus.
Add modalias and subchannel type attributes for all subchannels.
I/O subchannel specific attributes are now created in
io_subchannel_probe(). modalias and subchannel type are also
added to the uevent for the css bus. Also make the css modalias
known.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2008-07-14 10:02:05 +02:00
James Morris
6f0f0fd496 security: remove register_security hook
The register security hook is no longer required, as the capability
module is always registered.  LSMs wishing to stack capability as
a secondary module should do so explicitly.

Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-14 15:04:06 +10:00
Miklos Szeredi
b478a9f988 security: remove unused sb_get_mnt_opts hook
The sb_get_mnt_opts() hook is unused, and is superseded by the
sb_show_options() hook.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: James Morris <jmorris@namei.org>
2008-07-14 15:02:05 +10:00
Eric Paris
2069f45784 LSM/SELinux: show LSM mount options in /proc/mounts
This patch causes SELinux mount options to show up in /proc/mounts.  As
with other code in the area seq_put errors are ignored.  Other LSM's
will not have their mount options displayed until they fill in their own
security_sb_show_options() function.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: James Morris <jmorris@namei.org>
2008-07-14 15:02:05 +10:00
Stephen Smalley
006ebb40d3 Security: split proc ptrace checking into read vs. attach
Enable security modules to distinguish reading of process state via
proc from full ptrace access by renaming ptrace_may_attach to
ptrace_may_access and adding a mode argument indicating whether only
read access or full attach access is requested.  This allows security
modules to permit access to reading process state without granting
full ptrace access.  The base DAC/capability checking remains unchanged.

Read access to /proc/pid/mem continues to apply a full ptrace attach
check since check_mem_permission() already requires the current task
to already be ptracing the target.  The other ptrace checks within
proc for elements like environ, maps, and fds are changed to pass the
read mode instead of attach.

In the SELinux case, we model such reading of process state as a
reading of a proc file labeled with the target process' label.  This
enables SELinux policy to permit such reading of process state without
permitting control or manipulation of the target process, as there are
a number of cases where programs probe for such information via proc
but do not need to be able to control the target (e.g. procps,
lsof, PolicyKit, ConsoleKit).  At present we have to choose between
allowing full ptrace in policy (more permissive than required/desired)
or breaking functionality (or in some cases just silencing the denials
via dontaudit rules but this can hide genuine attacks).

This version of the patch incorporates comments from Casey Schaufler
(change/replace existing ptrace_may_attach interface, pass access
mode), and Chris Wright (provide greater consistency in the checking).

Note that like their predecessors __ptrace_may_attach and
ptrace_may_attach, the __ptrace_may_access and ptrace_may_access
interfaces use different return value conventions from each other (0
or -errno vs. 1 or 0).  I retained this difference to avoid any
changes to the caller logic but made the difference clearer by
changing the latter interface to return a bool rather than an int and
by adding a comment about it to ptrace.h for any future callers.

Signed-off-by:  Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: James Morris <jmorris@namei.org>
2008-07-14 15:01:47 +10:00
Benjamin Herrenschmidt
11c2d8174e Merge commit 'origin/HEAD' into test-merge
Manual fixup of include/asm-powerpc/pgtable-ppc64.h
2008-07-14 14:29:49 +10:00
Richard Kennedy
afc1246f91 file lock: reorder struct file_lock to save space on 64 bit builds
Reduce sizeof struct file_lock by 8 on 64 bit builds allowing +1 objects
per slab in the file_lock_cache

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-07-13 17:22:24 -04:00
Russell King
044e5f45e4 Merge branch 'pxa' into devel
Conflicts:

	arch/arm/configs/em_x270_defconfig
	arch/arm/configs/xm_x270_defconfig
2008-07-13 12:05:49 +01:00
Gerrit Renker
5b5d0e7048 dccp: Upgrade NDP count from 3 to 6 bytes
RFC 4340, 7.7 specifies up to 6 bytes for the NDP Count option, whereas the code
is currently limited to up to 3 bytes. This seems to be a relict of an earlier 
draft version and is brought up to date by the patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2008-07-13 11:51:40 +01:00
Ingo Molnar
54ef76f37b Merge branch 'linus' into sched/devel 2008-07-13 08:50:13 +02:00
Eric Miao
52256c0e06 [NET] smc91x: prepare SMC_USE_PXA_DMA to be specified in platform data
Now that the original SMC_USE_PXA_DMA specific code will always being
built if CONFIG_ARCH_PXA is defined, so to make this part of the code
to be PXA public, and still prevent it from being built if support of
PXA is not selected.

A SMC91X_USE_DMA flag is added to the platform data to allow platform
to choose its usage of DMA. Note this flag itself is so named to be
generic enough (assuming other platforms can also use DMA).

It keeps backward compatibility to set the SMC91X_USE_DMA flag if
SMC_USE_PXA_DMA is still defined.

Signed-off-by: Eric Miao <eric.miao@marvell.com>
Acked-by: Nicolas Pitre <nico@cam.org>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-07-12 21:52:41 +01:00
Eric Miao
159198862a [NET] smc91x: prepare for SMC_IO_SHIFT to be a platform configurable variable
Now one can use the following code

  #define SMC_IO_SHIFT	lp->io_shift

to make SMC_IO_SHIFT a variable. This, however, will slightly increase
the CPU overhead and have negative impact on the network performance.
The tradeoff is, this can be specified in the smc91x platform data so
that multiple boards support can be built in a single zImage.

Signed-off-by: Eric Miao <eric.miao@marvell.com>
Acked-by: Nicolas Pitre <nico@cam.org>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-07-12 21:52:40 +01:00
Eric Miao
c4f0e76747 [NET] smc91x: add SMC91X_NOWAIT flag to platform data
And also favors the usage of SMC91X_NOWAIT over the hardcoded SMC_NOWAIT
by converting "nowait" (module parameter overridable) to platform flag.

There are several possibilities:

  1. platform data present - preferred and use as is
  2. platform data absent  - use "nowait", it can be:
       a. SMC_NOWAIT if defined
       b. default to 0 if SMC_NOWAIT isn't defined
       c. overriden by module parameter

Signed-off-by: Eric Miao <eric.miao@marvell.com>
Acked-by: Nicolas Pitre <nico@cam.org>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-07-12 21:52:40 +01:00
Eric Miao
d280eadc4f [NET] smc91x: remove "irq_flags" from "struct smc91x_platdata"
IRQ trigger type can be specified in the IRQ resource definition by
IORESOURCE_IRQ_*, we need only one way to specify this.

This also fixes the following small issue:

To allow dynamic support for multiple platforms, when those relevant
macros are not defined for one specific platform, the default case
will be:

	- SMC_DYNAMIC_BUS_CONFIG defined
	- and SMC_IRQ_FLAGS = IRQF_TRIGGER_RISING

While if "irq_flags" is missing when defining the smc91x_platdata,
usually as follows:

  static struct smc91x_platdata xxxx_smc91x_data = {
	.flags	= SMC91X_USE_XXBIT,
  };

The lp->cfg.irq_flags will always be overriden by the above structure
(due to a memcpy), thus rendering lp->cfg.irq_flags to be "0" always.
(regardless of the default SMC_IRQ_FLAGS or IORESOURCE_IRQ_* flags)

Fixes this by forcing to use IORESOURCE_IRQ_* flags if present, and
make the only user of smc91x_platdata.irq_flags (renesas/migor) to
use IORESOURCE_IRQ_*.

Signed-off-by: Eric Miao <eric.miao@marvell.com>
Acked-by: Nicolas Pitre <nico@cam.org>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-07-12 21:52:40 +01:00
Russell King
7fecc34e07 Merge branch 'pxa-tosa' into pxa
Conflicts:

	arch/arm/mach-pxa/Kconfig
	arch/arm/mach-pxa/tosa.c
	arch/arm/mach-pxa/spitz.c
2008-07-12 21:43:01 +01:00
Martin K. Petersen
f11f594edb [SCSI] lib: Add support for the T10 (SCSI) Data Integrity Field CRC
The SCSI Block Protocol uses this 16-bit CRC to verify the integrity
of each data sector.  crc_t10dif() is used by sd_dif.c when performing
I/O to or from disks formatted with protection information.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:32 -05:00
Suresh Siddha
75c46fa61b x64, x2apic/intr-remap: MSI and MSI-X support for interrupt remapping infrastructure
MSI and MSI-X support for interrupt remapping infrastructure.

MSI address register will be programmed with interrupt-remapping table
entry(IRTE) index and the IRTE will contain information about the vector,
cpu destination, etc.

For MSI-X, all the IRTE's will be consecutively allocated in the table,
and the address registers will contain the starting index to the block
and the data register will contain the subindex with in that block.

This also introduces a new irq_chip for cleaner irq migration (in the process
context as opposed to the current irq migration in the context of an interrupt.
interrupt-remapping infrastructure will help us achieve this).

As MSI is edge triggered, irq migration is a simple atomic update(of vector
and cpu destination) of IRTE and flushing the hardware cache.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: akpm@linux-foundation.org
Cc: arjan@linux.intel.com
Cc: andi@firstfloor.org
Cc: ebiederm@xmission.com
Cc: jbarnes@virtuousgeek.org
Cc: steiner@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-12 08:45:05 +02:00
Suresh Siddha
89027d35aa x64, x2apic/intr-remap: IO-APIC support for interrupt-remapping
IO-APIC support in the presence of interrupt-remapping infrastructure.

IO-APIC RTE will be programmed with interrupt-remapping table entry(IRTE)
index and the IRTE will contain information about the vector, cpu destination,
trigger mode etc, which traditionally was present in the IO-APIC RTE.

Introduce a new irq_chip for cleaner irq migration (in the process
context as opposed to the current irq migration in the context of an interrupt.
interrupt-remapping infrastructure will help us achieve this cleanly).

For edge triggered, irq migration is a simple atomic update(of vector
and cpu destination) of IRTE and flush the hardware cache.

For level triggered, we need to modify the io-apic RTE aswell with the update
vector information, along with modifying IRTE with vector and cpu destination.
So irq migration for level triggered is little  bit more complex compared to
edge triggered migration. But the good news is, we use the same algorithm
for level triggered migration as we have today, only difference being,
we now initiate the irq migration from process context instead of the
interrupt context.

In future, when we do a directed EOI (combined with cpu EOI broadcast
suppression) to the IO-APIC, level triggered irq migration will also be
as simple as edge triggered migration and we can do the irq migration
with a simple atomic update to IO-APIC RTE.

TBD: some tests/changes needed in the presence of fixup_irqs() for
level triggered irq migration.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: akpm@linux-foundation.org
Cc: arjan@linux.intel.com
Cc: andi@firstfloor.org
Cc: ebiederm@xmission.com
Cc: jbarnes@virtuousgeek.org
Cc: steiner@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-12 08:45:05 +02:00
Suresh Siddha
72b1e22dfc x64, x2apic/intr-remap: generic irq migration support from process context
Generic infrastructure for migrating the irq from the process context in the
presence of CONFIG_GENERIC_PENDING_IRQ.

This will be used later for migrating irq in the presence of
interrupt-remapping.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: akpm@linux-foundation.org
Cc: arjan@linux.intel.com
Cc: andi@firstfloor.org
Cc: ebiederm@xmission.com
Cc: jbarnes@virtuousgeek.org
Cc: steiner@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-12 08:44:55 +02:00
Suresh Siddha
b6fcb33ad6 x64, x2apic/intr-remap: routines managing Interrupt remapping table entries.
Routines handling the management of interrupt remapping table entries.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: akpm@linux-foundation.org
Cc: arjan@linux.intel.com
Cc: andi@firstfloor.org
Cc: ebiederm@xmission.com
Cc: jbarnes@virtuousgeek.org
Cc: steiner@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-12 08:44:54 +02:00
Suresh Siddha
2ae2101069 x64, x2apic/intr-remap: Interrupt remapping infrastructure
Interrupt remapping (part of Intel Virtualization Tech for directed I/O)
infrastructure.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: akpm@linux-foundation.org
Cc: arjan@linux.intel.com
Cc: andi@firstfloor.org
Cc: ebiederm@xmission.com
Cc: jbarnes@virtuousgeek.org
Cc: steiner@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-12 08:44:53 +02:00
Suresh Siddha
ad3ad3f6a2 x64, x2apic/intr-remap: parse ioapic scope under vt-d structures
Parse the vt-d device scope structures to find the mapping between IO-APICs
and the interrupt remapping hardware units.

This will be used later for enabling Interrupt-remapping for IOAPIC devices.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: akpm@linux-foundation.org
Cc: arjan@linux.intel.com
Cc: andi@firstfloor.org
Cc: ebiederm@xmission.com
Cc: jbarnes@virtuousgeek.org
Cc: steiner@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-12 08:44:50 +02:00
Suresh Siddha
1886e8a90a x64, x2apic/intr-remap: code re-structuring, to be used by both DMA and Interrupt remapping
Allocate the iommu during the parse of DMA remapping hardware
definition structures. And also, introduce routines for device
scope initialization which will be explicitly called during
dma-remapping initialization.

These will be used for enabling interrupt remapping separately from the
existing DMA-remapping enabling sequence.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: akpm@linux-foundation.org
Cc: arjan@linux.intel.com
Cc: andi@firstfloor.org
Cc: ebiederm@xmission.com
Cc: jbarnes@virtuousgeek.org
Cc: steiner@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-12 08:44:48 +02:00
Ingo Molnar
ae94b8075a Merge branch 'linus' into x86/core
Conflicts:

	arch/x86/mm/ioremap.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-12 07:29:02 +02:00
Aneesh Kumar K.V
06d6cf6959 mm: Add range_cont mode for writeback
Filesystems like ext4 needs to start a new transaction in
the writepages for block allocation. This happens with delayed
allocation and there is limit to how many credits we can request
from the journal layer. So we call write_cache_pages multiple
times with wbc->nr_to_write set to the maximum possible value
limitted by the max journal credits available.

Add a new mode to writeback that enables us to handle this
behaviour. In the new mode we update the wbc->range_start
to point to the new offset to be written. Next call to
call to write_cache_pages will start writeout from specified
range_start offset. In the new mode we also limit writing
to the specified wbc->range_end.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11 19:27:31 -04:00
Mingming Cao
e8ced39d5e percpu_counter: new function percpu_counter_sum_and_set
Delayed allocation need to check free blocks at every write time.
percpu_counter_read_positive() is not quit accurate. delayed
allocation need a more accurate accounting, but using
percpu_counter_sum_positive() is frequently is quite expensive.

This patch added a new function to update center counter when sum
per-cpu counter, to increase the accurate rate for next
percpu_counter_read() and require less calling expensive
percpu_counter_sum().

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11 19:27:31 -04:00
Alex Tomas
29a814d2ee vfs: add hooks for ext4's delayed allocation support
Export mpage_bio_submit() and __mpage_writepage() for the benefit of
ext4's delayed allocation support.   Also change __block_write_full_page
so that if buffers that have the BH_Delay flag set it will call
get_block() to get the physical block allocated, just as in the
!BH_Mapped case.

Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11 19:27:31 -04:00
Jan Kara
87c89c232c jbd2: Remove data=ordered mode support using jbd buffer heads
Signed-off-by: Jan Kara <jack@suse.cz>
2008-07-11 19:27:31 -04:00
Jan Kara
c851ed5401 jbd2: Implement data=ordered mode handling via inodes
This patch adds necessary framework into JBD2 to be able to track inodes
with each transaction and write-out their dirty data during transaction
commit time.

This new ordered mode brings all sorts of advantages such as possibility 
to get rid of journal heads and buffer heads for data buffers in ordered 
mode, better ordering of writes on transaction commit, simplification of
 some JBD code, no more anonymous pages when truncate of data being 
committed happens.  Also with this new ordered mode, delayed allocation 
on ordered mode is much simpler.

Signed-off-by: Jan Kara <jack@suse.cz>
2008-07-11 19:27:31 -04:00
Jan Kara
f4c0a0fdfa vfs: export filemap_fdatawrite_range()
Make filemap_fdatawrite_range() function public, so that it can later
be used in ordered mode rewrite by JBD/JBD2.

Signed-off-by: Jan Kara <jack@suse.cz>
2008-07-11 19:27:31 -04:00
Theodore Ts'o
736603ab29 jbd2: Add commit time into the commit block
Carlo Wood has demonstrated that it's possible to recover deleted
files from the journal.  Something that will make this easier is if we
can put the time of the commit into commit block.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11 19:27:31 -04:00
Steven Rostedt
af52a90a14 sched_clock: stop maximum check on NO HZ
Working with ftrace I would get large jumps of 11 millisecs or more with
the clock tracer. This killed the latencing timings of ftrace and also
caused the irqoff self tests to fail.

What was happening is with NO_HZ the idle would stop the jiffy counter and
before the jiffy counter was updated the sched_clock would have a bad
delta jiffies to compare with the gtod with the maximum.

The jiffies would stop and the last sched_tick would record the last gtod.
On wakeup, the sched clock update would compare the gtod + delta jiffies
(which would be zero) and compare it to the TSC. The TSC would have
correctly (with a stable TSC) moved forward several jiffies. But because the
jiffies has not been updated yet the clock would be prevented from moving
forward because it would appear that the TSC jumped too far ahead.

The clock would then virtually stop, until the jiffies are updated. Then
the next sched clock update would see that the clock was very much behind
since the delta jiffies is now correct. This would then jump the clock
forward by several jiffies.

This caused ftrace to report several milliseconds of interrupts off
latency at every resume from NO_HZ idle.

This patch adds hooks into the nohz code to disable the checking of the
maximum clock update when nohz is in effect. It resumes the max check
when nohz has updated the jiffies again.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:53:26 +02:00
Steven Rostedt
a2bb6a3d85 ftrace: add ftrace_kill_atomic
It has been suggested that I add a way to disable the function tracer
on an oops. This code adds a ftrace_kill_atomic. It is not meant to be
used in normal situations. It will disable the ftrace tracer, but will
not perform the nice shutdown that requires scheduling.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 15:49:21 +02:00
David Woodhouse
a8931ef380 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-07-11 14:36:25 +01:00
Andre Noll
7e93a89251 md: Remove some unused macros.
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-11 22:02:23 +10:00
Andre Noll
0f420358e3 md: Turn rdev->sb_offset into a sector-based quantity.
Rename it to sb_start to make sure all users have been converted.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-11 22:02:23 +10:00
Neil Brown
0306d5efbf Merge branch 'master' into for-next 2008-07-11 21:57:40 +10:00
Ingo Molnar
0c81b2a144 Merge branch 'linus' into core/rcu
Conflicts:

	include/linux/rculist.h
	kernel/rcupreempt.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-11 10:46:50 +02:00
Alexander Duyck
d815653404 net: add netif_napi_del function to allow for removal of napistructs
Adds netif_napi_del function which is used to remove the napi struct from
the netdev napi_list in cases where CONFIG_NETPOLL was enabled.
The motivation for adding this is to handle the case in which the number of
queues on a device changes due to a configuration change.  Previously the
napi structs for each queue would be left in the list until the netdev was
freed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-07-11 01:20:33 -04:00
Linus Torvalds
e5a5816f78 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
  tun: Persistent devices can get stuck in xoff state
  xfrm: Add a XFRM_STATE_AF_UNSPEC flag to xfrm_usersa_info
  ipv6: missed namespace context in ipv6_rthdr_rcv
  netlabel: netlink_unicast calls kfree_skb on error path by itself
  ipv4: fib_trie: Fix lookup error return
  tcp: correct kcalloc usage
  ip: sysctl documentation cleanup
  Documentation: clarify tcp_{r,w}mem sysctl docs
  netfilter: nf_nat_snmp_basic: fix a range check in NAT for SNMP
  netfilter: nf_conntrack_tcp: fix endless loop
  libertas: fix memory alignment problems on the blackfin
  zd1211rw: stop beacons on remove_interface
  rt2x00: Disable synchronization during initialization
  rc80211_pid: Fix fast_start parameter handling
  sctp: Add documentation for sctp sysctl variable
  ipv6: fix race between ipv6_del_addr and DAD timer
  irda: Fix netlink error path return value
  irda: New device ID for nsc-ircc
  irda: via-ircc proper dma freeing
  sctp: Mark the tsn as received after all allocations finish
  ...
2008-07-10 17:58:47 -07:00
Steffen Klassert
ccf9b3b83d xfrm: Add a XFRM_STATE_AF_UNSPEC flag to xfrm_usersa_info
Add a XFRM_STATE_AF_UNSPEC flag to handle the AF_UNSPEC behavior for
the selector family. Userspace applications can set this flag to leave
the selector family of the xfrm_state unspecified.  This can be used
to to handle inter family tunnels if the selector is not set from
userspace.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-10 16:55:37 -07:00
David Woodhouse
f1485f3deb ihex: request_ihex_firmware() function to load and validate firmware
Provide a helper to load the file and validate it in one call, to
simplify error handling in the drivers which are going to use it.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2008-07-10 14:47:38 +01:00
David Woodhouse
bacfe09dd7 ihex.h: binary representation of ihex records
Some devices need their firmware as a set of {address, len, data...}
records in some specific order rather than a simple blob.

The normal way of doing this kind of thing is 'ihex', which is a text
format and not entirely suitable for use in the kernel.

This provides a binary representation which is very similar, but much
more compact -- and a helper routine to skip to the next record,
because the alignment constraints mean that everybody will screw it up
for themselves otherwise.

Also a helper function which can verify that a 'struct firmware'
contains a valid set of ihex records, and that following them won't run
off the end of the loaded data.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2008-07-10 14:47:36 +01:00
David Woodhouse
5658c76944 firmware: allow firmware files to be built into kernel image
Some drivers have their own hacks to bypass the kernel's firmware loader
and build their firmware into the kernel; this renders those unnecessary.

Other drivers don't use the firmware loader at all, because they always
want the firmware to be available. This allows them to start using the
firmware loader.

A third set of drivers already use the firmware loader, but can't be
used without help from userspace, which sometimes requires an initrd.
This allows them to work in a static kernel.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2008-07-10 14:30:13 +01:00
David Woodhouse
b7a39bd0af firmware: make fw->data const
In preparation for supporting firmware files linked into the static
kernel, make fw->data const to ensure that users aren't modifying it (so
that we can pass a pointer to the original in-kernel copy, rather than
having to copy it).

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2008-07-10 14:29:25 +01:00
Herbert Xu
18e33e6d5c crypto: hash - Move ahash functions into crypto/hash.h
All new crypto interfaces should go into individual files as much
as possible in order to ensure that crypto.h does not collapse under
its own weight.

This patch moves the ahash code into crypto/hash.h and crypto/internal/hash.h
respectively.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-07-10 20:35:18 +08:00
Herbert Xu
166247f46a crypto: hash - Removed vestigial ahash fields
The base field in ahash_tfm appears to have been cut-n-pasted from
ablkcipher.  It isn't needed here at all.  Similarly, the info field
in ahash_request also appears to have originated from its cipher
counter-part and is vestigial.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-07-10 20:35:18 +08:00
Loc Ho
004a403c2e [CRYPTO] hash: Add asynchronous hash support
This patch adds asynchronous hash and digest support.

Signed-off-by: Loc Ho <lho@amcc.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-07-10 20:35:13 +08:00
Ingo Molnar
5373fdbdc1 Merge branch 'tracing/mmiotrace' into auto-ftrace-next 2008-07-10 11:43:06 +02:00
Ingo Molnar
bac0c9103b Merge branch 'tracing/ftrace' into auto-ftrace-next 2008-07-10 11:43:00 +02:00
Ingo Molnar
9e4144abf8 Merge branch 'linus' into core/printk
Conflicts:

	kernel/printk.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-10 08:17:14 +02:00
Russell King
f974a8ec96 Merge branch 'machtypes' into pxa-palm 2008-07-09 21:34:25 +01:00
Chuck Lever
0e0cab744b NFS: use documenting macro constants for initializing ac{reg, dir}{min, max}
Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:40 -04:00
Trond Myklebust
259875efed NFS: set transport defaults after mount option parsing is finished
Move the UDP/TCP default timeo/retrans settings for text mounts to
nfs_init_timeout_values(), which was were they were always being
initialised (and sanity checked) for binary mounts.
Document the default timeout values using appropriate #defines.

Ensure that we initialise and sanity check the transport protocols that
may have been specified by the user.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:38 -04:00
Chuck Lever
ce3b7e1906 NFS: Add string length argument to nfs_parse_server_address
To make nfs_parse_server_address() more generally useful, allow it to
accept input strings that are not terminated with '\0'.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:28 -04:00
Trond Myklebust
e468bae97d NFS: Allow redirtying of a completed unstable write.
Currently, if an unstable write completes, we cannot redirty the page in
order to reflect a new change in the page data until after we've sent a
COMMIT request.

This patch allows a page rewrite to proceed without the unnecessary COMMIT
step, putting it immediately back onto the dirty page list, undoing the
VM unstable write accounting, and removing the NFS_PAGE_TAG_COMMIT tag from
the NFS radix tree.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:24 -04:00
Chuck Lever
34e8f92831 NFS: Move fs/nfs/iostat.h to include/linux
The fs/nfs/iostat.h header has definitions that were designed to be exposed
to user space.  Move these definitions under include/linux so user space can
use the definitions in applications that read /proc/self/mountstats.

Also address a handful of coding style issues called out by checkpatch.pl in
fs/nfs/iostat.h.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:17 -04:00
Trond Myklebust
46cb650c22 NFS: Remove the redundant file_open entry from struct nfs_rpc_ops
All instances are set to nfs_open(), so we should just remove the redundant
indirection. Ditto for the file_release op

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:16 -04:00
\\\"J. Bruce Fields\\\
a486aeda9b rpc: minor cleanup of scheduler callback code
Try to make the comment here a little more clear and concise.

Also, this macro definition seems unnecessary.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:14 -04:00
Olga Kornievskaia
b6b6152c46 rpc: bring back cl_chatty
The cl_chatty flag alows us to control whether a given rpc client leaves

	"server X not responding, timed out"

messages in the syslog.  Such messages make sense for ordinary nfs
clients (where an unresponsive server means applications on the
mountpoint are probably hanging), but not for the callback client (which
can fail more commonly, with the only result just of disabling some
optimizations).

Previously cl_chatty was removed, do to lack of users; reinstate it, and
use it for the nfsd's callback client.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:10 -04:00
Trond Myklebust
2116271a34 NFS: Add correct bounds checking to NFSv2 locks
NFSv2 file locking currently fails the Connectathon tests, because the
calls to the VFS locking code do not return an EINVAL error if the
struct file_lock overflows the 32-bit boundaries.

The problem is due to the fact that we occasionally call helpers from
fs/locks.c in order to avoid RPC calls to the server when we know that a
local process holds the lock. These helpers are, of course, always
64-bit enabled, so EINVAL is not returned in cases when it would if
the call had gone to the NLM code.

For consistency, we therefore add support for a bounds-checking helper.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:40 -04:00
Ingo Molnar
d028203c04 Merge branch 'x86/core' into x86/unify-pci 2008-07-09 11:39:02 +02:00
Dave Kleikamp
aba46c5027 powerpc/mm: Define flags for Strong Access Ordering
This patch defines:

- PROT_SAO, which is passed into mmap() and mprotect() in the prot field
- VM_SAO in vma->vm_flags, and
- _PAGE_SAO, the combination of WIMG bits in the pte that enables strong
access ordering for the page.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-09 16:30:45 +10:00
Dave Kleikamp
b845f313d7 mm: Allow architectures to define additional protection bits
This patch allows architectures to define functions to deal with
additional protections bits for mmap() and mprotect().

arch_calc_vm_prot_bits() maps additonal protection bits to vm_flags
arch_vm_get_page_prot() maps additional vm_flags to the vma's vm_page_prot
arch_validate_prot() checks for valid values of the protection bits

Note: vm_get_page_prot() is now pretty ugly, but the generated code
should be identical for architectures that don't define additional
protection bits.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-09 16:30:45 +10:00
David S. Miller
79d16385c7 netdev: Move atomic queue state bits into netdev_queue.
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 23:14:46 -07:00
David S. Miller
b19fa1fa91 net: Delete NETDEVICES_MULTIQUEUE kconfig option.
Multiple TX queue support is a core networking feature.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 23:14:24 -07:00
David S. Miller
c773e847ea netdev: Move _xmit_lock and xmit_lock_owner into netdev_queue.
Accesses are mostly structured such that when there are multiple TX
queues the code transformations will be a little bit simpler.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 23:13:53 -07:00
David S. Miller
86d804e10a netdev: Make netif_schedule() routines work with netdev_queue objects.
Only plain netif_schedule() remains taking a net_device, mostly as a
compatability item while we transition the rest of these interfaces.

Everything else calls netif_schedule_queue() or __netif_schedule(),
both of which take a netdev_queue pointer.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 23:11:25 -07:00
David S. Miller
970565bbad netdev: Move gso_skb into netdev_queue.
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 23:10:33 -07:00
David S. Miller
ee609cb362 netdev: Move next_sched into struct netdev_queue.
We schedule queues, not the device, for output queue processing in BH.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 22:58:37 -07:00
David S. Miller
816f3258e7 netdev: Kill qdisc_ingress, use netdev->rx_queue.qdisc instead.
Now that our qdisc management is bi-directional, per-queue, and fully
orthogonal, there is no reason to have a special ingress qdisc pointer
in struct net_device.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 22:49:00 -07:00
David S. Miller
b0e1e6462d netdev: Move rest of qdisc state into struct netdev_queue
Now qdisc, qdisc_sleeping, and qdisc_list also live there.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 17:42:10 -07:00
David S. Miller
555353cfa1 netdev: The ingress_lock member is no longer needed.
Every qdisc is assosciated with a queue, and in the case of ingress
qdiscs that will now be netdev->rx_queue so using that queue's lock is
the thing to do.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 17:33:13 -07:00
David S. Miller
dc2b48475a netdev: Move queue_lock into struct netdev_queue.
The lock is now an attribute of the device queue.

One thing to notice is that "suspicious" places
emerge which will need specific training about
multiple queue handling.  They are so marked with
explicit "netdev->rx_queue" and "netdev->tx_queue"
references.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 17:18:23 -07:00
David S. Miller
bb949fbd18 netdev: Create netdev_queue abstraction.
A netdev_queue is an entity managed by a qdisc.

Currently there is one RX and one TX queue, and a netdev_queue merely
contains a backpointer to the net_device.

The Qdisc struct is augmented with a netdev_queue pointer as well.

Eventually the 'dev' Qdisc member will go away and we will have the
resulting hierarchy:

	net_device --> netdev_queue --> Qdisc

Also, qdisc_alloc() and qdisc_create_dflt() now take a netdev_queue
pointer argument.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 16:55:56 -07:00
David S. Miller
54dceb008f Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2008-07-08 15:39:41 -07:00
Patrick McHardy
11a100f844 vlan: avoid header copying and linearisation where possible
- vlan_dev_reorder_header() is only called on the receive path after
  calling skb_share_check(). This means we can use skb_cow() since
  all we need is a writable header.

- vlan_dev_hard_header() includes a work-around for some apparently
  broken out of tree MPLS code. The hard_header functions can expect
  to always have a headroom of at least there own hard_header_len
  available, so the reallocation check is unnecessary.

- __vlan_put_tag() can use skb_cow_head() to avoid the skb_unshare()
  copy when the header is writable.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 15:36:57 -07:00
Haavard Skinnemoen
3bfb1d20b5 dmaengine: Driver for the Synopsys DesignWare DMA controller
This adds a driver for the Synopsys DesignWare DMA controller (aka
DMACA on AVR32 systems.) This DMA controller can be found integrated
on the AT32AP7000 chip and is primarily meant for peripheral DMA
transfer, but can also be used for memory-to-memory transfers.

This patch is based on a driver from David Brownell which was based on
an older version of the DMA Engine framework. It also implements the
proposed extensions to the DMA Engine API for slave DMA operations.

The dmatest client shows no problems, but there may still be room for
improvement performance-wise. DMA slave transfer performance is
definitely "good enough"; reading 100 MiB from an SD card running at ~20
MHz yields ~7.2 MiB/s average transfer rate.

Full documentation for this controller can be found in the Synopsys
DW AHB DMAC Databook:

http://www.synopsys.com/designware/docs/iip/DW_ahb_dmac/latest/doc/dw_ahb_dmac_db.pdf

The controller has lots of implementation options, so it's usually a
good idea to check the data sheet of the chip it's intergrated on as
well. The AT32AP7000 data sheet can be found here:

http://www.atmel.com/dyn/products/datasheets.asp?family_id=682


Changes since v4:
  * Use client_count instead of dma_chan_is_in_use()
  * Add missing include
  * Unmap buffers unless client told us not to

Changes since v3:
  * Update to latest DMA engine and DMA slave APIs
  * Embed the hw descriptor into the sw descriptor
  * Clean up and update MODULE_DESCRIPTION, copyright date, etc.

Changes since v2:
  * Dequeue all pending transfers in terminate_all()
  * Rename dw_dmac.h -> dw_dmac_regs.h
  * Define and use controller-specific dma_slave data
  * Fix up a few outdated comments
  * Define hardware registers as structs (doesn't generate better
    code, unfortunately, but it looks nicer.)
  * Get number of channels from platform_data instead of hardcoding it
    based on CONFIG_WHATEVER_CPU.
  * Give slave clients exclusive access to the channel

Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>,
Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-07-08 11:59:42 -07:00
Haavard Skinnemoen
dc0ee6435c dmaengine: Add slave DMA interface
This patch adds the necessary interfaces to the DMA Engine framework
to use functionality found on most embedded DMA controllers: DMA from
and to I/O registers with hardware handshaking.

In this context, hardware hanshaking means that the peripheral that
owns the I/O registers in question is able to tell the DMA controller
when more data is available for reading, or when there is room for
more data to be written. This usually happens internally on the chip,
but these signals may also be exported outside the chip for things
like IDE DMA, etc.

A new struct dma_slave is introduced. This contains information that
the DMA engine driver needs to set up slave transfers to and from a
slave device. Most engines supporting DMA slave transfers will want to
extend this structure with controller-specific parameters.  This
additional information is usually passed from the platform/board code
through the client driver.

A "slave" pointer is added to the dma_client struct. This must point
to a valid dma_slave structure iff the DMA_SLAVE capability is
requested.  The DMA engine driver may use this information in its
device_alloc_chan_resources hook to configure the DMA controller for
slave transfers from and to the given slave device.

A new operation for preparing slave DMA transfers is added to struct
dma_device. This takes a scatterlist and returns a single descriptor
representing the whole transfer.

Another new operation for terminating all pending transfers is added as
well. The latter is needed because there may be errors outside the scope
of the DMA Engine framework that may require DMA operations to be
terminated prematurely.

DMA Engine drivers may extend the dma_device, dma_chan and/or
dma_slave_descriptor structures to allow controller-specific
operations. The client driver can detect such extensions by looking at
the DMA Engine's struct device, or it can request a specific DMA
Engine device by setting the dma_dev field in struct dma_slave.

dmaslave interface changes since v4:
  * Fix checkpatch errors
  * Fix changelog (there are no slave descriptors anymore)

dmaslave interface changes since v3:
  * Use dma_data_direction instead of a new enum
  * Submit slave transfers as scatterlists
  * Remove the DMA slave descriptor struct

dmaslave interface changes since v2:
  * Add a dma_dev field to struct dma_slave. If set, the client can
    only be bound to the DMA controller that corresponds to this
    device.  This allows controller-specific extensions of the
    dma_slave structure; if the device matches, the controller may
    safely assume its extensions are present.
  * Move reg_width into struct dma_slave as there are currently no
    users that need to be able to set the width on a per-transfer
    basis.

dmaslave interface changes since v1:
  * Drop the set_direction and set_width descriptor hooks. Pass the
    direction and width to the prep function instead.
  * Declare a dma_slave struct with fixed information about a slave,
    i.e. register addresses, handshake interfaces and such.
  * Add pointer to a dma_slave struct to dma_client. Can be NULL if
    the DMA_SLAVE capability isn't requested.
  * Drop the set_slave device hook since the alloc_chan_resources hook
    now has enough information to set up the channel for slave
    transfers.

Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-07-08 11:59:35 -07:00
Dan Williams
e1d181efb1 dmaengine: add DMA_COMPL_SKIP_{SRC,DEST}_UNMAP flags to control dma unmap
In some cases client code may need the dma-driver to skip the unmap of source
and/or destination buffers.  Setting these flags indicates to the driver to
skip the unmap step.  In this regard async_xor is currently broken in that it
allows the destination buffer to be unmapped while an operation is still in
progress, i.e. when the number of sources exceeds the hardware channel's
maximum (fixed in a subsequent patch).

Acked-by: Saeed Bishara <saeed@marvell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-07-08 11:59:12 -07:00
Haavard Skinnemoen
848c536a37 dmaengine: Add dma_client parameter to device_alloc_chan_resources
A DMA controller capable of doing slave transfers may need to know a
few things about the slave when preparing the channel. We don't want
to add this information to struct dma_channel since the channel hasn't
yet been bound to a client at this point.

Instead, pass a reference to the client requesting the channel to the
driver's device_alloc_chan_resources hook so that it can pick the
necessary information from the dma_client struct by itself.

[dan.j.williams@intel.com: fixed up fsldma and mv_xor]
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-07-08 11:58:58 -07:00
Dan Williams
7cc5bf9a3a dmaengine: track the number of clients using a channel
Haavard's dma-slave interface would like to test for exclusive access to a
channel.  The standard channel refcounting is not sufficient in that it
tracks more than just client references, it is also inaccurate as reference
counts are percpu until the channel is removed.

This change also enables a future fix to deallocate resources when a client
declines to use a capable channel.

Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-07-08 11:58:21 -07:00
Harvey Harrison
238f74a227 mac80211: move QOS control helpers into ieee80211.h
Also remove the WLAN_IS_QOS_DATA inline after removing the last
two users.  This starts moving away from using rx->fc to using
the header frame_control directly.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-07-08 14:15:59 -04:00
Bartlomiej Zolnierkiewicz
a861beb140 ide: add __ide_default_irq() inline helper
Add __ide_default_irq() inline helper and use it instead of
ide_default_irq() in ide-probe.c and ns87415.c (all host drivers
except IDE PCI ones always setup hwif->irq so it is enough to
check only for I/O bases 0x1f0 and 0x170).

This fixes post-2.6.25 regression since ide_default_irq()
define could shadow ide_default_irq() inline.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-07-08 19:27:22 +02:00
Bernhard Walle
69ac9cd629 sysfs: add /sys/firmware/memmap
This patch adds /sys/firmware/memmap interface that represents the BIOS
(or Firmware) provided memory map. The tree looks like:

    /sys/firmware/memmap/0/start   (hex number)
                           end     (hex number)
                           type    (string)
    ...                 /1/start
                           end
                           type

With the following shell snippet one can print the memory map in the same form
the kernel prints itself when booting on x86 (the E820 map).

  --------- 8< --------------------------
    #!/bin/sh
    cd /sys/firmware/memmap
    for dir in * ; do
        start=$(cat $dir/start)
        end=$(cat $dir/end)
        type=$(cat $dir/type)
        printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type"
    done
  --------- >8 --------------------------

That patch only provides the needed interface:

 1. The sysfs interface.
 2. The structure and enumeration definition.
 3. The function firmware_map_add() and firmware_map_add_early()
    that should be called from architecture code (E820/EFI, for
    example) to add the contents to the interface.

If the kernel is compiled without CONFIG_FIRMWARE_MEMMAP, the interface does
nothing without cluttering the architecture-specific code with #ifdef's.

The purpose of the new interface is kexec: While /proc/iomem represents
the *used* memory map (e.g. modified via kernel parameters like 'memmap'
and 'mem'), the /sys/firmware/memmap tree represents the unmodified memory
map provided via the firmware. So kexec can:

 - use the original memory map for rebooting,
 - use the /proc/iomem for setting up the ELF core headers for kdump
   case that should only represent the memory of the system.

The patch has been tested on i386 and x86_64.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Acked-by: Greg KH <gregkh@suse.de>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: kexec@lists.infradead.org
Cc: yhlu.kernel@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 17:55:41 +02:00
Niels de Vos
f3d1eb19ab Input: serio - trivial documentation fix
In include/linux/serio.h two different define-series are documented as
"Serio types". However the second series contains defines for the
different protocols.

Signed-off-by: Niels de Vos <niels.devos@wincor-nixdorf.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2008-07-08 10:33:01 -04:00
Ron Rindjunsky
429a380571 mac80211: add block ack request capability
This patch adds block ack request capability

Signed-off-by: Ester Kummer <ester.kummer@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-07-08 10:21:34 -04:00
Yinghai Lu
3c999f1426 x86: check command line when CONFIG_X86_MPPARSE is not set, v2
if acpi=off, acpi=noirq and pci=noacpi, we need to disable apic.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:31 +02:00
Yinghai Lu
d52d53b8a5 RFC x86: try to remove arch_get_ram_range
want to remove arch_get_ram_range, and use early_node_map instead.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:27 +02:00
Jeremy Fitzhardinge
a7bf0bd5e6 build: add __page_aligned_data and __page_aligned_bss
Making a variable page-aligned by using
__attribute__((section(".data.page_aligned"))) is fragile because if
sizeof(variable) is not also a multiple of page size, it leaves
variables in the remainder of the section unaligned.

This patch introduces two new qualifiers, __page_aligned_data and
__page_aligned_bss to set the section *and* the alignment of
variables.  This makes page-aligned variables more robust because the
linker will make sure they're aligned properly.  Unfortunately it
requires *all* page-aligned data to use these macros...

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:13 +02:00
Patrick McHardy
9bb8582efb vlan: TCI related type and naming cleanups
The VLAN code contains multiple spots that use tag, id and tci as
identifiers for arguments and variables incorrectly and they actually
contain or are expected to contain something different. Additionally
types are used inconsistently (unsigned short vs u16) and identifiers
are sometimes capitalized.

- consistently use u16 for storing TCI, ID or QoS values
- consistently use vlan_id and vlan_tci for storing the respective values
- remove capitalization
- add kdoc comment to netif_hwaccel_{rx,receive_skb}

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 03:24:44 -07:00
Patrick McHardy
df6b6a0cf6 vlan: remove useless struct hlist_node declaration from if_vlan.h
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 03:24:14 -07:00
Patrick McHardy
22d1ba74bb vlan: move struct vlan_dev_info to private header
Hide struct vlan_dev_info from drivers to prevent them from growing
more creative ways to use it. Provide accessors for the two drivers
that currently use it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 03:23:57 -07:00
Patrick McHardy
7750f403cb vlan: uninline __vlan_hwaccel_rx
The function is huge and included at least once in every VLAN acceleration
capable driver. Uninline it; to avoid having drivers depend on the VLAN
module, the function is always built in statically when VLAN is enabled.

With all VLAN acceleration capable drivers that build on x86_64 enabled,
this results in:

   text    data     bss     dec     hex filename
6515227  854044  343968 7713239  75b1d7 vmlinux.inlined
6505637  854044  343968 7703649  758c61 vmlinux.uninlined
----------------------------------------------------------
  -9590

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 03:23:36 -07:00
Patrick McHardy
acc81e1465 vlan: fix network_header/mac_header adjustments
Lennert Buytenhek points out that the VLAN code incorrectly adjusts
skb->network_header to point in the middle of the VLAN header and
additionally tries to adjust skb->mac_header without checking for
validity.

The network_header should not be touched at all since we're only
adding headers in front of it, mac_header adjustments are not
necessary at all.

Based on patch by Lennert Buytenhek <buytenh@wantstofly.org>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 03:21:27 -07:00
Richard Kennedy
2c693610fe net: remove padding from struct socket on 64bit & increase objects/cache
remove padding from struct socket reducing its size by 8 bytes.
    
This allows more objects/cache in sock_inode_cache
12 objects/cache when cacheline size is 128 (generic x86_64)
    
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 03:03:01 -07:00
Ingo Molnar
2b4fa851b2 Merge branch 'x86/numa' into x86/devel
Conflicts:

	arch/x86/Kconfig
	arch/x86/kernel/e820.c
	arch/x86/kernel/efi_64.c
	arch/x86/kernel/mpparse.c
	arch/x86/kernel/setup.c
	arch/x86/kernel/setup_32.c
	arch/x86/mm/init_64.c
	include/asm-x86/proto.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:59:23 +02:00
Joonwoo Park
4ad3f26162 netfilter: fix string extension for case insensitive pattern matching
The flag XT_STRING_FLAG_IGNORECASE indicates case insensitive string
matching. netfilter can find cmd.exe, Cmd.exe, cMd.exe and etc easily.

A new revision 1 was added, in the meantime invert of xt_string_info
was moved into flags as a flag. If revision is 1, The flag
XT_STRING_FLAG_INVERT indicates invert matching.

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 02:38:56 -07:00
Joonwoo Park
dde77e6044 textsearch: convert kmalloc + memset to kzalloc
convert kmalloc + memset to kzalloc for alloc_ts_config

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 02:38:40 -07:00
Joonwoo Park
b9c7967831 textsearch: support for case insensitive searching
The function textsearch_prepare has a new flag to support case
insensitive searching.

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 02:37:31 -07:00
Adrian Bunk
cdf060a5d3 netfilter: cleanup netfilter_ipv6.h userspace header
Kernel functions are not for userspace.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 02:36:40 -07:00
Mike Travis
3461b0af02 x86: remove static boot_cpu_pda array v2
* Remove the boot_cpu_pda array and pointer table from the data section.
    Allocate the pointer table and array during init.  do_boot_cpu()
    will reallocate the pda in node local memory and if the cpu is being
    brought up before the bootmem array is released (after_bootmem = 0),
    then it will free the initial pda.  This will happen for all cpus
    present at system startup.

    This removes 512k + 32k bytes from the data section.

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-08 11:31:25 +02:00
Ingo Molnar
3de352bbd8 Merge branch 'x86/mpparse' into x86/devel
Conflicts:

	arch/x86/Kconfig
	arch/x86/kernel/io_apic_32.c
	arch/x86/kernel/setup_64.c
	arch/x86/mm/init_32.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:14:58 +02:00
Yinghai Lu
b5bc6c0e55 x86, mm: use add_highpages_with_active_regions() for high pages init v2
use early_node_map to init high pages, so we can remove page_is_ram() and
page_is_reserved_early() in the big loop with add_one_highpage

also remove page_is_reserved_early(), it is not needed anymore.

v2: fix the build of other platforms

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:37:25 +02:00
Yinghai Lu
cc1050bafe x86: replace shrink_active_range() with remove_active_range()
in case we have kva before ramdisk on a node, we still need to use
those ranges.

v2: reserve_early kva ram area, in case there are holes in highmem, to avoid
    those area could be treat as free high pages.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:36:29 +02:00
Yinghai Lu
d2dbf34332 x86: clean up reserve_bootmem_generic() and port it to 32-bit
1. add reserve_bootmem_generic for 32bit
2. change len to unsigned long
3. make early_res_to_bootmem to use it

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:36:17 +02:00
Ingo Molnar
896395c290 Merge branch 'linus' into tmp.x86.mpparse.new 2008-07-08 10:32:56 +02:00
Ingo Molnar
1b8ba39a3f Merge branch 'x86/irq' into x86/devel
Conflicts:

	arch/x86/kernel/i8259.c
	arch/x86/kernel/irqinit_64.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 09:53:57 +02:00
Ingo Molnar
58cf35228f Merge branches 'x86/mmio', 'x86/delay', 'x86/idle', 'x86/oprofile', 'x86/debug', 'x86/ptrace' and 'x86/amd-iommu' into x86/devel 2008-07-08 09:46:15 +02:00
Ingo Molnar
6924d1ab8b Merge branches 'x86/numa-fixes', 'x86/apic', 'x86/apm', 'x86/bitops', 'x86/build', 'x86/cleanups', 'x86/cpa', 'x86/cpu', 'x86/defconfig', 'x86/gart', 'x86/i8259', 'x86/intel', 'x86/irqstats', 'x86/kconfig', 'x86/ldt', 'x86/mce', 'x86/memtest', 'x86/pat', 'x86/ptemask', 'x86/resumetrace', 'x86/threadinfo', 'x86/timers', 'x86/vdso' and 'x86/xen' into x86/devel 2008-07-08 09:16:56 +02:00
Neil Brown
0529613a19 Merge branch 'for-neil' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/md into for-next 2008-07-08 10:13:28 +10:00
Neil Brown
5b1a4bf220 Merge branch 'master' into for-next 2008-07-08 10:11:50 +10:00
Rafael J. Wysocki
337001b6c4 PCI: Simplify PCI device PM code
If the offset of PCI device's PM capability in its configuration space,
the mask of states that the device supports PME# from and the D1 and D2
support bits are cached in the corresponding struct pci_dev, the PCI
device PM code can be simplified quite a bit.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-07-07 16:26:50 -07:00
Rafael J. Wysocki
404cc2d8ce PCI PM: Introduce pci_prepare_to_sleep and pci_back_from_sleep
Introduce functions pci_prepare_to_sleep() and pci_back_from_sleep(),
to be used by the PCI drivers that want to place their devices into
the lowest power state appropiate for them (PCI_D3hot, if the device
is not supposed to wake up the system, or the deepest state from
which the wake-up is possible, otherwise) while the system is being
prepared to go into a sleeping state and to put them back into D0
during the subsequent transition to the working state.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-07-07 16:26:33 -07:00
Rafael J. Wysocki
eb9d0fe40e PCI ACPI: Rework PCI handling of wake-up
* Introduce function acpi_pm_device_sleep_wake() for enabling and
  disabling the system wake-up capability of devices that are power
  manageable by ACPI.

* Introduce function acpi_bus_can_wakeup() allowing other (dependent)
  subsystems to check if ACPI is able to enable the system wake-up
  capability of given device.

* Introduce callback .sleep_wake() in struct pci_platform_pm_ops and
  for the ACPI PCI 'driver' make it use acpi_pm_device_sleep_wake().

* Introduce callback .can_wakeup() in struct pci_platform_pm_ops and
  for the ACPI 'driver' make it use acpi_bus_can_wakeup().

* Move the PME# handlig code out of pci_enable_wake() and split it
  into two functions, pci_pme_capable() and pci_pme_active(),
  allowing the caller to check if given device is capable of
  generating PME# from given power state and to enable/disable the
  device's PME# functionality, respectively.

* Modify pci_enable_wake() to use the new ACPI callbacks and the new
  PME#-related functions.

* Drop the generic .platform_enable_wakeup() callback that is not
  used any more.

* Introduce device_set_wakeup_capable() that will set the
  power.can_wakeup flag of given device.

* Rework PCI device PM initialization so that, if given device is
  capable of generating wake-up events, either natively through the
  PME# mechanism, or with the help of the platform, its
  power.can_wakeup flag is set and its power.should_wakeup flag is
  unset as appropriate.

* Make ACPI set the power.can_wakeup flag for devices found to be
  wake-up capable by it.

* Make the ACPI wake-up code enable/disable GPEs for devices that
  have the wakeup.flags.prepared flag set (which means that their
  wake-up power has been enabled).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-07-07 16:26:28 -07:00
Greg KH
c6c4f070a6 PCI: make pci_name use dev_name
Also fixes up the sparc code that was assuming this is not a constant.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-07-07 16:02:40 -07:00
Claudio Nieder
7342239273 Input: add driver for Tabletkiosk Sahara TouchIT-213 touchscreen
Signed-off-by: Claudio Nieder <private@claudio.ch>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2008-07-07 12:00:30 -04:00
Dmitry Baryshkov
f024ff10b1 [ARM] 5128/1: tc6393xb: tmio-nand support
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-07-07 13:22:08 +01:00
Dmitry Baryshkov
aa613de676 [ARM] 5127/1: Core MFD support
This patch provides a common subdevice registration system for MFD type
chips, using platfrom device.

Signed-off-by: Ian Molton <spyro@f2s.com>
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-07-07 13:22:06 +01:00
Dmitry Baryshkov
d6315949ac [ARM] 5096/2: Support Toshiba TC6393XB Mobile I/O Controller.
Add support for Toshiba TC6393XB companion chip. Currently
only GPIO and part of IRQ features of the device are supported.

Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-07-07 13:22:01 +01:00
Ingo Molnar
d763d5edf9 Merge branch 'linus' into tracing/mmiotrace 2008-07-07 08:07:35 +02:00
Ingo Molnar
032f82786f Merge commit 'v2.6.26-rc9' into sched/devel 2008-07-07 08:01:26 +02:00
Ingo Molnar
9982fbface Revert "cpumask: introduce new APIs"
This reverts commit acb7669c12.

the wrappers are not needed anymore.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-06 14:24:08 +02:00
Ingo Molnar
68083e05d7 Merge commit 'v2.6.26-rc9' into cpus4096 2008-07-06 14:23:39 +02:00
David S. Miller
ea2aca084b Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	Documentation/feature-removal-schedule.txt
	drivers/net/wan/hdlc_fr.c
	drivers/net/wireless/iwlwifi/iwl-4965.c
	drivers/net/wireless/iwlwifi/iwl3945-base.c
2008-07-05 23:08:07 -07:00
Patrick McHardy
70c03b49b8 vlan: Add GVRP support
Add GVRP support for dynamically registering VLANs with switches.

By default GVRP is disabled because we only support the applicant-only
participant model, which means it should not be enabled on vlans that
are members of a bridge. Since there is currently no way to cleanly
determine that, the user is responsible for enabling it.

The code is pretty small and low impact, its wrapped in a config
option though because it depends on the GARP implementation and
the STP core.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 21:26:57 -07:00
Patrick McHardy
eca9ebac65 net: Add GARP applicant-only participant
Add an implementation of the GARP (Generic Attribute Registration Protocol)
applicant-only participant. This will be used by the following patch to
add GVRP support to the VLAN code.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 21:26:13 -07:00
David S. Miller
4ce2417bfb Merge branch 'davem-next' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2008-07-05 21:03:31 -07:00
Eduard - Gabriel Munteanu
ca31e146d5 Move _RET_IP_ and _THIS_IP_ to include/linux/kernel.h
These two macros are useful beyond lock debugging. Moved definitions from
include/linux/debug_locks.h to include/linux/kernel.h, so code that needs
them does not have to include the former, which would have been a less
intuitive choice of a header.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-05 13:10:50 -07:00
Stephen Rothwell
acb7669c12 cpumask: introduce new APIs
In linux-next there is a commit ("x86: Add performance variants of cpumask
operators") which, as part of the 4096 cpu support work adds some new APIs
for dealing with cpu masks.  Add trivial versions of these now so that
subsystems can update in a timely manner and avoid conflicts in linux-next
and the next merge window.

Cc: Mike Travis <travis@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-04 10:40:09 -07:00
Andres Salomon
e08c1694d9 olpc: sdhci: add quirk for the Marvell CaFe's vdd/powerup issue
This has been sitting around unloved for way too long..

The Marvell CaFe chip's SD implementation chokes during card insertion
if one attempts to set the voltage and power up in the same
SDHCI_POWER_CONTROL register write.  This adds a quirk that does
that particular dance in two steps.

It also adds an entry to pci_ids.h for the CaFe chip's SD device.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Pierre Ossman <drzeus-list@drzeus.cx>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-04 10:40:09 -07:00
Andrew G. Morgan
086f7316f0 security: filesystem capabilities: fix fragile setuid fixup code
This commit includes a bugfix for the fragile setuid fixup code in the
case that filesystem capabilities are supported (in access()).  The effect
of this fix is gated on filesystem capability support because changing
securebits is only supported when filesystem capabilities support is
configured.)

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-04 10:40:08 -07:00
Stephen Rothwell
93921f5c2c Introduce rculist.h
In linux-next there is a commit ("rcu: split list.h and move rcu-protected
lists into rculist.h") that moved the rcu related list iterators from
list.h to rculist.h.  Add a trivial version of the file now so that
various subsystem trees can start using it now for -next changes and so
reduce the build errors caused by adding uses of the moved functions.

Cc: Franck Bui-Huu <fbuihuu@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-04 10:40:07 -07:00
Miguel Ojeda
450c622e9f Miguel Ojeda has moved
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-04 10:40:05 -07:00
James Bottomley
69d44a1835 firmware: fix the request_firmware() dummy
> the build (.config attached) failed, make ends with :
> ...
>   UPD     include/linux/compile.h
>   CC      init/version.o
>   LD      init/built-in.o
>   LD      vmlinux
> drivers/built-in.o: In function `sas_request_addr':
> (.text+0x33bab): undefined reference to `request_firmware'
> drivers/built-in.o: In function `sas_request_addr':
> (.text+0x33c3f): undefined reference to `release_firmware'
> make: *** [vmlinux] Error 1

There's a slight fault in the stub logic.  It fails for FW_LOADER=m and
the user =y.

This should fix it.

This patch fixes the following 2.6.26-rc regression:
  http://bugzilla.kernel.org/show_bug.cgi?id=10730

Reviewed-by: Toralf Foerster <toralf.foerster@gmx.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-04 10:40:04 -07:00
Christoph Lameter
cde5353599 Christoph has moved
Remove all clameter@sgi.com addresses from the kernel tree since they will
become invalid on June 27th.  Change my maintainer email address for the
slab allocators to cl@linux-foundation.org (which will be the new email
address for the future).

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-04 10:40:04 -07:00
Linus Torvalds
3ea9eed493 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  slub: Do not use 192 byte sized cache if minimum alignment is 128 byte
2008-07-04 09:48:21 -07:00
Krzysztof Halasa
198191c4a7 WAN: convert drivers to use built-in netdev_stats
There is no point in using separate net_device_stats structs when
the one in struct net_device is present. Compiles.

Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-07-04 08:47:41 -04:00
Heiko Carstens
ba8dd03ac0 generic-ipi: fix s390 build bug
forgot to remove #include <linux/spinlock.h> from linux/smp.h while
fixing the original s390 build bug.

Patch below fixes this build bug caused by header inclusion dependencies:

  CC      kernel/timer.o
In file included from include/linux/spinlock.h:87,
                 from include/linux/smp.h:11,
                 from include/linux/kernel_stat.h:4,
                 from kernel/timer.c:22:
include/asm/spinlock.h: In function '__raw_spin_lock':
include/asm/spinlock.h:69: error: implicit declaration of function 'smp_processor_id'

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-04 11:26:40 +02:00
FUJITA Tomonori
27f8221af4 block: add blk_queue_update_dma_pad
This adds blk_queue_update_dma_pad to prevent LLDs from overwriting
the dma pad mask wrongly (we added blk_queue_update_dma_alignment due
to the same reason).

This also converts libata to use blk_queue_update_dma_pad instead of
blk_queue_dma_pad.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-07-04 09:52:13 +02:00
J. Bruce Fields
e86322f611 Merge branch 'for-bfields' of git://linux-nfs.org/~tomtucker/xprt-switch-2.6 into for-2.6.27 2008-07-03 16:24:06 -04:00
Christoph Lameter
41d54d3bf8 slub: Do not use 192 byte sized cache if minimum alignment is 128 byte
The 192 byte cache is not necessary if we have a basic alignment of 128
byte. If it would be used then the 192 would be aligned to the next 128 byte
boundary which would result in another 256 byte cache. Two 256 kmalloc caches
cause sysfs to complain about a duplicate entry.

MIPS needs 128 byte aligned kmalloc caches and spits out warnings on boot without
this patch.

Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-07-03 19:01:55 +03:00
Philipp Zabel
3b73125af6 [ARM] 5044/1: pwm_bl: add init/notify/exit callbacks
This allows platform code to manipulate GPIOs and brightness level as
needed.

Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-07-03 13:25:05 +01:00
eric miao
42796d37da [ARM] pxa: add generic PWM backlight driver
Patch mostly from Eric Miao, with minor edits by rmk to convert
Eric's driver to a generic PWM-based backlight driver.

Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-07-03 13:25:00 +01:00
Jens Axboe
e48ec69005 block: extend queue_flag bitops
Add test_and_clear and test_and_set.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-07-03 13:21:15 +02:00
Alasdair G Kergon
cc371e66e3 Add bvec_merge_data to handle stacked devices and ->merge_bvec()
When devices are stacked, one device's merge_bvec_fn may need to perform
the mapping and then call one or more functions for its underlying devices.

The following bio fields are used:
  bio->bi_sector
  bio->bi_bdev
  bio->bi_size
  bio->bi_rw  using bio_data_dir()

This patch creates a new struct bvec_merge_data holding a copy of those
fields to avoid having to change them directly in the struct bio when
going down the stack only to have to change them back again on the way
back up.  (And then when the bio gets mapped for real, the whole
exercise gets repeated, but that's a problem for another day...)

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-07-03 13:21:15 +02:00
Jens Axboe
b24498d477 block: integrity flags can't use bit ops on unsigned short
Just use normal open coded bit operations instead, they need not be
atomic.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-07-03 13:21:15 +02:00
Adel Gadllah
0b07de85a7 allow userspace to modify scsi command filter on per device basis
This patch exports the per-gendisk command filter to user space through
sysfs, so it can be changed by the system administrator.
All users of the old cmd filter have been converted to use the new one.

Original patch from Peter Jones.

Signed-off-by: Adel Gadllah <adel.gadllah@gmail.com>
Signed-off-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-07-03 13:21:14 +02:00
Jens Axboe
6e2401ad6f block: integrity cleanups
- No need to check for NULL bio, we'll get an immediate oops anyway.
- Make bio_integrity() a proper function.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-07-03 13:21:14 +02:00
Jens Axboe
da9cbc8739 block: blkdev.h cleanup, move iocontext stuff to iocontext.h
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-07-03 13:21:14 +02:00
Martin K. Petersen
7ba1ba12ee block: Block layer data integrity support
Some block devices support verifying the integrity of requests by way
of checksums or other protection information that is submitted along
with the I/O.

This patch implements support for generating and verifying integrity
metadata, as well as correctly merging, splitting and cloning bios and
requests that have this extra information attached.

See Documentation/block/data-integrity.txt for more information.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-07-03 13:21:13 +02:00
Martin K. Petersen
51d654e1d8 block: Globalize bio_set and bio_vec_slab
Move struct bio_set and biovec_slab definitions to bio.h so they can
be used outside of bio.c.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-07-03 13:21:13 +02:00
Jens Axboe
244b4d56f8 block: kill request_queue_t
Everything was moved to struct request_queue a few kernel revisions
ago, maintaining the deprecated typedef to avoid breaking things.
Now the time has come to get rid of that typedef.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-07-03 13:21:12 +02:00
Alan D. Brunelle
02c62304e6 Added in user-injected messages into blk traces
This allows a user to annotate the blk trace stream: writing a suitable
message to {/sys/kernel/debug}/block/<dsf>/msg will have it propagated
into the trace stream.

Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-07-03 13:21:12 +02:00
Rusty Russell
f43798c276 tun: Allow GSO using virtio_net_hdr
Add a IFF_VNET_HDR flag.  This uses the same ABI as virtio_net
(ie. prepending struct virtio_net_hdr to packets) to indicate GSO and
checksum information.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-03 03:48:02 -07:00
Rusty Russell
5228ddc98f tun: TUNSETFEATURES to set gso features.
ethtool is useful for setting (some) device fields, but it's
root-only.  Finer feature control is available through a tun-specific
ioctl.

(Includes Mark McLoughlin <markmc@redhat.com>'s fix to hold rtnl sem).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-03 03:46:16 -07:00
Rusty Russell
07240fd090 tun: Interface to query tun/tap features.
The problem with introducing checksum offload and gso to tun is they
need to set dev->features to enable GSO and/or checksumming, which is
supposed to be done before register_netdevice(), ie. as part of
TUNSETIFF.

Unfortunately, TUNSETIFF has always just ignored flags it doesn't
understand, so there's no good way of detecting whether the kernel
supports new IFF_ flags.

This patch implements a TUNGETFEATURES ioctl which returns all the valid IFF
flags.  It could be extended later to include other features.

Here's an example program which uses it:

#include <linux/if_tun.h>
#include <sys/types.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <err.h>
#include <stdio.h>

static struct {
	unsigned int flag;
	const char *name;
} known_flags[] = {
	{ IFF_TUN, "TUN" },
	{ IFF_TAP, "TAP" },
	{ IFF_NO_PI, "NO_PI" },
	{ IFF_ONE_QUEUE, "ONE_QUEUE" },
};

int main()
{
	unsigned int features, i;

	int netfd = open("/dev/net/tun", O_RDWR);
	if (netfd < 0)
		err(1, "Opening /dev/net/tun");

	if (ioctl(netfd, TUNGETFEATURES, &features) != 0) {
		printf("Kernel does not support TUNGETFEATURES, guessing\n");
		features = (IFF_TUN|IFF_TAP|IFF_NO_PI|IFF_ONE_QUEUE);
	}
	printf("Available features are: ");
	for (i = 0; i < sizeof(known_flags)/sizeof(known_flags[0]); i++) {
		if (features & known_flags[i].flag) {
			features &= ~known_flags[i].flag;
			printf("%s ", known_flags[i].name);
		}
	}
	if (features)
		printf("(UNKNOWN %#x)", features);
	printf("\n");
	return 0;
}

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-03 03:45:32 -07:00
YOSHIFUJI Hideaki
e0835f8fa5 ipv4,ipv6 mroute: Add some helper inline functions to remove ugly ifdefs.
ip{,v6}_mroute_{set,get}sockopt() should not matter by optimization but
it would be better not to depend on optimization semantically.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:57 +09:00
Wang Chen
03d2f897e9 ipv4: Do cleanup for ip_mr_init
Same as ip6_mr_init(), make ip_mr_init() return errno if fails.
But do not do error handling in inet_init(), just print a msg.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:57 +09:00
Wang Chen
623d1a1af7 ipv6: Do cleanup for ip6_mr_init.
If do not do it, we will get following issues:
1. Leaving junks after inet6_init failing halfway.
2. Leaving proc and notifier junks after ipv6 modules unloading.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:56 +09:00
YOSHIFUJI Hideaki
1b34be74cb ipv6 addrconf: add accept_dad sysctl to control DAD operation.
- If 0, disable DAD.
- If 1, perform DAD (default).
- If >1, perform DAD and disable IPv6 operation if DAD for MAC-based
  link-local address has been failed (RFC4862 5.4.5).

We do not follow RFC4862 by default.  Refer to the netdev thread entitled
"Linux IPv6 DAD not full conform to RFC 4862 ?"
	http://www.spinics.net/lists/netdev/msg52027.html

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:56 +09:00
YOSHIFUJI Hideaki
778d80be52 ipv6: Add disable_ipv6 sysctl to disable IPv6 operaion on specific interface.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:55 +09:00
Linus Torvalds
0e77a07ff9 Merge branch 'for-2.6.26' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.26' of git://git.kernel.dk/linux-2.6-block:
  Properly notify block layer of sync writes
  block: Fix the starving writes bug in the anticipatory IO scheduler
2008-07-02 19:25:36 -07:00
Linus Torvalds
f7572da502 Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
  i2c: Fix bad hint about irqs in i2c.h
  i2c: Documentation: fix device matching description
2008-07-02 19:00:29 -07:00
Linus Torvalds
821b03ffac Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (55 commits)
  net: fib_rules: fix error code for unsupported families
  netdevice: Fix wrong string handle in kernel command line parsing
  net: Tyop of sk_filter() comment
  netlink: Unneeded local variable
  net-sched: fix filter destruction in atm/hfsc qdisc destruction
  net-sched: change tcf_destroy_chain() to clear start of filter list
  ipv4: fix sysctl documentation of time related values
  mac80211: don't accept WEP keys other than WEP40 and WEP104
  hostap: fix sparse warnings
  hostap: don't report useless WDS frames by default
  textsearch: fix Boyer-Moore text search bug
  netfilter: nf_conntrack_tcp: fixing to check the lower bound of valid ACK
  ipv6 route: Convert rt6_device_match() to use RT6_LOOKUP_F_xxx flags.
  netlabel: Fix a problem when dumping the default IPv6 static labels
  net/inet_lro: remove setting skb->ip_summed when not LRO-able
  inet fragments: fix race between inet_frag_find and inet_frag_secret_rebuild
  CONNECTOR: add a proc entry to list connectors
  netlink: Fix some doc comments in net/netlink/attr.c
  tcp: /proc/net/tcp rto,ato values not scaled properly (v2)
  include/linux/netdevice.h: don't export MAX_HEADER to userspace
  ...
2008-07-02 18:43:16 -07:00
Andi Kleen
9465efc9e9 Remove BKL from remote_llseek v2
- Replace remote_llseek with generic_file_llseek_unlocked (to force compilation
failures in all users)
- Change all users to either use generic_file_llseek_unlocked directly or
take the BKL around. I changed the file systems who don't use the BKL
for anything (CIFS, GFS) to call it directly. NCPFS and SMBFS and NFS
take the BKL, but explicitely in their own source now.

I moved them all over in a single patch to avoid unbisectable sections.

Open problem: 32bit kernels can corrupt fpos because its modification
is not atomic, but they can do that anyways because there's other paths who
modify it without BKL.

Do we need a special lock for the pos/f_version = 0 checks?

Trond says the NFS BKL is likely not needed, but keep it for now
until his full audit.

v2: Use generic_file_llseek_unlocked instead of remote_llseek_unlocked
    and factor duplicated code (suggested by hch)

Cc: Trond.Myklebust@netapp.com
Cc: swhiteho@redhat.com
Cc: sfrench@samba.org
Cc: vandrove@vc.cvut.cz

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-07-02 15:06:27 -06:00
Tom Tucker
8948896c9e svcrdma: Change WR context get/put to use the kmem cache
Change the WR context pool to be shared across mount points. This
reduces the RDMA transport memory footprint significantly since
idle mounts don't consume WR context memory.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-07-02 15:02:02 -05:00
Tom Tucker
779a48577b svcrdma: Remove unused wait q from svcrdma_xprt structure
The sc_read_wait queue head is no longer used. Remove it.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-07-02 15:01:58 -05:00
Tom Tucker
87295b6c5c svcrdma: Add dma map count and WARN_ON
Add a dma map count in order to verify that all DMA mapping resources
have been freed when the transport is closed.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-07-02 15:01:56 -05:00
Tom Tucker
f820c57ebf svcrdma: Use reply and chunk map for RDMA_READ processing
Modify the RDMA_READ processing to use the reply and chunk list mapping data
types. Also add a special purpose 'hdr_count' field in in the context to hold
the header page count instead of overloading the SGE length field and
corrupting the DMA map length.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-07-02 15:01:55 -05:00
Tom Tucker
ab96dddbed svcrdma: Add a type for keeping NFS RPC mapping
Create a new data structure to hold the remote client address space
to local server address space mapping.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-07-02 15:01:53 -05:00
Santwona Behera
0853ad66b1 netdev: Add support for rx flow hash configuration, using ethtool.
Added new interfaces to ethtool to configure receive network flow
distribution across multiple rx rings using hashing.

Signed-off-by: Santwona Behera <santwona.behera@sun.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-02 03:47:41 -07:00
Wolfram Sang
8e29da9ee8 i2c: Fix bad hint about irqs in i2c.h
i2c.h mentions -1 as a not-issued irq. This false hint was taken by
of_i2c and caused crashes. Don't give any advice as 'no irq' is not
consistent across all architectures yet and it is not needed internally
by the i2c-core.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-07-01 22:38:18 +02:00
Jens Axboe
18ce3751cc Properly notify block layer of sync writes
fsync_buffers_list() and sync_dirty_buffer() both issue async writes and
then immediately wait on them. Conceptually, that makes them sync writes
and we should treat them as such so that the IO schedulers can handle
them appropriately.

This patch fixes a write starvation issue that Lin Ming reported, where
xx is stuck for more than 2 minutes because of a large number of
synchronous IO in the system:

INFO: task kjournald:20558 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kjournald     D ffff810010820978  6712 20558      2
ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2
ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb
0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537
Call Trace:
[<ffffffff803ba6f2>] kobject_get+0x12/0x17
[<ffffffff80247537>] getnstimeofday+0x2f/0x83
[<ffffffff8029c1ac>] sync_buffer+0x0/0x3f
[<ffffffff8066d195>] io_schedule+0x5d/0x9f
[<ffffffff8029c1e7>] sync_buffer+0x3b/0x3f
[<ffffffff8066d3f0>] __wait_on_bit+0x40/0x6f
[<ffffffff8029c1ac>] sync_buffer+0x0/0x3f
[<ffffffff8066d48b>] out_of_line_wait_on_bit+0x6c/0x78
[<ffffffff80243909>] wake_bit_function+0x0/0x23
[<ffffffff8029e3ad>] sync_dirty_buffer+0x98/0xcb
[<ffffffff8030056b>] journal_commit_transaction+0x97d/0xcb6
[<ffffffff8023a676>] lock_timer_base+0x26/0x4b
[<ffffffff8030300a>] kjournald+0xc1/0x1fb
[<ffffffff802438db>] autoremove_wake_function+0x0/0x2e
[<ffffffff80302f49>] kjournald+0x0/0x1fb
[<ffffffff802437bb>] kthread+0x47/0x74
[<ffffffff8022de51>] schedule_tail+0x28/0x5d
[<ffffffff8020cac8>] child_rip+0xa/0x12
[<ffffffff80243774>] kthread+0x0/0x74
[<ffffffff8020cabe>] child_rip+0x0/0x12

Lin Ming confirms that this patch fixes the issue. I've run tests with
it for the past week and no ill effects have been observed, so I'm
proposing it for inclusion into 2.6.26.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-07-01 09:07:34 +02:00
Michael Neuling
f3e909c275 powerpc: Update for VSX core file and ptrace
This correctly hooks the VSX dump into Roland McGrath core file
infrastructure.  It adds the VSX dump information as an additional elf
note in the core file (after talking more to the tool chain/gdb guys).
This also ensures the formats are consistent between signals, ptrace
and core files.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-01 14:47:09 +10:00
Adrian Bunk
9b4a8dd2e9 drivers/macintosh: Various cleanups
This contains the following cleanups:
- make the following needlessly global code static:
  - adb.c: adb_controller
  - adb.c: adb_init()
  - adbhid.c: adb_to_linux_keycodes[]  (also make it const)
  - via-pmu68k.c: backlight_level
  - via-pmu68k.c: backlight_enabled
- remove the following unused code:
  - via-pmu68k.c: sleep_notifier_list

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-01 11:28:06 +10:00
Dan Williams
b5470dc5fc md: resolve external metadata handling deadlock in md_allow_write
md_allow_write() marks the metadata dirty while holding mddev->lock and then
waits for the write to complete.  For externally managed metadata this causes a
deadlock as userspace needs to take the lock to communicate that the metadata
update has completed.

Change md_allow_write() in the 'external' case to start the 'mark active'
operation and then return -EAGAIN.  The expected side effects while waiting for
userspace to write 'active' to 'array_state' are holding off reshape (code
currently handles -ENOMEM), cause some 'stripe_cache_size' change requests to
fail, cause some GET_BITMAP_FILE ioctl requests to fall back to GFP_NOIO, and
cause updates to 'raid_disks' to fail.  Except for 'stripe_cache_size' changes
these failures can be mitigated by coordinating with mdmon.

md_write_start() still prevents writes from occurring until the metadata
handler has had a chance to take action as it unconditionally waits for
MD_CHANGE_CLEAN to be cleared.

[neilb@suse.de: return -EAGAIN, try GFP_NOIO]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-06-30 17:18:19 -07:00
Randy Dunlap
80be038593 PCI: add stub for pci_set_consistent_dma_mask()
When CONFIG_PCI=n, there is no stub for pci_set_consistent_dma_mask(),
so add one like other similar stubs.  Otherwise there can be build errors,
as here:

linux-next-20080630/drivers/ssb/main.c:1175: error: implicit declaration of
function 'pci_set_consistent_dma_mask'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-30 11:41:36 -07:00
Linus Torvalds
e1441b9a41 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: fix locking in force-feedback core
  Input: add KEY_MEDIA_REPEAT definition
2008-06-30 08:58:09 -07:00
Richard Lemon
3cadd2d989 Input: Add driver for iNexio serial touchscreen.
Signed-off-by: Richard Lemon <richard@codelemon.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2008-06-30 09:37:32 -04:00
Bastien Nocera
4bbff7e408 Input: add KEY_MEDIA_REPEAT definition
This patch adds the Repeat key to the input layer. The usage
in the HUT is 0xBC (listed under "15.7 Transport Controls").

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2008-06-30 09:25:12 -04:00
Paul Mackerras
e9a4b6a3f6 Merge branch 'linux-2.6' 2008-06-30 10:16:50 +10:00
Linus Torvalds
0acbbee440 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  dock: bay: Don't call acpi_walk_namespace() when ACPI is disabled.
  ACPI: don't walk tables if ACPI was disabled
  thermal: Create CONFIG_THERMAL_HWMON=n
2008-06-29 12:22:30 -07:00
Linus Torvalds
535e49f48e Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes:
  kbuild: fix a.out.h export to userspace with O= build.
2008-06-29 12:21:56 -07:00
Linus Torvalds
a4480ac4f9 Merge branch 'audit.b52' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b52' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
  [PATCH] remove useless argument type in audit_filter_user()
  [PATCH] audit: fix kernel-doc parameter notation
  [PATCH] kernel/audit.c: nlh->nlmsg_type is gotten more than once
2008-06-29 12:15:10 -07:00
Linus Torvalds
4f46accee4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  [patch 2/3] vfs: dcache cleanups
  [patch 1/3] vfs: dcache sparse fixes
  [patch 3/3] vfs: make d_path() consistent across mount operations
  [patch 4/4] flock: remove unused fields from file_lock_operations
  [patch 3/4] vfs: fix ERR_PTR abuse in generic_readlink
  [patch 2/4] fs: make struct file arg to d_path const
  [patch 1/4] vfs: path_{get,put}() cleanups
  [patch for 2.6.26 4/4] vfs: utimensat(): fix write access check for futimens()
  [patch for 2.6.26 3/4] vfs: utimensat(): fix error checking for {UTIME_NOW,UTIME_OMIT} case
  [patch for 2.6.26 1/4] vfs: utimensat(): ignore tv_sec if tv_nsec == UTIME_OMIT or UTIME_NOW
  [patch for 2.6.26 2/4] vfs: utimensat(): be consistent with utime() for immutable and append-only files
  [PATCH] fix cgroup-inflicted breakage in block_dev.c
2008-06-29 12:14:37 -07:00
David S. Miller
28f49d8fec Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2008-06-28 22:57:58 -07:00
David S. Miller
332e4af80d Merge branch 'davem-next' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2008-06-28 21:28:46 -07:00
Jussi Kivilinna
818727badc rndis_host: pass buffer length to rndis_command
Pass buffer length to rndis_command so that rndis_command can read full
response buffer from device instead of max CONTROL_BUFFER_SIZE bytes.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-28 10:23:34 -04:00
David S. Miller
1b63ba8a86 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/iwlwifi/iwl4965-base.c
2008-06-28 01:19:40 -07:00
Eli Cohen
251a4b320f net/inet_lro: remove setting skb->ip_summed when not LRO-able
When an SKB cannot be chained to a session, the current code attempts
to "restore" its ip_summed field from lro_mgr->ip_summed. However,
lro_mgr->ip_summed does not hold the original value; in fact, we'd
better not touch skb->ip_summed since it is not modified by the code
in the path leading to a failure to chain it.  Also use a cleaer
comment to the describe the ip_summed field of struct net_lro_mgr.

Issue raised by Or Gerlitz <ogerlitz@voltaire.com>

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27 20:09:00 -07:00
Adrian Bunk
c88e6f51c2 include/linux/netdevice.h: don't export MAX_HEADER to userspace
Due to the CONFIG_'s the value is anyway not correct in userspace.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27 19:54:54 -07:00
Dan Williams
d8ee0728b5 md: replace R5_WantPrexor with R5_WantDrain, add 'prexor' reconstruct_states
From: Dan Williams <dan.j.williams@intel.com>

Currently ops_run_biodrain and other locations have extra logic to determine
which blocks are processed in the prexor and non-prexor cases.  This can be
eliminated if handle_write_operations5 flags the blocks to be processed in all
cases via R5_Wantdrain.  The presence of the prexor operation is tracked in
sh->reconstruct_state.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>
2008-06-28 08:32:06 +10:00
Dan Williams
600aa10993 md: replace STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} with 'reconstruct_states'
From: Dan Williams <dan.j.williams@intel.com>

Track the state of reconstruct operations (recalculating the parity block
usually due to incoming writes, or as part of array expansion)  Reduces the
scope of the STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} flags to only tracking whether
a reconstruct operation has been requested via the ops_request field of struct
stripe_head_state.

This is the final step in the removal of ops.{pending,ack,complete,count}, i.e.
the STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} flags only request an operation and do
not track the state of the operation.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>
2008-06-28 08:32:05 +10:00
Dan Williams
ecc65c9b3f md: replace STRIPE_OP_CHECK with 'check_states'
From: Dan Williams <dan.j.williams@intel.com>

The STRIPE_OP_* flags record the state of stripe operations which are
performed outside the stripe lock.  Their use in indicating which
operations need to be run is straightforward; however, interpolating what
the next state of the stripe should be based on a given combination of
these flags is not straightforward, and has led to bugs.  An easier to read
implementation with minimal degrees of freedom is needed.

Towards this goal, this patch introduces explicit states to replace what was
previously interpolated from the STRIPE_OP_* flags.  For now this only converts
the handle_parity_checks5 path, removing a user of the
ops.{pending,ack,complete,count} fields of struct stripe_operations.

This conversion also found a remaining issue with the current code.  There is
a small window for a drive to fail between when we schedule a repair and when
the parity calculation for that repair completes.  When this happens we will
writeback to 'failed_num' when we really want to write back to 'pd_idx'.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>
2008-06-28 08:31:57 +10:00
Dan Williams
2b7497f0e0 md: kill STRIPE_OP_IO flag
From: Dan Williams <dan.j.williams@intel.com>

The R5_Want{Read,Write} flags already gate i/o.  So, this flag is
superfluous and we can unconditionally call ops_run_io().

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>
2008-06-28 08:31:52 +10:00
Dan Williams
b203886edb md: kill STRIPE_OP_MOD_DMA in raid5 offload
From: Dan Williams <dan.j.williams@intel.com>

This micro-optimization allowed the raid code to skip a re-read of the
parity block after checking parity.  It took advantage of the fact that
xor-offload-engines have their own internal result buffer and can check
parity without writing to memory.  Remove it for the following reasons:

1/ It is a layering violation for MD to need to manage the DMA and
   non-DMA paths within async_xor_zero_sum
2/ Bad precedent to toggle the 'ops' flags outside the lock
3/ Hard to realize a performance gain as reads will not need an updated
   parity block and writes will dirty it anyways.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>
2008-06-28 08:31:50 +10:00
Neil Brown
526647320e Make sure all changes to md/dev-XX/state are notified
The important state change happens during an interrupt
in md_error.  So just set a flag there and call sysfs_notify
later in process context.

Signed-off-by: Neil Brown <neilb@suse.de>
2008-06-28 08:31:44 +10:00
Neil Brown
72a23c211e Make sure all changes to md/sync_action are notified.
When the 'resync' thread starts or stops, when we explicitly
set sync_action, or when we determine that there is definitely nothing
to do, we notify sync_action.

To stop "sync_action" from occasionally showing the wrong value,
we introduce a new flags - MD_RECOVERY_RECOVER - to say that a
recovery is probably needed or happening, and we make sure
that we set MD_RECOVERY_RUNNING before clearing MD_RECOVERY_NEEDED.

Signed-off-by: Neil Brown <neilb@suse.de>
2008-06-28 08:31:41 +10:00
Neil Brown
5e96ee65c8 Allow setting start point for requested check/repair
This makes it possible to just resync a small part of an array.
e.g. if a drive reports that it has questionable sectors,
a 'repair' of just the region covering those sectors will
cause them to be read and, if there is an error, re-written
with correct data.

Signed-off-by: Neil Brown <neilb@suse.de>
2008-06-28 08:31:24 +10:00
Neil Brown
a0da84f35b Improve setting of "events_cleared" for write-intent bitmaps.
When an array is degraded, bits in the write-intent bitmap are not
cleared, so that if the missing device is re-added, it can be synced
by only updated those parts of the device that have changed since
it was removed.

The enable this a 'events_cleared' value is stored. It is the event
counter for the array the last time that any bits were cleared.

Sometimes - if a device disappears from an array while it is 'clean' -
the events_cleared value gets updated incorrectly (there are subtle
ordering issues between updateing events in the main metadata and the
bitmap metadata) resulting in the missing device appearing to require
a full resync when it is re-added.

With this patch, we update events_cleared precisely when we are about
to clear a bit in the bitmap.  We record events_cleared when we clear
the bit internally, and copy that to the superblock which is written
out before the bit on storage.  This makes it more "obviously correct".

We also need to update events_cleared when the event_count is going
backwards (as happens on a dirty->clean transition of a non-degraded
array).

Thanks to Mike Snitzer for identifying this problem and testing early
"fixes".

Cc:  "Mike Snitzer" <snitzer@gmail.com>
Signed-off-by: Neil Brown <neilb@suse.de>
2008-06-28 08:31:22 +10:00
David Woodhouse
b660398101 kbuild: fix a.out.h export to userspace with O= build.
We need to check for existence of the a.out.h header in the source tree,
not the object tree, if we want it to get the right answer with O=.

Signed-off-by: David Woodhouse <david.woodhouse@intel.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-06-27 23:13:54 +02:00
Wang Chen
9433f6dd3a PCI: Fix comment of pci_dynids
struct pci_driver has no field of driver_data.
It's in pci_device_id.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-27 13:06:54 -07:00
Luis R. Rodriguez
ffd7891dc9 mac80211: Let drivers have access to TKIP key offets for TX and RX MIC
Some drivers may want to to use the TKIP key offsets for TX and RX
MIC so lets move this out. Lets also clear up a bit how this is used
internally in mac80211.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-27 09:09:17 -04:00
Michael Buesch
f225763a7d ssb, b43, b43legacy, b44: Rewrite SSB DMA API
This is a rewrite of the DMA API for SSB devices.
This is needed, because the old (non-existing) "API" made too many bad
assumptions on the API of the host-bus (PCI).
This introduces an almost complete SSB-DMA-API that maps to the lowlevel
bus-API based on the bustype.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-27 09:09:15 -04:00
Peter Zijlstra
2398f2c6d3 sched: update shares on wakeup
We found that the affine wakeup code needs rather accurate load figures
to be effective. The trouble is that updating the load figures is fairly
expensive with group scheduling. Therefore ratelimit the updating.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 14:31:45 +02:00
Peter Zijlstra
b6a86c746f sched: fix sched_domain aggregation
Keeping the aggregate on the first cpu of the sched domain has two problems:
 - it could collide between different sched domains on different cpus
 - it could slow things down because of the remote accesses

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 14:31:32 +02:00
Peter Zijlstra
c09595f63b sched: revert revert of: fair-group: SMP-nice for group scheduling
Try again..

Initial commit: 18d95a2832
Revert: 6363ca57c7

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 14:31:29 +02:00
Steven Whitehouse
b2cad26cfc [GFS2] Remove obsolete conversion deadlock avoidance code
This is only used by GFS1 so can be removed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27 09:39:47 +01:00
Steven Whitehouse
1bdad60633 [GFS2] Remove remote lock dropping code
There are several reasons why this is undesirable:

 1. It never happens during normal operation anyway
 2. If it does happen it causes performance to be very, very poor
 3. It isn't likely to solve the original problem (memory shortage
    on remote DLM node) it was supposed to solve
 4. It uses a bunch of arbitrary constants which are unlikely to be
    correct for any particular situation and for which the tuning seems
    to be a black art.
 5. In an N node cluster, only 1/N of the dropped locked will actually
    contribute to solving the problem on average.

So all in all we are better off without it. This also makes merging
the lock_dlm module into GFS2 a bit easier.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27 09:39:44 +01:00
Assaf Krauss
b662348662 mac80211: 11h - Handling measurement request
This patch handles the 11h measurement request information element.
This is minimal requested implementation - refuse measurement.

Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26 16:49:14 -04:00
Assaf Krauss
f2df38596a mac80211: 11h Infrastructure - Parsing
This patch introduces parsing of 11h and 11d related elements from incoming
management frames.

Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26 16:49:14 -04:00
Henrique de Moraes Holschuh
5005657cbd rfkill: rename the rfkill_state states and add block-locked state
The current naming of rfkill_state causes a lot of confusion: not only the
"kill" in rfkill suggests negative logic, but also the fact that rfkill cannot
turn anything on (it can just force something off or stop forcing something
off) is often forgotten.

Rename RFKILL_STATE_OFF to RFKILL_STATE_SOFT_BLOCKED (transmitter is blocked
and will not operate; state can be changed by a toggle_radio request), and
RFKILL_STATE_ON to RFKILL_STATE_UNBLOCKED (transmitter is not blocked, and may
operate).

Also, add a new third state, RFKILL_STATE_HARD_BLOCKED (transmitter is blocked
and will not operate; state cannot be changed through a toggle_radio request),
which is used by drivers to indicate a wireless transmiter was blocked by a
hardware rfkill line that accepts no overrides.

Keep the old names as #defines, but document them as deprecated.  This way,
drivers can be converted to the new names *and* verified to actually use rfkill
correctly one by one.

Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26 14:21:22 -04:00
Henrique de Moraes Holschuh
79399a8d19 rfkill: add notifier chains support
Add a notifier chain for use by the rfkill class.  This notifier chain
signals the following events (more to be added when needed):

  1. rfkill: rfkill device state has changed

A pointer to the rfkill struct will be passed as a parameter.

The notifier message types have been added to include/linux/rfkill.h
instead of to include/linux/notifier.h in order to avoid the madness of
modifying a header used globally (and that triggers an almost full tree
rebuild every time it is touched) with information that is of interest only
to code that includes the rfkill.h header.

Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26 14:21:21 -04:00
Henrique de Moraes Holschuh
477576a073 rfkill: add the WWAN radio type
Unfortunately, instead of adding a generic Wireless WAN type, a technology-
specific type (WiMAX) was added.  That's useless for other WWAN devices,
such as EDGE, UMTS, X-RTT and other such radios.

Add a WWAN rfkill type for generic wireless WAN devices.  No keys are added
as most devices really want to use KEY_WLAN for WWAN control (in a cycle of
none, WLAN, WWAN, WLAN+WWAN) and need no specific keycode added.

Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Cc: Iñaky Pérez-González <inaky.perez-gonzalez@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26 14:21:20 -04:00
Henrique de Moraes Holschuh
801e49af4c rfkill: add read-write rfkill switch support
Currently, rfkill support for read/write rfkill switches is hacked through
a round-trip over the input layer and rfkill-input to let a driver sync
rfkill->state to hardware changes.

This is buggy and sub-optimal.  It causes real problems.  It is best to
think of the rfkill class as supporting only write-only switches at the
moment.

In order to implement the read/write functionality properly:

Add a get_state() hook that is called by the class every time it needs to
fetch the current state of the switch.  Add a call to this hook every time
the *current* state of the radio plays a role in a decision.

Also add a force_state() method that can be used to forcefully syncronize
the class' idea of the current state of the switch.  This allows for a
faster implementation of the read/write functionality, as a driver which
get events on switch changes can avoid the need for a get_state() hook.

If the get_state() hook is left as NULL, current behaviour is maintained,
so this change is fully backwards compatible with the current rfkill
drivers.

For hardware that issues events when the rfkill state changes, leave
get_state() NULL in the rfkill struct, set the initial state properly
before registering with the rfkill class, and use the force_state() method
in the driver to keep the rfkill interface up-to-date.

get_state() can be called by the class from atomic context. It must not
sleep.

Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26 14:21:20 -04:00
Henrique de Moraes Holschuh
f3146aff7f rfkill: clarify meaning of rfkill states
rfkill really should have been named rfswitch.  As it is, one can get
confused whether RFKILL_STATE_ON means the KILL switch is on (and
therefore, the radio is being *blocked* from operating), or whether it
means the RADIO rf output is on.

Clearly state that RFKILL_STATE_ON means the radio is *unblocked* from
operating (i.e. there is no rf killing going on).

Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-06-26 14:21:19 -04:00
Jens Axboe
15c8b6c1aa on_each_cpu(): kill unused 'retry' parameter
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:24:38 +02:00
Jens Axboe
8691e5a8f6 smp_call_function: get rid of the unused nonatomic/retry argument
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:24:35 +02:00
Jens Axboe
3d44223327 Add generic helpers for arch IPI function calls
This adds kernel/smp.c which contains helpers for IPI function calls. In
addition to supporting the existing smp_call_function() in a more efficient
manner, it also adds a more scalable variant called smp_call_function_single()
for calling a given function on a single CPU only.

The core of this is based on the x86-64 patch from Nick Piggin, lots of
changes since then. "Alan D. Brunelle" <Alan.Brunelle@hp.com> has
contributed lots of fixes and suggestions as well. Also thanks to
Paul E. McKenney <paulmck@linux.vnet.ibm.com> for reviewing RCU usage
and getting rid of the data allocation fallback deadlock.

Acked-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:21:34 +02:00
Ingo Molnar
9a13150109 Merge commit 'v2.6.26-rc8' into core/rcu 2008-06-26 09:24:23 +02:00
Rene Herman
16d7523973 thermal: Create CONFIG_THERMAL_HWMON=n
A bug in libsensors <= 2.10.6 is exposed
when this new hwmon I/F is enabled.
Create CONFIG_THERMAL_HWMON=n
until some time after libsensors 2.10.7 ships
so those users can run the latest kernel.

libsensors 3.x is already fixed -- those users
can use CONFIG_THERMAL_HWMON=y now.

Signed-off-by: Rene Herman <rene.herman@gmail.com>
Acked-by: Mark M. Hoffman <mhoffman@lightlink.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-06-25 19:25:42 -04:00
John W. Linville
1839cea91e Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/wireless-2.6 2008-06-25 15:17:58 -04:00
Ingo Molnar
c7e745c6de Merge commit 'v2.6.26-rc8' into core/softlockup 2008-06-25 17:49:08 +02:00
Ingo Molnar
cbd6712406 Merge branch 'linus' into x86/irq 2008-06-25 12:30:49 +02:00
Ingo Molnar
28f73e51d0 Merge branch 'linus' into x86/delay
Conflicts:

	arch/x86/kernel/tsc_32.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-25 12:30:10 +02:00
Ingo Molnar
97e6722b8d Merge branch 'linus' into tracing/ftrace 2008-06-25 12:27:56 +02:00
Ingo Molnar
773dc8eaca Merge branch 'linus' into sched/new-API-sched_setscheduler 2008-06-25 12:27:05 +02:00
Ingo Molnar
f57aec5a87 Merge branch 'linus' into sched/devel
Conflicts:

	kernel/sched_rt.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-25 12:26:59 +02:00
Ingo Molnar
ace7f1b796 Merge branch 'linus' into core/softirq 2008-06-25 12:23:59 +02:00
Ingo Molnar
d02859ecb3 Merge commit 'v2.6.26-rc8' into x86/xen
Conflicts:

	arch/x86/xen/enlighten.c
	arch/x86/xen/mmu.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-25 12:16:51 +02:00
Peng Haitao
d8de72473e [PATCH] remove useless argument type in audit_filter_user()
The second argument "type" is not used in audit_filter_user(), so I think that type can be removed. If I'm wrong, please tell me.

Signed-off-by: Peng Haitao <penght@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-24 23:36:35 -04:00
Ben Dooks
f8dd0ecbb7 DM9000: Allow the use of the NSR register to get link status.
The DM9000's internal PHY reports a copy of the link status
in the NSR register of the chip. Reading the status when
polling for link status is faster as it eliminates the need
to sleep, but does not print as much information.

Add an platform flag to force this behaviour, and a Kconfig
option to allow it to be forced to the faster method always.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-24 22:58:07 -04:00
Alok Kataria
f3f3149f35 x86: use cpu_khz for loops_per_jiffy calculation, cleanup
As suggested by Ingo, remove all references to tsc from init/calibrate.c

TSC is x86 specific, and using tsc in variable names in a generic file should
be avoided. lpj_tsc is now called lpj_fine, since it is related to fine tuning
of lpj value. Also tsc_rate_*  is called timer_rate_*

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Daniel Hecht <dhecht@vmware.com>
Cc: Tim Mann <mann@vmware.com>
Cc: Zach Amsden <zach@vmware.com>
Cc: Sahil Rihan <srihan@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:53:46 +02:00
Li Zefan
a033c332e0 lockdep: remove duplicate definition of STATIC_LOCKDEP_MAP_INIT
STATIC_LOCKDEP_MAP_INIT is defined twice in lockdep.h. I guess
it's a copy & paste.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:43:07 +02:00
Marcelo Tosatti
06e0564566 KVM: close timer injection race window in __vcpu_run
If a timer fires after kvm_inject_pending_timer_irqs() but before
local_irq_disable() the code will enter guest mode and only inject such
timer interrupt the next time an unrelated event causes an exit.

It would be simpler if the timer->pending irq conversion could be done
with IRQ's disabled, so that the above problem cannot happen.

For now introduce a new vcpu requests bit to cancel guest entry.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24 12:16:59 +03:00
Eilon Greenstein
34f80b04f3 bnx2x: Add support for BCM57711 HW
Supporting the 57711 and 57711E - refers to in the code as E1H. The
57710 is referred to as E1.

To support the new members in the family, the bnx2x structure was
divided to 3 parts: common, port and function. These changes caused some
rearrangement in the bnx2x.h file.

A set of accessories macros were added to make access to the bnx2x
structure more readable

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-23 20:33:01 -07:00
Rusty Russell
961ccddd59 sched: add new API sched_setscheduler_nocheck: add a flag to control access checks
Hidehiro Kawai noticed that sched_setscheduler() can fail in
stop_machine: it calls sched_setscheduler() from insmod, which can
have CAP_SYS_MODULE without CAP_SYS_NICE.

Two cases could have failed, so are changed to sched_setscheduler_nocheck:
  kernel/softirq.c:cpu_callback()
	- CPU hotplug callback
  kernel/stop_machine.c:__stop_machine_run()
	- Called from various places, including modprobe()

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: sugita <yumiko.sugita.yf@hitachi.com>
Cc: Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-23 22:57:56 +02:00
Alok Kataria
3da757daf8 x86: use cpu_khz for loops_per_jiffy calculation
On the x86 platform we can use the value of tsc_khz computed during tsc
calibration to calculate the loops_per_jiffy value. Its very important
to keep the error in lpj values to minimum as any error in that may
result in kernel panic in check_timer. In virtualization environment, On
a highly overloaded host the guest delay calibration may sometimes
result in errors beyond the ~50% that timer_irq_works can handle,
resulting in the guest panicking.

Does some formating changes to lpj_setup code to now have a single
printk to print the bogomips value.

We do this only for the boot processor because the AP's can have
different base frequencies or the BIOS might boot a AP at a different
frequency.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Daniel Hecht <dhecht@vmware.com>
Cc: Tim Mann <mann@vmware.com>
Cc: Zach Amsden <zach@vmware.com>
Cc: Sahil Rihan <srihan@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-23 22:51:33 +02:00
Abhishek Sagar
ecea656d1d ftrace: freeze kprobe'd records
Let records identified as being kprobe'd be marked as "frozen". The trouble
with records which have a kprobe installed on their mcount call-site is
that they don't get updated. So if such a function which is currently being
traced gets its tracing disabled due to a new filter rule (or because it
was added to the notrace list) then it won't be updated and continue being
traced. This patch allows scanning of all frozen records during tracing to
check if they should be traced.

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-23 22:10:58 +02:00
Abhishek Sagar
785656a41f kprobes: enable clean usage of get_kprobe
Allow clean use of get_kprobe() outside of core kprobe code. Ftrace makes use
of get_kprobe to identify probes installed on mcount call-sites.

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: jkenisto@us.ibm.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-23 22:10:57 +02:00
Abhishek Sagar
395a59d0f8 ftrace: store mcount address in rec->ip
Record the address of the mcount call-site. Currently all archs except sparc64
record the address of the instruction following the mcount call-site. Some
general cleanups are entailed. Storing mcount addresses in rec->ip enables
looking them up in the kprobe hash table later on to check if they're kprobe'd.

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: davem@davemloft.net
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-23 22:10:56 +02:00
Kevin Coffman
d00953a53e gss_krb5: create a define for token header size and clean up ptr location
cleanup:
Document token header size with a #define instead of open-coding it.

Don't needlessly increment "ptr" past the beginning of the header
which makes the values passed to functions more understandable and
eliminates the need for extra "krb5_hdr" pointer.

Clean up some intersecting  white-space issues flagged by checkpatch.pl.

This leaves the checksum length hard-coded at 8 for DES.  A later patch
cleans that up.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:47:25 -04:00
Alan Cox
36c7343b4e tty_driver: Update required method documentation
Some of the requirement rules are now more relaxed. Also correct a
contradiction in the previous update

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-23 10:36:47 -07:00
Miklos Szeredi
8837abcab3 nfsd: rename MAY_ flags
Rename nfsd_permission() specific MAY_* flags to NFSD_MAY_* to make it
clear, that these are not used outside nfsd, and to avoid name and
number space conflicts with the VFS.

[comment from hch: rename MAY_READ, MAY_WRITE and MAY_EXEC as well]

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:50 -04:00
Jeff Layton
a75c5d01e4 sunrpc: remove sv_kill_signal field from svc_serv struct
Since we no longer make any distinction between shutdown signals with
nfsd, then it becomes easier to just standardize on a particular signal
to use to bring it down (SIGINT, in this case).

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:49 -04:00
Jeff Layton
9867d76ca1 knfsd: convert knfsd to kthread API
This patch is rather large, but I couldn't figure out a way to break it
up that would remain bisectable. It does several things:

- change svc_thread_fn typedef to better match what kthread_create expects
- change svc_pool_map_set_cpumask to be more kthread friendly. Make it
  take a task arg and and get rid of the "oldmask"
- have svc_set_num_threads call kthread_create directly
- eliminate __svc_create_thread

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:49 -04:00
Neil Brown
bedbdd8bad knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking.
This removes the BKL from the RPC service creation codepath. The BKL
really isn't adequate for this job since some of this info needs
protection across sleeps.

Also, add some comments to try and clarify how the locking should work
and to make it clear that the BKL isn't necessary as long as there is
adequate locking between tasks when touching the svc_serv fields.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:49 -04:00
Benny Halevy
0d169ca136 nfsd: eliminate unused nfs4_callback.cb_stat
The cb_stat member of struct nfs4_callback is unused
since commit ff7d9756 nfsd: use static memory for callback program and stats

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:48 -04:00
Benny Halevy
a5e561fee6 nfsd: eliminate unused nfs4_callback.cb_program
The cb_program member of struct nfs4_callback unused
since commit ff7d9756 nfsd: use static memory for callback program and stats

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:48 -04:00
J. Bruce Fields
7c11337d9d nfsd: remove three unused NFS4_ACE_* defines
These flag bits aren't used by either the protocol or our
implementation, so I don't know why they were here.

Thanks to Johann Dahm for running across these.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Johann Dahm <jdahm@umich.edu>
2008-06-23 13:02:48 -04:00
Denis V. Lunev
f9f48ec72b [patch 4/4] flock: remove unused fields from file_lock_operations
fl_insert and fl_remove are not used right now in the kernel. Remove them.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23 11:52:30 -04:00
Jan Engelhardt
20d4fdc1a7 [patch 2/4] fs: make struct file arg to d_path const
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23 11:52:30 -04:00
Ingo Molnar
1de8644cc7 Merge branch 'linus' into sched/devel 2008-06-23 11:30:23 +02:00
Ingo Molnar
1e74f9cbbb Merge branch 'linus' into core/rcu 2008-06-23 11:29:11 +02:00
Ingo Molnar
f34bfb1bee Merge branch 'linus' into tracing/ftrace 2008-06-23 11:11:42 +02:00