Commit Graph

37110 Commits

Author SHA1 Message Date
Trond Myklebust
b944dba31d NFSv4: Sanity check the server reply in _nfs4_server_capabilities
We don't want to be setting capabilities and/or requesting attributes
that are not appropriate for the NFSv4 minor version.

- Ensure that we clear the NFS_CAP_SECURITY_LABEL capability when appropriate
- Ensure that we limit the attribute bitmasks to the mounted_on_fileid
  attribute and less for NFSv4.0
- Ensure that we limit the attribute bitmasks to suppattr_exclcreat and
  less for NFSv4.1
- Ensure that we limit it to change_sec_label or less for NFSv4.2

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-04 16:42:52 -05:00
Trond Myklebust
f3f5a0f8cc NFSv4.2: Fix a mismatch between Linux labeled NFS and the NFSv4.2 spec
In the spec, the security label attribute id is '80', which means that
it should be bit number 80-64 == 16 in the 3rd word of the bitmap.

Fixes: 4488cc96c5: NFS: Add NFSv4.2 protocol constants
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Steve Dickson <steved@redhat.com>
Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-04 15:42:10 -05:00
Trond Myklebust
c698dbf9fe Merge branch 'fscache' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into linux-next
Pull fs-cache fixes from David Howells:

Can you pull these commits to fix an issue with NFS whereby caching can be
enabled on a file that is open for writing by subsequently opening it for
reading.  This can be made to crash by opening it for writing again if you're
quick enough.

The gist of the patchset is that the cookie should be acquired at inode
creation only and subsequently enabled and disabled as appropriate (which
dispenses with the backing objects when they're not needed).

The extra synchronisation that NFS does can then be dispensed with as it is
thenceforth managed by FS-Cache.

Could you send these on to Linus?

This likely will need fixing also in CIFS and 9P also once the FS-Cache
changes are upstream.  AFS and Ceph are probably safe.

* 'fscache' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open()
  FS-Cache: Provide the ability to enable/disable cookies
  FS-Cache: Add use/unuse/wake cookie wrappers
2013-10-28 19:36:46 -04:00
Weston Andros Adamson
4d4b69dd84 NFS: add support for multiple sec= mount options
This patch adds support for multiple security options which can be
specified using a colon-delimited list of security flavors (the same
syntax as nfsd's exports file).

This is useful, for instance, when NFSv4.x mounts cross SECINFO
boundaries. With this patch a user can use "sec=krb5i,krb5p"
to mount a remote filesystem using krb5i, but can still cross
into krb5p-only exports.

New mounts will try all security options before failing.  NFSv4.x
SECINFO results will be compared against the sec= flavors to
find the first flavor in both lists or if no match is found will
return -EPERM.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:38:02 -04:00
Weston Andros Adamson
0f5f49b8b3 NFS: cache parsed auth_info in nfs_server
Cache the auth_info structure in nfs_server and pass these values to submounts.

This lays the groundwork for supporting multiple sec= options.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:37:43 -04:00
Weston Andros Adamson
a3f73c27af NFS: separate passed security flavs from selected
When filling parsed_mount_data, store the parsed sec= mount option in
the new struct nfs_auth_info and the chosen flavor in selected_flavor.

This patch lays the groundwork for supporting multiple sec= options.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:36:58 -04:00
Chuck Lever
44c9993384 NFS: Add method to detect whether an FSID is still on the server
Introduce a mechanism for probing a server to determine if an FSID
is present or absent.

The on-the-wire compound is different between minor version 0 and 1.
Minor version 0 appends a RENEW operation to identify which client
ID is probing.  Minor version 1 has a SEQUENCE operation in the
compound which effectively carries the same information.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:30:03 -04:00
Chuck Lever
c9fdeb280b NFS: Add basic migration support to state manager thread
Migration recovery and state recovery must be serialized, so handle
both in the state manager thread.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:24:40 -04:00
Chuck Lever
ce6cda1845 NFS: Add a super_block backpointer to the nfs_server struct
NFS_SB() returns the pointer to an nfs_server struct, given a
pointer to a super_block.  But we have no way to go back the other
way.

Add a super_block backpointer field so that, given an nfs_server
struct, it is easy to get to the filesystem's root dentry.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:24:26 -04:00
Chuck Lever
b03d735b4c NFS: Add method to retrieve fs_locations during migration recovery
The nfs4_proc_fs_locations() function is invoked during referral
processing to perform a GETATTR(fs_locations) on an object's parent
directory in order to discover the target of the referral.  It
performs a LOOKUP in the compound, so the client needs to know the
parent's file handle a priori.

Unfortunately this function is not adequate for handling migration
recovery.  We need to probe fs_locations information on an FSID, but
there's no parent directory available for many operations that
can return NFS4ERR_MOVED.

Another subtlety: recovering from NFS4ERR_LEASE_MOVED is a process
of walking over a list of known FSIDs that reside on the server, and
probing whether they have migrated.  Once the server has detected
that the client has probed all migrated file systems, it stops
returning NFS4ERR_LEASE_MOVED.

A minor version zero server needs to know what client ID is
requesting fs_locations information so it can clear the flag that
forces it to continue returning NFS4ERR_LEASE_MOVED.  This flag is
set per client ID and per FSID.  However, the client ID is not an
argument of either the PUTFH or GETATTR operations.  Later minor
versions have client ID information embedded in the compound's
SEQUENCE operation.

Therefore, by convention, minor version zero clients send a RENEW
operation in the same compound as the GETATTR(fs_locations), since
RENEW's one argument is a clientid4.  This allows a minor version
zero server to identify correctly the client that is probing for a
migration.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:24:00 -04:00
Trond Myklebust
40b00b6b17 SUNRPC: Add a helper to switch the transport of an rpc_clnt
Add an RPC client API to redirect an rpc_clnt's transport from a
source server to a destination server during a migration event.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: forward ported to 3.12 ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:21:32 -04:00
Trond Myklebust
99875249bf NFSv4: Ensure that we disable the resend timeout for NFSv4
The spec states that the client should not resend requests because
the server will disconnect if it needs to drop an RPC request.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-01 18:22:11 -04:00
Trond Myklebust
8a19a0b6cb SUNRPC: Add RPC task and client level options to disable the resend timeout
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-01 18:22:11 -04:00
Trond Myklebust
90051ea774 SUNRPC: Clean up - convert xprt_prepare_transmit to return a bool
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-01 18:22:11 -04:00
Linus Torvalds
f927318840 NFS client bugfixes for 3.12
- Stable fix for Oopses in the pNFS files layout driver
 - Fix a regression when doing a non-exclusive file create on NFSv4.x
 - NFSv4.1 security negotiation fixes when looking up the root filesystem
 - Fix a memory ordering issue in the pNFS files layout driver
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSSfNNAAoJEGcL54qWCgDybGYQAJGm4/vd7/rWZ49KIjGFGkFo
 sCt0UOK6Y6ALhUOIlIreXsQ+Iwn9aAoIIRgx8UwnB+hO6PGnSyFuJZZx1KE8V2kj
 6JlE5FbsWV+3uFQzNJQsNcoj7NZMzIRZT7x+7QansBOdSQjgQc3ig2sAMWREZjn8
 GxMOl8FNRrnP8gRom30ZScgMp1YDM8J1ql80S/nbxh2NOLBsvgg9VapzJhhqkMyl
 b7WKX4Qbg4AeSaxIAIrIwcZ7L2YS09JGC40VSybQARs0/7J8fjOZPs7CmrUCoB5F
 DmT5vfEC4+dqDf8PMyoFVfxK5ua5Sb/FGQmagYYa8bSgY7Uq03akYI++co+4PZU1
 f3SN6CSvVffzGMdXAhUupOZQbkKvKFxR2MTGy8s7dxdkQudd4RioYPDmLfCHlbmb
 VY5kFh/Duqso1FCrcfvZoC88ElrWUz5yoVzZyECOEwCs1wjI6bjmGdSqCSbU75Lm
 Z0XOAn1cStwFvGwCbGZPUzlvueji3coDdCFPBXAOFHzisLYoo/Lxenw7l5D1qM5b
 02iZllcIo340vw8wxHZxVebecFo33P90X1gjv0HQQkV/6EeNgq4D47SWTPxRq3Ai
 Dl9MFjTPl51oseDLrH6I/hBvcqjksB1M1+WjifT0bCIi3Y0HAea2U0wgweHS3vAd
 QHqIpIJxNHDjPBMDWEZW
 =ScfI
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Stable fix for Oopses in the pNFS files layout driver
 - Fix a regression when doing a non-exclusive file create on NFSv4.x
 - NFSv4.1 security negotiation fixes when looking up the root
   filesystem
 - Fix a memory ordering issue in the pNFS files layout driver

* tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Give "flavor" an initial value to fix a compile warning
  NFSv4.1: try SECINFO_NO_NAME flavs until one works
  NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_ds
  NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails
  NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem method
2013-09-30 17:10:26 -07:00
Linus Torvalds
522d6d38f8 Merge branch 'akpm' (fixes from Andrew Morton)
Merge misc fixes from Andrew Morton.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
  pidns: fix free_pid() to handle the first fork failure
  ipc,msg: prevent race with rmid in msgsnd,msgrcv
  ipc/sem.c: update sem_otime for all operations
  mm/hwpoison: fix the lack of one reference count against poisoned page
  mm/hwpoison: fix false report on 2nd attempt at page recovery
  mm/hwpoison: fix test for a transparent huge page
  mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood
  block: change config option name for cmdline partition parsing
  mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration
  mm: avoid reinserting isolated balloon pages into LRU lists
  arch/parisc/mm/fault.c: fix uninitialized variable usage
  include/asm-generic/vtime.h: avoid zero-length file
  nilfs2: fix issue with race condition of competition between segments for dirty blocks
  Documentation/kernel-parameters.txt: replace kernelcore with Movable
  mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored
  kernel/kmod.c: check for NULL in call_usermodehelper_exec()
  ipc/sem.c: synchronize the proc interface
  ipc/sem.c: optimize sem_lock()
  ipc/sem.c: fix race in sem_lock()
  mm/compaction.c: periodically schedule when freeing pages
  ...
2013-09-30 14:32:32 -07:00
Rafael Aquini
117aad1e9e mm: avoid reinserting isolated balloon pages into LRU lists
Isolated balloon pages can wrongly end up in LRU lists when
migrate_pages() finishes its round without draining all the isolated
page list.

The same issue can happen when reclaim_clean_pages_from_list() tries to
reclaim pages from an isolated page list, before migration, in the CMA
path.  Such balloon page leak opens a race window against LRU lists
shrinkers that leads us to the following kernel panic:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
  IP: [<ffffffff810c2625>] shrink_page_list+0x24e/0x897
  PGD 3cda2067 PUD 3d713067 PMD 0
  Oops: 0000 [#1] SMP
  CPU: 0 PID: 340 Comm: kswapd0 Not tainted 3.12.0-rc1-22626-g4367597 #87
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  RIP: shrink_page_list+0x24e/0x897
  RSP: 0000:ffff88003da499b8  EFLAGS: 00010286
  RAX: 0000000000000000 RBX: ffff88003e82bd60 RCX: 00000000000657d5
  RDX: 0000000000000000 RSI: 000000000000031f RDI: ffff88003e82bd40
  RBP: ffff88003da49ab0 R08: 0000000000000001 R09: 0000000081121a45
  R10: ffffffff81121a45 R11: ffff88003c4a9a28 R12: ffff88003e82bd40
  R13: ffff88003da0e800 R14: 0000000000000001 R15: ffff88003da49d58
  FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000067d9000 CR3: 000000003ace5000 CR4: 00000000000407b0
  Call Trace:
    shrink_inactive_list+0x240/0x3de
    shrink_lruvec+0x3e0/0x566
    __shrink_zone+0x94/0x178
    shrink_zone+0x3a/0x82
    balance_pgdat+0x32a/0x4c2
    kswapd+0x2f0/0x372
    kthread+0xa2/0xaa
    ret_from_fork+0x7c/0xb0
  Code: 80 7d 8f 01 48 83 95 68 ff ff ff 00 4c 89 e7 e8 5a 7b 00 00 48 85 c0 49 89 c5 75 08 80 7d 8f 00 74 3e eb 31 48 8b 80 18 01 00 00 <48> 8b 74 0d 48 8b 78 30 be 02 00 00 00 ff d2 eb
  RIP  [<ffffffff810c2625>] shrink_page_list+0x24e/0x897
   RSP <ffff88003da499b8>
  CR2: 0000000000000028
  ---[ end trace 703d2451af6ffbfd ]---
  Kernel panic - not syncing: Fatal exception

This patch fixes the issue, by assuring the proper tests are made at
putback_movable_pages() & reclaim_clean_pages_from_list() to avoid
isolated balloon pages being wrongly reinserted in LRU lists.

[akpm@linux-foundation.org: clarify awkward comment text]
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Reported-by: Luiz Capitulino <lcapitulino@redhat.com>
Tested-by: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-30 14:31:02 -07:00
Mark Brown
3abd38db6d Merge remote-tracking branch 'regulator/fix/doc' into regulator-linus 2013-09-30 12:04:30 +01:00
Linus Torvalds
c23c2234de Char / Misc driver fixes for 3.12-rc3
Here are some HyperV and MEI driver fixes for 3.12-rc3.  They resolve some
 issues that people have been reporting for them.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.21 (GNU/Linux)
 
 iEYEABECAAYFAlJIgcwACgkQMUfUDdst+ynRGwCgoIIGIXG2RTVaTDm8wezRERGA
 ZDoAoKZe5Bec2CSqydcPNZWYg4ATTMjn
 =6r+r
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-3.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are some HyperV and MEI driver fixes for 3.12-rc3.  They resolve
  some issues that people have been reporting for them"

* tag 'char-misc-3.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  Drivers: hv: vmbus: Terminate vmbus version negotiation on timeout
  Drivers: hv: util: Correctly support ws2008R2 and earlier
  mei: cancel stall timers in mei_reset
  mei: bus: stop wait for read during cl state transition
  mei: make me client counters less error prone
2013-09-29 13:44:48 -07:00
Linus Torvalds
aeebc26457 Merge branch 'lockref' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 lockref enablement from Heiko Carstens:
 "Enabling the new lockless lockref variant on s390 would have been
  trivial until Tony Luck added a cpu_relax() call into the
  CMPXCHG_LOOP(), with commit d472d9d98b ("lockref: Relax in cmpxchg
  loop")

  As already mentioned cpu_relax() is very expensive on s390 since it
  yields() the current virtual cpu.  So we are talking of several
  thousand cycles.  Considering this enabling the lockless lockref
  variant would contradict the intention of the new semantics.  And also
  some quick measurements show performance regressions of 50% and more.

  Simply removing the cpu_relax() call again seems also not very
  desireable since Waiman Long reported that for some workloads the call
  improved performance by 5%."

* 'lockref' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: enable ARCH_USE_CMPXCHG_LOCKREF
  lockref: use arch_mutex_cpu_relax() in CMPXCHG_LOOP()
  mutex: replace CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX with simple ifdef
2013-09-28 12:36:19 -07:00
Linus Torvalds
f2e98aa830 DeviceTree fixes for 3.12
Clean-up to fix some warnings for !OF builds and spelling fixes in docs
 
 - Clean-up openrisc prom.h
 - Fix warnings caused by of_irq.h ifdefs
 - Spelling fix for Synopsys
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJSQk1CAAoJEMhvYp4jgsXilvEH/RztxbKNiscz7TYg3BDEKMaY
 Wg029sdygyhl+Xb78FaS3Qwt98pK2JSWqzoNv+Tv4tfksGeLWVpe6HZlGvE/PEhb
 2gUppV/ILyFSH8NfeaPkfyqdqr2XpCkcpkyGgYz7jQzNti5si448LC4oJR4tcXbo
 FyKguRPpTtCwDsG86fPpQgc3fHK+C7W0oQu+PCB0DEk2ApMR05rGpPiMEYsKlj56
 FE9n0DMZ+hfMCUivraVTyHB9Xjo4/kTv9s88hkjwkli+N3UrIUgOGsQpNt27QIFP
 F95roRtVOmgRbgtMAFaEy7nfd+3QuVfonjfYNJzxpRHWew7LxFcVEwWRjb0zaKQ=
 =KDHt
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull DeviceTree fixes from Rob Herring:
 "Clean-up to fix some warnings for !OF builds and spelling fixes in
  docs:

   - Clean-up openrisc prom.h
   - Fix warnings caused by of_irq.h ifdefs
   - Spelling fix for Synopsys"

* tag 'devicetree-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dts: Fix misspelling of Synopsys
  of: clean-up ifdefs in of_irq.h
  openrisc: clean-up prom.h
2013-09-28 11:57:26 -07:00
Heiko Carstens
083986e824 mutex: replace CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX with simple ifdef
Linus suggested to replace

 #ifndef CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX
 #define arch_mutex_cpu_relax() cpu_relax()
 #endif

with just a simple

  #ifndef arch_mutex_cpu_relax
  # define arch_mutex_cpu_relax() cpu_relax()
  #endif

to get rid of CONFIG_HAVE_CPU_RELAX_SIMPLE. So architectures can
simply define arch_mutex_cpu_relax if they want an architecture
specific function instead of having to add a select statement in
their Kconfig in addition.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-09-28 12:46:21 +02:00
David Howells
f1fe29b4a0 NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open()
Use i_writecount to control whether to get an fscache cookie in nfs_open() as
NFS does not do write caching yet.  I *think* this is the cause of a problem
encountered by Mark Moseley whereby __fscache_uncache_page() gets a NULL
pointer dereference because cookie->def is NULL:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160
PGD 0
Thread overran stack, or stack corrupted
Oops: 0000 [#1] SMP
Modules linked in: ...
CPU: 7 PID: 18993 Comm: php Not tainted 3.11.1 #1
Hardware name: Dell Inc. PowerEdge R420/072XWF, BIOS 1.3.5 08/21/2012
task: ffff8804203460c0 ti: ffff880420346640
RIP: 0010:[<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160
RSP: 0018:ffff8801053af878 EFLAGS: 00210286
RAX: 0000000000000000 RBX: ffff8800be2f8780 RCX: ffff88022ffae5e8
RDX: 0000000000004c66 RSI: ffffea00055ff440 RDI: ffff8800be2f8780
RBP: ffff8801053af898 R08: 0000000000000001 R09: 0000000000000003
R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00055ff440
R13: 0000000000001000 R14: ffff8800c50be538 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88042fc60000(0063) knlGS:00000000e439c700
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000000000010 CR3: 0000000001d8f000 CR4: 00000000000607f0
Stack:
...
Call Trace:
[<ffffffff81365a72>] __nfs_fscache_invalidate_page+0x42/0x70
[<ffffffff813553d5>] nfs_invalidate_page+0x75/0x90
[<ffffffff811b8f5e>] truncate_inode_page+0x8e/0x90
[<ffffffff811b90ad>] truncate_inode_pages_range.part.12+0x14d/0x620
[<ffffffff81d6387d>] ? __mutex_lock_slowpath+0x1fd/0x2e0
[<ffffffff811b95d3>] truncate_inode_pages_range+0x53/0x70
[<ffffffff811b969d>] truncate_inode_pages+0x2d/0x40
[<ffffffff811b96ff>] truncate_pagecache+0x4f/0x70
[<ffffffff81356840>] nfs_setattr_update_inode+0xa0/0x120
[<ffffffff81368de4>] nfs3_proc_setattr+0xc4/0xe0
[<ffffffff81357f78>] nfs_setattr+0xc8/0x150
[<ffffffff8122d95b>] notify_change+0x1cb/0x390
[<ffffffff8120a55b>] do_truncate+0x7b/0xc0
[<ffffffff8121f96c>] do_last+0xa4c/0xfd0
[<ffffffff8121ffbc>] path_openat+0xcc/0x670
[<ffffffff81220a0e>] do_filp_open+0x4e/0xb0
[<ffffffff8120ba1f>] do_sys_open+0x13f/0x2b0
[<ffffffff8126aaf6>] compat_SyS_open+0x36/0x50
[<ffffffff81d7204c>] sysenter_dispatch+0x7/0x24

The code at the instruction pointer was disassembled:

> (gdb) disas __fscache_uncache_page
> Dump of assembler code for function __fscache_uncache_page:
> ...
> 0xffffffff812a18ff <+31>: mov 0x48(%rbx),%rax
> 0xffffffff812a1903 <+35>: cmpb $0x0,0x10(%rax)
> 0xffffffff812a1907 <+39>: je 0xffffffff812a19cd <__fscache_uncache_page+237>

These instructions make up:

	ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);

That cmpb is the faulting instruction (%rax is 0).  So cookie->def is NULL -
which presumably means that the cookie has already been at least partway
through __fscache_relinquish_cookie().

What I think may be happening is something like a three-way race on the same
file:

	PROCESS 1	PROCESS 2	PROCESS 3
	===============	===============	===============
	open(O_TRUNC|O_WRONLY)
			open(O_RDONLY)
					open(O_WRONLY)
	-->nfs_open()
	-->nfs_fscache_set_inode_cookie()
	nfs_fscache_inode_lock()
	nfs_fscache_disable_inode_cookie()
	__fscache_relinquish_cookie()
	nfs_inode->fscache = NULL
	<--nfs_fscache_set_inode_cookie()

			-->nfs_open()
			-->nfs_fscache_set_inode_cookie()
			nfs_fscache_inode_lock()
			nfs_fscache_enable_inode_cookie()
			__fscache_acquire_cookie()
			nfs_inode->fscache = cookie
			<--nfs_fscache_set_inode_cookie()
	<--nfs_open()
	-->nfs_setattr()
	...
	...
	-->nfs_invalidate_page()
	-->__nfs_fscache_invalidate_page()
	cookie = nfsi->fscache
					-->nfs_open()
					-->nfs_fscache_set_inode_cookie()
					nfs_fscache_inode_lock()
					nfs_fscache_disable_inode_cookie()
					-->__fscache_relinquish_cookie()
	-->__fscache_uncache_page(cookie)
	<crash>
					<--__fscache_relinquish_cookie()
					nfs_inode->fscache = NULL
					<--nfs_fscache_set_inode_cookie()

What is needed is something to prevent process #2 from reacquiring the cookie
- and I think checking i_writecount should do the trick.

It's also possible to have a two-way race on this if the file is opened
O_TRUNC|O_RDONLY instead.

Reported-by: Mark Moseley <moseleymark@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-27 18:40:25 +01:00
David Howells
94d30ae90a FS-Cache: Provide the ability to enable/disable cookies
Provide the ability to enable and disable fscache cookies.  A disabled cookie
will reject or ignore further requests to:

	Acquire a child cookie
	Invalidate and update backing objects
	Check the consistency of a backing object
	Allocate storage for backing page
	Read backing pages
	Write to backing pages

but still allows:

	Checks/waits on the completion of already in-progress objects
	Uncaching of pages
	Relinquishment of cookies

Two new operations are provided:

 (1) Disable a cookie:

	void fscache_disable_cookie(struct fscache_cookie *cookie,
				    bool invalidate);

     If the cookie is not already disabled, this locks the cookie against other
     dis/enablement ops, marks the cookie as being disabled, discards or
     invalidates any backing objects and waits for cessation of activity on any
     associated object.

     This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
     but it reinitialises the cookie such that it can be reenabled.

     All possible failures are handled internally.  The caller should consider
     calling fscache_uncache_all_inode_pages() afterwards to make sure all page
     markings are cleared up.

 (2) Enable a cookie:

	void fscache_enable_cookie(struct fscache_cookie *cookie,
				   bool (*can_enable)(void *data),
				   void *data)

     If the cookie is not already enabled, this locks the cookie against other
     dis/enablement ops, invokes can_enable() and, if the cookie is not an
     index cookie, will begin the procedure of acquiring backing objects.

     The optional can_enable() function is passed the data argument and returns
     a ruling as to whether or not enablement should actually be permitted to
     begin.

     All possible failures are handled internally.  The cookie will only be
     marked as enabled if provisional backing objects are allocated.

A later patch will introduce these to NFS.  Cookie enablement during nfs_open()
is then contingent on i_writecount <= 0.  can_enable() checks for a race
between open(O_RDONLY) and open(O_WRONLY/O_RDWR).  This simplifies NFS's cookie
handling and allows us to get rid of open(O_RDONLY) accidentally introducing
caching to an inode that's open for writing already.

One operation has its API modified:

 (3) Acquire a cookie.

	struct fscache_cookie *fscache_acquire_cookie(
		struct fscache_cookie *parent,
		const struct fscache_cookie_def *def,
		void *netfs_data,
		bool enable);

     This now has an additional argument that indicates whether the requested
     cookie should be enabled by default.  It doesn't need the can_enable()
     function because the caller must prevent multiple calls for the same netfs
     object and it doesn't need to take the enablement lock because no one else
     can get at the cookie before this returns.

Signed-off-by: David Howells <dhowells@redhat.com
2013-09-27 18:40:25 +01:00
David Howells
8fb883f3e3 FS-Cache: Add use/unuse/wake cookie wrappers
Add wrapper functions for dealing with cookie->n_active:

 (*) __fscache_use_cookie() to increment it.

 (*) __fscache_unuse_cookie() to decrement and test against zero.

 (*) __fscache_wake_unused_cookie() to wake up anyone waiting for it to reach
     zero.

The second and third are split so that the third can be done after cookie->lock
has been released in case the waiter wakes up whilst we're still holding it and
tries to get it.

We will need to wake-on-zero once the cookie disablement patch is applied
because it will then be possible to see n_active become zero without the cookie
being relinquished.

Also move the cookie usement out of fscache_attr_changed_op() and into
fscache_attr_changed() and the operation struct so that cookie disablement
will be able to track it.

Whilst we're at it, only increment n_active if we're about to do
fscache_submit_op() so that we don't have to deal with undoing it if anything
earlier fails.  Possibly this should be moved into fscache_submit_op() which
could look at FSCACHE_OP_UNUSE_COOKIE.

Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-27 18:40:25 +01:00
K. Y. Srinivasan
3a4916050b Drivers: hv: util: Correctly support ws2008R2 and earlier
The current code does not correctly negotiate the version numbers for the util
driver when hosted on earlier hosts. The version numbers presented by this
driver were not compatible with the version numbers supported by Windows Server
2008. Fix this problem.

I would like to thank Olaf Hering (ohering@suse.com) for identifying the problem.

Reported-by: Olaf Hering <ohering@suse.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26 14:20:21 -07:00
Trond Myklebust
5bc2afc2b5 NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem method
Determine if we've created a new file by examining the directory change
attribute and/or the O_EXCL flag.

This fixes a regression when doing a non-exclusive create of a new file.
If the FILE_CREATED flag is not set, the atomic_open() command will
perform full file access permissions checks instead of just checking
for MAY_OPEN.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-26 10:20:18 -04:00
Linus Torvalds
e93dd910b9 A set of device-mapper fixes for 3.12.
A few fixes for dm-snapshot, a 32 bit fix for dm-stats, a couple error
 handling fixes for dm-multipath.  A fix for the thin provisioning target
 to not expose non-zero discard limits if discards are disabled.
 
 Lastly, add two DM module parameters which allow users to tune the
 emergency memory reserves that DM mainatins per device -- this helps fix
 a long-standing issue for dm-multipath.  The conservative default
 reserve for request-based dm-multipath devices (256) has proven
 problematic for users with many multipathed SCSI devices but relatively
 little memory.  To responsibly select a smaller value users should use
 the new nr_bios tracepoint info (via commit 75afb352 "block: Add nr_bios
 to block_rq_remap tracepoint") to determine the peak number of bios
 their workloads create.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSQMVHAAoJEMUj8QotnQNaOXgIAJS6/XJKMoHfiDJ9M+XD34rZ
 Uyr9TEnubX3DKCRBiY23MUcCQn3fx6BjCGv5/c8L4jQFIuLyDi2yatqpwXcbGSJh
 G/S/y6u0Axek+ew7TS80OFop4nblW6MoKnoh9/4N55Ofa+1WvKM4ERUGjHGbauyS
 TxmLQPToCFPLYRIOZ+imd6hQuIZ1+FFdJFvi7kY9O6Llx2sLD6fWi1iruBd/Da2H
 ByMX3biGN45mSpcBzRbSC/FkJ9CRIvT9n82BDPS0o3Tllt8NaVlEDaovB7h4ncc0
 bFuT2Z3Q38B9uZ8Lj0bqdGzv3kXMLCkLo6WhWjyUt84hmDPAzRpBwt60jUqWyZs=
 =bjVp
 -----END PGP SIGNATURE-----

Merge tag 'dm-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device-mapper fixes from Mike Snitzer:
 "A few fixes for dm-snapshot, a 32 bit fix for dm-stats, a couple error
  handling fixes for dm-multipath.  A fix for the thin provisioning
  target to not expose non-zero discard limits if discards are disabled.

  Lastly, add two DM module parameters which allow users to tune the
  emergency memory reserves that DM mainatins per device -- this helps
  fix a long-standing issue for dm-multipath.  The conservative default
  reserve for request-based dm-multipath devices (256) has proven
  problematic for users with many multipathed SCSI devices but
  relatively little memory.  To responsibly select a smaller value users
  should use the new nr_bios tracepoint info (via commit 75afb352
  "block: Add nr_bios to block_rq_remap tracepoint") to determine the
  peak number of bios their workloads create"

* tag 'dm-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: add reserved_bio_based_ios module parameter
  dm: add reserved_rq_based_ios module parameter
  dm: lower bio-based mempool reservation
  dm thin: do not expose non-zero discard limits if discards disabled
  dm mpath: disable WRITE SAME if it fails
  dm-snapshot: fix performance degradation due to small hash size
  dm snapshot: workaround for a false positive lockdep warning
  dm stats: fix possible counter corruption on 32-bit systems
  dm mpath: do not fail path on -ENOSPC
2013-09-25 15:12:46 -07:00
Rob Herring
b0b8c960ff of: clean-up ifdefs in of_irq.h
Much of of_irq.h is needlessly ifdef'ed. Clean this up and minimize the
amount ifdef'ed code. This fixes some  build warnings when CONFIG_OF
is not enabled (seen on i386 and x86_64):

include/linux/of_irq.h:82:7: warning: 'struct device_node' declared inside parameter list [enabled by default]
include/linux/of_irq.h:82:7: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
include/linux/of_irq.h:87:47: warning: 'struct device_node' declared inside parameter list [enabled by default]

Compile tested on i386, sparc and arm.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
2013-09-24 21:12:32 -05:00
Andrew Morton
0608f43da6 revert "memcg, vmscan: integrate soft reclaim tighter with zone shrinking code"
Revert commit 3b38722efd ("memcg, vmscan: integrate soft reclaim
tighter with zone shrinking code")

I merged this prematurely - Michal and Johannes still disagree about the
overall design direction and the future remains unclear.

Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 17:00:26 -07:00
Andrew Morton
b1aff7fcf8 revert "vmscan, memcg: do softlimit reclaim also for targeted reclaim"
Revert commit a5b7c87f92 ("vmscan, memcg: do softlimit reclaim also
for targeted reclaim")

I merged this prematurely - Michal and Johannes still disagree about the
overall design direction and the future remains unclear.

Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 17:00:26 -07:00
Andrew Morton
694fbc0fe7 revert "memcg: enhance memcg iterator to support predicates"
Revert commit de57780dc6 ("memcg: enhance memcg iterator to support
predicates")

I merged this prematurely - Michal and Johannes still disagree about the
overall design direction and the future remains unclear.

Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 17:00:26 -07:00
Michal Hocko
9809b18fcf watchdog: update watchdog_thresh properly
watchdog_tresh controls how often nmi perf event counter checks per-cpu
hrtimer_interrupts counter and blows up if the counter hasn't changed
since the last check.  The counter is updated by per-cpu
watchdog_hrtimer hrtimer which is scheduled with 2/5 watchdog_thresh
period which guarantees that hrtimer is scheduled 2 times per the main
period.  Both hrtimer and perf event are started together when the
watchdog is enabled.

So far so good.  But...

But what happens when watchdog_thresh is updated from sysctl handler?

proc_dowatchdog will set a new sampling period and hrtimer callback
(watchdog_timer_fn) will use the new value in the next round.  The
problem, however, is that nobody tells the perf event that the sampling
period has changed so it is ticking with the period configured when it
has been set up.

This might result in an ear ripping dissonance between perf and hrtimer
parts if the watchdog_thresh is increased.  And even worse it might lead
to KABOOM if the watchdog is configured to panic on such a spurious
lockup.

This patch fixes the issue by updating both nmi perf even counter and
hrtimers if the threshold value has changed.

The nmi one is disabled and then reinitialized from scratch.  This has
an unpleasant side effect that the allocation of the new event might
fail theoretically so the hard lockup detector would be disabled for
such cpus.  On the other hand such a memory allocation failure is very
unlikely because the original event is deallocated right before.

It would be much nicer if we just changed perf event period but there
doesn't seem to be any API to do that right now.  It is also unfortunate
that perf_event_alloc uses GFP_KERNEL allocation unconditionally so we
cannot use on_each_cpu() and do the same thing from the per-cpu context.
The update from the current CPU should be safe because
perf_event_disable removes the event atomically before it clears the
per-cpu watchdog_ev so it cannot change anything under running handler
feet.

The hrtimer is simply restarted (thanks to Don Zickus who has pointed
this out) if it is queued because we cannot rely it will fire&adopt to
the new sampling period before a new nmi event triggers (when the
treshold is decreased).

[akpm@linux-foundation.org: the UP version of __smp_call_function_single ended up in the wrong place]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 17:00:25 -07:00
Linus Torvalds
68cf8d0c72 Merge branch 'for-3.12/core' of git://git.kernel.dk/linux-block
Pull block IO fixes from Jens Axboe:
 "After merge window, no new stuff this time only a collection of neatly
  confined and simple fixes"

* 'for-3.12/core' of git://git.kernel.dk/linux-block:
  cfq: explicitly use 64bit divide operation for 64bit arguments
  block: Add nr_bios to block_rq_remap tracepoint
  If the queue is dying then we only call the rq->end_io callout. This leaves bios setup on the request, because the caller assumes when the blk_execute_rq_nowait/blk_execute_rq call has completed that the rq->bios have been cleaned up.
  bio-integrity: Fix use of bs->bio_integrity_pool after free
  blkcg: relocate root_blkg setting and clearing
  block: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...)
  block: trace all devices plug operation
2013-09-22 15:00:11 -07:00
Jun'ichi Nomura
75afb35299 block: Add nr_bios to block_rq_remap tracepoint
Adding the number of bios in a remapped request to 'block_rq_remap'
tracepoint.

Request remapper clones bios in a request to track the completion
status of each bio. So the number of bios can be useful information
for investigation.

Related discussions:
  http://www.redhat.com/archives/dm-devel/2013-August/msg00084.html
  http://www.redhat.com/archives/dm-devel/2013-September/msg00024.html

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-09-21 13:57:47 -06:00
Mike Snitzer
f84cb8a46a dm mpath: disable WRITE SAME if it fails
Workaround the SCSI layer's problematic WRITE SAME heuristics by
disabling WRITE SAME in the DM multipath device's queue_limits if an
underlying device disabled it.

The WRITE SAME heuristics, with both the original commit 5db44863b6
("[SCSI] sd: Implement support for WRITE SAME") and the updated commit
66c28f971 ("[SCSI] sd: Update WRITE SAME heuristics"), default to enabling
WRITE SAME(10) even without successfully determining it is supported.
After the first failed WRITE SAME the SCSI layer will disable WRITE SAME
for the device (by setting sdkp->device->no_write_same which results in
'max_write_same_sectors' in device's queue_limits to be set to 0).

When a device is stacked ontop of such a SCSI device any changes to that
SCSI device's queue_limits do not automatically propagate up the stack.
As such, a DM multipath device will not have its WRITE SAME support
disabled.  This causes the block layer to continue to issue WRITE SAME
requests to the mpath device which causes paths to fail and (if mpath IO
isn't configured to queue when no paths are available) it will result in
actual IO errors to the upper layers.

This fix doesn't help configurations that have additional devices
stacked ontop of the mpath device (e.g. LVM created linear DM devices
ontop).  A proper fix that restacks all the queue_limits from the bottom
of the device stack up will need to be explored if SCSI will continue to
use this model of optimistically allowing op codes and then disabling
them after they fail for the first time.

Before this patch:

EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
device-mapper: multipath: XXX snitm debugging: got -EREMOTEIO (-121)
device-mapper: multipath: XXX snitm debugging: failing WRITE SAME IO with error=-121
end_request: critical target error, dev dm-6, sector 528
dm-6: WRITE SAME failed. Manually zeroing.
device-mapper: multipath: Failing path 8:112.
end_request: I/O error, dev dm-6, sector 4616
dm-6: WRITE SAME failed. Manually zeroing.
end_request: I/O error, dev dm-6, sector 4616
end_request: I/O error, dev dm-6, sector 5640
end_request: I/O error, dev dm-6, sector 6664
end_request: I/O error, dev dm-6, sector 7688
end_request: I/O error, dev dm-6, sector 524288
Buffer I/O error on device dm-6, logical block 65536
lost page write due to I/O error on dm-6
JBD2: Error -5 detected when updating journal superblock for dm-6-8.
end_request: I/O error, dev dm-6, sector 524296
Aborting journal on device dm-6-8.
end_request: I/O error, dev dm-6, sector 524288
Buffer I/O error on device dm-6, logical block 65536
lost page write due to I/O error on dm-6
JBD2: Error -5 detected when updating journal superblock for dm-6-8.

# cat /sys/block/sdh/queue/write_same_max_bytes
0
# cat /sys/block/dm-6/queue/write_same_max_bytes
33553920

After this patch:

EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
device-mapper: multipath: XXX snitm debugging: got -EREMOTEIO (-121)
device-mapper: multipath: XXX snitm debugging: WRITE SAME I/O failed with error=-121
end_request: critical target error, dev dm-6, sector 528
dm-6: WRITE SAME failed. Manually zeroing.

# cat /sys/block/sdh/queue/write_same_max_bytes
0
# cat /sys/block/dm-6/queue/write_same_max_bytes
0

It should be noted that WRITE SAME support wasn't enabled in DM
multipath until v3.10.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: stable@vger.kernel.org # 3.10+
2013-09-20 10:36:34 -04:00
Linus Torvalds
b75ff5e84b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) If the local_df boolean is set on an SKB we have to allocate a
    unique ID even if IP_DF is set in the ipv4 headers, from Ansis
    Atteka.

 2) Some fixups for the new chipset support that went into the sfc
    driver, from Ben Hutchings.

 3) Because SCTP bypasses a good chunk of, and actually duplicates, the
    logic of the ipv6 output path, some IPSEC things don't get done
    properly.  Integrate SCTP better into the ipv6 output path so that
    these problems are fixed and such issues don't get missed in the
    future either.  From Daniel Borkmann.

 4) Fix skge regressions added by the DMA mapping error return checking
    added in v3.10, from Mikulas Patocka.

 5) Kill some more IRQF_DISABLED references, from Michael Opdenacker.

 6) Fix races and deadlocks in the bridging code, from Hong Zhiguo.

 7) Fix error handling in tun_set_iff(), in particular don't leak
    resources.  From Jason Wang.

 8) Prevent format-string injection into xen-netback driver, from Kees
    Cook.

 9) Fix regression added to netpoll ARP packet handling, in particular
    check for the right ETH_P_ARP protocol code.  From Sonic Zhang.

10) Try to deal with AMD IOMMU errors when using r8169 chips, from
    Francois Romieu.

11) Cure freezes due to recent changes in the rt2x00 wireless driver,
    from Stanislaw Gruszka.

12) Don't do SPI transfers (which can sleep) in interrupt context in
    cw1200 driver, from Solomon Peachy.

13) Fix LEDs handling bug in 5720 tg3 chips already handled for 5719.
    From Nithin Sujir.

14) Make xen_netbk_count_skb_slots() count the actual number of slots
    that will be used, taking into consideration packing and other
    issues that the transmit path will run into.  From David Vrabel.

15) Use the correct maximum age when calculating the bridge
    message_age_timer, from Chris Healy.

16) Get rid of memory leaks in mcs7780 IRDA driver, from Alexey
    Khoroshilov.

17) Netfilter conntrack extensions were converted to RCU but are not
    always freed properly using kfree_rcu().  Fix from Michal Kubecek.

18) VF reset recovery not being done correctly in qlcnic driver, from
    Manish Chopra.

19) Fix inverted test in ATM nicstar driver, from Andy Shevchenko.

20) Missing workqueue destroy in cxgb4 error handling, from Wei Yang.

21) Internal switch not initialized properly in bgmac driver, from Rafał
    Miłecki.

22) Netlink messages report wrong local and remote addresses in IPv6
    tunneling, from Ding Zhi.

23) ICMP redirects should not generate socket errors in DCCP and SCTP.
    We're still working out how this should be handled for RAW and UDP
    sockets.  From Daniel Borkmann and Duan Jiong.

24) We've had several bugs wherein the network namespace's loopback
    device gets accessed after it is free'd, NULL it out so that we can
    catch these problems more readily.  From Eric W Biederman.

25) Fix regression in TCP RTO calculations, from Neal Cardwell.

26) Fix too early free of xen-netback network device when VIFs still
    exist.  From Paul Durrant.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
  netconsole: fix a deadlock with rtnl and netconsole's mutex
  netpoll: fix NULL pointer dereference in netpoll_cleanup
  skge: fix broken driver
  ip: generate unique IP identificator if local fragmentation is allowed
  ip: use ip_hdr() in __ip_make_skb() to retrieve IP header
  xen-netback: Don't destroy the netdev until the vif is shut down
  net:dccp: do not report ICMP redirects to user space
  cnic: Fix crash in cnic_bnx2x_service_kcq()
  bnx2x, cnic, bnx2i, bnx2fc: Fix bnx2i and bnx2fc regressions.
  vxlan: Avoid creating fdb entry with NULL destination
  tcp: fix RTO calculated from cached RTT
  drivers: net: phy: cicada.c: clears warning Use #include <linux/io.h> instead of <asm/io.h>
  net loopback: Set loopback_dev to NULL when freed
  batman-adv: set the TAG flag for the vid passed to BLA
  netfilter: nfnetlink_queue: use network skb for sequence adjustment
  net: sctp: rfc4443: do not report ICMP redirects to user space
  net: usb: cdc_ether: use usb.h macros whenever possible
  net: usb: cdc_ether: fix checkpatch errors and warnings
  net: usb: cdc_ether: Use wwan interface for Telit modules
  ip6_tunnels: raddr and laddr are inverted in nl msg
  ...
2013-09-19 13:57:28 -05:00
Linus Torvalds
e9ff04dd94 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fixes from Sage Weil:
 "These fix several bugs with RBD from 3.11 that didn't get tested in
  time for the merge window: some error handling, a use-after-free, and
  a sequencing issue when unmapping and image races with a notify
  operation.

  There is also a patch fixing a problem with the new ceph + fscache
  code that just went in"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  fscache: check consistency does not decrement refcount
  rbd: fix error handling from rbd_snap_name()
  rbd: ignore unmapped snapshots that no longer exist
  rbd: fix use-after free of rbd_dev->disk
  rbd: make rbd_obj_notify_ack() synchronous
  rbd: complete notifies before cleaning up osd_client and rbd_dev
  libceph: add function to ensure notifies are complete
2013-09-19 12:50:37 -05:00
Linus Torvalds
9d2cd7048b Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
 "An NTP related lockup fix"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Fix HRTICK related deadlock from ntp lock changes
2013-09-18 11:24:49 -05:00
Linus Torvalds
62d228b8c6 Merge branch 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Gleb Natapov.

* 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: VMX: set "blocked by NMI" flag if EPT violation happens during IRET from NMI
  kvm: free resources after canceling async_pf
  KVM: nEPT: reset PDPTR register cache on nested vmentry emulation
  KVM: mmu: allow page tables to be in read-only slots
  KVM: x86 emulator: emulate RETF imm
2013-09-17 22:20:30 -04:00
Linus Torvalds
84fca9f38c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID updates from Jiri Kosina:
 "Fixes for CVE-2013-2897, CVE-2013-2895, CVE-2013-2897, CVE-2013-2894,
  CVE-2013-2893, CVE-2013-2891, CVE-2013-2890, CVE-2013-2889.

  All the bugs are triggerable only by specially crafted evil-on-purpose
  HW devices.  Fixes by Kees Cook and Benjamin Tissoires"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: lenovo-tpkbd: fix leak if tpkbd_probe_tp fails
  HID: multitouch: validate indexes details
  HID: logitech-dj: validate output report details
  HID: validate feature and input report details
  HID: lenovo-tpkbd: validate output report details
  HID: LG: validate HID output report details
  HID: steelseries: validate output report details
  HID: sony: validate HID output report details
  HID: zeroplus: validate output report details
  HID: provide a helper for validating hid reports
2013-09-17 21:54:05 -04:00
David S. Miller
61c5923a2f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter fixes for you net tree,
mostly targeted to ipset, they are:

* Fix ICMPv6 NAT due to wrong comparison, code instead of type, from
  Phil Oester.

* Fix RCU race in conntrack extensions release path, from Michal Kubecek.

* Fix missing inversion in the userspace ipset test command match if
  the nomatch option is specified, from Jozsef Kadlecsik.

* Skip layer 4 protocol matching in ipset in case of IPv6 fragments,
  also from Jozsef Kadlecsik.

* Fix sequence adjustment in nfnetlink_queue due to using the netlink
  skb instead of the network skb, from Gao feng.

* Make sure we cannot swap of sets with different layer 3 family in
  ipset, from Jozsef Kadlecsik.

* Fix possible bogus matching in ipset if hash sets with net elements
  are used, from Oliver Smith.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-17 20:22:53 -04:00
Randy Dunlap
7c2330f1af regulator: fix fatal kernel-doc error
Fix fatal kernel-doc error in <linux/regulator/driver.h>:

Error(include/linux/regulator/driver.h:52): cannot understand prototype: 'struct regulator_linear_range '

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
[Rewrote first line -- broonie]
Signed-off-by: Mark Brown <broonie@linaro.org>
2013-09-17 11:49:25 +01:00
Paolo Bonzini
ba6a354154 KVM: mmu: allow page tables to be in read-only slots
Page tables in a read-only memory slot will currently cause a triple
fault because the page walker uses gfn_to_hva and it fails on such a slot.

OVMF uses such a page table; however, real hardware seems to be fine with
that as long as the accessed/dirty bits are set.  Save whether the slot
is readonly, and later check it when updating the accessed and dirty bits.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-17 12:52:31 +03:00
Linus Torvalds
a4ae54f90e Merge branch 'timers/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer code update from Thomas Gleixner:
 - armada SoC clocksource overhaul with a trivial merge conflict
 - Minor improvements to various SoC clocksource drivers

* 'timers/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: armada-370-xp: Add detailed clock requirements in devicetree binding
  clocksource: armada-370-xp: Get reference fixed-clock by name
  clocksource: armada-370-xp: Replace WARN_ON with BUG_ON
  clocksource: armada-370-xp: Fix device-tree binding
  clocksource: armada-370-xp: Introduce new compatibles
  clocksource: armada-370-xp: Use CLOCKSOURCE_OF_DECLARE
  clocksource: armada-370-xp: Simplify TIMER_CTRL register access
  clocksource: armada-370-xp: Use BIT()
  ARM: timer-sp: Set dynamic irq affinity
  ARM: nomadik: add dynamic irq flag to the timer
  clocksource: sh_cmt: 32-bit control register support
  clocksource: em_sti: Convert to devm_* managed helpers
2013-09-16 16:10:26 -04:00
Jozsef Kadlecsik
0f1799ba1a netfilter: ipset: Consistent userspace testing with nomatch flag
The "nomatch" commandline flag should invert the matching at testing,
similarly to the --return-nomatch flag of the "set" match of iptables.
Until now it worked with the elements with "nomatch" flag only. From
now on it works with elements without the flag too, i.e:

 # ipset n test hash:net
 # ipset a test 10.0.0.0/24 nomatch
 # ipset t test 10.0.0.1
 10.0.0.1 is NOT in set test.
 # ipset t test 10.0.0.1 nomatch
 10.0.0.1 is in set test.

 # ipset a test 192.168.0.0/24
 # ipset t test 192.168.0.1
 192.168.0.1 is in set test.
 # ipset t test 192.168.0.1 nomatch
 192.168.0.1 is NOT in set test.

 Before the patch the results were

 ...
 # ipset t test 192.168.0.1
 192.168.0.1 is in set test.
 # ipset t test 192.168.0.1 nomatch
 192.168.0.1 is in set test.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-16 20:35:55 +02:00
Joseph Gasparakis
35e4237973 vxlan: Fix sparse warnings
This patch fixes sparse warnings when incorrectly handling the port number
and using int instead of unsigned int iterating through &vn->sock_list[].
Keeping the port as __be16 also makes things clearer wrt endianess.
Also, it was pointed out that vxlan_get_rx_port() had unnecessary checks
which got removed.

Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-15 22:18:13 -04:00
Linus Torvalds
0375ec5899 SCSI misc on 20130915
This patch set is a set of driver updates (megaraid_sas, fnic, lpfc, ufs,
 hpsa) we also have a couple of bug fixes (sd out of bounds and ibmvfc error
 handling) and the first round of esas2r checker fixes and finally the much
 anticipated big endian additions for megaraid_sas.
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJSNheiAAoJEDeqqVYsXL0MueMIAKD1kaB0oooRawE1+0vpKmyV
 eE2M6trA8ofTeq0z1eNfRsVMkRsUuG9exW0CKS2z6mHiWwQ/zGbqT7ukveW+dMi3
 mjKD0yO5ODk6bohWX/LiwZ6NGZSwC0dbIacXNy5ZsXKEizqwo1Jcc7qC/0AWn+o7
 WpIL48XLPH0HqjQZ3dvgC6TWeFZOn9cKOWvQQq0S3ENALOx/eLZ+C7VrJLx5Magv
 myNOUkTLzdlYglQfjaNO6et98k2oHTrzKwH7U2X6U75q7L8Pkj4RbNzce/Ge301V
 u+R1w+BlbeTPdHopTBoTJupsvqDYBZxVwS7rr8nhSvfKduQppHnN6jX8yR4XNeM=
 =RG3j
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull misc SCSI driver updates from James Bottomley:
 "This patch set is a set of driver updates (megaraid_sas, fnic, lpfc,
  ufs, hpsa) we also have a couple of bug fixes (sd out of bounds and
  ibmvfc error handling) and the first round of esas2r checker fixes and
  finally the much anticipated big endian additions for megaraid_sas"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (47 commits)
  [SCSI] fnic: fnic Driver Tuneables Exposed through CLI
  [SCSI] fnic: Kernel panic while running sh/nosh with max lun cfg
  [SCSI] fnic: Hitting BUG_ON(io_req->abts_done) in fnic_rport_exch_reset
  [SCSI] fnic: Remove QUEUE_FULL handling code
  [SCSI] fnic: On system with >1.1TB RAM, VIC fails multipath after boot up
  [SCSI] fnic: FC stat param seconds_since_last_reset not getting updated
  [SCSI] sd: Fix potential out-of-bounds access
  [SCSI] lpfc 8.3.42: Update lpfc version to driver version 8.3.42
  [SCSI] lpfc 8.3.42: Fixed issue of task management commands having a fixed timeout
  [SCSI] lpfc 8.3.42: Fixed inconsistent spin lock usage.
  [SCSI] lpfc 8.3.42: Fix driver's abort loop functionality to skip IOs already getting aborted
  [SCSI] lpfc 8.3.42: Fixed failure to allocate SCSI buffer on PPC64 platform for SLI4 devices
  [SCSI] lpfc 8.3.42: Fix WARN_ON when driver unloads
  [SCSI] lpfc 8.3.42: Avoided making pci bar ioremap call during dual-chute WQ/RQ pci bar selection
  [SCSI] lpfc 8.3.42: Fixed driver iocbq structure's iocb_flag field running out of space
  [SCSI] lpfc 8.3.42: Fix crash on driver load due to cpu affinity logic
  [SCSI] lpfc 8.3.42: Fixed logging format of setting driver sysfs attributes hard to interpret
  [SCSI] lpfc 8.3.42: Fixed back to back RSCNs discovery failure.
  [SCSI] lpfc 8.3.42: Fixed race condition between BSG I/O dispatch and timeout handling
  [SCSI] lpfc 8.3.42: Fixed function mode field defined too small for not recognizing dual-chute mode
  ...
2013-09-15 17:41:30 -04:00
Linus Torvalds
bff157b3ad Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
Pull SLAB update from Pekka Enberg:
 "Nothing terribly exciting here apart from Christoph's kmalloc
  unification patches that brings sl[aou]b implementations closer to
  each other"

* 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
  slab: Use correct GFP_DMA constant
  slub: remove verify_mem_not_deleted()
  mm/sl[aou]b: Move kmallocXXX functions to common code
  mm, slab_common: add 'unlikely' to size check of kmalloc_slab()
  mm/slub.c: beautify code for removing redundancy 'break' statement.
  slub: Remove unnecessary page NULL check
  slub: don't use cpu partial pages on UP
  mm/slub: beautify code for 80 column limitation and tab alignment
  mm/slub: remove 'per_cpu' which is useless variable
2013-09-15 07:15:06 -04:00
Linus Torvalds
3711d86a2d a trivial writeback fix
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJSMwgMAAoJECvKgwp+S8JaiJUP/RGA98MkWnl5eio9mG5eEbF/
 DC6bP5UOzPo+6oZbwH4LTc4EB04q728SSOU1nG6q1yfuSF0I1Kzt/Um6aS3P5wdk
 okyYW1SjieE0xpmfQpvMEX6TZ7L/FpYjAg47GI0TaJMUdKRmJK0fkZ22hfv6uJzr
 PMVmdJKKgxs85usrn4JyNY93xpKZgncJVuwpfFF1k9oSNIXHAk7OxT7JWj51UdqP
 k/L/HXNhT3MRVvsjyqURHMIXfqRvqcgn47LAkM/IYVdgaFkpLPvwp8RZr/CcKr7U
 KqJsQqqegRyoQ73yqgWXGAGLLXujKllsfKLu/d0vtqY2J4z6lHKTcRGpAGCDyH+3
 bLe4hk+/d+Tz0xBSPaHryy/4yiQ4O+h9rLZCwGdxMX1duoqvThL9S8fLoUkrNBai
 OU7cd4iWPlCmiquATjk0bgthCcKw3wlg+rsiSzUcaO3JbdwTp8P45Mie0ZtZ5jpa
 UcczrT6osOAAswoEPMMeySQ+BVLewSPwmYKaETniYXB5Bb/IHkliX1MkXnA1D9bI
 DNijgB2g2561BVhdkDHf2q8D4Cbrq6UhK7plATB90DB7bwNaAxmtRVJ3zDaQGKOM
 VWBbloNf5QcodshEttj9ZLko7JNF/DjNOcNomb5ZtzY+EGzMksUHBUMPld3yOcna
 LTNApshhbx92MemJ02FC
 =FB22
 -----END PGP SIGNATURE-----

Merge tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux

Pull writeback fix from Wu Fengguang:
 "A trivial writeback fix"

* tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Do not sort b_io list only because of block device inode
2013-09-13 23:06:40 -04:00