This patch (as1295) fixes a recently-added bug in the USB serial core.
If certain kinds of errors occur during probing, the core may call a
serial driver's release method without previously calling the attach
method. This causes some drivers (io_ti in particular) to perform an
invalid memory access.
The patch adds a new flag to keep track of whether or not attach has
been called.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf>
CC: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Since the beginnings in aafe4dbed0
("asm-generic: add generic versions of common headers") the generic
version of <asm/hardirq.h> defined __softirq_pending as unsigned long.
Which is different from other architectures for no apparent good reason
and was causing the following warning:
kernel/time/tick-sched.c: In function 'tick_nohz_stop_sched_tick':
kernel/time/tick-sched.c:261: warning: format '%02x' expects type 'unsigned int', but argument 2 has type 'long unsigned int'
Reported and initial patch by Wu Zhangjin <wuzhangjin@gmail.com>.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
[ Arnd points out that we really should make sure parisc and alpha are
ok with this, since they have also been converted to use the generic
hardirq.h file. But neither seems to use it, although parisc does
build a IRQSTAT_SIRQ_PEND #define into asm-offsets - but that also
appears unused.. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Driver for Texas Instruments TPA6130A2 stereo headphone
amplifier.
The driver provides playback gain control and also pre-defined
DAPM_HP widgets and DAPM routings for power management.
The DAPM_HP widget names are:
"TPA6130A2 Headphone Left"
"TPA6130A2 Headphone Right"
From soc machine drivers to use with the tpa6130a2 amplifier,
the tpa6130a2_add_controls has to be called, which adds the alsa
controls and the DAPM routing needed for the tpa6130a2.
After that the machine driver can connect the codec's output
with 'TPA6130A2 Left' and 'TPA6130A2 Right':
{"TPA6130A2 Left", NULL, "CODEC LEFT OUT"},
{"TPA6130A2 Right", NULL, "CODEC RIGHT OUT"},
Internally the left and right channels are powered separately.
When none of the channels are needed the amplifier is powered
down:
hard power: valid GPIO number is passed within platform data
soft power: Using the software shutdown of the amplifier
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@nokia.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
(This patch fixes bug of commit f7734fdf61
title "make TLLAO option for NA packets configurable")
When the IPV6 conf is used, the function sysctl_set_parent is called and the
array addrconf_sysctl is used as a parameter of the function.
The above patch added new conf "force_tllao" into the array addrconf_sysctl,
but the size of the array was not modified, the static allocated size is
DEVCONF_MAX + 1 but the real size is DEVCONF_MAX + 2, so the problem is
that the function sysctl_set_parent accessed wrong address.
I got the following information.
Call Trace:
[<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
[<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
[<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
[<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
[<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
[<ffffffff810622d5>] __register_sysctl_paths+0xde/0x272
[<ffffffff8110892d>] ? __kmalloc_track_caller+0x16e/0x180
[<ffffffffa00cfac3>] ? __addrconf_sysctl_register+0xc5/0x144 [ipv6]
[<ffffffff8141f2c9>] register_net_sysctl_table+0x48/0x4b
[<ffffffffa00cfaf5>] __addrconf_sysctl_register+0xf7/0x144 [ipv6]
[<ffffffffa00cfc16>] addrconf_init_net+0xd4/0x104 [ipv6]
[<ffffffff8139195f>] setup_net+0x35/0x82
[<ffffffff81391f6c>] copy_net_ns+0x76/0xe0
[<ffffffff8107ad60>] create_new_namespaces+0xf0/0x16e
[<ffffffff8107afee>] copy_namespaces+0x65/0x9f
[<ffffffff81056dff>] copy_process+0xb2c/0x12c3
[<ffffffff810576e1>] do_fork+0x14b/0x2d2
[<ffffffff8107ac4e>] ? up_read+0xe/0x10
[<ffffffff81438e73>] ? do_page_fault+0x27a/0x2aa
[<ffffffff8101044b>] sys_clone+0x28/0x2a
[<ffffffff81011fb3>] stub_clone+0x13/0x20
[<ffffffff81011c72>] ? system_call_fastpath+0x16/0x1b
And the information of IPV6 in .config is as following.
IPV6 in .config:
CONFIG_IPV6=m
CONFIG_IPV6_PRIVACY=y
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_IPV6_ROUTE_INFO=y
CONFIG_IPV6_OPTIMISTIC_DAD=y
CONFIG_IPV6_MIP6=m
CONFIG_IPV6_SIT=m
# CONFIG_IPV6_SIT_6RD is not set
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
CONFIG_IPV6_MULTIPLE_TABLES=y
CONFIG_IPV6_SUBTREES=y
CONFIG_IPV6_MROUTE=y
CONFIG_IPV6_PIMSM_V2=y
# CONFIG_IP_VS_IPV6 is not set
CONFIG_NF_CONNTRACK_IPV6=m
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
I confirmed this patch fixes this problem.
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
pata_atp867x: add Power Management support
pata_atp867x: PIO support fixes
pata_atp867x: clarifications in timings calculations and cable detection
pata_atp867x: fix it to not claim MWDMA support
libata: fix incorrect link online check during probe
ahci: filter FPDMA non-zero offset enable for Aspire 3810T
libata: make gtf_filter per-dev
libata: implement more acpi filtering options
libata: cosmetic updates
ahci: display all AHCI 1.3 HBA capability flags (v2)
pata_ali: trivial fix of a very frequent spelling mistake
ahci: disable 64bit DMA by default on SB600s
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
futex: fix requeue_pi key imbalance
futex: Fix typo in FUTEX_WAIT/WAKE_BITSET_PRIVATE definitions
rcu: Place root rcu_node structure in separate lockdep class
rcu: Make hot-unplugged CPU relinquish its own RCU callbacks
rcu: Move rcu_barrier() to rcutree
futex: Move exit_pi_state() call to release_mm()
futex: Nullify robust lists after cleanup
futex: Fix locking imbalance
panic: Fix panic message visibility by calling bust_spinlocks(0) before dying
rcu: Replace the rcu_barrier enum with pointer to call_rcu*() function
rcu: Clean up code based on review feedback from Josh Triplett, part 4
rcu: Clean up code based on review feedback from Josh Triplett, part 3
rcu: Fix rcu_lock_map build failure on CONFIG_PROVE_LOCKING=y
rcu: Clean up code to address Ingo's checkpatch feedback
rcu: Clean up code based on review feedback from Josh Triplett, part 2
rcu: Clean up code based on review feedback from Josh Triplett
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: user local buffer variable for trace branch tracer
tracing: fix warning on kernel/trace/trace_branch.c andtrace_hw_branches.c
ftrace: check for failure for all conversions
tracing: correct module boundaries for ftrace_release
tracing: fix transposed numbers of lock_depth and preempt_count
trace: Fix missing assignment in trace_ctxwake_*
tracing: Use free_percpu instead of kfree
tracing: Check total refcount before releasing bufs in profile_enable failure
* 'sparc-perf-events-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
mm, perf_event: Make vmalloc_user() align base kernel virtual address to SHMLBA
perf_event: Provide vmalloc() based mmap() backing
* 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_events: Make ABI definitions available to userspace
perf tools: elf_sym__is_function() should accept "zero" sized functions
tracing/syscalls: Use long for syscall ret format and field definitions
perf trace: Update eval_flag() flags array to match interrupt.h
perf trace: Remove unused code in builtin-trace.c
perf: Propagate term signal to child
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (24 commits)
drm/radeon/kms: fix vline register for second head.
drm/r600: avoid assigning vb twice in blit code
drm/radeon: use list_for_each_entry instead of list_for_each
drm/radeon/kms: Fix AGP support for R600/RV770 family (v2)
drm/radeon/kms: Fallback to non AGP when acceleration fails to initialize (v2)
drm/radeon/kms: Fix RS600/RV515/R520/RS690 IRQ
drm/radeon: Fix setting of bits
drm/ttm: fix refcounting in ttm global code.
drm/fb: add more correct 8/16/24/32 bpp fb support.
drm/fb: add setcmap and fix 8-bit support.
drm/radeon/kms: respect single crtc cards, only create one crtc. (v2)
drm: Delete the DRM_DEBUG_KMS in drm_mode_cursor_ioctl
drm/radeon/kms: add support for "Surround View"
drm/radeon/kms: Fix irq handling on AVIVO hw
drm/radeon/kms: R600/RV770 remove dead code and print message for wrong BIOS
drm/radeon/kms: Fix R600/RV770 disable acceleration path
drm/radeon/kms: Fix R600/RV770 startup path & reset
drm/radeon/kms: Fix R600 write back buffer
drm/radeon/kms: Remove old init path as no hw use it anymore
drm/radeon/kms: Convert RS600 to new init path
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits)
ethoc: limit the number of buffers to 128
ethoc: use system memory as buffer
ethoc: align received packet to make IP header at word boundary
ethoc: fix buffer address mapping
ethoc: fix typo to compute number of tx descriptors
au1000_eth: Duplicate test of RX_OVERLEN bit in update_rx_stats()
netxen: Fix Unlikely(x) > y
pasemi_mac: ethtool get settings fix
add maintainer for network drop monitor kernel service
tg3: Fix phylib locking strategy
rndis_host: support ETHTOOL_GPERMADDR
ipv4: arp_notify address list bug
gigaset: add kerneldoc comments
gigaset: correct debugging output selection
gigaset: improve error recovery
gigaset: fix device ERROR response handling
gigaset: announce if built with debugging
gigaset: handle isoc frame errors more gracefully
gigaset: linearize skb
gigaset: fix reject/hangup handling
...
TI HECC (High End CAN Controller) module is found on many TI devices. It
has 32 hardware mailboxes with full implementation of CAN protocol 2.0B
with bus speeds up to 1Mbps. Specifications of the module are available
on TI web <http://www.ti.com>
Signed-off-by: Anant Gole <anantgole@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for
several setups.
4000 active UDP sockets -> 32 sockets per chain in average. An
incoming frame has to lookup all sockets to find best match, so long
chains hurt latency.
Instead of a fixed size hash table that cant be perfect for every
needs, let UDP stack choose its table size at boot time like tcp/ip
route, using alloc_large_system_hash() helper
Add an optional boot parameter, uhash_entries=x so that an admin can
force a size between 256 and 65536 if needed, like thash_entries and
rhash_entries.
dmesg logs two new lines :
[ 0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes)
[ 0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes)
Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non
debugging spinlocks.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
conflict in radeon since new init path merged with vga arb code.
Conflicts:
drivers/gpu/drm/radeon/radeon.h
drivers/gpu/drm/radeon/radeon_asic.h
drivers/gpu/drm/radeon/radeon_device.c
Fix build error introduced in commit fa857afcf - ipv6 sit: 6rd
(IPv6 Rapid Deployment) Support. Struct in6_addr is the issue.
I'm only seeing this on x86_64 systems, not on 32-bit with same
IPv6 config options, so it could be there's a missing forward
declaration somewhere, but including the correct header file
fixes the problem too.
CC [M] net/ipv6/ip6_tunnel.o
In file included from net/ipv6/ip6_tunnel.c:31:
include/linux/if_tunnel.h:59: error: field ‘prefix’ has incomplete type
make[2]: *** [net/ipv6/ip6_tunnel.o] Error 1
make[1]: *** [net/ipv6] Error 2
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's useful to provide firmware and hardware version to user space and have a
generic interface to retrieve them. Users can provide the version information
in bug reports etc.
Add fields for firmware and hardware version to struct wiphy.
(Dropped nl80211 bits for now and modified remaining bits in favor of
ethtool. -- JWL)
Cc: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Refactor wext to
* split out iwpriv handling
* split out iwspy handling
* split out procfs support
* allow cfg80211 to have wireless extensions compat code
w/o CONFIG_WIRELESS_EXT
After this, drivers need to
- select WIRELESS_EXT - for wext support
- select WEXT_PRIV - for iwpriv support
- select WEXT_SPY - for iwspy support
except cfg80211 -- which gets new hooks in wext-core.c
and can then get wext handlers without CONFIG_WIRELESS_EXT.
Wireless extensions procfs support is auto-selected
based on PROC_FS and anything that requires the wext core
(i.e. WIRELESS_EXT or CFG80211_WEXT).
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Linux keeps scan results up to 15 seconds. This can be a problem for fast
moving clients: they get back stale data. But if the kernel reports the age
of the BSS items, then user-space can simply weed out old entries by itself.
Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When the module is about the unload we release its call records.
The ftrace_release function was given wrong values representing
the module core boundaries, thus not releasing its call records.
Plus making ftrace_release function module specific.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1254934835-363-3-git-send-email-jolsa@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This quirk will disable fast back to back transfer on the secondary bus
segment of the TI Bridge.
Signed-off-by: Gabe Black <gabe.black@ni.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Looks like a typo, FUTEX_WAKE_BITS should be FUTEX_WAIT_BITSET.
Signed-off-by: Anton Blanchard <anton@samba.org>
LKML-Reference: <20091007001358.GE16073@kryten>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The data center bridging ops structure can be const
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All usages of structure net_proto_ops should be declared const.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On Friday 02 October 2009 20:53:51 you wrote:
> This is good although I would have shortened the name.
Ah, I knew I forgot something :) Here is v4.
tavi
>From 24d96d825b9fa832b22878cc6c990d5711968734 Mon Sep 17 00:00:00 2001
From: Octavian Purdila <opurdila@ixiacom.com>
Date: Fri, 2 Oct 2009 00:51:15 +0300
Subject: [PATCH] ipv6: new sysctl for sending TLLAO with unicast NAs
Neighbor advertisements responding to unicast neighbor solicitations
did not include the target link-layer address option. This patch adds
a new sysctl option (disabled by default) which controls whether this
option should be sent even with unicast NAs.
The need for this arose because certain routers expect the TLLAO in
some situations even as a response to unicast NS packets.
Moreover, RFC 2461 recommends sending this to avoid a race condition
(section 4.4, Target link-layer address)
Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com>
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After updating firmware stored in flash, users may wish to reset the
relevant hardware and start the new firmware immediately. This should
not be completely automatic as it may be disruptive.
A selective reset may also be useful for debugging or diagnostics.
This adds a separate reset operation which takes flags indicating the
components to be reset. Drivers are allowed to reset only a subset of
those requested, and must indicate the actual subset. This allows the
use of generic component masks and some future expansion.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarek Poplawski a écrit :
>
>
> Hmm... So you made me to do some "real" work here, and guess what?:
> there is one serious checkpatch warning! ;-) Plus, this new parameter
> should be added to the function description. Otherwise:
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
>
> Thanks,
> Jarek P.
>
> PS: I guess full "Don't" would show we really mean it...
Okay :) Here is the last round, before the night !
Thanks again
[RFC] pkt_sched: gen_estimator: Don't report fake rate estimators
We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
is running.
# tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
one (because no estimator is active)
After this patch, tc command output is :
$ tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
We add a parameter to gnet_stats_copy_rate_est() function so that
it can use gen_estimator_active(bstats, r), as suggested by Jarek.
This parameter can be NULL if check is not necessary, (htb for
example has a mandatory rate estimator)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 Rapid Deployment (6rd; draft-ietf-softwire-ipv6-6rd) builds upon
mechanisms of 6to4 (RFC3056) to enable a service provider to rapidly
deploy IPv6 unicast service to IPv4 sites to which it provides
customer premise equipment. Like 6to4, it utilizes stateless IPv6 in
IPv4 encapsulation in order to transit IPv4-only network
infrastructure. Unlike 6to4, a 6rd service provider uses an IPv6
prefix of its own in place of the fixed 6to4 prefix.
With this option enabled, the SIT driver offers 6rd functionality by
providing additional ioctl API to configure the IPv6 Prefix for in
stead of static 2002::/16 for 6to4.
Original patch was done by Alexandre Cassen <acassen@freebox.fr>
based on old Internet-Draft.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When routing daemon wants to enable forwarding of multicast traffic it
performs something like:
struct vifctl vc = {
.vifc_vifi = 1,
.vifc_flags = 0,
.vifc_threshold = 1,
.vifc_rate_limit = 0,
.vifc_lcl_addr = ip, /* <--- ip address of physical
interface, e.g. eth0 */
.vifc_rmt_addr.s_addr = htonl(INADDR_ANY),
};
setsockopt(fd, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc));
This leads (in the kernel) to calling vif_add() function call which
search the (physical) device using assigned IP address:
dev = ip_dev_find(net, vifc->vifc_lcl_addr.s_addr);
The current API (struct vifctl) does not allow to specify an
interface other way than using it's IP, and if there are more than a
single interface with specified IP only the first one will be found.
The attached patch (against 2.6.30.4) allows to specify an interface
by its index, instead of IP address:
struct vifctl vc = {
.vifc_vifi = 1,
.vifc_flags = VIFF_USE_IFINDEX, /* NEW */
.vifc_threshold = 1,
.vifc_rate_limit = 0,
.vifc_lcl_ifindex = if_nametoindex("eth0"), /* NEW */
.vifc_rmt_addr.s_addr = htonl(INADDR_ANY),
};
setsockopt(fd, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc));
Signed-off-by: Ilia K. <mail4ilia@gmail.com>
=== modified file 'include/linux/mroute.h'
Signed-off-by: David S. Miller <davem@davemloft.net>
An incoming datagram must bring into cpu cache *lot* of cache lines,
in particular : (other parts omitted (hash chains, ip route cache...))
On 32bit arches :
offsetof(struct sock, sk_rcvbuf) =0x30 (read)
offsetof(struct sock, sk_lock) =0x34 (rw)
offsetof(struct sock, sk_sleep) =0x50 (read)
offsetof(struct sock, sk_rmem_alloc) =0x64 (rw)
offsetof(struct sock, sk_receive_queue)=0x74 (rw)
offsetof(struct sock, sk_forward_alloc)=0x98 (rw)
offsetof(struct sock, sk_callback_lock)=0xcc (rw)
offsetof(struct sock, sk_drops) =0xd8 (read if we add dropcount support, rw if frame dropped)
offsetof(struct sock, sk_filter) =0xf8 (read)
offsetof(struct sock, sk_socket) =0x138 (read)
offsetof(struct sock, sk_data_ready) =0x15c (read)
We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
with no fasync() structures. (socket->fasync_list ptr is probably already in cache
because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)
This avoids one cache line load per incoming packet for common cases (no fasync())
We can leave (or even move in a future patch) sk->sk_socket in a cold location
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit a9327cac44 added seperate read
and write statistics of in_flight requests. And exported the number
of read and write requests in progress seperately through sysfs.
But Corrado Zoccolo <czoccolo@gmail.com> reported getting strange
output from "iostat -kx 2". Global values for service time and
utilization were garbage. For interval values, utilization was always
100%, and service time is higher than normal.
So this was reverted by commit 0f78ab9899
The problem was in part_round_stats_single(), I missed the following:
if (now == part->stamp)
return;
- if (part->in_flight) {
+ if (part_in_flight(part)) {
__part_stat_add(cpu, part, time_in_queue,
part_in_flight(part) * (now - part->stamp));
__part_stat_add(cpu, part, io_ticks, (now - part->stamp));
With this chunk included, the reported regression gets fixed.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
--
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Sometimes it is desirable to have a mux which does not reflect any
direct register configuration but which will instead only have an
effect implicitly (for example, as a result of changing which parts
of the device are powered up). Provide a virtual mux for this purpose.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
The sign info used for filters in the kernel is also useful to
applications that process the trace stream. Add it to the format
files and make it available to userspace.
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: rostedt@goodmis.org
Cc: lizf@cn.fujitsu.com
Cc: hch@infradead.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1254809398-8078-2-git-send-email-tzanussi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some architectures such as Sparc, ARM and MIPS (basically
everything with flush_dcache_page()) need to deal with dcache
aliases by carefully placing pages in both kernel and user maps.
These architectures typically have to use vmalloc_user() for this.
However, on other architectures, vmalloc() is not needed and has
the downsides of being more restricted and slower than regular
allocations.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: David Miller <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1254830228.21044.272.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now we have the capabilities of the sending process available,
use them to enforce CAP_SYS_ADMIN.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The original driver was written with the KEY() macro defined as (col,
row) instead of (row, col) as defined by the matrix keypad
infrastructure. So the keymap was defined accordingly. Since the
driver that was merged upstream uses the matrix keypad infrastructure,
modify the keymap accordingly.
While we are at it, fix the comments in twl4030.h and define
PERSISTENT_KEY as (r,c) instead of (c, r)
Tested on a RX51 (N900) device.
Signed-off-by: Amit Kucheria <amit.kucheria@verdurent.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Add ->gtf_filter to ata_device and set it to ata_acpi_gtf_filter when
initializing ata_link. This is to allow quirks which apply different
gtf filters.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Currently libata-acpi can only filter DIPM among SATA feature enables
via _GTF. This patch adds the capability to filter out FPDMA non-zero
offset, in-order guarantee and auto-activation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
We're about to add more SATA_* and ATA_ACPI_FILTER_* constants.
Reformat them in preparation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The previous patches had some unwanted side effects, I've fixed
the lack of 32bpp working, and fixed up 16bpp so it should also work.
this also adds the interface to allow the driver to set a preferred
console depth so for example low memory rn50 can set it to 8bpp.
It also catches 24bpp on cards that can't do it and forces 32bpp.
Tested on r100/r600/i945.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Whitespace fixes, updated comments, and trivial code movement.
o Fix whitespace error in RCU_HEAD_INIT()
o Move "So where is rcu_write_lock()" comment so that it does
not come between the rcu_read_unlock() header comment and
the rcu_read_unlock() definition.
o Move the module_param statements for blimit, qhimark, and
qlowmark to immediately follow the corresponding
definitions.
o In __rcu_offline_cpu(), move the assignment to rdp_me
inside the "if" statement, given that rdp_me is not used
outside of that "if" statement.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12541491931164-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Accumulating one tick at a time works well unless we're using NOHZ.
Then it can be an issue, since we may have to run through the loop
a few thousand times, which can increase timer interrupt caused
latency.
The current solution was to accumulate in half-second intervals
with NOHZ. This kept the number of loops down, however it did
slightly change how we make NTP adjustments. While not an issue
with NTPd users, as NTPd makes adjustments over a longer period of
time, other adjtimex() users have noticed the half-second
granularity with which we can apply frequency changes to the clock.
For instance, if a application tries to apply a 100ppm frequency
correction for 20ms to correct a 2us offset, with NOHZ they either
get no correction, or a 50us correction.
Now, there will always be some granularity error for applying
frequency corrections. However with users sensitive to this error
have seen a 50-500x increase with NOHZ compared to running without
NOHZ.
So I figured I'd try another approach then just simply increasing
the interval. My approach is to consume the time interval
logarithmically. This reduces the number of times through the loop
needed keeping latency down, while still preserving the original
granularity error for adjtimex() changes.
Further, this change allows us to remove the xtime_cache code
(patch to follow), as xtime is always within one tick of the
current time, instead of the half-second updates it saw before.
An earlier version of this patch has been shipping to x86 users in
the RedHat MRG releases for awhile without issue, but I've reworked
this version to be even more careful about avoiding possible
overflows if the shift value gets too large.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: John Kacur <jkacur@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1254525473.7741.88.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It was briefly introduced to allow CFQ to to delayed scheduling,
but we ended up removing that feature again. So lets kill the
function and export, and just switch CFQ back to the normal work
schedule since it is now passing in a '0' delay from all call
sites.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
For various purposes including a wireless extensions
bugfix, we need to hook into the netdev creation before
before netdev_register_kobject(). This will also ease
doing the dev type assignment that Marcel was working
on for cfg80211 drivers w/o touching them all.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for usbnet based devices like CDC-Ether to indicate that they
are actually mobile broadband devices. In that case use wwan%d as default
interface name.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following user-space program fails to compile:
#include <linux/socket.h>
#include <sys/socket.h>
int main() { return 0; }
The reason is that <linux/socket.h> tests __GLIBC__ to decide whether it
should define various structures and macros that are now defined for
user-space by <sys/socket.h>, but __GLIBC__ is not defined if no libc
headers have yet been included.
It seems safe to drop support for libc 5 now.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Bastian Blank <waldi@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently dirty a cache line to update tunnel device stats
(tx_packets/tx_bytes). We better use the txq->tx_bytes/tx_packets
counters that already are present in cpu cache, in the cache
line shared with txq->_xmit_lock
This patch extends IPTUNNEL_XMIT() macro to use txq pointer
provided by the caller.
Also &tunnel->dev->stats can be replaced by &dev->stats
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The FIB algorithim for IPV4 is set at compile time, but kernel goes through
the overhead of function call indirection at runtime. Save some
cycles by turning the indirect calls to direct calls to either
hash or trie code.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add Ancilliary data to better represent loss information
I've had a few requests recently to provide more detail regarding frame loss
during an AF_PACKET packet capture session. Specifically the requestors want to
see where in a packet sequence frames were lost, i.e. they want to see that 40
frames were lost between frames 302 and 303 in a packet capture file. In order
to do this we need:
1) The kernel to export this data to user space
2) The applications to make use of it
This patch addresses item (1). It does this by doing the following:
A) Anytime we drop a frame for which we would increment po->stats.tp_drops, we
also no increment a stats called po->stats.tp_gap.
B) Every time we successfully enqueue a frame to sk_receive_queue, we record the
value of po->stats.tp_gap in skb->mark. skb->cb would nominally be the place to
record this, but since all the space there is used up, we're overloading
skb->mark. Its safe to do since any enqueued packet is guaranteed to be
unshared at this point, and skb->mark isn't used for anything else in the rx
path to the application. After we record tp_gap in the skb, we zero
po->stats.tp_gap. This allows us to keep a counter of the number of frames lost
between any two enqueued packets
C) When the application goes to dequeue a frame from the packet socket, we look
at skb->mark for that frame. If it is non-zero, we add a cmsg chunk to the
msghdr of level SOL_PACKET and type PACKET_GAPDATA. Its a 32 bit integer that
represents the number of frames lost between this packet and the last previous
frame received.
Note there is a chance that if there is frame loss after a receive, and then the
socket is closed, some gap data might be lost. This is covered by the use of
the PACKET_AUXDATA socket option, which gives total loss data. With a bit of
math, the final gap can be determined that way.
I've tested this patch myself, and it works well.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
include/linux/if_packet.h | 2 ++
net/packet/af_packet.c | 33 +++++++++++++++++++++++++++++++++
2 files changed, 35 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
The in-tree implementations have all been converted to
get_sset_count().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for the setcmap api and fixes the 8bpp
support at least on radeon hardware. It adds a new load_lut
hook which can be called once the color map is setup.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Also add single crtc for RN50 chips.
changes in v2:
fix vblank init to respect single crtc flag
fix r100 mode bandwidth to respect single crtc flag
Signed-off-by: Dave Airlie <airlied@redhat.com>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (41 commits)
Revert "Seperate read and write statistics of in_flight requests"
cfq-iosched: don't delay async queue if it hasn't dispatched at all
block: Topology ioctls
cfq-iosched: use assigned slice sync value, not default
cfq-iosched: rename 'desktop' sysfs entry to 'low_latency'
cfq-iosched: implement slower async initiate and queue ramp up
cfq-iosched: delay async IO dispatch, if sync IO was just done
cfq-iosched: add a knob for desktop interactiveness
Add a tracepoint for block request remapping
block: allow large discard requests
block: use normal I/O path for discard requests
swapfile: avoid NULL pointer dereference in swapon when s_bdev is NULL
fs/bio.c: move EXPORT* macros to line after function
Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
cciss: fix build when !PROC_FS
block: Do not clamp max_hw_sectors for stacking devices
block: Set max_sectors correctly for stacking devices
cciss: cciss_host_attr_groups should be const
cciss: Dynamically allocate the drive_info_struct for each logical drive.
cciss: Add usage_count attribute to each logical drive in /sys
...
While writing the manpage I noticed some shortcomings in the
current interface.
- Define symbolic names for all the different values
- Boundary check the kill mode values
- For symmetry add a get interface too. This allows library
code to get/set the current state.
- For consistency define a PR_MCE_KILL_DEFAULT value
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Not all users of the topology information want to use libblkid. Provide
the topology information through bdev ioctls.
Also clarify sector size comments for existing BLK ioctls.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This slowly ramps up the async queue depth based on the time
passed since the sync IO, and doesn't allow async at all until
a sync slice period has passed.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (46 commits)
cnic: Fix NETDEV_UP event processing.
uvesafb/connector: Disallow unpliviged users to send netlink packets
pohmelfs/connector: Disallow unpliviged users to configure pohmelfs
dst/connector: Disallow unpliviged users to configure dst
dm/connector: Only process connector packages from privileged processes
connector: Removed the destruct_data callback since it is always kfree_skb()
connector/dm: Fixed a compilation warning
connector: Provide the sender's credentials to the callback
connector: Keep the skb in cn_callback_data
e1000e/igb/ixgbe: Don't report an error if devices don't support AER
net: Fix wrong sizeof
net: splice() from tcp to pipe should take into account O_NONBLOCK
net: Use sk_mark for routing lookup in more places
sky2: irqname based on pci address
skge: use unique IRQ name
IPv4 TCP fails to send window scale option when window scale is zero
net/ipv4/tcp.c: fix min() type mismatch warning
Kconfig: STRIP: Remove stale bits of STRIP help text
NET: mkiss: Fix typo
tg3: Remove prev_vlan_tag from struct tx_ring_info
...
This patch contains changes that allow iscsi_session_setup
to allocate private space for LLD's
Signed-off-by: Jayamohan Kallickal <jayamohank@serverengines.com>
Acked-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
For automated testing it is useful to have the option to turn
the warnings on copy_from_user() etc checks into errors:
In function ‘copy_from_user’,
inlined from ‘fd_copyin’ at drivers/block/floppy.c:3080,
inlined from ‘fd_ioctl’ at drivers/block/floppy.c:3503:
linux/arch/x86/include/asm/uaccess_32.h:213:
error: call to ‘copy_from_user_overflow’ declared with attribute error:
copy_from_user buffer size is not provably correct
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20091002075050.4e9f7641@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Disks formatted with DIF Type 2 reject READ/WRITE 6/10/12/16 commands
when protection is enabled. Only the 32-byte variants are supported.
Implement support for issusing 32-byte READ/WRITE and enable Type 2
drives in the protection type detection logic.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
So far we have only issued DIF commands if CONFIG_BLK_DEV_INTEGRITY is
enabled. However, communication between initiator and target should be
independent of protection information DMA. There are DIF-only host
adapters coming out that will be able to take advantage of this.
Move the relevant DIF bits to sd.c.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The checksum format is orthogonal to whether the protection information
is being passed on beyond the HBA or not. It is perfectly valid to use
a non-T10 CRC with WRITE_STRIP and READ_INSERT.
Consequently it no longer makes sense to explicitly refer to the
conversion in the protection operation. Update sd_dif and lpfc
accordingly.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Ihab Hamadi <Ihab.Hamadi@Emulex.Com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Complete the early_initcall() API by making it available in modules
too. To be used by the EDAC/MCE code.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <20091002132321.GC28682@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch clean up/fixes for memcg's uncharge soft limit path.
Problems:
Now, res_counter_charge()/uncharge() handles softlimit information at
charge/uncharge and softlimit-check is done when event counter per memcg
goes over limit. Now, event counter per memcg is updated only when
memory usage is over soft limit. Here, considering hierarchical memcg
management, ancesotors should be taken care of.
Now, ancerstors(hierarchy) are handled in charge() but not in uncharge().
This is not good.
Prolems:
1. memcg's event counter incremented only when softlimit hits. That's bad.
It makes event counter hard to be reused for other purpose.
2. At uncharge, only the lowest level rescounter is handled. This is bug.
Because ancesotor's event counter is not incremented, children should
take care of them.
3. res_counter_uncharge()'s 3rd argument is NULL in most case.
ops under res_counter->lock should be small. No "if" sentense is better.
Fixes:
* Removed soft_limit_xx poitner and checks in charge and uncharge.
Do-check-only-when-necessary scheme works enough well without them.
* make event-counter of memcg incremented at every charge/uncharge.
(per-cpu area will be accessed soon anyway)
* All ancestors are checked at soft-limit-check. This is necessary because
ancesotor's event counter may never be modified. Then, they should be
checked at the same time.
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The asm-generic/gpio.h header uses the might_sleep() macro but doesn't
include the header for it, so any source code that might include
linux/gpio.h before linux/kernel.h can easily lead to a build failure.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since 2.6.31 now has request-based device-mapper, it's useful to have
a tracepoint for request-remapping as well as bio-remapping.
This patch adds a tracepoint for request-remapping, trace_block_rq_remap().
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Currently we set the bio size to the byte equivalent of the blocks to
be trimmed when submitting the initial DISCARD ioctl. That means it
is subject to the max_hw_sectors limitation of the HBA which is
much lower than the size of a DISCARD request we can support.
Add a separate max_discard_sectors tunable to limit the size for discard
requests.
We limit the max discard request size in bytes to 32bit as that is the
limit for bio->bi_size. This could be much larger if we had a way to pass
that information through the block layer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
prepare_discard_fn() was being called in a place where memory allocation
was effectively impossible. This makes it inappropriate for all but
the most trivial translations of Linux's DISCARD operation to the block
command set. Additionally adding a payload there makes the ownership
of the bio backing unclear as it's now allocated by the device driver
and not the submitter as usual.
It is replaced with QUEUE_FLAG_DISCARD which is used to indicate whether
the queue supports discard operations or not. blkdev_issue_discard now
allocates a one-page, sector-length payload which is the right thing
for the common ATA and SCSI implementations.
The mtd implementation of prepare_discard_fn() is replaced with simply
checking for the request being a discard.
Largely based on a previous patch from Matthew Wilcox <matthew@wil.cx>
which did the prepare_discard_fn but not the different payload allocation
yet.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Since 2.6.31 now has request-based device-mapper, it's useful to have
a tracepoint for request-remapping as well as bio-remapping.
This patch adds a tracepoint for request-remapping, trace_block_rq_remap().
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Currently we set the bio size to the byte equivalent of the blocks to
be trimmed when submitting the initial DISCARD ioctl. That means it
is subject to the max_hw_sectors limitation of the HBA which is
much lower than the size of a DISCARD request we can support.
Add a separate max_discard_sectors tunable to limit the size for discard
requests.
We limit the max discard request size in bytes to 32bit as that is the
limit for bio->bi_size. This could be much larger if we had a way to pass
that information through the block layer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
prepare_discard_fn() was being called in a place where memory allocation
was effectively impossible. This makes it inappropriate for all but
the most trivial translations of Linux's DISCARD operation to the block
command set. Additionally adding a payload there makes the ownership
of the bio backing unclear as it's now allocated by the device driver
and not the submitter as usual.
It is replaced with QUEUE_FLAG_DISCARD which is used to indicate whether
the queue supports discard operations or not. blkdev_issue_discard now
allocates a one-page, sector-length payload which is the right thing
for the common ATA and SCSI implementations.
The mtd implementation of prepare_discard_fn() is replaced with simply
checking for the request being a discard.
Largely based on a previous patch from Matthew Wilcox <matthew@wil.cx>
which did the prepare_discard_fn but not the different payload allocation
yet.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Add a general per-cpu notifier that is called whenever the kernel is
about to return to userspace. The notifier uses a thread_info flag
and existing checks, so there is no impact on user return or context
switch fast paths.
This will be used initially to speed up KVM task switching by lazily
updating MSRs.
Signed-off-by: Avi Kivity <avi@redhat.com>
LKML-Reference: <1253342422-13811-1-git-send-email-avi@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
In order to support multiple codecs on the same system in the debugfs
the directory hierarchy need to be changed by adding directory per codec
under the asoc direcorty:
debugfs/asoc/{dev_name(socdev->dev)}-{codec->name}/codec_reg
/dapm_pop_time
/dapm/{widgets}
With the original implementation only the debugfs files are only
created for the first codec, other codecs loaded later would fail to
create the debugfs files (since they are already exist).
Furthermore in this situation any of the codecs has been removed, would
cause the debugfs entries to disappear, regardless if the codec, which
created them are still loaded (the one which loaded first).
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@nokia.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
A previous patch added the buffer size check to copy_from_user().
One of the things learned from analyzing the result of the previous
patch is that in general, gcc is really good at proving that the
code contains sufficient security checks to not need to do a
runtime check. But that for those cases where gcc could not prove
this, there was a relatively high percentage of real security
issues.
This patch turns the case of "gcc cannot prove" into a compile time
warning, as long as a sufficiently new gcc is in use that supports
this. The objective is that these warnings will trigger developers
checking new cases out before a security hole enters a linux kernel
release.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Cc: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <20090930130523.348ae6c4@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The conversion solves the problem that firmware size was set to 64KB
while non PnP cards have 128KB firmware files.
An additional firmware initialization code has been moved from the OSS
driver.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.
Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Fix time encoding with extra epoch bits
ext4: Add a stub for mpage_da_data in the trace header
jbd2: Use tracepoints for history file
ext4: Use tracepoints for mb_history trace file
ext4, jbd2: Drop unneeded printks at mount and unmount time
ext4: Handle nested ext4_journal_start/stop calls without a journal
ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode
ext4: Avoid updating the inode table bh twice in no journal mode
ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first
ext4: async direct IO for holes and fallocate support
ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O
ext4: Split uninitialized extents for direct I/O
ext4: release reserved quota when block reservation for delalloc retry
ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks
ext4: Fix hueristic which avoids group preallocation for closed files
ext4: Use ext4_msg() for ext4_da_writepage() errors
ext4: Update documentation about quota mount options
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PM / yenta: Fix cardbus suspend/resume regression
PM / PCMCIA: Drop second argument of pcmcia_socket_dev_suspend()
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (33 commits)
sony-laptop: re-read the rfkill state when resuming from suspend
sony-laptop: check for rfkill hard block at load time
wext: add back wireless/ dir in sysfs for cfg80211 interfaces
wext: Add bound checks for copy_from_user
mac80211: improve/fix mlme messages
cfg80211: always get BSS
iwlwifi: fix 3945 ucode info retrieval after failure
iwlwifi: fix memory leak in command queue handling
iwlwifi: fix debugfs buffer handling
cfg80211: don't set privacy w/o key
cfg80211: wext: don't display BSSID unless associated
net: Add explicit bound checks in net/socket.c
bridge: Fix double-free in br_add_if.
isdn: fix netjet/isdnhdlc build errors
atm: dereference of he_dev->rbps_virt in he_init_group()
ax25: Add missing dev_put in ax25_setsockopt
Revert "sit: stateless autoconf for isatap"
net: fix double skb free in dcbnl
net: fix nlmsg len size for skb when error bit is set.
net: fix vlan_get_size to include vlan_flags size
...
* 'drm-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (25 commits)
drm/radeon/kms: Convert R520 to new init path and associated cleanup
drm/radeon/kms: Convert RV515 to new init path and associated cleanup
drm: fix radeon DRM warnings when !CONFIG_DEBUG_FS
drm: fix drm_fb_helper warning when !CONFIG_MAGIC_SYSRQ
drm/r600: fix memory leak introduced with 64k malloc avoidance fix.
drm/kms: make fb helper work for all drivers.
drm/radeon/r600: fix offset handling in CS parser
drm/radeon/kms/r600: fix forcing pci mode on agp cards
drm/radeon/kms: fix for the extra pages copying.
drm/radeon/kms/r600: add support for vline relocs
drm/radeon/kms: fix some bugs in vline reloc
drm/radeon/kms/r600: clamp vram to aperture size
drm/kms: protect against fb helper not being created.
drm/r600: get values from the passed in IB not the copy.
drm: create gitignore file for radeon
drm/radeon/kms: remove unneeded master create/destroy functions.
drm/kms: start adding command line interface using fb.
fb: change rules for global rules match.
drm/radeon/kms: don't require up to 64k allocations. (v2)
drm/radeon/kms: enable dac load detection by default.
...
Trivial conflicts in drivers/gpu/drm/radeon/radeon_asic.h due to adding
'->vga_set_state' function pointers.
The tracepoint ext4_da_write_pages has a struct mpage_da_data*
parameter, but that struct is only defined in fs/ext4/ext4.h. This
patch adds a forward declaration for that struct, so this tracepoint
header can still be used by tools like SystemTap.
This is a continuation of the fix in commit 3661d286.
http://sourceware.org/bugzilla/show_bug.cgi?id=10703
Signed-off-by: Josh Stone <jistone@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The /proc/fs/jbd2/<dev>/history was maintained manually; by using
tracepoints, we can get all of the existing functionality of the /proc
file plus extra capabilities thanks to the ftrace infrastructure. We
save memory as a bonus.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a
number of problems: it required a largish amount of memory to be
allocated for each ext4 filesystem, and the s_mb_history_lock
introduced a CPU contention problem.
By ripping out the mb_history code and replacing it with ftrace
tracepoints, and we get more functionality: timestamps, event
filtering, the ability to correlate mballoc history with other ext4
tracepoints, etc.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
pcmcia_socket_dev_suspend() doesn't use its second argument, so it
may be dropped safely.
This change is necessary for the subsequent yenta suspend/resume fix.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@kernel.org
The move away from having drivers assign wireless handlers,
in favour of making cfg80211 assign them, broke the sysfs
registration (the wireless/ dir went missing) because the
handlers are now assigned only after registration, which is
too late.
Fix this by special-casing cfg80211-based devices, all
of which are required to have an ieee80211_ptr, in the
sysfs code, and also using get_wireless_stats() to have
the same values reported as in procfs.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reported-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Tested-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Work around problems in the writeback code to force out writebacks in
larger chunks than just 4mb, which is just too small. This also works
around limitations in the ext4 block allocator, which can't allocate
more than 2048 blocks at a time. So we need to defeat the round-robin
characteristics of the writeback code and try to write out as many
blocks in one inode before allowing the writeback code to move on to
another inode. We add a a new per-filesystem tunable,
max_writeback_mb_bump, which caps this to a default of 128mb per
inode.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Currently we are calling the bkl tracepoint callbacks just before the
bkl lock/unlock operations, ie the tracepoint call is not inside a
lock_kernel() function but inside a lock_kernel() macro. Hence the
bkl trace event header must be included from smp_lock.h. This raises
some nasty circular header dependencies:
linux/smp_lock.h -> trace/events/bkl.h -> trace/define_trace.h
-> trace/ftrace.h -> linux/ftrace_event.h -> linux/hardirq.h
-> linux/smp_lock.h
This results in incomplete event declarations, spurious event
definitions and other kind of funny behaviours.
This is hardly fixable without ugly workarounds. So instead, we push
the file name, line number and function name as lock_kernel()
parameters, so that we only deal with the trace event header from
lib/kernel_lock.c
This adds two parameters to lock_kernel() and unlock_kernel() but
it should be fine wrt to performances because this pair dos not seem
to be called in fast paths.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Add DAI format definition for PDM interfaces.
Signed-off-by: Misael Lopez Cruz <x0052729@ti.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
This initialises the fb helper with the connector helper,
so that the fb cmdline code works for intel as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The following commit made console open fails while booting:
commit b50989dc44
Author: Alan Cox <alan@linux.intel.com>
Date: Sat Sep 19 13:13:22 2009 -0700
tty: make the kref destructor occur asynchronously
Due to tty release routines run in a workqueue now, error like the
following will be reported while booting:
INIT open /dev/console Input/output error
It also causes hibernation regression to appear as reported at
http://bugzilla.kernel.org/show_bug.cgi?id=14229
The reason is that now there's latency issue with closing, but when
we open a "closing not finished" tty, -EIO will be returned.
Fix it as per the following Alan's suggestion:
Fun but it's actually not a bug and the fix is wrong in itself as
the port may be closing but not yet being destructed, in which case
it seems to do the wrong thing. Opening a tty that is closing (and
could be closing for long periods) is supposed to return -EIO.
I suspect a better way to deal with this and keep the old console
timing is to split tty->shutdown into two functions.
tty->shutdown() - called synchronously just before we dump the tty
onto the waitqueue for destruction
tty->cleanup() - called when the destructor runs.
We would then do the shutdown part which can occur in IRQ context
fine, before queueing the rest of the release (from tty->magic = 0
... the end) to occur asynchronously
The USB update in -next would then need a call like
if (tty->cleanup)
tty->cleanup(tty);
at the top of the async function and the USB shutdown to be split
between shutdown and cleanup as the USB resource cleanup and final
tidy cannot occur synchronously as it needs to sleep.
In other words the logic becomes
final kref put
make object unfindable
async
clean it up
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
[ rjw: Rebased on top of 2.6.31-git, reworked the changelog. ]
Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
[ Changed serial naming to match new rules, dropped tty_shutdown as per
comments from Alan Stern - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code
But leave TTM code alone, something is fishy there with global vm_ops
being used.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 645069299a.
While the code does not actually break anything, it does not completely follow
RFC5214 yet. After talking back with Fred L. Templin, I agree that completing the
ISATAP specific RS/RA code, would pollute the kernel a lot with code that is better
implemented in userspace.
The kernel should not send RS packages for ISATAP at all.
Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Acked-by: Fred L. Templin <Fred.L.Templin@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-backlight:
backlight: new driver for ADP5520/ADP5501 MFD PMICs
backlight: extend event support to also support poll()
backlight/eeepc-laptop: Update the backlight state when we change brightness
backlight/acpi: Update the backlight state when we change brightness
backlight: Allow drivers to update the core, and generate events on changes
backlight: switch to da903x driver to dev_pm_ops
backlight: Add support for the Avionic Design Xanthos backlight device.
backlight: spi driver for LMS283GF05 LCD
backlight: move hp680-bl's probe function to .devinit.text
backlight: Add support for new Apple machines.
backlight: mbp_nvidia_bl: add support for MacBookAir 1,1
backlight: Add WM831x backlight driver
Trivial conflicts due to '#ifdef CONFIG_PM' differences in
drivers/video/backlight/da903x_bl.c
* remove asm/atomic.h inclusion from kref.h -- not needed, linux/types.h
is enough for atomic_t
* remove linux/kref.h inclusion from files which do not need it.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
modules, tracing: Remove stale struct marker signature from module_layout()
tracing/workqueue: Use %pf in workqueue trace events
tracing: Fix a comment and a trivial format issue in tracepoint.h
tracing: Fix failure path in ftrace_regex_open()
tracing: Fix failure path in ftrace_graph_write()
tracing: Check the return value of trace_get_user()
tracing: Fix off-by-one in trace_get_user()
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (32 commits)
ACPI: i2c-scmi: don't use acpi_device_uid()
ACPI: simplify building device HID/CID list
ACPI: remove acpi_device_uid() and related stuff
ACPI: remove acpi_device.flags.hardware_id
ACPI: remove acpi_device.flags.compatible_ids
ACPI: maintain a single list of _HID and _CID IDs
ACPI: make sure every acpi_device has an ID
ACPI: use acpi_device_hid() when possible
ACPI: fix synthetic HID for \_SB_
ACPI: handle re-enumeration, when acpi_devices might already exist
ACPI: factor out device type and status checking
ACPI: add acpi_bus_get_status_handle()
ACPI: use acpi_walk_namespace() to enumerate devices
ACPI: identify device tree root by null parent pointer, not ACPI_BUS_TYPE
ACPI: enumerate namespace before adding functional fixed hardware devices
ACPI: convert acpi_bus_scan() to operate on an acpi_handle
ACPI: add acpi_bus_get_parent() and remove "parent" arguments
ACPI: remove unnecessary argument checking
ACPI: remove redundant "type" arguments
ACPI: remove acpi_device_set_context() "type" argument
...
gcc (4.x) supports the __builtin_object_size() builtin, which
reports the size of an object that a pointer point to, when known
at compile time. If the buffer size is not known at compile time, a
constant -1 is returned.
This patch uses this feature to add a sanity check to
copy_from_user(); if the target buffer is known to be smaller than
the copy size, the copy is aborted and a WARNing is emitted in
memory debug mode.
These extra checks compile away when the object size is not known,
or if both the buffer size and the copy length are constants.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <20090926143301.2c396b94@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Sometimes we only want to write pages from a specific super_block,
so allow that to be passed in.
This fixes a problem with commit 56a131dcf7
causing writeback on all super_blocks on a bdi, where we only really
want to sync a specific sb from writeback_inodes_sb().
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Nobody uses acpi_device_uid(), so this patch removes it.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Every acpi_device has at least one ID (if there's no _HID or _CID, we
give it a synthetic or default ID). So there's no longer a need to
check whether an ID exists; we can just use it.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
We now keep a single list of IDs that includes both the _HID and any
_CIDs. We no longer need to keep track of whether the device has a _CID.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
There's no need to treat _HID and _CID differently. Keeping them in
a single list makes code that uses the IDs a little simpler because it
can just traverse the list rather than checking "do we have a HID?",
"do we have any CIDs?"
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Reviewed-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Add acpi_bus_get_status_handle() so we can get the status of a namespace
object before building a struct acpi_device.
This removes a use of "device->flags.dynamic_status", a cached indicator of
whether _STA exists. It seems simpler and more reliable to just evaluate
_STA and catch AE_NOT_FOUND errors.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
We can identify the root of the ACPI device tree by the fact that it
has no parent. This is simpler than passing around ACPI_BUS_TYPE_SYSTEM
and will help remove special treatment of the device tree root.
Currently, we add the root by hand with ACPI_BUS_TYPE_SYSTEM. If we
traverse the tree treating the root as just another device and use
acpi_get_type(), the root shows up as ACPI_TYPE_DEVICE.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Most uses of the ACPI bus device_type (ACPI_BUS_TYPE_DEVICE,
ACPI_BUS_TYPE_POWER, etc) are during device initialization, but
we do need it later for notify handler installation, since that
is different for fixed hardware devices vs. namespace devices.
This patch saves the device_type in the acpi_device structure,
so we can check that rather than comparing against the _HID string.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (94 commits)
genetlink: fix netns vs. netlink table locking (2)
3c59x: Get rid of "Trying to free already-free IRQ"
tunnel: eliminate recursion field
ems_pci: fix size of CAN controllers BAR mapping for CPC-PCI v2
net: fix htmldocs sunrpc, clnt.c
Phonet: error on broadcast sending (unimplemented)
Phonet: fix race for port number in concurrent bind()
pktgen: better scheduler friendliness
pktgen: T_TERMINATE flag is unused
ipv4: check optlen for IP_MULTICAST_IF option
ath9k: Initialize txgain and rxgain for newer AR9287 chipsets.
iwlagn: fix panic in iwl{5000,4965}_rx_reply_tx
ath9k: Fix RFKILL bugs
drivers/net/wireless: Use usb_endpoint_dir_out
cfg80211: don't overwrite privacy setting
wl12xx: fix kconfig/link errors
rt2x00: fix the definition of rt2x00crypto_rx_insert_iv
iwlwifi: reduce noise when skb allocation fails
iwlwifi: do not send sync command while holding spinlock
mac80211: fix DTIM setting
...
[note this requires an fb patch posted to linux-fbdev-devel already]
This uses the normal video= command line option to control the kms
output setup at boot time. It is used to override the autodetection
done by kms.
video= normally takes a framebuffer as the first parameter, in kms
it will take a connector name, DVI-I-1, or LVDS-1 etc. If no output
connector is specified the mode string will apply to all connectors.
The mode specification used will match down the probed modes, and if
no mode is found it will add a CVT mode that matches.
video=1024x768 - all connectors match a 1024x768 mode or add a CVT on
video=VGA-1:1024x768, VGA-1 connector gets mode only.
The same strings as used in current fb modedb.c are used, except I've
added three more letters, e, D, d, e = enable, D = enable Digital,
d = disable, which allow a connector to be forced into a certain state.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The NOMMU fallback for is_vmalloc_or_module_addr() should be static inline,
not just static, in linux/mm.h.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The old RW_DATA_SECTION had INIT_TASK_DATA (which was
more-than-PAGE_SIZE-aligned), followed by a bunch of small alignment
stuff, followed by more PAGE_SIZE-aligned stuff, so you wasted memory
in the middle of .data re-aligning back up to PAGE_SIZE.
This patch sorts the sections by alignment requirements, which should
pack them essentially optimally.
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Why macros are always wrong:
mm/mmap.c: In function 'do_mmap_pgoff':
mm/mmap.c:953: warning: unused variable 'user'
also, move a couple of struct forward-decls outside `#ifdef
CONFIG_HUGETLB_PAGE' - it's pointless and frequently harmful to make these
conditional (eg, this patch needed `struct user_struct').
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Similar to commit d136f1bd36,
there's a bug when unregistering a generic netlink family,
which is caught by the might_sleep() added in that commit:
BUG: sleeping function called from invalid context at net/netlink/af_netlink.c:183
in_atomic(): 1, irqs_disabled(): 0, pid: 1510, name: rmmod
2 locks held by rmmod/1510:
#0: (genl_mutex){+.+.+.}, at: [<ffffffff8138283b>] genl_unregister_family+0x2b/0x130
#1: (rcu_read_lock){.+.+..}, at: [<ffffffff8138270c>] __genl_unregister_mc_group+0x1c/0x120
Pid: 1510, comm: rmmod Not tainted 2.6.31-wl #444
Call Trace:
[<ffffffff81044ff9>] __might_sleep+0x119/0x150
[<ffffffff81380501>] netlink_table_grab+0x21/0x100
[<ffffffff813813a3>] netlink_clear_multicast_users+0x23/0x60
[<ffffffff81382761>] __genl_unregister_mc_group+0x71/0x120
[<ffffffff81382866>] genl_unregister_family+0x56/0x130
[<ffffffffa0007d85>] nl80211_exit+0x15/0x20 [cfg80211]
[<ffffffffa000005a>] cfg80211_exit+0x1a/0x40 [cfg80211]
Fix in the same way by grabbing the netlink table lock
before doing rcu_read_lock().
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems recursion field from "struct ip_tunnel" is not anymore needed.
recursion prevention is done at the upper level (in dev_queue_xmit()),
since we use HARD_TX_LOCK protection for tunnels.
This avoids a cache line ping pong on "struct ip_tunnel" : This structure
should be now mostly read on xmit and receive paths.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we ever implement this, then we can stop returning an error.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel: (57 commits)
drm/i915: Handle ERESTARTSYS during page fault
drm/i915: Warn before mmaping a purgeable buffer.
drm/i915: Track purged state.
drm/i915: Remove eviction debug spam
drm/i915: Immediately discard any backing storage for uneeded objects
drm/i915: Do not mis-classify clean objects as purgeable
drm/i915: Whitespace correction for madv
drm/i915: BUG_ON page refleak during unbind
drm/i915: Search harder for a reusable object
drm/i915: Clean up evict from list.
drm/i915: Add tracepoints
drm/i915: framebuffer compression for GM45+
drm/i915: split display functions by chip type
drm/i915: Skip the sanity checks if the current relocation is valid
drm/i915: Check that the relocation points to within the target
drm/i915: correct FBC update when pipe base update occurs
drm/i915: blacklist Acer AspireOne lid status
ACPI: make ACPI button funcs no-ops if not built in
drm/i915: prevent FIFO calculation overflows on 32 bits with high dotclocks
drm/i915: intel_display.c handle latency variable efficiently
...
Fix up trivial conflicts in drivers/gpu/drm/i915/{i915_dma.c|i915_drv.h}
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
module: don't call percpu_modfree on NULL pointer.
module: fix memory leak when load fails after srcversion/version allocated
module: preferred way to use MODULE_AUTHOR
param: allow whitespace as kernel parameter separator
module: reduce string table for loaded modules (v2)
module: reduce symbol table for loaded modules (v2)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
truncate: use new helpers
truncate: new helpers
fs: fix overflow in sys_mount() for in-kernel calls
fs: Make unload_nls() NULL pointer safe
freeze_bdev: grab active reference to frozen superblocks
freeze_bdev: kill bd_mount_sem
exofs: remove BKL from super operations
fs/romfs: correct error-handling code
vfs: seq_file: add helpers for data filling
vfs: remove redundant position check in do_sendfile
vfs: change sb->s_maxbytes to a loff_t
vfs: explicitly cast s_maxbytes in fiemap_check_ranges
libfs: return error code on failed attr set
seq_file: return a negative error code when seq_path_root() fails.
vfs: optimize touch_time() too
vfs: optimization for touch_atime()
vfs: split generic_forget_inode() so that hugetlbfs does not have to copy it
fs/inode.c: add dev-id and inode number for debugging in init_special_inode()
libfs: make simple_read_from_buffer conventional
For the longest time now we've been using multiple MODULE_AUTHOR()
statements when a module has more than one author, but the comment here
disagrees.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Luciano Coelho <luciano.coelho@nokia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Also remove all parts of the string table (referenced by the symbol
table) that are not needed for kallsyms use (i.e. which were only
referenced by symbols discarded by the previous patch, or not
referenced at all for whatever reason).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Discard all symbols not interesting for kallsyms use: absolute,
section, and in the common case (!KALLSYMS_ALL) data ones.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* 'for-linus' of git://neil.brown.name/md: (97 commits)
md: raid-1/10: fix RW bits manipulation
md: remove unnecessary memset from multipath.
md: report device as congested when suspended
md: Improve name of threads created by md_register_thread
md: remove sparse warnings about lock context.
md: remove sparse waring "symbol xxx shadows an earlier one"
async_tx/raid6: add missing dma_unmap calls to the async fail case
ioat3: fix uninitialized var warnings
drivers/dma/ioat/dma_v2.c: fix warnings
raid6test: fix stack overflow
ioat2: clarify ring size limits
md/raid6: cleanup ops_run_compute6_2
md/raid6: eliminate BUG_ON with side effect
dca: module load should not be an error message
ioat: driver version 4.0
dca: registering requesters in multiple dca domains
async_tx: remove HIGHMEM64G restriction
dmaengine: sh: Add Support SuperH DMA Engine driver
dmaengine: Move all map_sg/unmap_sg for slave channel to its client
fsldma: Add DMA_SLAVE support
...
* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
HWPOISON: Enable error_remove_page on btrfs
HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
HWPOISON: Add madvise() based injector for hardware poisoned pages v4
HWPOISON: Enable error_remove_page for NFS
HWPOISON: Enable .remove_error_page for migration aware file systems
HWPOISON: The high level memory error handler in the VM v7
HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
HWPOISON: shmem: call set_page_dirty() with locked page
HWPOISON: Define a new error_remove_page address space op for async truncation
HWPOISON: Add invalidate_inode_page
HWPOISON: Refactor truncate to allow direct truncating of page v2
HWPOISON: check and isolate corrupted free pages v2
HWPOISON: Handle hardware poisoned pages in try_to_unmap
HWPOISON: Use bitmask/action code for try_to_unmap behaviour
HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
HWPOISON: Add poison check to page fault handling
HWPOISON: Add basic support for poisoned pages in fault handler v3
HWPOISON: Add new SIGBUS error codes for hardware poison signals
HWPOISON: Add support for poison swap entries v2
HWPOISON: Export some rmap vma locking to outside world
...
Because the binfmt is not different between threads in the same process,
it can be moved from task_struct to mm_struct. And binfmt moudle is
handled per mm_struct instead of task_struct.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When unaligned accesses are required for uncompressing a kernel (such as
for LZO decompression on ARM in a patch that follows), including
<linux/kernel.h> causes issues as it brings in a lot of things that are
not available in the decompression environment.
linux/kernel.h brings at least:
extern int console_printk[];
extern const char hex_asc[];
which causes errors at link-time as they are not available when
compiling the pre-boot environement. There are also a few others:
arch/arm/boot/compressed/misc.o: In function `valid_user_regs':
arch/arm/include/asm/ptrace.h:158: undefined reference to `elf_hwcap'
arch/arm/boot/compressed/misc.o: In function `console_silent':
include/linux/kernel.h:292: undefined reference to `console_printk'
arch/arm/boot/compressed/misc.o: In function `console_verbose':
include/linux/kernel.h:297: undefined reference to `console_printk'
arch/arm/boot/compressed/misc.o: In function `pack_hex_byte':
include/linux/kernel.h:360: undefined reference to `hex_asc'
arch/arm/boot/compressed/misc.o: In function `hweight_long':
include/linux/bitops.h:45: undefined reference to `hweight32'
arch/arm/boot/compressed/misc.o: In function `__cmpxchg_local_generic':
include/asm-generic/cmpxchg-local.h:21: undefined reference to `wrong_size_cmpxchg'
include/asm-generic/cmpxchg-local.h:42: undefined reference to `wrong_size_cmpxchg'
arch/arm/boot/compressed/misc.o: In function `__xchg':
arch/arm/include/asm/system.h:309: undefined reference to `__bad_xchg'
However, those files apparently use nothing from <linux/kernel.h>, all
they need is the declaration of types such as u32 or u64, so
<linux/types.h> should be enough
Signed-off-by: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
->ioctx_lock and ->ioctx_list are used only under CONFIG_AIO.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The forward decls for some kernel types are only needed by the code behind
__KERNEL__, so don't bleed these types to userspace.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's unused.
It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.
It _was_ used in two places at arch/frv for some reason.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__fatal_signal_pending inlines to one instruction on x86, probably two
instructions on other machines. It takes two longer x86 instructions just
to call it and test its return value, not to mention the function itself.
On my random x86_64 config, this saved 70 bytes of text (59 of those being
__fatal_signal_pending itself).
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In order to direct the SIGIO signal to a particular thread of a
multi-threaded application we cannot, like suggested by the manpage, put a
TID into the regular fcntl(F_SETOWN) call. It will still be send to the
whole process of which that thread is part.
Since people do want to properly direct SIGIO we introduce F_SETOWN_EX.
The need to direct SIGIO comes from self-monitoring profiling such as with
perf-counters. Perf-counters uses SIGIO to notify that new sample data is
available. If the signal is delivered to the same task that generated the
new sample it can augment that data by inspecting the task's user-space
state right after it returns from the kernel. This is esp. convenient
for interpreted or virtual machine driven environments.
Both F_SETOWN_EX and F_GETOWN_EX take a pointer to a struct f_owner_ex
as argument:
struct f_owner_ex {
int type;
pid_t pid;
};
Where type is one of F_OWNER_TID, F_OWNER_PID or F_OWNER_GID.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: stephane eranian <eranian@googlemail.com>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce do_send_sig_info() and convert group_send_sig_info(),
send_sig_info(), do_send_specific() to use this helper.
Hopefully it will have more users soon, it allows to specify
specific/group behaviour via "bool group" argument.
Shaves 80 bytes from .text.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sys_delete_module() can set MODULE_STATE_GOING after
search_binary_handler() does try_module_get(). In this case
set_binfmt()->try_module_get() fails but since none of the callers
check the returned error, the task will run with the wrong old
->binfmt.
The proper fix should change all ->load_binary() methods, but we can
rely on fact that the caller must hold a reference to binfmt->module
and use __module_get() which never fails.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This changes tracehook_notify_jctl() so it's called with the siglock held,
and changes its argument and return value definition. These clean-ups
make it a better fit for what new tracing hooks need to check.
Tracing needs the siglock here, held from the time TASK_STOPPED was set,
to avoid potential SIGCONT races if it wants to allow any blocking in its
tracing hooks.
This also folds the finish_stop() function into its caller
do_signal_stop(). The function is short, called only once and only
unconditionally. It aids readability to fold it in.
[oleg@redhat.com: do not call tracehook_notify_jctl() in TASK_STOPPED state]
[oleg@redhat.com: introduce tracehook_finish_jctl() helper]
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The bug is old, it wasn't cause by recent changes.
Test case:
static void *tfunc(void *arg)
{
int pid = (long)arg;
assert(ptrace(PTRACE_ATTACH, pid, NULL, NULL) == 0);
kill(pid, SIGKILL);
sleep(1);
return NULL;
}
int main(void)
{
pthread_t th;
long pid = fork();
if (!pid)
pause();
signal(SIGCHLD, SIG_IGN);
assert(pthread_create(&th, NULL, tfunc, (void*)pid) == 0);
int r = waitpid(-1, NULL, __WNOTHREAD);
printf("waitpid: %d %m\n", r);
return 0;
}
Before the patch this program hangs, after this patch waitpid() correctly
fails with errno == -ECHILD.
The problem is, __ptrace_detach() reaps the EXIT_ZOMBIE tracee if its
->real_parent is our sub-thread and we ignore SIGCHLD. But in this case
we should wake up other threads which can sleep in do_wait().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Vitaly Mayatskikh <vmayatsk@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement reclaim from groups over their soft limit
Permit reclaim from memory cgroups on contention (via the direct reclaim
path).
memory cgroup soft limit reclaim finds the group that exceeds its soft
limit by the largest number of pages and reclaims pages from it and then
reinserts the cgroup into its correct place in the rbtree.
Add additional checks to mem_cgroup_hierarchical_reclaim() to detect long
loops in case all swap is turned off. The code has been refactored and
the loop check (loop < 2) has been enhanced for soft limits. For soft
limits, we try to do more targetted reclaim. Instead of bailing out after
two loops, the routine now reclaims memory proportional to the size by
which the soft limit is exceeded. The proportion has been empirically
determined.
[akpm@linux-foundation.org: build fix]
[kamezawa.hiroyu@jp.fujitsu.com: fix softlimit css refcnt handling]
[nishimura@mxp.nes.nec.co.jp: refcount of the "victim" should be decremented before exiting the loop]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Organize cgroups over soft limit in a RB-Tree
Introduce an RB-Tree for storing memory cgroups that are over their soft
limit. The overall goal is to
1. Add a memory cgroup to the RB-Tree when the soft limit is exceeded.
We are careful about updates, updates take place only after a particular
time interval has passed
2. We remove the node from the RB-Tree when the usage goes below the soft
limit
The next set of patches will exploit the RB-Tree to get the group that is
over its soft limit by the largest amount and reclaim from it, when we
face memory contention.
[hugh.dickins@tiscali.co.uk: CONFIG_CGROUP_MEM_RES_CTLR=y CONFIG_PREEMPT=y fails to boot]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add an interface to allow get/set of soft limits. Soft limits for memory
plus swap controller (memsw) is currently not supported. Resource
counters have been enhanced to support soft limits and new type
RES_SOFT_LIMIT has been added. Unlike hard limits, soft limits can be
directly set and do not need any reclaim or checks before setting them to
a newer value.
Kamezawa-San raised a question as to whether soft limit should belong to
res_counter. Since all resources understand the basic concepts of hard
and soft limits, it is justified to add soft limits here. Soft limits are
a generic resource usage feature, even file system quotas support soft
limits.
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change the memory cgroup to remove the overhead associated with accounting
all pages in the root cgroup. As a side-effect, we can no longer set a
memory hard limit in the root cgroup.
A new flag to track whether the page has been accounted or not has been
added as well. Flags are now set atomically for page_cgroup,
pcg_default_flags is now obsolete and removed.
[akpm@linux-foundation.org: fix a few documentation glitches]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alter the ss->can_attach and ss->attach functions to be able to deal with
a whole threadgroup at a time, for use in cgroup_attach_proc. (This is a
pre-patch to cgroup-procs-writable.patch.)
Currently, new mode of the attach function can only tell the subsystem
about the old cgroup of the threadgroup leader. No subsystem currently
needs that information for each thread that's being moved, but if one were
to be added (for example, one that counts tasks within a group) this bit
would need to be reworked a bit to tell the subsystem the right
information.
[hidave.darkstar@gmail.com: fix build]
Signed-off-by: Ben Blum <bblum@google.com>
Signed-off-by: Paul Menage <menage@google.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Reviewed-by: Matt Helsley <matthltc@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Changes css_set freeing mechanism to be under RCU
This is a prepatch for making the procs file writable. In order to free the
old css_sets for each task to be moved as they're being moved, the freeing
mechanism must be RCU-protected, or else we would have to have a call to
synchronize_rcu() for each task before freeing its old css_set.
Signed-off-by: Ben Blum <bblum@google.com>
Signed-off-by: Paul Menage <menage@google.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Previously there was the problem in which two processes from different pid
namespaces reading the tasks or procs file could result in one process
seeing results from the other's namespace. Rather than one pidlist for
each file in a cgroup, we now keep a list of pidlists keyed by namespace
and file type (tasks versus procs) in which entries are placed on demand.
Each pidlist has its own lock, and that the pidlists themselves are passed
around in the seq_file's private pointer means we don't have to touch the
cgroup or its master list except when creating and destroying entries.
Signed-off-by: Ben Blum <bblum@google.com>
Signed-off-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
struct cgroup used to have a bunch of fields for keeping track of the
pidlist for the tasks file. Those are now separated into a new struct
cgroup_pidlist, of which two are had, one for procs and one for tasks.
The way the seq_file operations are set up is changed so that just the
pidlist struct gets passed around as the private data.
Interface example: Suppose a multithreaded process has pid 1000 and other
threads with ids 1001, 1002, 1003:
$ cat tasks
1000
1001
1002
1003
$ cat cgroup.procs
1000
$
Signed-off-by: Ben Blum <bblum@google.com>
Signed-off-by: Paul Menage <menage@google.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following series adds a "cgroup.procs" file to each cgroup that
reports unique tgids rather than pids, and allows all threads in a
threadgroup to be atomically moved to a new cgroup.
The subsystem "attach" interface is modified to support attaching whole
threadgroups at a time, which could introduce potential problems if any
subsystem were to need to access the old cgroup of every thread being
moved. The attach interface may need to be revised if this becomes the
case.
Also added is functionality for read/write locking all CLONE_THREAD
fork()ing within a threadgroup, by means of an rwsem that lives in the
sighand_struct, for per-threadgroup-ness and also for sharing a cacheline
with the sighand's atomic count. This scheme should introduce no extra
overhead in the fork path when there's no contention.
The final patch reveals potential for a race when forking before a
subsystem's attach function is called - one potential solution in case any
subsystem has this problem is to hang on to the group's fork mutex through
the attach() calls, though no subsystem yet demonstrates need for an
extended critical section.
This patch:
Revert
commit 096b7fe012
Author: Li Zefan <lizf@cn.fujitsu.com>
AuthorDate: Wed Jul 29 15:04:04 2009 -0700
Commit: Linus Torvalds <torvalds@linux-foundation.org>
CommitDate: Wed Jul 29 19:10:35 2009 -0700
cgroups: fix pid namespace bug
This is in preparation for some clashing cgroups changes that subsume the
original commit's functionaliy.
The original commit fixed a pid namespace bug which Ben Blum fixed
independently (in the same way, but with different code) as part of a
series of patches. I played around with trying to reconcile Ben's patch
series with Li's patch, but concluded that it was simpler to just revert
Li's, given that Ben's patch series contained essentially the same fix.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are many similar code in kernel for one object: convert time between
calendar time and broken-down time.
Here is some source I found:
fs/ncpfs/dir.c
fs/smbfs/proc.c
fs/fat/misc.c
fs/udf/udftime.c
fs/cifs/netmisc.c
net/netfilter/xt_time.c
drivers/scsi/ips.c
drivers/input/misc/hp_sdc_rtc.c
drivers/rtc/rtc-lib.c
arch/ia64/hp/sim/boot/fw-emu.c
arch/m68k/mac/misc.c
arch/powerpc/kernel/time.c
arch/parisc/include/asm/rtc.h
...
We can make a common function for this type of conversion, At least we
can get following benefit:
1: Make kernel simple and unify
2: Easy to fix bug in converting code
3: Reduce clone of code in future
For example, I'm trying to make ftrace display walltime,
this patch will make me easy.
This code is based on code from glibc-2.6
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add two events lock_kernel and unlock_kernel() to trace the bkl uses.
This opens the door for userspace tools to perform statistics about
the callsites that use it, dependencies with other locks (by pairing
the trace with lock events), use with recursivity and so on...
The {__reacquire,release}_kernel_lock() events are not traced because
these are called from schedule, thus the sched events are sufficient
to trace them.
Example of a trace:
hald-addon-stor-4152 [000] 165.875501: unlock_kernel: depth: 0, fs/block_dev.c:1358 __blkdev_put()
hald-addon-stor-4152 [000] 167.832974: lock_kernel: depth: 0, fs/block_dev.c:1167 __blkdev_get()
How to get the callsites that acquire it recursively:
cd /debug/tracing/events/bkl
echo "lock_depth > 0" > filter
firefox-4951 [001] 206.276967: unlock_kernel: depth: 1, fs/reiserfs/super.c:575 reiserfs_dirty_inode()
You can also filter by file and/or line.
v2: Use of FILTER_PTR_STRING attribute for files and lines fields to
make them traceable.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Introduce new truncate helpers truncate_pagecache and inode_newsize_ok.
vmtruncate is also consolidated from mm/memory.c and mm/nommu.c and
into mm/truncate.c.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Currently we held s_umount while a filesystem is frozen, despite that we
might return to userspace and unlock it from a different process. Instead
grab an active reference to keep the file system busy and add an explicit
check for frozen filesystems in remount and reject the remount instead
of blocking on s_umount.
Add a new get_active_super helper to super.c for use by freeze_bdev that
grabs an active reference to a superblock from a given block device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that we have the freeze count there is not much reason for bd_mount_sem
anymore. The actual freeze/thaw operations are serialized using the
bd_fsfreeze_mutex, and the only other place we take bd_mount_sem is
get_sb_bdev which tries to prevent mounting a filesystem while the block
device is frozen. Instead of add a check for bd_fsfreeze_count and
return -EBUSY if a filesystem is frozen. While that is a change in user
visible behaviour a failing mount is much better for this case rather
than having the mount process stuck uninterruptible for a long time.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add two helpers that allow access to the seq_file's own buffer, but
hide the internal details of seq_files.
This allows easier implementation of special purpose filling
functions. It also cleans up some existing functions which duplicated
the seq_file logic.
Make these inline functions in seq_file.h, as suggested by Al.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
sb->s_maxbytes is supposed to indicate the maximum size of a file that can
exist on the filesystem. It's declared as an unsigned long long.
Even if a filesystem has no inherent limit that prevents it from using
every bit in that unsigned long long, it's still problematic to set it to
anything larger than MAX_LFS_FILESIZE. There are places in the kernel
that cast s_maxbytes to a signed value. If it's set too large then this
cast makes it a negative number and generally breaks the comparison.
Change s_maxbytes to be loff_t instead. That should help eliminate the
temptation to set it too large by making it a signed value.
Also, add a warning for couple of releases to help catch filesystems that
set s_maxbytes too large. Eventually we can either convert this to a
BUG() or just remove it and in the hope that no one will get it wrong now
that it's a signed value.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Robert Love <rlove@google.com>
Cc: Mandeep Singh Baines <msb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Hugetlbfs needs to do special things instead of truncate_inode_pages().
Currently, it copied generic_forget_inode() except for
truncate_inode_pages() call which is asking for trouble (the code there
isn't trivial). So create a separate function generic_detach_inode()
which does all the list magic done in generic_forget_inode() and call
it from hugetlbfs_forget_inode().
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (39 commits)
cpumask: Move deprecated functions to end of header.
cpumask: remove unused deprecated functions, avoid accusations of insanity
cpumask: use new-style cpumask ops in mm/quicklist.
cpumask: use mm_cpumask() wrapper: x86
cpumask: use mm_cpumask() wrapper: um
cpumask: use mm_cpumask() wrapper: mips
cpumask: use mm_cpumask() wrapper: mn10300
cpumask: use mm_cpumask() wrapper: m32r
cpumask: use mm_cpumask() wrapper: arm
cpumask: Use accessors for cpu_*_mask: um
cpumask: Use accessors for cpu_*_mask: powerpc
cpumask: Use accessors for cpu_*_mask: mips
cpumask: Use accessors for cpu_*_mask: m32r
cpumask: remove arch_send_call_function_ipi
cpumask: arch_send_call_function_ipi_mask: s390
cpumask: arch_send_call_function_ipi_mask: powerpc
cpumask: arch_send_call_function_ipi_mask: mips
cpumask: arch_send_call_function_ipi_mask: m32r
cpumask: arch_send_call_function_ipi_mask: alpha
cpumask: remove obsolete topology_core_siblings and topology_thread_siblings: ia64
...
* remove asm/atomic.h inclusion from linux/utsname.h --
not needed after kref conversion
* remove linux/utsname.h inclusion from files which do not need it
NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
due to some personality stuff it _is_ needed -- cowardly leave ELF-related
headers and files alone.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The new ones have pretty kerneldoc. Move the old ones to the end to
avoid confusing people.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: benh@kernel.crashing.org
We're not forcing removal of the old cpu_ functions, but we might as
well delete the now-unused ones.
Especially CPUMASK_ALLOC and friends. I actually got a phone call (!)
from a hacker who thought I had introduced them as the new cpumask
API. He seemed bewildered that I had lost all taste.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: benh@kernel.crashing.org
By 7be23e278f, mask field was deleted by irqaction. However, it was not
deleted from comment.
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu.nobuhiro@renesas.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Up until 1.1.83, the primitive human tribes used struct sigaction for
interrupts. The sa_mask field was overloaded to hold a pointer to the
name.
When someone created the new "struct irqaction" they carried across
the "mask" field as a kind of ancestor worship: the fact that it was
unused makes clear its spiritual significance.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(Thanks to Al Viro for reminding me of this, via Ingo)
CPU_MASK_ALL is the (deprecated) "all bits set" cpumask, defined as so:
#define CPU_MASK_ALL (cpumask_t) { { ... } }
Taking the address of such a temporary is questionable at best,
unfortunately 321a8e9d (cpumask: add CPU_MASK_ALL_PTR macro) added
CPU_MASK_ALL_PTR:
#define CPU_MASK_ALL_PTR (&CPU_MASK_ALL)
Which formalizes this practice. One day gcc could bite us over this
usage (though we seem to have gotten away with it so far).
Now all callers are removed, we kill it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mike Travis <travis@sgi.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (30 commits)
Use macros for .data.page_aligned section.
Use macros for .bss.page_aligned section.
Use new __init_task_data macro in arch init_task.c files.
kbuild: Don't define ALIGN and ENTRY when preprocessing linker scripts.
arm, cris, mips, sparc, powerpc, um, xtensa: fix build with bash 4.0
kbuild: add static to prototypes
kbuild: fail build if recordmcount.pl fails
kbuild: set -fconserve-stack option for gcc 4.5
kbuild: echo the record_mcount command
gconfig: disable "typeahead find" search in treeviews
kbuild: fix cc1 options check to ensure we do not use -fPIC when compiling
checkincludes.pl: add option to remove duplicates in place
markup_oops: use modinfo to avoid confusion with underscored module names
checkincludes.pl: provide usage helper
checkincludes.pl: close file as soon as we're done with it
ctags: usability fix
kernel hacking: move STRIP_ASM_SYMS from General
gitignore usr/initramfs_data.cpio.bz2 and usr/initramfs_data.cpio.lzma
kbuild: Check if linker supports the -X option
kbuild: introduce ld-option
...
Fix trivial conflict in scripts/basic/fixdep.c
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Propagate 'fsc' mount option through automounts
sunrpc/rpc_pipe: fix kernel-doc notation
sunrpc: xdr_xcode_hyper helpers cannot presume 64-bit alignment
NFS: Add nfs_alloc_parsed_mount_data
NFS/RPC: fix problems with reestablish_timeout and related code.
NFS: Get rid of the NFS_MOUNT_VER3 and NFS_MOUNT_TCP flags
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: Update documentation to add fscache related bits
9p: Add fscache support to 9p
9p: Fix the incorrect update of inode size in v9fs_file_write()
9p: Use the i_size_[read, write]() macros instead of using inode->i_size directly.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
SELinux: do not destroy the avc_cache_nodep
KEYS: Have the garbage collector set its timer for live expired keys
tpm-fixup-pcrs-sysfs-file-update
creds_are_invalid() needs to be exported for use by modules:
include/linux/cred.h: fix build
Fix trivial BUILD_BUG_ON-induced conflicts in drivers/char/tpm/tpm.c
Fix new kernel-doc warnings in serial_core.[hc] files.
Warning(include/linux/serial_core.h:485): No description found for parameter 'uport'
Warning(include/linux/serial_core.h:485): Excess function parameter 'port' description in 'uart_handle_dcd_change'
Warning(include/linux/serial_core.h:511): No description found for parameter 'uport'
Warning(include/linux/serial_core.h:511): Excess function parameter 'port' description in 'uart_handle_cts_change'
Warning(drivers/serial/serial_core.c:2437): No description found for parameter 'uport'
Warning(drivers/serial/serial_core.c:2437): Excess function parameter 'port' description in 'uart_add_one_port'
Warning(drivers/serial/serial_core.c:2509): No description found for parameter 'uport'
Warning(drivers/serial/serial_core.c:2509): Excess function parameter 'port' description in 'uart_remove_one_port'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds a persistent, read-only caching facility for
9p clients using the FS-Cache caching backend.
When the fscache facility is enabled, each inode is associated
with a corresponding vcookie which is an index into the FS-Cache
indexing tree. The FS-Cache indexing tree is indexed at 3 levels:
- session object associated with each mount.
- inode/vcookie
- actual data (pages)
A cache tag is chosen randomly for each session. These tags can
be read off /sys/fs/9p/caches and can be passed as a mount-time
parameter to re-attach to the specified caching session.
Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
mips allmodconfig:
include/linux/cred.h: In function `creds_are_invalid':
include/linux/cred.h:187: error: `PAGE_SIZE' undeclared (first use in this function)
include/linux/cred.h:187: error: (Each undeclared identifier is reported only once
include/linux/cred.h:187: error: for each function it appears in.)
Fixes
commit b6dff3ec5e
Author: David Howells <dhowells@redhat.com>
AuthorDate: Fri Nov 14 10:39:16 2008 +1100
Commit: James Morris <jmorris@namei.org>
CommitDate: Fri Nov 14 10:39:16 2008 +1100
CRED: Separate task security context from task_struct
I think.
It's way too large to be inlined anyway.
Dunno if this needs an EXPORT_SYMBOL() yet.
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
These issues identified during an old-fashioned face-to-face code
review extending over many hours.
o Add comments for tricky parts of code, and correct comments
that have passed their sell-by date.
o Get rid of the vestiges of rcu_init_sched(), which is no
longer needed now that PREEMPT_RCU is gone.
o Move the #include of rcutree_plugin.h to the end of
rcutree.c, which means that, rather than having a random
collection of forward declarations, the new set of forward
declarations document the set of plugins. The new home for
this #include also allows __rcu_init_preempt() to move into
rcutree_plugin.h.
o Fix rcu_preempt_check_callbacks() to be static.
Suggested-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12537246443924-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra <peterz@infradead.org>
These issues identified during an old-fashioned face-to-face code
review extended over many hours.
o Bury various forms of the "rsp->completed == rsp->gpnum"
comparison into an rcu_gp_in_progress() function, which has
the beneficial side-effect of forcing consistent use of
ACCESS_ONCE().
o Replace hand-coded arithmetic with DIV_ROUND_UP().
o Bury several "!list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x01])"
instances into an rcu_preempted_readers() function, as this
expression indicates that there are no readers blocked
within RCU read-side critical sections blocking the current
grace period. (Though there might well be similar readers
blocking the next grace period.)
o Remove a dangling rcu_restart_cpu() declaration that has
been dangling for almost 20 minor releases of the kernel.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12537246442687-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.infradead.org/battery-2.6:
power_supply: Add driver for the PMU on WM831x PMICs
ds2760_battery: Fix integer overflow for time_to_empty_now
wm97xx_battery: Convert to dev_pm_ops
wm97xx_battery: Use irq to detect charger state
wm97xx_battery: Use platform_data
wm97xx-core: Pass platform_data to battery
ds2760_battery: implement set_charged() feature
power_supply: get_by_name and set_charged functionality
power_supply: EXPORT_SYMBOL cleanups
ds2760_battery: add current_accum module parameter
ds2760_battery: handle full_active_uAh == 0 case correctly
ds2760_battery: add rated_capacity module parameter
ds2760_battery: export more features
ds2760_battery: delay power supply registration
wm8350_power: Implement charge type property
power_supply: Add a charge_type property, and use it for olpc driver
olpc_battery: Add an 'error' sysfs device that displays raw errors
Revert "power: remove POWER_SUPPLY_PROP_CAPACITY_LEVEL"
* git://git.infradead.org/mtd-2.6: (58 commits)
mtd: jedec_probe: add PSD4256G6V id
mtd: OneNand support for Nomadik 8815 SoC (on NHK8815 board)
mtd: nand: driver for Nomadik 8815 SoC (on NHK8815 board)
m25p80: Add Spansion S25FL129P serial flashes
jffs2: Use SLAB_HWCACHE_ALIGN for jffs2_raw_{dirent,inode} slabs
mtd: sh_flctl: register sh_flctl using platform_driver_probe()
mtd: nand: txx9ndfmc: transfer 512 byte at a time if possible
mtd: nand: fix tmio_nand ecc correction
mtd: nand: add __nand_correct_data helper function
mtd: cfi_cmdset_0002: add 0xFF intolerance for M29W128G
mtd: inftl: fix fold chain block number
mtd: jedec: fix compilation problem with I28F640C3B definition
mtd: nand: fix ECC Correction bug for SMC ordering for NDFC driver
mtd: ofpart: Check availability of reg property instead of name property
driver/Makefile: Initialize "mtd" and "spi" before "net"
mtd: omap: adding DMA mode support in nand prefetch/post-write
mtd: omap: add support for nand prefetch-read and post-write
mtd: add nand support for w90p910 (v2)
mtd: maps: add mtd-ram support to physmap_of
mtd: pxa3xx_nand: add single-bit error corrections reporting
...
* git://git.infradead.org/iommu-2.6: (23 commits)
intel-iommu: Disable PMRs after we enable translation, not before
intel-iommu: Kill DMAR_BROKEN_GFX_WA option.
intel-iommu: Fix integer wrap on 32 bit kernels
intel-iommu: Fix integer overflow in dma_pte_{clear_range,free_pagetable}()
intel-iommu: Limit DOMAIN_MAX_PFN to fit in an 'unsigned long'
intel-iommu: Fix kernel hang if interrupt remapping disabled in BIOS
intel-iommu: Disallow interrupt remapping if not all ioapics covered
intel-iommu: include linux/dmi.h to use dmi_ routines
pci/dmar: correct off-by-one error in dmar_fault()
intel-iommu: Cope with yet another BIOS screwup causing crashes
intel-iommu: iommu init error path bug fixes
intel-iommu: Mark functions with __init
USB: Work around BIOS bugs by quiescing USB controllers earlier
ia64: IOMMU passthrough mode shouldn't trigger swiotlb init
intel-iommu: make domain_add_dev_info() call domain_context_mapping()
intel-iommu: Unify hardware and software passthrough support
intel-iommu: Cope with broken HP DC7900 BIOS
iommu=pt is a valid early param
intel-iommu: double kfree()
intel-iommu: Kill pointless intel_unmap_single() function
...
Fixed up trivial include lines conflict in drivers/pci/intel-iommu.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6: (41 commits)
regulator: Add some brief design documentation
regulator: fix voltage range in da9034 ldo12
regulator/driver: be more specific in nanodoc for is_enabled
regulator/lp3971: drop unnecessary initialization
regulator: drop 'default n'
regulator: fix typos
regulator: fix calculation of voltage range in da9034_set_ldo12_voltage()
regulator: update a filename in documentation
drivers/regulator/Kconfig: fix typo (s/Usersapce/Userspace/) in REGULATOR_USERSPACE_CONSUMER description
REGULATOR Handle positive returncode from enable
regulator: tps650xx - build fixes for x86_64
Fix some regulator documentation
Regulator: Adding TPS65023 and TPS6507x in Kconfig and Makefile
Regulator: Add TPS6507x regulator driver
Regulator: Add TPS65023 regulator driver
regulator: userspace: use sysfs_create_group
regulator: Add GPIO enable control to fixed voltage regulator driver
Regulator: Implement list_voltage for pcf50633 regulator driver.
regulator: regulator_enable() permission checking
regulator: Push locking for regulator_is_enabled() out
...
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (119 commits)
ACPI: don't pass handle for fixed hardware notifications
ACPI: remove null pointer checks in deferred execution path
ACPI: simplify deferred execution path
acerhdf: additional BIOS versions
acerhdf: convert to dev_pm_ops
acerhdf: fix fan control for AOA150 model
thermal: add missing Kconfig dependency
acpi: switch /proc/acpi/{debug_layer,debug_level} to seq_file
hp-wmi: fix rfkill memory leak on unload
ACPI: remove unnecessary #ifdef CONFIG_DMI
ACPI: linux/acpi.h should not include linux/dmi.h
hwmon driver for ACPI 4.0 power meters
topstar-laptop: add new driver for hotkeys support on Topstar N01
thinkpad_acpi: fix rfkill memory leak on unload
thinkpad-acpi: report brightness events when required
thinkpad-acpi: don't poll by default any of the reserved hotkeys
thinkpad-acpi: Fix procfs hotkey reset command
thinkpad-acpi: deprecate hotkey_bios_mask
thinkpad-acpi: hotkey poll fixes
thinkpad-acpi: be more strict when detecting a ThinkPad
...
* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
i2c: Clearly mark ACPI drivers as such
i2c: Add driver for SMBus Control Method Interface
i2c-pnx: Correct use of request_region/request_mem_region
MAINTAINERS: Add maintainer for AT24 and PCA9564/PCA9665
i2c-piix4: Add AMD SB900 SMBus device ID
i2c/chips: Remove deprecated pcf8574 driver
i2c/chips: Remove deprecated pca9539 driver
i2c/chips: Remove deprecated pcf8575 driver
gpio/pcf857x: Copy i2c_device_id from old pcf8574 driver
i2c/scx200_acb: Provide more information on bus errors
i2c: Provide compatibility links for i2c adapters
i2c: Convert i2c adapters to bus devices
i2c: Convert i2c clients to a device type
i2c/tsl2550: Use combined SMBus transactions
i2c-taos-evm: Switch echo off to improve performance
i2c: Drop unused i2c_driver.id field
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (142 commits)
USB: Fix sysfs paths in documentation
USB: skeleton: fix coding style issues.
USB: O_NONBLOCK in read path of skeleton
USB: make usb-skeleton honor O_NONBLOCK in write path
USB: skel_read really sucks royally
USB: Add hub descriptor update hook for xHCI
USB: xhci: Support USB hubs.
USB: xhci: Set multi-TT field for LS/FS devices under hubs.
USB: xhci: Set route string for all devices.
USB: xhci: Fix command wait list handling.
USB: xhci: Change how xHCI commands are handled.
USB: xhci: Refactor input device context setup.
USB: xhci: Endpoint representation refactoring.
USB: gadget: ether needs to select CRC32
USB: fix USBTMC get_capabilities success handling
USB: fix missing error check in probing
USB: usbfs: add USBDEVFS_URB_BULK_CONTINUATION flag
USB: support for autosuspend in sierra while online
USB: ehci-dbgp,ehci: Allow dbpg to work with suspend/resume
USB: ehci-dbgp,documentation: Documentation updates for ehci-dbgp
...
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
lguest: don't force VIRTIO_F_NOTIFY_ON_EMPTY
lguest: cleanup for map_switcher()
lguest: use PGDIR_SHIFT for PAE code to allow different PAGE_OFFSET
lguest: use set_pte/set_pmd uniformly for real page table entries
lguest: move panic notifier registration to its expected place.
virtio_blk: add support for cache flush
virtio: add virtio IDs file
virtio: get rid of redundant VIRTIO_ID_9P definition
virtio: make add_buf return capacity remaining
virtio_pci: minor MSI-X cleanups
For this system call user space passes a signed long length parameter,
while the kernel side takes an unsigned long parameter and converts it
later to signed long again.
This has led to bugs in compat wrappers see e.g. dd90bbd5 "powerpc: Add
compat_sys_truncate". The s390 compat wrapper for this functions is
broken as well since it also performs zero extension instead of sign
extension for the length parameter.
In addition if hpa comes up with an automated way of generating
compat wrappers it would generate a wrong one here.
So change the length parameter from unsigned long to long.
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'x86/orig_ax' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland:
x86: ptrace: set TS_COMPAT when 32-bit ptrace sets orig_eax>=0
x86: ptrace: do not sign-extend orig_ax on write
x86: syscall_get_nr returns int
asm-generic: syscall_get_nr returns int
bitfields should be unsigned.
This fixes sparse noise:
error: dubious one-bit signed bitfield
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since the previous version, return values in ioctl() function have been
modified.
[akpm@linux-foundation.org: simplify lcd_disable_raster()]
Signed-off-by: Sudhakar Rajashekhara <sudhakar.raj@ti.com>
Signed-off-by: Pavel Kiryukhin <pkiryukhin@ru.mvista.com>
Signed-off-by: Steve Chen <schen@mvista.com>
Acked-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add LCD controller (LCDC) driver for TI's DA8xx/OMAP-L1xx architecture.
LCDC specifications can be found at http://www.ti.com/litv/pdf/sprufm0a.
LCDC on DA8xx consists of two independent controllers, the Raster
Controller and the LCD Interface Display Driver (LIDD) controller. LIDD
further supports character and graphic displays.
This patch adds support for the graphic display (Sharp LQ035Q3DG01) found
on the DA830 based EVM. The EVM details can be found at:
http://support.spectrumdigital.com/boards/dskda830/revc/.
Signed-off-by: Sudhakar Rajashekhara <sudhakar.raj@ti.com>
Signed-off-by: Pavel Kiryukhin <pkiryukhin@ru.mvista.com>
Signed-off-by: Steve Chen <schen@mvista.com>
Acked-by: Krzysztof Helt <krzysztof.h1@wp.pl>
DESC
davinci-fb-frame-buffer-driver-for-ti-da8xx-omap-l1xx-fix
EDESC
From: Andrew Morton <akpm@linux-foundation.org>
fix kconfig indenting
Cc: Krzysztof Helt <krzysztof.h1@wp.pl>
Cc: Pavel Kiryukhin <pkiryukhin@ru.mvista.com>
Cc: Steve Chen <schen@mvista.com>
Cc: Sudhakar Rajashekhara <sudhakar.raj@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Miao <eric.y.miao@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Brownell <david-b@pacbell.net>
Cc: Samuel Ortiz <sameo@openedhand.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A GPIO driver for the Freescale MC33880 High/Low side switch
Signed-off-by: Richard Röjfors <richard.rojfors.ext@mocean-labs.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 926b663ce8 (gpiolib: allow GPIOs to
be named) already provides naming on the chip level. This patch provides
more flexibility by allowing multiple names where ever in sysfs on a per
GPIO basis.
Adapted from David Brownell's comments on a similar concept:
http://lkml.org/lkml/2009/4/20/203.
[randy.dunlap@oracle.com: fix build for CONFIG_GENERIC_GPIO=n]
Signed-off-by: Jani Nikula <ext-jani.1.nikula@nokia.com>
Acked-by: David Brownell <david-b@pacbell.net>
Cc: Daniel Silverstone <dsilvers@simtec.co.uk>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Support two new half-duplex SPI implementation restrictions, for links
that talk to TX-only or RX-only devices. (Existing half-duplex flavors
support both transfer directions, just not at the same time.)
Move spi_async() into the spi.c core, and stop inlining it. Then make
that function perform error checks and reject messages that demand more
than the underlying controller can support.
Based on a patch from Marek Szyprowski which did this only for the
bitbanged GPIO driver.
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes it consistent with other buses (platform, i2c, vio, ...). I'm
not sure why we use the prefixes, but there must be a reason.
This was easy enough to do it, and I did it.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Samuel Ortiz <sameo@openedhand.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Acked-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With this patch spi drivers can use standard spi_driver.id_table and
MODULE_DEVICE_TABLE() mechanisms to bind against the devices. Just like
we do with I2C drivers.
This is useful when a single driver supports several variants of devices
but it is not possible to detect them in run-time (like non-JEDEC chips
probing in drivers/mtd/devices/m25p80.c), and when platform_data usage is
overkill.
This patch also makes life a lot easier on OpenFirmware platforms, since
with OF we extensively use proper device IDs in modaliases.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add missing kernel-doc notation in spi.h for struct spi_master:
Warning(include/linux/spi/spi.h:289): No description found for parameter 'mode_bits'
Warning(include/linux/spi/spi.h:289): No description found for parameter 'flags'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
initramfs userspace likes to use this magic number.
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: maximilian attems <max@stro.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some archs define MODULED_VADDR/MODULES_END which is not in VMALLOC area.
This is handled only in x86-64. This patch make it more generic. And we
can use vread/vwrite to access the area. Fix it.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Herrenschmidt <benh@kernel.crashing.org> pointed out that vmemmap
range is not included in KCORE_RAM, KCORE_VMALLOC ....
This adds KCORE_VMEMMAP if SPARSEMEM_VMEMMAP is used. By this, vmemmap
can be readable via /proc/kcore
Because it's not vmalloc area, vread/vwrite cannot be used. But the range
is static against the memory layout, this patch handles vmemmap area by
the same scheme with physical memory.
This patch assumes SPARSEMEM_VMEMMAP range is not in VMALLOC range. It's
correct now.
[akpm@linux-foundation.org: fix typo]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Originally, walk_memory_resource() was introduced to traverse all memory
of "System RAM" for detecting memory hotplug/unplug range. For doing so,
flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for
memory hotplug.
But for using other purpose, /proc/kcore, this may includes some firmware
area marked as IORESOURCE_BUSY | IORESOUCE_MEM. This patch makes the
check strict to find out busy "System RAM".
Note: PPC64 keeps their own walk_memory_resouce(), which walk through
ppc64's lmb informaton. Because old kclist_add() is called per lmb, this
patch makes no difference in behavior, finally.
And this patch removes CONFIG_MEMORY_HOTPLUG check from this function.
Because pfn_valid() just show "there is memmap or not* and cannot be used
for "there is physical memory or not", this function is useful in generic
to scan physical memory range.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Américo Wang <xiyou.wangcong@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Presently, kclist_add() only eats start address and size as its arguments.
Considering to make kclist dynamically reconfigulable, it's necessary to
know which kclists are for System RAM and which are not.
This patch add kclist types as
KCORE_RAM
KCORE_VMALLOC
KCORE_TEXT
KCORE_OTHER
This "type" is used in a patch following this for detecting KCORE_RAM.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset is for /proc/kcore. With this,
- many per-arch hooks are removed.
- /proc/kcore will know really valid physical memory area.
- /proc/kcore will be aware of memory hotplug.
- /proc/kcore will be architecture independent i.e.
if an arch supports CONFIG_MMU, it can use /proc/kcore.
(if the arch uses usual memory layout.)
This patch:
/proc/kcore uses its own list handling codes. It's better to use
generic list codes.
No changes in logic. just clean up.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A patch to give a better overview of the userland application stack usage,
especially for embedded linux.
Currently you are only able to dump the main process/thread stack usage
which is showed in /proc/pid/status by the "VmStk" Value. But you get no
information about the consumed stack memory of the the threads.
There is an enhancement in the /proc/<pid>/{task/*,}/*maps and which marks
the vm mapping where the thread stack pointer reside with "[thread stack
xxxxxxxx]". xxxxxxxx is the maximum size of stack. This is a value
information, because libpthread doesn't set the start of the stack to the
top of the mapped area, depending of the pthread usage.
A sample output of /proc/<pid>/task/<tid>/maps looks like:
08048000-08049000 r-xp 00000000 03:00 8312 /opt/z
08049000-0804a000 rw-p 00001000 03:00 8312 /opt/z
0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
a7d12000-a7d13000 ---p 00000000 00:00 0
a7d13000-a7f13000 rw-p 00000000 00:00 0 [thread stack: 001ff4b4]
a7f13000-a7f14000 ---p 00000000 00:00 0
a7f14000-a7f36000 rw-p 00000000 00:00 0
a7f36000-a8069000 r-xp 00000000 03:00 4222 /lib/libc.so.6
a8069000-a806b000 r--p 00133000 03:00 4222 /lib/libc.so.6
a806b000-a806c000 rw-p 00135000 03:00 4222 /lib/libc.so.6
a806c000-a806f000 rw-p 00000000 00:00 0
a806f000-a8083000 r-xp 00000000 03:00 14462 /lib/libpthread.so.0
a8083000-a8084000 r--p 00013000 03:00 14462 /lib/libpthread.so.0
a8084000-a8085000 rw-p 00014000 03:00 14462 /lib/libpthread.so.0
a8085000-a8088000 rw-p 00000000 00:00 0
a8088000-a80a4000 r-xp 00000000 03:00 8317 /lib/ld-linux.so.2
a80a4000-a80a5000 r--p 0001b000 03:00 8317 /lib/ld-linux.so.2
a80a5000-a80a6000 rw-p 0001c000 03:00 8317 /lib/ld-linux.so.2
afaf5000-afb0a000 rw-p 00000000 00:00 0 [stack]
ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso]
Also there is a new entry "stack usage" in /proc/<pid>/{task/*,}/status
which will you give the current stack usage in kb.
A sample output of /proc/self/status looks like:
Name: cat
State: R (running)
Tgid: 507
Pid: 507
.
.
.
CapBnd: fffffffffffffeff
voluntary_ctxt_switches: 0
nonvoluntary_ctxt_switches: 0
Stack usage: 12 kB
I also fixed stack base address in /proc/<pid>/{task/*,}/stat to the base
address of the associated thread stack and not the one of the main
process. This makes more sense.
[akpm@linux-foundation.org: fs/proc/array.c now needs walk_page_range()]
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Especially with the PM framework, those are quite handy to have in driver
code too.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Normally writes to SDIO function 0 outside the vendor specific CCCR
registers are prohibited.
To support embedded devices that require writes to SDIO function 0 outside
this range (e.g. TI WL127x embedded sdio wifi device),
MMC_QUIRK_LENIENT_FN0 is introduced.
A card quirks field is added to `struct mmc_card' to support non-standard
devices (e.g. embedded sdio devices).
[akpm@linux-foundation.org: code in C, not cpp!]
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add support to disconnect the pull-up resistor on CD/DAT[3] (pin 1)
of the card. This may be desired on certain setups of boards,
controllers and embedded sdio devices which do not need the card's
pull-up. As a result, card detection is disabled and power is saved.
[akpm@linux-foundation.org: simplify sdio_disable_cd() a bit]
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Acked-by: Matt Fleming <matt@console-pimps.org>
Cc: Ian Molton <ian@mnementh.co.uk>
Cc: "Roberto A. Foglietta" <roberto.foglietta@gmail.com>
Cc: Philip Langdale <philipl@overt.org>
Cc: Pierre Ossman <pierre@ossman.eu>
Cc: David Vrabel <david.vrabel@csr.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
According to the standard, the SWITCH command should be followed by a
SEND_STATUS command to check for errors.
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Acked-by: Matt Fleming <matt@console-pimps.org>
Cc: Ian Molton <ian@mnementh.co.uk>
Cc: "Roberto A. Foglietta" <roberto.foglietta@gmail.com>
Cc: Jarkko Lavinen <jarkko.lavinen@nokia.com>
Cc: Denis Karpov <ext-denis.2.karpov@nokia.com>
Cc: Pierre Ossman <pierre@ossman.eu>
Cc: Philip Langdale <philipl@overt.org>
Cc: "Madhusudhan" <madhu.cr@ti.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add support for the new MMC command SLEEP_AWAKE.
Signed-off-by: Jarkko Lavinen <jarkko.lavinen@nokia.com>
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Acked-by: Matt Fleming <matt@console-pimps.org>
Cc: Ian Molton <ian@mnementh.co.uk>
Cc: "Roberto A. Foglietta" <roberto.foglietta@gmail.com>
Cc: Jarkko Lavinen <jarkko.lavinen@nokia.com>
Cc: Denis Karpov <ext-denis.2.karpov@nokia.com>
Cc: Pierre Ossman <pierre@ossman.eu>
Cc: Philip Langdale <philipl@overt.org>
Cc: "Madhusudhan" <madhu.cr@ti.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Power can be saved by powering off cards that are not in use. This is
similar to suspend / resume except it is under the control of the driver,
and does not require any power management support. It can only be used
when the driver can monitor whether the card is removed, otherwise it is
unsafe. This is possible because, unlike suspend, the driver still
receives card detect and / or cover switch interrupts.
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Acked-by: Matt Fleming <matt@console-pimps.org>
Cc: Ian Molton <ian@mnementh.co.uk>
Cc: "Roberto A. Foglietta" <roberto.foglietta@gmail.com>
Cc: Jarkko Lavinen <jarkko.lavinen@nokia.com>
Cc: Denis Karpov <ext-denis.2.karpov@nokia.com>
Cc: Pierre Ossman <pierre@ossman.eu>
Cc: Philip Langdale <philipl@overt.org>
Cc: "Madhusudhan" <madhu.cr@ti.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
eMMC's are not removable, so unsafe resume is OK always.
To permit this a new host capability MMC_CAP_NONREMOVABLE has been added
and suspend / resume updated accordingly.
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Acked-by: Matt Fleming <matt@console-pimps.org>
Cc: Ian Molton <ian@mnementh.co.uk>
Cc: "Roberto A. Foglietta" <roberto.foglietta@gmail.com>
Cc: Jarkko Lavinen <jarkko.lavinen@nokia.com>
Cc: Denis Karpov <ext-denis.2.karpov@nokia.com>
Cc: Pierre Ossman <pierre@ossman.eu>
Cc: Philip Langdale <philipl@overt.org>
Cc: "Madhusudhan" <madhu.cr@ti.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This change allows the MMC host to be claimed in situations where the host
may or may not have already been claimed. Also 'mmc_try_claim_host()' is
now exported.
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Acked-by: Matt Fleming <matt@console-pimps.org>
Cc: Ian Molton <ian@mnementh.co.uk>
Cc: "Roberto A. Foglietta" <roberto.foglietta@gmail.com>
Cc: Jarkko Lavinen <jarkko.lavinen@nokia.com>
Cc: Denis Karpov <ext-denis.2.karpov@nokia.com>
Cc: Pierre Ossman <pierre@ossman.eu>
Cc: Philip Langdale <philipl@overt.org>
Cc: "Madhusudhan" <madhu.cr@ti.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
MMC hosts that support power saving can use the 'enable' and 'disable'
methods to exit and enter power saving states. An explanation of their
use is provided in the comments added to include/linux/mmc/host.h.
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Acked-by: Matt Fleming <matt@console-pimps.org>
Cc: Ian Molton <ian@mnementh.co.uk>
Cc: "Roberto A. Foglietta" <roberto.foglietta@gmail.com>
Cc: Jarkko Lavinen <jarkko.lavinen@nokia.com>
Cc: Denis Karpov <ext-denis.2.karpov@nokia.com>
Cc: Pierre Ossman <pierre@ossman.eu>
Cc: Philip Langdale <philipl@overt.org>
Cc: "Madhusudhan" <madhu.cr@ti.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some ports (like the Blackfin arch) have a discontiguous memory map which
means there may be text or data that falls outside of the standard range
of the start/end text/data symbols. Creating some helper functions allows
these non-standard ports to declare these regions without adversely
affecting anyone else.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I tend to use a 'D' debugging macro a lot during debugging. When I define
it before includes I often get conflicts with kmap_types.h's use of 'D'
too. It's not very nice when a global include pollutes the name space
like this.
Rename the kmap_types.h D to KMAP_D. It is only used temporarily in the
header so has no effect on anything else.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
abs() will truncate the input if is it outside the 2^32 range. Fix that
by assuming `long' input.
This might generate worse code in the common case.
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Split the anonfd interface into a bare file pointer creation one, and a
file pointer creation plus install one.
There are cases, like the usage of eventfds inside other kernel
interfaces, where the file pointer created by anonfd needs to be used
inside the initialization of other structures.
As it is right now, as soon as anon_inode_getfd() returns, the kenrle can
race with userspace closing the newly installed file descriptor.
This patch, while keeping the old anon_inode_getfd(), introduces a new
anon_inode_getfile() (whose services are reused in anon_inode_getfd())
that allows to split the file creation phase and the fd install one.
Once all the kernel structures are initialized, the code can call the
proper fd_install().
Gregory manifested the need for something like this inside KVM.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: James Morris <jmorris@namei.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Gregory Haskins <ghaskins@novell.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gcc permitting variable length arrays makes the current construct used for
BUILD_BUG_ON() useless, as that doesn't produce any diagnostic if the
controlling expression isn't really constant. Instead, this patch makes
it so that a bit field gets used here. Consequently, those uses where the
condition isn't really constant now also need fixing.
Note that in the gfp.h, kmemcheck.h, and virtio_config.h cases
MAYBE_BUILD_BUG_ON() really just serves documentation purposes - even if
the expression is compile time constant (__builtin_constant_p() yields
true), the array is still deemed of variable length by gcc, and hence the
whole expression doesn't have the intended effect.
[akpm@linux-foundation.org: make arch/sparc/include/asm/vio.h compile]
[akpm@linux-foundation.org: more nonsensical assertions in tpm.c..]
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Rajiv Andrade <srajiv@linux.vnet.ibm.com>
Cc: Mimi Zohar <zohar@us.ibm.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Using the type bool (instead of int) for the __print_once flag in the
printk_once() macro matches the intent of the code better, and allows the
compiler to generate smaller code; eg a typical callsite with gcc 4.3.3 on
i386:
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-6 (-6)
function old new delta
static.__print_once 4 1 -3
get_cpu_vendor 146 143 -3
Saving 6 bytes of object size per callsite by slightly improving the
readability of the source seems like a win to me.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The act of a process becoming a session leader is a useful signal to a
supervising init daemon such as Upstart.
While a daemon will normally do this as part of the process of becoming a
daemon, it is rare for its children to do so. When the children do, it is
nearly always a sign that the child should be considered detached from the
parent and not supervised along with it.
The poster-child example is OpenSSH; the per-login children call setsid()
so that they may control the pty connected to them. If the primary daemon
dies or is restarted, we do not want to consider the per-login children
and want to respawn the primary daemon without killing the children.
This patch adds a new PROC_SID_EVENT and associated structure to the
proc_event event_data union, it arranges for this to be emitted when the
special PIDTYPE_SID pid is set.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Scott James Remnant <scott@ubuntu.com>
Acked-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Acked-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make all seq_operations structs const, to help mitigate against
revectoring user-triggerable function pointers.
This is derived from the grsecurity patch, although generated from scratch
because it's simpler than extracting the changes from there.
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch can remove spinlock from struct call_function_data, the
reasons are below:
1: add a new interface for cpumask named cpumask_test_and_clear_cpu(),
it can atomically test and clear specific cpu, we can use it instead
of cpumask_test_cpu() and cpumask_clear_cpu() and no need data->lock
to protect those in generic_smp_call_function_interrupt().
2: in smp_call_function_many(), after csd_lock() return, the current's
cfd_data is deleted from call_function list, so it not have race
between other cpus, then cfs_data is only used in
smp_call_function_many() that must disable preemption and not from
a hardware interrupthandler or from a bottom half handler to call,
only the correspond cpu can use it, so it not have race in current
cpu, no need cfs_data->lock to protect it.
3: after 1 and 2, cfs_data->lock is only use to protect cfs_data->refs in
generic_smp_call_function_interrupt(), so we can define cfs_data->refs
to atomic_t, and no need cfs_data->lock any more.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
[akpm@linux-foundation.org: use atomic_dec_return()]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move various magic-number definitions into magic.h.
Signed-off-by: Nick Black <dank@qemfd.net>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When syslog is not possible, at the same time there's no serial/net
console available, it will be hard to read the printk messages. For
example oops/panic/warning messages in shutdown phase.
Add a printk delay feature, we can make each printk message delay some
milliseconds.
Setting the delay by proc/sysctl interface: /proc/sys/kernel/printk_delay
The value range from 0 - 10000, default value is 0
[akpm@linux-foundation.org: fix a few things]
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
of the form
include/net/inet_sock.h:208: warning: ISO C90 forbids mixed declarations and code
Cc: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch (as1283) adds a new flag, USBDEVFS_URB_BULK_CONTINUATION,
to usbfs. It is intended for userspace libraries such as libusb and
openusb. When they have to break up a single usbfs bulk transfer into
multiple URBs, they will set the flag on all but the first URB of the
series.
If an error other than an unlink occurs, the kernel will automatically
cancel all the following URBs for the same endpoint and refuse to
accept new submissions, until an URB is encountered that is not marked
as a BULK_CONTINUATION. Such an URB would indicate the start of a new
transfer or the presence of an older library, so the kernel returns to
normal operation.
This enables libraries to delimit bulk transfers correctly, even in
the presence of early termination as indicated by short packets.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On some EHCI usb debug controllers, the EHCI debug device will fail to
be seen after a port reset, after a warm reset. Two options exist to
get the device to initialize correctly.
Option 1 is to unplug and plug in the device.
Option 2 is to use the EHCI port test to get the usb debug device to
start talking again. At that point the debug controller port reset
will succeed.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
CC: dbrownell@users.sourceforge.net
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If the EHCI debug port is initialized and in use, the EHCI host
controller driver must follow two rules.
1) If the EHCI host driver issues a controller reset, the debug
controller driver re-initialization must get called after the reset
is completed.
2) The EHCI host driver should ignore any requests to the physical
EHCI debug port when the EHCI debug port is in use.
The code to check for the debug port was moved from ehci_pci_reinit()
to ehci_pci_setup because it must get called prior to ehci_reset()
which will clear the debug port registers.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: dbrownell@users.sourceforge.net
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch implements several changes:
1) Improve the capability to debug the dbgp driver
The dbgp_ehci_status() was added in a number of places to report
the critical ehci registers to diagnose the cause of a failure of
the ehci-dbgp driver.
2) Capability to survive the host controller initialization
The dbgp_external_startup(), dbgp_not_safe, and dbgp_phys_port were
added so as to allow the ehci-dbgp to re-initialize after the ehci
host controller is reset by the standard host controller driver.
This same routine is common for the early startup or
re-initialization.
This resulted in the need to move some of the initialization code
out of the __init section because the ehci driver has the
possibility to be loaded later on as a kernel module.
3) Stability improvements for device initialization
The device enumeration from 0 to 127 has the possibility to fail
the first time after a warm reset on some older EHCI debug
controllers. The enumeration will be tried up to 3 times to
account for this failure case.
The dbg_wait_until_complete() was changed to wait up to 250 ms
before failing which only comes into play during device
initialization. The maximum delay will never get hit during the
course of normal operation of the driver, unless the device got
unplugged or there was a ehci controller failure, in which case the
dbgp device driver will shut itself down.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: dbrownell@users.sourceforge.net
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Move the dbgp early printk driver in advance of refactoring and adding
new code, so the changes to this code are tracked separately from the
move of the code.
The drivers/usb/early directory will be the location of the current
and future early usb code for driving usb devices prior initializing
the standard interrupt driven USB drivers.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When do_output_char() attempts to write a carriage return/line feed sequence,
it first checks to see how much buffer room is available. If there are at least
two characters free, it will write the carriage return/line feed with two calls
to tty_put_char(). It calls the tty_operation functions write() for devices that
don't support the tty_operations function put_char(). If the USB generic serial
device's write URB is not in use, it will return the buffer size when asked how
much room is available. The write() of the carriage return will cause it to mark
the write URB busy, so the subsequent write() of the line feed will be ignored.
This patch uses the kfifo infrastructure to implement a write FIFO that
accurately returns the amount of space available in the buffer.
Signed-off-by: David VomLehn <dvomlehn@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
include/linux/usb/audio.h is exported to userspace,
so part of this file that is for internal kernel
usage need to be guarded with ifdef __KERNEL__.
This way make headers_install will stript it out.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Platform device support was merged earlier, but support for boards to
customize the devflags aspect of the controller was not. We want this on
Blackfin systems to control the bus width, but might as well expose all of
the fields while we're at it.
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The Intel Moorestown EHCI controller supports non-standard HOSTPCx register
extension. This register controls the LPM behaviour and controls the behaviour
of each USB port.
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: Alek Du <alek.du@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1260) changes the pm_usage_cnt field in struct
usb_interface from an int to an atomic_t. This is so that drivers can
invoke the usb_autopm_get_interface_async() and
usb_autopm_put_interface_async() routines without locking and without
fear of corrupting the pm_usage_cnt value.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1258) implements a feature that users have been asking
for: It gives programs the ability to "claim" a port on a hub, via a
new usbfs ioctl. A device plugged into a "claimed" port will not be
touched by the kernel beyond the immediate necessities of
initialization and enumeration.
In particular, when a device is plugged into a "claimed" port, the
kernel will not select and install a configuration. And when a config
is installed by usbfs or sysfs, the kernel will not probe any drivers
for any of the interfaces. (However the kernel will fetch various
string descriptors during enumeration. One could argue that this
isn't really necessary, but the strings are exported in sysfs.)
The patch does not guarantee exclusive access to these devices; it is
still possible for more than one program to open the device file
concurrently. Programs are responsible for coordinating access among
themselves.
A demonstration program showing how to use the new interface can be
found in an attachment to
http://marc.info/?l=linux-usb&m=124345857431452&w=2
The patch also makes a small simplification to the hub driver,
replacing a bunch of more-or-less useless variants of "out of memory"
with a single message.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Those functions are used only used to fill the set/get members of
usb_audio_control. It doesn't make much sense to inline them.
Signed-off-by: Laurent Pinchart <laurent.pinchart@skynet.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
linux/usb/audio.h is a public header file that includes definitions
exported to userspace. To avoid namespace clashes, prefix all macro
definitions with UAC_. Existing macros and structures prefixed with
USB_AC_ and USB_AS_ are renamed for consistency.
Signed-off-by: Laurent Pinchart <laurent.pinchart@skynet.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
USB_SUBCLASS_VENDOR_SPEC is common to several USB classes and as such belongs
to usb/ch9.h.
Signed-off-by: Laurent Pinchart <laurent.pinchart@skynet.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
And use the new definitions in the USB Audio Class gadget driver.
Signed-off-by: Laurent Pinchart <laurent.pinchart@skynet.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes crashes when usbmon attempts to access GART aperture.
The old code attempted to take a bus address and convert it into a
virtual address, which clearly was impossible on systems with actual
IOMMUs. Let us not persist in this foolishness, and use transfer_buffer
in all cases instead.
I think downsides are negligible. The ones I see are:
- A driver may pass an address of one buffer down as transfer_buffer,
and entirely different entity mapped for DMA, resulting in misleading
output of usbmon. Note, however, that PIO based controllers would
do transfer the same data that usbmon sees here.
- Out of tree drivers may crash usbmon if they store garbage in
transfer_buffer. I inspected the in-tree drivers, and clarified
the documentation in comments.
- Drivers that use get_user_pages will not be possible to monitor.
I only found one driver with this problem (drivers/staging/rspiusb).
- Same happens with with usb_storage transferring from highmem, but
it works fine on 64-bit systems, so I think it's not a concern.
At least we don't crash anymore.
Why didn't we do this in 2.6.10? That's because back in those days
it was popular not to fill in transfer_buffer, so almost all
traffic would be invisible (e.g. all of HID was like that).
But now, the tree is almost 100% PIO friendly, so we can do the
right thing at last.
Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Recent qemu has added a VIRTIO_BLK_F_FLUSH flag to advertise that the
virtual disk has a volatile write cache that needs to be flushed. In case
we see this feature implement tell the Linux block layer about the fact
and use the new VIRTIO_BLK_T_FLUSH to flush the cache when required. This
allows for an correct and simple implementation of write barriers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Virtio IDs are spread all over the tree which makes assigning new IDs
bothersome. Putting them together should make the process less error-prone.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This API change means that virtio_net can tell how much capacity
remains for buffers. It's necessarily fuzzy, since
VIRTIO_RING_F_INDIRECT_DESC means we can fit any number of descriptors
in one, *if* we can kmalloc.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dinesh Subhraveti <dineshs@us.ibm.com>
Only 32 bits of system call number are meaningful, so make the
specification for syscall_get_nr() be to return int, not long.
Signed-off-by: Roland McGrath <roland@redhat.com>
In order to correctly prevent the invalid reuse of a purged buffer, we
need to track such events and warn the user before something bad
happens.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Usbnet framework assumes USB hardware doesn't handle zero length
packets, but SMSC LAN95xx requires these to be sent for correct
operation.
This patch fixes an easily reproducible tx lockup when sending a frame
that results in exactly 512 bytes in a USB transmission (e.g. a UDP
frame with 458 data bytes, due to IP headers and our USB headers). It
adds an extra flag to usbnet for the hardware driver to indicate that
it can handle and requires the zero length packets.
This patch should not affect other usbnet users, please also consider
for -stable.
Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using %pf instead of %pF supresses printing of the function offset
which will always be 0 in the case of worklet functions.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Zhaolei <zhaolei@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20090922024033.GB31801@kryten>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Fix the tracepoint documentation path in tracepoints headers and
a misaligned tabulation.
Signed-off-by: Li Hong <lihong.hi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
LKML-Reference: <3a3680030909220300h7cf18849q4d4702b9d4feaa67@mail.gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
This moves the mmci platform data definition struct away from
arch/arm/include/asm/mach/mmc.h into the more proper place among
the other primecells in include/linux/amba/mmci.h and at the same
time renames it to "mmci.h", and also the struct in this file
confusingly named mmc_platform_data has been renamed
mmci_platform_data for clarity.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* 'for-2.6.32' of git://linux-nfs.org/~bfields/linux: (68 commits)
nfsd4: nfsv4 clients should cross mountpoints
nfsd: revise 4.1 status documentation
sunrpc/cache: avoid variable over-loading in cache_defer_req
sunrpc/cache: use list_del_init for the list_head entries in cache_deferred_req
nfsd: return success for non-NFS4 nfs4_state_start
nfsd41: Refactor create_client()
nfsd41: modify nfsd4.1 backchannel to use new xprt class
nfsd41: Backchannel: Implement cb_recall over NFSv4.1
nfsd41: Backchannel: cb_sequence callback
nfsd41: Backchannel: Setup sequence information
nfsd41: Backchannel: Server backchannel RPC wait queue
nfsd41: Backchannel: Add sequence arguments to callback RPC arguments
nfsd41: Backchannel: callback infrastructure
nfsd4: use common rpc_cred for all callbacks
nfsd4: allow nfs4 state startup to fail
SUNRPC: Defer the auth_gss upcall when the RPC call is asynchronous
nfsd4: fix null dereference creating nfsv4 callback client
nfsd4: fix whitespace in NFSPROC4_CLNT_CB_NULL definition
nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel
sunrpc/cache: simplify cache_fresh_locked and cache_fresh_unlocked.
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
trivial: fix typo in aic7xxx comment
trivial: fix comment typo in drivers/ata/pata_hpt37x.c
trivial: typo in kernel-parameters.txt
trivial: fix typo in tracing documentation
trivial: add __init/__exit macros in drivers/gpio/bt8xxgpio.c
trivial: add __init macro/ fix of __exit macro location in ipmi_poweroff.c
trivial: remove unnecessary semicolons
trivial: Fix duplicated word "options" in comment
trivial: kbuild: remove extraneous blank line after declaration of usage()
trivial: improve help text for mm debug config options
trivial: doc: hpfall: accept disk device to unload as argument
trivial: doc: hpfall: reduce risk that hpfall can do harm
trivial: SubmittingPatches: Fix reference to renumbered step
trivial: fix typos "man[ae]g?ment" -> "management"
trivial: media/video/cx88: add __init/__exit macros to cx88 drivers
trivial: fix typo in CONFIG_DEBUG_FS in gcov doc
trivial: fix missing printk space in amd_k7_smp_check
trivial: fix typo s/ketymap/keymap/ in comment
trivial: fix typo "to to" in multiple files
trivial: fix typos in comments s/DGBU/DBGU/
...
Decouple kernel.h from ratelimit.h: the global declaration of
printk's ratelimit_state is not needed, and it leads to messy
circular dependencies due to ratelimit.h's (new) adding of a
spinlock_types.h include.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David S. Miller <davem@davemloft.net>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The shutdown method is used by the winbond cir driver to setup the
hardware for wake-from-S5.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: David Härdeman <david@hardeman.nu>
Cc: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This offers a way for platforms to define flags and thresholds for the
free-fall/wakeup functions of the lis302d chips.
More registers needed to be seperated as they are specific to the
Signed-off-by: Daniel Mack <daniel@caiaq.de>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bit 0x80 in CTRL_REG3 is an ACTIVE_LOW rather than an ACTIVE_HIGH
function, I got that wrong during my last change.
Signed-off-by: Daniel Mack <daniel@caiaq.de>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FLEX_ARRAY_INIT(element_size, total_nr_elements) cannot determine if
either parameter is valid, so flex arrays which are statically allocated
with this interface can easily become corrupted or reference beyond its
allocated memory.
This removes FLEX_ARRAY_INIT() as a struct flex_array initializer since no
initializer may perform the required checking. Instead, the array is now
defined with a new interface:
DEFINE_FLEX_ARRAY(name, element_size, total_nr_elements)
This may be prefixed with `static' for file scope.
This interface includes compile-time checking of the parameters to ensure
they are valid. Since the validity of both element_size and
total_nr_elements depend on FLEX_ARRAY_BASE_SIZE and FLEX_ARRAY_PART_SIZE,
the kernel build will fail if either of these predefined values changes
such that the array parameters are no longer valid.
Since BUILD_BUG_ON() requires compile time constants, several of the
static inline functions that were once local to lib/flex_array.c had to be
moved to include/linux/flex_array.h.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new function to the flex_array API:
int flex_array_shrink(struct flex_array *fa)
This function will free all unused second-level pages. Since elements are
now poisoned if they are not allocated with __GFP_ZERO, it's possible to
identify parts that consist solely of unused elements.
flex_array_shrink() returns the number of pages freed.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Newly initialized flex_array's and/or flex_array_part's are now poisoned
with a new poison value, FLEX_ARRAY_FREE. It's value is similar to
POISON_FREE used in the various slab allocators, but is different to
distinguish between flex array's poisoned kmem and slab allocator poisoned
kmem.
This will allow us to identify flex_array_part's that only contain free
elements (and free them with an addition to the flex_array API). This
could also be extended in the future to identify `get' uses on elements
that have not been `put'.
If __GFP_ZERO is passed for a part's gfp mask, the poisoning is avoided.
These elements are considered to be in-use since they have been
initialized.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new function to the flex_array API:
int flex_array_clear(struct flex_array *fa,
unsigned int element_nr)
This function will zero the element at element_nr in the flex_array.
Although this is equivalent to using flex_array_put() and passing a
pointer to zero'd memory, flex_array_clear() does not require such a
pointer to memory that would most likely need to be allocated on the
caller's stack which could be significantly large depending on
element_size.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the menu idle governor which balances power savings, energy efficiency
and performance impact.
The reason for a reworked governor is that there have been serious
performance issues reported with the existing code on Nehalem server
systems.
To show this I'm sure Andrew wants to see benchmark results:
(benchmark is "fio", "no cstates" is using "idle=poll")
no cstates current linux new algorithm
1 disk 107 Mb/s 85 Mb/s 105 Mb/s
2 disks 215 Mb/s 123 Mb/s 209 Mb/s
12 disks 590 Mb/s 320 Mb/s 585 Mb/s
In various power benchmark measurements, no degredation was found by our
measurement&diagnostics team. Obviously a small percentage more power was
used in the "fio" benchmark, due to the much higher performance.
While it would be a novel idea to describe the new algorithm in this
commit message, I cheaped out and described it in comments in the code
instead.
[changes since first post: spelling fixes from akpm, review feedback,
folded menu-tng into menu.c]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anyone who wants to do copy to/from user from a kernel thread, needs
use_mm (like what fs/aio has). Move that into mm/, to make reusing and
exporting easier down the line, and make aio use it. Next intended user,
besides aio, will be vhost-net.
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a flag for mmap that will be used to request a huge page region that
will look like anonymous memory to userspace. This is accomplished by
using a file on the internal vfsmount. MAP_HUGETLB is a modifier of
MAP_ANONYMOUS and so must be specified with it. The region will behave
the same as a MAP_ANONYMOUS region using small pages.
[akpm@linux-foundation.org: fix arch definitions of MAP_HUGETLB]
Signed-off-by: Eric B Munson <ebmunson@us.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a flag for mmap that will be used to request a huge page region that
will look like anonymous memory to user space. This is accomplished by
using a file on the internal vfsmount. MAP_HUGETLB is a modifier of
MAP_ANONYMOUS and so must be specified with it. The region will behave
the same as a MAP_ANONYMOUS region using small pages.
The patch also adds the MAP_STACK flag, which was previously defined only
on some architectures but not on others. Since MAP_STACK is meant to be a
hint only, architectures can define it without assigning a specific
meaning to it.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: David Rientjes <rientjes@google.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset adds a flag to mmap that allows the user to request that an
anonymous mapping be backed with huge pages. This mapping will borrow
functionality from the huge page shm code to create a file on the kernel
internal mount and use it to approximate an anonymous mapping. The
MAP_HUGETLB flag is a modifier to MAP_ANONYMOUS and will not work without
both flags being preset.
A new flag is necessary because there is no other way to hook into huge
pages without creating a file on a hugetlbfs mount which wouldn't be
MAP_ANONYMOUS.
To userspace, this mapping will behave just like an anonymous mapping
because the file is not accessible outside of the kernel.
This patchset is meant to simplify the programming model. Presently there
is a large chunk of boiler platecode, contained in libhugetlbfs, required
to create private, hugepage backed mappings. This patch set would allow
use of hugepages without linking to libhugetlbfs or having hugetblfs
mounted.
Unification of the VM code would provide these same benefits, but it has
been resisted each time that it has been suggested for several reasons: it
would break PAGE_SIZE assumptions across the kernel, it makes page-table
abstractions really expensive, and it does not provide any benefit on
architectures that do not support huge pages, incurring fast path
penalties without providing any benefit on these architectures.
This patch:
There are two means of creating mappings backed by huge pages:
1. mmap() a file created on hugetlbfs
2. Use shm which creates a file on an internal mount which essentially
maps it MAP_SHARED
The internal mount is only used for shared mappings but there is very
little that stops it being used for private mappings. This patch extends
hugetlbfs_file_setup() to deal with the creation of files that will be
mapped MAP_PRIVATE on the internal hugetlbfs mount. This extended API is
used in a subsequent patch to implement the MAP_HUGETLB mmap() flag.
Signed-off-by: Eric Munson <ebmunson@us.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CONFIG_SHMEM off gives you (ramfs masquerading as) tmpfs, even when
CONFIG_TMPFS is off: that's a little anomalous, and I'd intended to make
more sense of it by removing CONFIG_TMPFS altogether, always enabling its
code when CONFIG_SHMEM; but so many defconfigs have CONFIG_SHMEM on
CONFIG_TMPFS off that we'd better leave that as is.
But there is no point in asking for CONFIG_TMPFS if CONFIG_SHMEM is off:
make TMPFS depend on SHMEM, which also prevents TMPFS_POSIX_ACL
shmem_acl.o being pointlessly built into the kernel when SHMEM is off.
And a selfish change, to prevent the world from being rebuilt when I
switch between CONFIG_SHMEM on and off: the only CONFIG_SHMEM in the
header files is mm.h shmem_lock() - give that a shmem.c stub instead.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__get_user_pages() has been taking its own GUP flags, then processing
them into FOLL flags for follow_page(). Though oddly named, the FOLL
flags are more widely used, so pass them to __get_user_pages() now.
Sorry, VM flags, VM_FAULT flags and FAULT_FLAGs are still distinct.
(The patch to __get_user_pages() looks peculiar, with both gup_flags
and foll_flags: the gup_flags remain constant; but as before there's
an exceptional case, out of scope of the patch, in which foll_flags
per page have FOLL_WRITE masked off.)
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
follow_hugetlb_page() shouldn't be guessing about the coredump case
either: pass the foll_flags down to it, instead of just the write bit.
Remove that obscure huge_zeropage_ok() test. The decision is easy,
though unlike the non-huge case - here vm_ops->fault is always set.
But we know that a fault would serve up zeroes, unless there's
already a hugetlbfs pagecache page to back the range.
(Alternatively, since hugetlb pages aren't swapped out under pressure,
you could save more dump space by arguing that a page not yet faulted
into this process cannot be relevant to the dump; but that would be
more surprising.)
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The "FOLL_ANON optimization" and its use_zero_page() test have caused
confusion and bugs: why does it test VM_SHARED? for the very good but
unsatisfying reason that VMware crashed without. As we look to maybe
reinstating anonymous use of the ZERO_PAGE, we need to sort this out.
Easily done: it's silly for __get_user_pages() and follow_page() to
be guessing whether it's safe to assume that they're being used for
a coredump (which can take a shortcut snapshot where other uses must
handle a fault) - just tell them with GUP_FLAGS_DUMP and FOLL_DUMP.
get_dump_page() doesn't even want a ZERO_PAGE: an error suits fine.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In preparation for the next patch, add a simple get_dump_page(addr)
interface for the CONFIG_ELF_CORE dumpers to use, instead of calling
get_user_pages() directly. They're not interested in errors: they
just want to use holes as much as possible, to save space and make
sure that the data is aligned where the headers said it would be.
Oh, and don't use that horrid DUMP_SEEK(off) macro!
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following two patches remove searching in the page allocator fast-path
by maintaining multiple free-lists in the per-cpu structure. At the time
the search was introduced, increasing the per-cpu structures would waste a
lot of memory as per-cpu structures were statically allocated at
compile-time. This is no longer the case.
The patches are as follows. They are based on mmotm-2009-08-27.
Patch 1 adds multiple lists to struct per_cpu_pages, one per
migratetype that can be stored on the PCP lists.
Patch 2 notes that the pcpu drain path check empty lists multiple times. The
patch reduces the number of checks by maintaining a count of free
lists encountered. Lists containing pages will then free multiple
pages in batch
The patches were tested with kernbench, netperf udp/tcp, hackbench and
sysbench. The netperf tests were not bound to any CPU in particular and
were run such that the results should be 99% confidence that the reported
results are within 1% of the estimated mean. sysbench was run with a
postgres background and read-only tests. Similar to netperf, it was run
multiple times so that it's 99% confidence results are within 1%. The
patches were tested on x86, x86-64 and ppc64 as
x86: Intel Pentium D 3GHz with 8G RAM (no-brand machine)
kernbench - No significant difference, variance well within noise
netperf-udp - 1.34% to 2.28% gain
netperf-tcp - 0.45% to 1.22% gain
hackbench - Small variances, very close to noise
sysbench - Very small gains
x86-64: AMD Phenom 9950 1.3GHz with 8G RAM (no-brand machine)
kernbench - No significant difference, variance well within noise
netperf-udp - 1.83% to 10.42% gains
netperf-tcp - No conclusive until buffer >= PAGE_SIZE
4096 +15.83%
8192 + 0.34% (not significant)
16384 + 1%
hackbench - Small gains, very close to noise
sysbench - 0.79% to 1.6% gain
ppc64: PPC970MP 2.5GHz with 10GB RAM (it's a terrasoft powerstation)
kernbench - No significant difference, variance well within noise
netperf-udp - 2-3% gain for almost all buffer sizes tested
netperf-tcp - losses on small buffers, gains on larger buffers
possibly indicates some bad caching effect.
hackbench - No significant difference
sysbench - 2-4% gain
This patch:
Currently the per-cpu page allocator searches the PCP list for pages of
the correct migrate-type to reduce the possibility of pages being
inappropriate placed from a fragmentation perspective. This search is
potentially expensive in a fast-path and undesirable. Splitting the
per-cpu list into multiple lists increases the size of a per-cpu structure
and this was potentially a major problem at the time the search was
introduced. These problem has been mitigated as now only the necessary
number of structures is allocated for the running system.
This patch replaces a list search in the per-cpu allocator with one list
per migrate type. The potential snag with this approach is when bulk
freeing pages. We round-robin free pages based on migrate type which has
little bearing on the cache hotness of the page and potentially checks
empty lists repeatedly in the event the majority of PCP pages are of one
type.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, OOM logic callflow is here.
__out_of_memory()
select_bad_process() for each task
badness() calculate badness of one task
oom_kill_process() search child
oom_kill_task() kill target task and mm shared tasks with it
example, process-A have two thread, thread-A and thread-B and it have very
fat memory and each thread have following oom_adj and oom_score.
thread-A: oom_adj = OOM_DISABLE, oom_score = 0
thread-B: oom_adj = 0, oom_score = very-high
Then, select_bad_process() select thread-B, but oom_kill_task() refuse
kill the task because thread-A have OOM_DISABLE. Thus __out_of_memory()
call select_bad_process() again. but select_bad_process() select the same
task. It mean kernel fall in livelock.
The fact is, select_bad_process() must select killable task. otherwise
OOM logic go into livelock.
And root cause is, oom_adj shouldn't be per-thread value. it should be
per-process value because OOM-killer kill a process, not thread. Thus
This patch moves oomkilladj (now more appropriately named oom_adj) from
struct task_struct to struct signal_struct. it naturally prevent
select_bad_process() choose wrong task.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For mem_cgroup, shrink_zone() may call shrink_list() with nr_to_scan=1, in
which case shrink_list() _still_ calls isolate_pages() with the much
larger SWAP_CLUSTER_MAX. It effectively scales up the inactive list scan
rate by up to 32 times.
For example, with 16k inactive pages and DEF_PRIORITY=12, (16k >> 12)=4.
So when shrink_zone() expects to scan 4 pages in the active/inactive list,
the active list will be scanned 4 pages, while the inactive list will be
(over) scanned SWAP_CLUSTER_MAX=32 pages in effect. And that could break
the balance between the two lists.
It can further impact the scan of anon active list, due to the anon
active/inactive ratio rebalance logic in balance_pgdat()/shrink_zone():
inactive anon list over scanned => inactive_anon_is_low() == TRUE
=> shrink_active_list()
=> active anon list over scanned
So the end result may be
- anon inactive => over scanned
- anon active => over scanned (maybe not as much)
- file inactive => over scanned
- file active => under scanned (relatively)
The accesses to nr_saved_scan are not lock protected and so not 100%
accurate, however we can tolerate small errors and the resulted small
imbalanced scan rates between zones.
Cc: Rik van Riel <riel@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is being done by allowing boot time allocations to specify that they
may want a sub-page sized amount of memory.
Overall this seems more consistent with the other hash table allocations,
and allows making two supposedly mm-only variables really mm-only
(nr_{kernel,all}_pages).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sizing of memory allocations shouldn't depend on the number of physical
pages found in a system, as that generally includes (perhaps a huge amount
of) non-RAM pages. The amount of what actually is usable as storage
should instead be used as a basis here.
Some of the calculations (i.e. those not intending to use high memory)
should likely even use (totalram_pages - totalhigh_pages).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make page_has_private() return a true boolean value and remove the double
negations from the two callsites using it for arithmetic.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
page_is_file_cache() has been used for both boolean checks and LRU
arithmetic, which was always a bit weird.
Now that page_lru_base_type() exists for LRU arithmetic, make
page_is_file_cache() a real predicate function and adjust the
boolean-using callsites to drop those pesky double negations.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of abusing page_is_file_cache() for LRU list index arithmetic, add
another helper with a more appropriate name and convert the non-boolean
users of page_is_file_cache() accordingly.
This new helper gives the LRU base type a page is supposed to live on,
inactive anon or inactive file.
[hugh.dickins@tiscali.co.uk: convert del_page_from_lru() also]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kzalloc mempool zeros items when they are initially allocated, but
does not rezero used items that are returned to the pool. Consequently
mempool_alloc()s may return non-zeroed memory.
Since there are/were only two in-tree users for
mempool_create_kzalloc_pool(), and 'fixing' this in a way that will
re-zero used (but not new) items before first use is non-trivial, just
remove it.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The page allocation trace event reports that a page was successfully
allocated but it does not specify where it came from. When analysing
performance, it can be important to distinguish between pages coming from
the per-cpu allocator and pages coming from the buddy lists as the latter
requires the zone lock to the taken and more data structures to be
examined.
This patch adds a trace event for __rmqueue reporting when a page is being
allocated from the buddy lists. It distinguishes between being called to
refill the per-cpu lists or whether it is a high-order allocation.
Similarly, this patch adds an event to catch when the PCP lists are being
drained a little and pages are going back to the buddy lists.
This is trickier to draw conclusions from but high activity on those
events could explain why there were a large number of cache misses on a
page-allocator-intensive workload. The coalescing and splitting of
buddies involves a lot of writing of page metadata and cache line bounces
not to mention the acquisition of an interrupt-safe lock necessary to
enter this path.
[akpm@linux-foundation.org: fix build]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Ming Chun <macli@brc.ubc.ca>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fragmentation avoidance depends on being able to use free pages from lists
of the appropriate migrate type. In the event this is not possible,
__rmqueue_fallback() selects a different list and in some circumstances
change the migratetype of the pageblock. Simplistically, the more times
this event occurs, the more likely that fragmentation will be a problem
later for hugepage allocation at least but there are other considerations
such as the order of page being split to satisfy the allocation.
This patch adds a trace event for __rmqueue_fallback() that reports what
page is being used for the fallback, the orders of relevant pages, the
desired migratetype and the migratetype of the lists being used, whether
the pageblock changed type and whether this event is important with
respect to fragmentation avoidance or not. This information can be used
to help analyse fragmentation avoidance and help decide whether
min_free_kbytes should be increased or not.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Ming Chun <macli@brc.ubc.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds trace events for the allocation and freeing of pages,
including the freeing of pagevecs. Using the events, it will be known
what struct page and pfns are being allocated and freed and what the call
site was in many cases.
The page alloc tracepoints be used as an indicator as to whether the
workload was heavily dependant on the page allocator or not. You can make
a guess based on vmstat but you can't get a per-process breakdown.
Depending on the call path, the call_site for page allocation may be
__get_free_pages() instead of a useful callsite. Instead of passing down
a return address similar to slab debugging, the user should enable the
stacktrace and seg-addr options to get a proper stack trace.
The pagevec free tracepoint has a different usecase. It can be used to
get a idea of how many pages are being dumped off the LRU and whether it
is kswapd doing the work or a process doing direct reclaim.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Ming Chun <macli@brc.ubc.ca>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The function free_cold_page() has no callers so delete it.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Just as the swapoff system call allocates many pages of RAM to various
processes, perhaps triggering OOM, so "echo 2 >/sys/kernel/mm/ksm/run"
(unmerge) is liable to allocate many pages of RAM to various processes,
perhaps triggering OOM; and each is normally run from a modest admin
process (swapoff or shell), easily repeated until it succeeds.
So treat unmerge_and_remove_all_rmap_items() in the same way that we treat
try_to_unuse(): generalize PF_SWAPOFF to PF_OOM_ORIGIN, and bracket both
with that, to ask the OOM killer to kill them first, to prevent them from
spawning more and more OOM kills.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A few cleanups, given the munlock fix: the comment on ksm_test_exit() no
longer applies, and it can be made private to ksm.c; there's no more
reference to mmu_gather or tlb.h, and mmap.c doesn't need ksm.h.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rawhide users have reported hang at startup when cryptsetup is run: the
same problem can be simply reproduced by running a program int main() {
mlockall(MCL_CURRENT | MCL_FUTURE); return 0; }
The problem is that exit_mmap() applies munlock_vma_pages_all() to
clean up VM_LOCKED areas, and its current implementation (stupidly)
tries to fault in absent pages, for example where PROT_NONE prevented
them being faulted in when mlocking. Whereas the "ksm: fix oom
deadlock" patch, knowing there's a race by which KSM might try to fault
in pages after exit_mmap() had finally zapped the range, backs out of
such faults doing nothing when its ksm_test_exit() notices mm_users 0.
So revert that part of "ksm: fix oom deadlock" which moved the
ksm_exit() call from before exit_mmap() to the middle of exit_mmap();
and remove those ksm_test_exit() checks from the page fault paths, so
allowing the munlocking to proceed without interference.
ksm_exit, if there are rmap_items still chained on this mm slot, takes
mmap_sem write side: so preventing KSM from working on an mm while
exit_mmap runs. And KSM will bail out as soon as it notices that
mm_users is already zero, thanks to its internal ksm_test_exit checks.
So that when a task is killed by OOM killer or the user, KSM will not
indefinitely prevent it from running exit_mmap to release its memory.
This does break a part of what "ksm: fix oom deadlock" was trying to
achieve. When unmerging KSM (echo 2 >/sys/kernel/mm/ksm), and even
when ksmd itself has to cancel a KSM page, it is possible that the
first OOM-kill victim would be the KSM process being faulted: then its
memory won't be freed until a second victim has been selected (freeing
memory for the unmerging fault to complete).
But the OOM killer is already liable to kill a second victim once the
intended victim's p->mm goes to NULL: so there's not much point in
rejecting this KSM patch before fixing that OOM behaviour. It is very
much more important to allow KSM users to boot up, than to haggle over
an unlikely and poorly supported OOM case.
We also intend to fix munlocking to not fault pages: at which point
this patch _could_ be reverted; though that would be controversial, so
we hope to find a better solution.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Justin M. Forbes <jforbes@redhat.com>
Acked-for-now-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There's a now-obvious deadlock in KSM's out-of-memory handling:
imagine ksmd or KSM_RUN_UNMERGE handling, holding ksm_thread_mutex,
trying to allocate a page to break KSM in an mm which becomes the
OOM victim (quite likely in the unmerge case): it's killed and goes
to exit, and hangs there waiting to acquire ksm_thread_mutex.
Clearly we must not require ksm_thread_mutex in __ksm_exit, simple
though that made everything else: perhaps use mmap_sem somehow?
And part of the answer lies in the comments on unmerge_ksm_pages:
__ksm_exit should also leave all the rmap_item removal to ksmd.
But there's a fundamental problem, that KSM relies upon mmap_sem to
guarantee the consistency of the mm it's dealing with, yet exit_mmap
tears down an mm without taking mmap_sem. And bumping mm_users won't
help at all, that just ensures that the pages the OOM killer assumes
are on their way to being freed will not be freed.
The best answer seems to be, to move the ksm_exit callout from just
before exit_mmap, to the middle of exit_mmap: after the mm's pages
have been freed (if the mmu_gather is flushed), but before its page
tables and vma structures have been freed; and down_write,up_write
mmap_sem there to serialize with KSM's own reliance on mmap_sem.
But KSM then needs to be careful, whenever it downs mmap_sem, to
check that the mm is not already exiting: there's a danger of using
find_vma on a layout that's being torn apart, or writing into page
tables which have been freed for reuse; and even do_anonymous_page
and __do_fault need to check they're not being called by break_ksm
to reinstate a pte after zap_pte_range has zapped that page table.
Though it might be clearer to add an exiting flag, set while holding
mmap_sem in __ksm_exit, that wouldn't cover the issue of reinstating
a zapped pte. All we need is to check whether mm_users is 0 - but
must remember that ksmd may detect that before __ksm_exit is reached.
So, ksm_test_exit(mm) added to comment such checks on mm->mm_users.
__ksm_exit now has to leave clearing up the rmap_items to ksmd,
that needs ksm_thread_mutex; but shift the exiting mm just after the
ksm_scan cursor so that it will soon be dealt with. __ksm_enter raise
mm_count to hold the mm_struct, ksmd's exit processing (exactly like
its processing when it finds all VM_MERGEABLEs unmapped) mmdrop it,
similar procedure for KSM_RUN_UNMERGE (which has stopped ksmd).
But also give __ksm_exit a fast path: when there's no complication
(no rmap_items attached to mm and it's not at the ksm_scan cursor),
it can safely do all the exiting work itself. This is not just an
optimization: when ksmd is not running, the raised mm_count would
otherwise leak mm_structs.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KSM will need to identify its kernel merged pages unambiguously, and
/proc/kpageflags will probably like to do so too.
Since KSM will only be substituting anonymous pages, statistics are best
preserved by making a PageKsm page a special PageAnon page: one with no
anon_vma.
But KSM then needs its own page_add_ksm_rmap() - keep it in ksm.h near
PageKsm; and do_wp_page() must COW them, unlike singly mapped PageAnons.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
page_dup_rmap(), used on each mapped page when forking, was originally
just an inline atomic_inc of mapcount. 2.6.22 added CONFIG_DEBUG_VM
out-of-line checks to it, which would need to be ever-so-slightly
complicated to allow for the PageKsm() we're about to define.
But I think these checks never caught anything. And if it's coding errors
we're worried about, such checks should be in page_remove_rmap() too, not
just when forking; whereas if it's pagetable corruption we're worried
about, then they shouldn't be limited to CONFIG_DEBUG_VM.
Oh, just revert page_dup_rmap() to an inline atomic_inc of mapcount.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch presents the mm interface to a dummy version of ksm.c, for
better scrutiny of that interface: the real ksm.c follows later.
When CONFIG_KSM is not set, madvise(2) reject MADV_MERGEABLE and
MADV_UNMERGEABLE with EINVAL, since that seems more helpful than
pretending that they can be serviced. But when CONFIG_KSM=y, accept them
even if KSM is not currently running, and even on areas which KSM will not
touch (e.g. hugetlb or shared file or special driver mappings).
Like other madvices, report ENOMEM despite success if any area in the
range is unmapped, and use EAGAIN to report out of memory.
Define vma flag VM_MERGEABLE to identify an area on which KSM may try
merging pages: leave it to ksm_madvise() to decide whether to set it.
Define mm flag MMF_VM_MERGEABLE to identify an mm which might contain
VM_MERGEABLE areas, to minimize callouts when forking or exiting.
Based upon earlier patches by Chris Wright and Izik Eidus.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The out-of-tree KSM used ioctls on fds cloned from /dev/ksm to register a
memory area for merging: we prefer now to use an madvise(2) interface.
This patch just defines MADV_MERGEABLE (to tell KSM it may merge pages in
this area found identical to pages in other mergeable areas) and
MADV_UNMERGEABLE (to undo that).
Most architectures use asm-generic, but alpha, mips, parisc, xtensa need
their own definitions: included here for mmotm convenience, but we'll
probably want to split this and feed pieces to arch maintainers.
Based upon earlier patches by Chris Wright and Izik Eidus.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Cc: Chris Zankel <chris@zankel.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KSM is a linux driver that allows dynamicly sharing identical memory pages
between one or more processes.
Unlike tradtional page sharing that is made at the allocation of the
memory, ksm do it dynamicly after the memory was created. Memory is
periodically scanned; identical pages are identified and merged.
The sharing is made in a transparent way to the processes that use it.
Ksm is highly important for hypervisors (kvm), where in production
enviorments there might be many copys of the same data data among the host
memory. This kind of data can be: similar kernels, librarys, cache, and
so on.
Even that ksm was wrote for kvm, any userspace application that want to
use it to share its data can try it.
Ksm may be useful for any application that might have similar (page
aligment) data strctures among the memory, ksm will find this data merge
it to one copy, and even if it will be changed and thereforew copy on
writed, ksm will merge it again as soon as it will be identical again.
Another reason to consider using ksm is the fact that it might simplify
alot the userspace code of application that want to use shared private
data, instead that the application will mange shared area, ksm will do
this for the application, and even write to this data will be allowed
without any synchinization acts from the application.
Ksm was designed to be a loadable module that doesn't change the VM code
of linux.
This patch:
The set_pte_at_notify() macro allows setting a pte in the shadow page
table directly, instead of flushing the shadow page table entry and then
getting vmexit to set it. It uses a new change_pte() callback to do so.
set_pte_at_notify() is an optimization for kvm, and other users of
mmu_notifiers, for COW pages. It is useful for kvm when ksm is used,
because it allows kvm not to have to receive vmexit and only then map the
ksm page into the shadow page table, but instead map it directly at the
same time as Linux maps the page into the host page table.
Users of mmu_notifiers who don't implement new mmu_notifier_change_pte()
callback will just receive the mmu_notifier_invalidate_page() callback.
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
By the time PG_mlocked is cleared in the page freeing path, nobody else is
looking at our page->flags anymore.
It is thus safe to make the test-and-clear non-atomic and thereby removing
an unnecessary and expensive operation from a hotpath.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
global_lru_pages() / zone_lru_pages() can be used in two ways:
- to estimate max reclaimable pages in determine_dirtyable_memory()
- to calculate the slab scan ratio
When swap is full or not present, the anon lru lists are not reclaimable
and also won't be scanned. So the anon pages shall not be counted in both
usage scenarios. Also rename to _reclaimable_pages: now they are counting
the possibly reclaimable lru pages.
It can greatly (and correctly) increase the slab scan rate under high
memory pressure (when most file pages have been reclaimed and swap is
full/absent), thus reduce false OOM kills.
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: David Howells <dhowells@redhat.com>
Cc: "Li, Ming Chun" <macli@brc.ubc.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the system is running a heavy load of processes then concurrent reclaim
can isolate a large number of pages from the LRU. /proc/vmstat and the
output generated for an OOM do not show how many pages were isolated.
This has been observed during process fork bomb testing (mstctl11 in LTP).
This patch shows the information about isolated pages.
Reproduced via:
-----------------------
% ./hackbench 140 process 1000
=> OOM occur
active_anon:146 inactive_anon:0 isolated_anon:49245
active_file:79 inactive_file:18 isolated_file:113
unevictable:0 dirty:0 writeback:0 unstable:0 buffer:39
free:370 slab_reclaimable:309 slab_unreclaimable:5492
mapped:53 shmem:15 pagetables:28140 bounce:0
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recently we encountered OOM problems due to memory use of the GEM cache.
Generally a large amuont of Shmem/Tmpfs pages tend to create a memory
shortage problem.
We often use the following calculation to determine the amount of shmem
pages:
shmem = NR_ACTIVE_ANON + NR_INACTIVE_ANON - NR_ANON_PAGES
however the expression does not consider isolated and mlocked pages.
This patch adds explicit accounting for pages used by shmem and tmpfs.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The amount of memory allocated to kernel stacks can become significant and
cause OOM conditions. However, we do not display the amount of memory
consumed by stacks.
Add code to display the amount of memory used for stacks in /proc/meminfo.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Free huges pages from nodes in round robin fashion in an attempt to keep
[persistent a.k.a static] hugepages balanced across nodes
New function free_pool_huge_page() is modeled on and performs roughly the
inverse of alloc_fresh_huge_page(). Replaces dequeue_huge_page() which
now has no callers, so this patch removes it.
Helper function hstate_next_node_to_free() uses new hstate member
next_to_free_nid to distribute "frees" across all nodes with huge pages.
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In my test, 128M memory is hot added, but zone's pcp batch is 0, which is
an obvious error. When pages are onlined, zone pcp should be updated
accordingly.
[akpm@linux-foundation.org: fix warnings]
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Yakui Zhao <yakui.zhao@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make use of the compiler's typechecking on !CONFIG_SWAP as well.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PERF_EVENT_FORK always outputs the time field, so update the header
to reflect this.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20090922123424.GD19453@kryten>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Document the possibility that is_enabled may also return with negative
errorcodes.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
Fix a couple of typos I found while working with this subsystem.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
Now fixed regulators that have their enable pin connected to a GPIO line
can use the fixed regulator driver for regulator enable/disable control.
The GPIO number and polarity information is passed through platform data.
GPIO enable control is achieved using gpiolib.
Signed-off-by: Roger Quadros <ext-roger.quadros@nokia.com>
Reviewed-by: Philipp Zabel <philipp.zabel@gmail.com>
Reviewed-by: Felipe Balbi <felipe.balbi@nokia.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
Simplify checking of support for voltage ranges by providing an API which
wraps the existing count and list operations.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
Some consumers require complete control of the regulator and can't
tolerate sharing it with other consumers, most commonly because they need
to have the regulator actually disabled so can't have other consumers
forcing it on. This new regulator_get_exclusive() API call allows these
consumers to explicitly request this, documenting the assumptions that
they are making.
In order to simplify coding of such consumers the use count for regulators
they request is forced to match the enabled state of the regulator when
it is requested. This is not possible for consumers which can share
regulators due to the need to keep track of the ownership of use counts.
A new API call is used rather than an additional argument to the existing
regulator_get() in order to avoid merge headaches with driver code in
other trees.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
BUCK3 is the new component in DA9035. So there're three BUCKs in DA9035.
And there're two BUCKs in DA9034.
Signed-off-by: Haojian Zhuang <haojian.zhuang@marvell.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
This allows machine drivers to build without ifdefs if they have
full constraints. Suggested by machine drivers contributed by
Haojian Zhuang <haojian.zhuang@gmail.com>.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
Follow the approach suggested by Russell King and implemented by him in
the clkdev API and allow consumer device supply mapings to be set up
using the dev_name() for the consumer instead of the struct device.
In order to avoid making existing machines instabuggy and creating merge
issues the use of struct device is still supported for the time being.
This resolves problems working with buses such as I2C which make the
struct device available late providing that the final device name is
known, which is the case for most embedded systems with fixed setups.
Consumers must still use the struct device when calling regulator_get().
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
I'd like to use printk_ratelimit() in atomic context, but that's
not possible right now due to the spinlock usage this commit
introduced more than a year ago:
717115e: printk ratelimiting rewrite
As a first step push the lock into the ratelimit state structure.
This allows us to deal with locking failures to be considered as an
event related to that state being too busy.
Also clean up the code a bit (without changing functionality):
- tidy up the definitions
- clean up the code flow
This also shrinks the code a tiny bit:
text data bss dec hex filename
264 0 4 268 10c ratelimit.o.before
255 0 0 255 ff ratelimit.o.after
( Whole-kernel data size got a bit larger, because we have
two ratelimit-state data structures right now. )
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David S. Miller <davem@davemloft.net>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Whether or not the sparse warning
warning: do-while statement is not a compound statement
is justified or not in this case, it is annoying and trivial to fix.
[vegard.nossum@gmail.com: title and cleanup]
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Certain hardware will send us events when the backlight brightness
changes. Add a function to update the value in the core, and
additionally send a uevent so that userspace can pop up appropriate
UI. The uevents are flagged depending on whether the update originated
in the kernel or from userspace, making it easier to only display UI
at the appropriate time.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>
* 'perfcounters-rename-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Tidy up after the big rename
perf: Do the big rename: Performance Counters -> Performance Events
perf_counter: Rename 'event' to event_id/hw_event
perf_counter: Rename list_entry -> group_entry, counter_list -> group_list
Manually resolved some fairly trivial conflicts with the tracing tree in
include/trace/ftrace.h and kernel/trace/trace_syscalls.c.
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: Fix whitespace inconsistencies
rcu: Fix thinko, actually initialize full tree
rcu: Apply results of code inspection of kernel/rcutree_plugin.h
rcu: Add WARN_ON_ONCE() consistency checks covering state transitions
rcu: Fix synchronize_rcu() for TREE_PREEMPT_RCU
rcu: Simplify rcu_read_unlock_special() quiescent-state accounting
rcu: Add debug checks to TREE_PREEMPT_RCU for premature grace periods
rcu: Kconfig help needs to say that TREE_PREEMPT_RCU scales down
rcutorture: Occasionally delay readers enough to make RCU force_quiescent_state
rcu: Initialize multi-level RCU grace periods holding locks
rcu: Need to update rnp->gpnum if preemptable RCU is to be reliable
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_counter, powerpc, sparc: Fix compilation after perf_counter_overflow() change
perf_counter: x86: Fix PMU resource leak
perf util: SVG performance improvements
perf util: Make the timechart SVG width dynamic
perf timechart: Show the duration of scheduler delays in the SVG
perf timechart: Show the name of the waker/wakee in timechart
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Simplify sys_sched_rr_get_interval() system call
sched: Fix potential NULL derference of doms_cur
sched: Fix raciness in runqueue_is_locked()
sched: Re-add lost cpu_allowed check to sched_fair.c::select_task_rq_fair()
sched: Remove unneeded indentation in sched_fair.c::place_entity()
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (133 commits)
drm/vgaarb: add VGA arbitration support to the drm and kms.
drm/radeon: some r420s have a CP race with the DMA engine.
drm/radeon/r600/kms: rv670 is not DCE3
drm/radeon/kms: r420 idle after programming GA_ENHANCE
drm/radeon/kms: more fixes to rv770 suspend/resume path.
drm/radeon/kms: more alignment for rv770.c with r600.c
drm/radeon/kms: rv770 blit init called too late.
drm/radeon/kms: move around new init path code to avoid posting at init
drm/radeon/r600: fix some issues with suspend/resume.
drm/radeon/kms: disable VGA rendering engine before taking over VRAM
drm/radeon/kms: Move radeon_get_clock_info() call out of radeon_clocks_init().
drm/radeon/kms: add initial connector properties
drm/radeon/kms: Use surfaces for scanout / cursor byte swapping on big endian.
drm/radeon/kms: don't fail if we fail to init GPU acceleration
drm/r600/kms: fixup number of loops per blit calculation.
drm/radeon/kms: reprogram format in set base.
drm/radeon: avivo chips have no separate int bit for display
drm/radeon/r600: don't do interrupts
drm: fix _DRM_GEM addmap error message
drm: update crtc x/y when only fb changes
...
Fixed up trivial conflicts in firmware/Makefile due to network driver
(cxgb3) and drm (mga/r128/radeon) firmware being listed next to each
other.
* complete support for ak4113
* based on code for ak4114 and ak4117
Signed-off-by: Pavel Hofman <pavel.hofman@ivitera.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
* complete support for ak4620
* codec regs listed in proc for all codecs/chips
* adding total regs for each codec
* fixing nb. of steps in input attenuation controls
Signed-off-by: Pavel Hofman <pavel.hofman@ivitera.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
* the previous version had a typo - values of AK4114_OPS10-12 were
identical with AK4114_OPS00-02
* Since no cards actually use this feature, the bug was not identified earlier
Signed-off-by: Pavel Hofman <pavel.hofman@ivitera.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This is patch to change ftp site of the libcap source.
"ftp://linux.kernel.org" address does not exist.
Signed-off-by: GeunSik Lim <geunsik.lim@samsung.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
- provide compatibility Kconfig entry for existing PERF_COUNTERS .config's
- provide courtesy copy of old perf_counter.h, for user-space projects
- small indentation fixups
- fix up MAINTAINERS
- fix small x86 printout fallout
- fix up small PowerPC comment fallout (use 'counter' as in register)
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Bye-bye Performance Counters, welcome Performance Events!
In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.
Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.
All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)
The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.
Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.
User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)
This patch has been generated via the following script:
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES
for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done
FILES=$(find . -name perf_event.*)
sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES
... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.
Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.
( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )
Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is in preparation of the big rename, but also makes sense
in a standalone way: 'list_entry' is a bad name as we already
have a list_entry() in list.h.
Also, the 'counter list' is too vague, it doesnt tell us the
purpose of that list.
Clarify these names to show that it's all about the group
hiearchy.
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
By removing the need for it to know details of scheduling classes.
This allows PlugSched to define orthogonal scheduling classes.
Signed-off-by: Peter Williams <pwil3058@bigpond.net.au>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <06d1b89ee15a0eef82d7.1253496713@mudlark.pw.nest>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 5622f295 ("x86, perf_counter, bts: Optimize BTS overflow
handling") removed the regs field from struct perf_sample_data and
added a regs parameter to perf_counter_overflow(). This breaks the
build on powerpc (and Sparc) as reported by Sachin Sant:
arch/powerpc/kernel/perf_counter.c: In function 'record_and_restart':
arch/powerpc/kernel/perf_counter.c:1165: error: unknown field 'regs' specified in initializer
This adjusts arch/powerpc/kernel/perf_counter.c to correspond with the
new struct perf_sample_data and perf_counter_overflow().
[ v2: also fix Sparc, Markus Metzger <markus.t.metzger@intel.com> ]
Reported-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Markus Metzger <markus.t.metzger@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: benh@kernel.crashing.org
Cc: linuxppc-dev@ozlabs.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <19127.8400.376239.586120@drongo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
VGA arb requires DRM support for non-kms drivers, to turn on/off
irqs when disabling the mem/io regions.
VGA arb requires KMS support for GPUs where we can turn off VGA
decoding. Currently we know how to do this for intel and radeon
kms drivers, which allows them to be removed from the arbiter.
This patch comes from Fedora rawhide kernel.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Adding a reference to <linux/linkage.h> to x86's <asm/cache.h> causes
the x86 linker script to have syntax errors, because the ALIGN and
ENTRY keywords get redefined to the assembly implementations of those.
One could fix this by adjusting the include structure, but I think any
solution based on that approach would be fragile.
Currently, it is impossible when writing a header to do something
different for assembly files and linker scripts, even though there are
clearly cases where one wants them to define macros differently for
the two (ENTRY being an excellent example).
So I think the right solution here is to introduce a new preprocessor
definition, called LINKER_SCRIPT that is set along with __ASSEMBLY__
for linker scripts, and to use that to not define ALIGN and ENTRY in
linker scripts.
I suspect we'll find other uses for this mechanism in
the future.
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (79 commits)
USB serial: update the console driver
usb-serial: straighten out serial_open
usb-serial: add missing tests and debug lines
usb-serial: rename subroutines
usb-serial: fix termios initialization logic
usb-serial: acquire references when a new tty is installed
usb-serial: change logic of serial lookups
usb-serial: put subroutines in logical order
usb-serial: change referencing of port and serial structures
tty: Char: mxser, use THRE for ASPP_OQUEUE ioctl
tty: Char: mxser, add support for CP112UL
uartlite: support shared interrupt lines
tty: USB: serial/mct_u232, fix tty refcnt
tty: riscom8, fix tty refcnt
tty: riscom8, fix shutdown declaration
TTY: fix typos
tty: Power: fix suspend vt regression
tty: vt: use printk_once
tty: handle VT specific compat ioctls in vt driver
n_tty: move echoctl check and clean up logic
...
* 'perfcounters-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (58 commits)
perf_counter: Fix perf_copy_attr() pointer arithmetic
perf utils: Use a define for the maximum length of a trace event
perf: Add timechart help text and add timechart to "perf help"
tracing, x86, cpuidle: Move the end point of a C state in the power tracer
perf utils: Be consistent about minimum text size in the svghelper
perf timechart: Add "perf timechart record"
perf: Add the timechart tool
perf: Add a SVG helper library file
tracing, perf: Convert the power tracer into an event tracer
perf: Add a sample_event type to the event_union
perf: Allow perf utilities to have "callback" options without arguments
perf: Store trace event name/id pairs in perf.data
perf: Add a timestamp to fork events
sched_clock: Make it NMI safe
perf_counter: Fix up swcounter throttling
x86, perf_counter, bts: Optimize BTS overflow handling
perf sched: Add --input=file option to builtin-sched.c
perf trace: Sample timestamp and cpu when using record flag
perf tools: Increase MAX_EVENT_LENGTH
perf tools: Fix memory leak in read_ftrace_printk()
...
runqueue_is_locked() is unavoidably racy due to a poor interface design.
It does
cpu = get_cpu()
ret = some_perpcu_thing(cpu);
put_cpu(cpu);
return ret;
Its return value is unreliable.
Fix.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <200909191855.n8JItiko022148@imap1.linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix the following 'make includecheck' warning:
include/linux/ftrace.h: linux/sched.h is included more than once.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <1247068321.4382.102.camel@ht.satnam>
fix the following 'make includecheck' warning:
include/linux/page_cgroup.h: linux/swap.h is included more than once.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
fix the following 'make includecheck' warning:
include/linux/aio.h: linux/aio_abi.h is included more than once.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: bcrl@kvack.org
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <1247068254.4382.101.camel@ht.satnam>
fix the following 'make includecheck' warning:
include/drm/drm_memory.h: linux/vmalloc.h is included more than once.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <1247068169.4382.99.camel@ht.satnam>
Acked-by: Dave Airlie <airlied@redhat.com>
fix the following 'make includecheck' warning:
include/acpi/acpi_bus.h: linux/device.h is included more than once.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <1247068058.4382.96.camel@ht.satnam>
Acked-by: Len Brown <len.brown@intel.com>
Split nand_correct_data() into two part, a pure calculation function
and a wrapper for mtd interface.
The tmio_nand driver can implement its ecc.correct function easily
using this __nand_correct_data helper.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Acked-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Acked-by: Vimal Singh <vimalsingh@ti.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The VT specific compat_ioctl handlers are the only ones
in common code that require the BKL. Moving them into
the vt driver lets us remove the BKL from the other handlers
and cleans up the code.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Various drivers have hacks to mangle termios structures. This stems from
the fact there is no nice setup hook for configuring the termios settings
when the port is created
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The serial layer for some reason uses different defines for the special
case close delays and then conditionally switches to/from the normal ones
in the ioctls.
Remove this rather pointless abstraction
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This little helper is now tty_port specific and useful generally so move it
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
They cover essentially the same stuff and we can therefore fold it into the
tty_port one.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fortunately the serial layer was designed to use the same flag values but
with different names. It has its own SUSPENDED flag which is a free slot in
the ASYNC flags so we allocate it in the ASYNC flags instead.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We moved this into uart_state, now move the fields out of the separate
structure and kill it off.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
And indeed none of them use it. Clean this up as it will make moving to a
standard open method rather easier.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
X and other graphical interfaces need to be able to flip to a console
and lock it into graphics mode without races.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In the past someone gratuitiously borrowed chunks of kernel internal vt
code and dumped them in kernel/power. They have all sorts of deep relations
with the vt code so put them in the vt tree instead
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is needed and requested in various forms for ConsoleKit, screenblank
handling and the like so do the job with a single interface. Also build the
interface so that unlike VT_WAITACTIVE and friends it won't miss events.
FIXME: Should this be a waitactive ioctl or a new device file you can poll
and read events from. We need the code anyway to fix up the existing broken
wait for console switch logic but the ConsoleKit people would prefer the
new device to the ioctl we have here
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Now we are extracting out methods for shutdown and the like we can add a
proper tty_port_close method that knows all the innards of the tty closing
process and hides the lot from the caller.
At some point in the future this will be paired with a similar open()
helper and the drivers can stick to hardware management.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is currently no provision for passing IRQ trigger flags for
serial IRQs with triggering requirements (such as GPIO IRQs)
This patch adds irqflags to plat_serial8250_port that can be passed
from board file to reqest_irq() of 8250 driver
Changes are backward compatible with boards passing UPF_SHARE_IRQ flag
Tested on Zoom2 board that has IRQF_TRIGGER_RISING requirement for 8250 irq
[Moved new flag to end to fix bugs in the original with the old_serial array
-- Alan]
Signed-off-by: Vikram Pandita <vikram.pandita@ti.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently kfifo cannot be used by parts of the kernel that use "const"
properly as kfifo itself does not use const for passed data blocks which
are indeed const.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add helpers for io operations, so that we can eliminate huge
amount of supporting code. It is now centralized in those
helpers and used values are precomputed in the init phase.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Don't fetch firmware address and recompute channel control on each
port access. Precompute the values on init and use them later all
the time.
The same for board control.
This simplify code and improves readability.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This allows subsytems to provide devtmpfs with non-default permissions
for the device node. Instead of the default mode of 0600, null, zero,
random, urandom, full, tty, ptmx now have a mode of 0666, which allows
non-privileged processes to access standard device nodes in case no
other userspace process applies the expected permissions.
This also fixes a wrong assignment in pktcdvd and a checkpatch.pl complain.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds the new mode NAND_ECC_HW_OOB_FIRST in the nand code to
support 4-bit ECC on TI DaVinci devices with large page (up to 2KiB) NAND
chips. This ECC mode is similar to NAND_ECC_HW, with the exception of
read_page API that first reads the OOB area, reads the data in chunks,
feeds the ECC from OOB area to the ECC hw engine and perform any
correction on the data as per the ECC status reported by the engine.
"ECC_HW_OOB_FIRST" name suggested by Thomas Gleixner
Reviewed-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Sneha Narnakaje <nsnehaprabha@ti.com>
Signed-off-by: Sandeep Paulraj <s-paulraj@ti.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch adds a new "page" parameter to all NAND read_page/read_page_raw
APIs. The read_page API for the new mode ECC_HW_OOB_FIRST requires the
page information to send the READOOB command and read the OOB area before
the data area.
Reviewed-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Sneha Narnakaje <nsnehaprabha@ti.com>
Signed-off-by: Sandeep Paulraj <s-paulraj@ti.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Remove the ARM dependency from the generic "onenand" platform device
driver. This change makes the driver useful for other architectures as
well. Needed for the SuperH kfr2r09 board.
Apart from the obvious Kconfig bits, the most important change is the move
away from ARM specific includes and platform data. Together with this
change the only in-tree board code gets an update, and the driver name is
also changed gracefully break potential out of tree drivers.
The driver is also updated to allow NULL as platform data together with a
few changes to make use of resource_size() and dev_name().
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Kyungmin Park <kmpark@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch converts the existing power tracer into an event tracer,
so that power events (C states and frequency changes) can be
tracked via "perf".
This also removes the perl script that was used to demo the tracer;
its functionality is being replaced entirely with timechart.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20090912130542.6d314860@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf timechart needs to know when a process forked, in order to be
able to visualize properly when tasks start.
This patch adds a time field to the event structure, and fills it
in appropriately.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20090912130341.51ad2de2@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
acpi_osi_setup() does not depend on CONFIG_DMI
acpi_dmi_osi_linux()'s definition doesn't depend on CONFIG_DMI either
Signed-off-by: Len Brown <len.brown@intel.com>
This is an initial driver for Analog Devices ADV7180 Video Decoder.
So far it only supports query standard.
[akpm@linux-foundation.org: remove unneeded cast]
Cc: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Richard Röjfors <richard.rojfors.ext@mocean-labs.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Douglas Schilling Landgraf <dougsland@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
In ISDB-S, time-devision duplex is used to multiplexing several waves
in the same frequency. Each wave is identified by its own transport
stream ID, or TS ID. We need to provide some way to specify this ID
from user applications to handle ISDB-S frontends.
This code has been tested with the Earthsoft PT1 driver.
[mchehab@infradead.org: Fix merge conflicts with isdbt and rename the new parameter to DTV_ISDBS_TS_ID]
Signed-off-by: HIRANO Takahito <hiranotaka@zng.info>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Warn when the desired device node number is already in use, except when
the new video_register_device_no_warn function is called since in some
use-cases that warning is not relevant.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
video_register_device_index is never actually called, instead the
stream index number is always calculated automatically.
This patch removes this function and simplifies the internal get_index
function since that can now always just return the first free index.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Rewrite v4l2_i2c_new_subdev as a simplified version of v4l2_i2c_new_subdev_cfg
and remove v4l2_i2c_new_probed_subdev and v4l2_i2c_new_probed_subdev_addr.
This simplifies this API substantially.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This makes the soc-camera interface for V4L2 subdevices thinner yet. Handle
gain and exposure internally in each driver just like all other controls.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Remove unneeded soc-camera operations, this also makes the soc-camera API to
v4l2 subdevices thinner.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The initial soc-camera scaling and cropping implementation turned out to be
incompliant with the V4L2 API, e.g., it expected the user to specify cropping
in output window pixels, instead of input window pixels. This patch converts
the soc-camera core and all drivers to comply with the standard.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Use v4l2_subdev_call() instead of v4l2_device_call_until_err() in all host
drivers and in soc-camera core.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Remove set_crop soc-camera device method and switch to s_crop from v4l2-subdev
video operations. Also extend non-i2c drivers to also hold a pointer to their
v4l2-subdev instance in control device driver-data, i.e., in
dev_get_drvdata((struct device *)to_soc_camera_control(icd))
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The move of format translation initialisation into soc_camera_open() was
temporary for the soc-camera as platform driver intermediate step, put it back
into soc_camera_probe(). Also add a .put_formats() method to
soc_camera_host_ops to free any resources host driver might have allocated in
.get_formats().
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Switch to using struct v4l2_rect in struct soc_camera_device for uniformity and
simplicity.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Until now soc-camera only supported client (sensor) controls. This patch
enables camera-host drivers to implement their own controls too.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Add a new V4L2_CID_BAND_STOP_FILTER integer control, which either switches the
band-stop filter off, or sets it to a certain strength.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Convert the soc-camera framework to use the v4l2-(sub)dev API. Start using
v4l2-subdev operations. Only a part of the interface between the
soc_camera core, soc_camera host drivers on one side and soc_camera device
drivers on the other side is replaced so far. The rest of the interface
will be replaced in incremental steps, and will require extensions and,
possibly, modifications to the v4l2-subdev code.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Convert soc-camera core and all drivers to platform device API. We already
converted platforms to register a platform device for each soc-camera client,
now we remove the compatibility code and switch completely to the new scheme.
This is a preparatory step for the v4l2-subdev conversion.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Add a struct device pointer to struct soc_camera_platform_info and let the user
(ap325rxa) pass it down to soc_camera_platform.c in its .add_device() method.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
soc_camera_platform.c is only used by y SuperH ap325rxa board. This patch
converts soc_camera_platform.c and its users for the soc-camera platform-
device conversion and also extends soc-camera core to handle non-I2C cameras.
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This is a new module added for vpss library functions that are
used for configuring vpss system module. All video drivers will
include vpss.h header file and call functions defined in this
module to configure vpss system module.
Reviewed by: Hans Verkuil <hverkuil@xs4all.nl>
Reviewed by: Laurent Pinchart <laurent.pinchart@skynet.be>
Reviewed by: Alexey Klimov <klimov.linux@gmail.com>
Signed-off-by: Muralidharan Karicheri <m-karicheri2@ti.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This is the hw module for DM644x CCDC. This registers with the
vpfe capture driver and provides a set of hw_ops to configure
CCDC for a specific decoder device connected to the VPFE.
Reviewed by: Hans Verkuil <hverkuil@xs4all.nl>
Reviewed by: Laurent Pinchart <laurent.pinchart@skynet.be>
Signed-off-by: Muralidharan Karicheri <m-karicheri2@ti.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Adds ccdc hw module for DM355 CCDC. This registers with the bridge
driver a set of hw_ops for configuring the CCDC for a specific
decoder device connected to vpfe.
Reviewed by: Hans Verkuil <hverkuil@xs4all.nl>
Reviewed by: Laurent Pinchart <laurent.pinchart@skynet.be>
Reviewed by: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Muralidharan Karicheri <m-karicheri2@ti.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This the vpfe capture bridge driver for doing video
capture on DM355 and DM6446 evms. The ccdc hw modules register with the
driver and are used for configuring the CCD Controller for a specific
decoder interface. The driver also registers the sub devices required
for a specific evm. More than one sub devices can be registered.
This allows driver to switch dynamically to capture video from
any sub device that is registered. Currently only one sub device
(tvp5146) is supported. But in future this driver is expected
to do capture from sensor devices such as Micron's MT9T001, MT9T031
and MT9P031 etc. The driver currently supports MMAP based IO.
Reviewed by: Laurent Pinchart <laurent.pinchart@skynet.be>
Reviewed by: Alexey Klimov <klimov.linux@gmail.com>
Signed-off-by: Muralidharan Karicheri <m-karicheri2@ti.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
SAA7164: Remove the SAA7164 bus id, no longer required.
Signed-off-by: Steven Toth <stoth@kernellabs.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Add support for NXP TDA18271 as a standalone tuner, allowing the use of
analog demodulators other than the Philips/NXP TDA829x.
Signed-off-by: Michael Krufky <mkrufky@kernellabs.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This patch increments the DVB-API to version 5.1 in order to reflect the addition of ISDB-T and ISDB-Tsb on Linux' DVB-API.
Changes in detail:
- added a small document to describe how to use the API to tune to an ISDB-T or ISDB-Tsb channel
- added necessary fields to dtv_frontend_cache
- added a smarter clear-cache function which resets all fields of the dtv_frontend_cache
- added a TRANSMISSION_MODE_4K to fe_transmit_mode_t
Signed-off-by: Olivier Grenie <olgrenie@dibcom.fr>
Signed-off-by: Patrick Boettcher <pboettcher@dibcom.fr>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The definition of the SPI clock phase for the Motorola mode of
the PL022 driver was incorrect: the spec had been interpreted as
data being recieved on rising or falling edge of the clocks while
the correct interpretation is that data can be recieved on the
first or second edge transition, falling or rising depending on
the polarity setting.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Now that the last users of markers have migrated to the event
tracer we can kill off the (now orphan) support code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20090917173527.GA1699@lst.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Draining the BTS buffer on a buffer overflow interrupt takes too
long resulting in a kernel lockup when tracing the kernel.
Restructure perf_counter sampling into sample creation and sample
output.
Prepare a single reference sample for BTS sampling and update the
from and to address fields when draining the BTS buffer. Drain the
entire BTS buffer between a single perf_output_begin() /
perf_output_end() pair.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090915130023.A16204@sedona.ch.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (64 commits)
ext4: Update documentation about quota mount options
ext4: replace MAX_DEFRAG_SIZE with EXT_MAX_BLOCK
ext4: Fix the alloc on close after a truncate hueristic
ext4: Add a tracepoint for ext4_alloc_da_blocks()
ext4: store EXT4_EXT_MIGRATE in i_state instead of i_flags
ext4: limit block allocations for indirect-block files to < 2^32
ext4: Fix different block exchange issue in EXT4_IOC_MOVE_EXT
ext4: Add null extent check to ext_get_path
ext4: Replace BUG_ON() with ext4_error() in move_extents.c
ext4: Replace get_ext_path macro with an inline funciton
ext4: Fix include/trace/events/ext4.h to work with Systemtap
ext4: Fix initalization of s_flex_groups
ext4: Always set dx_node's fake_dirent explicitly.
ext4: Fix async commit mode to be safe by using a barrier
ext4: Don't update superblock write time when filesystem is read-only
ext4: Clarify the locking details in mballoc
ext4: check for need init flag in ext4_mb_load_buddy
ext4: move ext4_mb_init_group() function earlier in the mballoc.c
ext4: Make non-journal fsync work properly
ext4: Assure that metadata blocks are written during fsync in no journal mode
...
Remove net/genetlink.h inclusion, now sched.c won't be recompiled
because of some networking changes.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
[WATCHDOG] sizeof cleanup
[WATCHDOG] wdt_pci: fix printk and variable type
[WATCHDOG] wdt_pci - use pci_request_region
[WATCHDOG] ar7_wdt: Fix error handling during probe.
[WATCHDOG] ar7_wdt: convert to become a platform driver
[WATCHDOG] fix book E watchdog to take WDIOC_SETTIMEOUT arg in seconds
[WATCHDOG] davinci: use clock framework for timer frequency
[WATCHDOG] Use DIV_ROUND_UP() macro in the coh901327 WDT
[WATCHDOG] Add support for WM831x watchdog
[WATCHDOG] Add watchdog driver for NUC900
[WATCHDOG] add SBC-FITPC2 watchdog driver
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (262 commits)
sh: mach-ecovec24: Add user debug switch support
sh: Kill off unused se_skipped in alignment trap notification code.
sh: Wire up HAVE_SYSCALL_TRACEPOINTS.
video: sh_mobile_lcdcfb: use both register sets for display panning
video: sh_mobile_lcdcfb: implement display panning
sh: Fix up sh7705 flush_dcache_page() build.
sh: kfr2r09: document the PLL/FLL <-> RF relationship.
sh: mach-ecovec24: need asm/clock.h.
sh: mach-ecovec24: deassert usb irq on boot.
sh: Add KEYSC support for EcoVec24
sh: add kycr2_delay for sh_keysc
sh: cpufreq: Include CPU id in info messages.
sh: multi-evt support for SH-X3 proto CPU.
sh: clkfwk: remove bogus set_bus_parent() from SH7709.
sh: Fix the indication point of the liquid crystal of AP-325RXA(AP3300)
sh: Add EcoVec24 romImage defconfig
sh: USB disable process is needed if romImage boot for EcoVec24
sh: EcoVec24: add HIZA setting for LED
sh: EcoVec24: write MAC address in boot
sh: Add romImage support for EcoVec24
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: add fusectl interface to max_background
fuse: limit user-specified values of max background requests
fuse: use drop_nlink() instead of direct nlink manipulation
fuse: document protocol version negotiation
fuse: make the number of max background requests and congestion threshold tunable
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
ext3: Flush disk caches on fsync when needed
ext3: Add locking to ext3_do_update_inode
ext3: Fix possible deadlock between ext3_truncate() and ext3_get_blocks()
jbd: Annotate transaction start also for journal_restart()
jbd: Journal block numbers can ever be only 32-bit use unsigned int for them
ext3: Update MAINTAINERS for ext3 and JBD
JBD: round commit timer up to avoid uncommitted transaction
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] Fix NULL ptr regression in powernow-k8
[CPUFREQ] Create a blacklist for processors that should not load the acpi-cpufreq module.
[CPUFREQ] Powernow-k8: Enable more than 2 low P-states
[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)
[CPUFREQ] ondemand - Use global sysfs dir for tuning settings
[CPUFREQ] Introduce global, not per core: /sys/devices/system/cpu/cpufreq
[CPUFREQ] Bail out of cpufreq_add_dev if the link for a managed CPU got created
[CPUFREQ] Factor out policy setting from cpufreq_add_dev
[CPUFREQ] Factor out interface creation from cpufreq_add_dev
[CPUFREQ] Factor out symlink creation from cpufreq_add_dev
[CPUFREQ] cleanup up -ENOMEM handling in cpufreq_add_dev
[CPUFREQ] Reduce scope of cpu_sys_dev in cpufreq_add_dev
[CPUFREQ] update Doc for cpuinfo_cur_freq and scaling_cur_freq
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (34 commits)
time: Prevent 32 bit overflow with set_normalized_timespec()
clocksource: Delay clocksource down rating to late boot
clocksource: clocksource_select must be called with mutex locked
clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash
timers: Drop a function prototype
clocksource: Resolve cpu hotplug dead lock with TSC unstable
timer.c: Fix S/390 comments
timekeeping: Fix invalid getboottime() value
timekeeping: Fix up read_persistent_clock() breakage on sh
timekeeping: Increase granularity of read_persistent_clock(), build fix
time: Introduce CLOCK_REALTIME_COARSE
x86: Do not unregister PIT clocksource on PIT oneshot setup/shutdown
clocksource: Avoid clocksource watchdog circular locking dependency
clocksource: Protect the watchdog rating changes with clocksource_mutex
clocksource: Call clocksource_change_rating() outside of watchdog_lock
timekeeping: Introduce read_boot_clock
timekeeping: Increase granularity of read_persistent_clock()
timekeeping: Update clocksource with stop_machine
timekeeping: Add timekeeper read_clock helper functions
timekeeping: Move NTP adjusted clock multiplier to struct timekeeper
...
Fix trivial conflict due to MIPS lemote -> loongson renaming.
The WM831x series of devices provide a watchdog with configurable
behaviour on timer expiry.
Currently this driver support refreshes via a register or GPIO line and
autonomous refreshes from a hardware source (eg, a clock).
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
The MELPAS MCS-5000 is the touchscreen controller. The overview of this
controller can see at the following website:
http://www.melfas.com/product/product01.asp?k_r=eng_
This driver is tested on s3c6410 NCP board and supports only the i2c
interface.
Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Automatically turn off leds and sound effects as part of suspend
process and restore led state, sounds and repeat rate at resume.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
The serio ports on i8042 are not completely isolated; while we provide
enough locking to ensure proper serialization when accessing control
and data registers AUX and KBD ports can still have an effect on each
other on PS/2 protocol level. The most prominent effect is that
issuing a command for the device connected to one port may cause
abort of the command currently executing by the device connected to
another port.
Since i8042 nor serio subsystem are not aware of the details of the
PS/2 protocol (length of the commands and their replies and so on) the
locking should be done on libps2 level by adding special handling when
we see that we are dealing with serio port on i8042.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Userspace can query if acceleration is working or not true get
info ioctl and could fallback to software if for some reason
kernel failed to initialize KMS. This should allow to give a
working KMS setup in all case (even with non functionning accel).
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Currently the trace event profile buffer is allocated in the stack. But
this may be too much for the stack, as the events can have large
statically defined field size and can also grow with dynamic arrays.
Allocate two per cpu buffer for all profiled events. The first cpu
buffer is used to host every non-nmi context traces. It is protected
by disabling the interrupts while writing and committing the trace.
The second buffer is reserved for nmi. So that there is no race between
them and the first buffer.
The whole write/commit section is rcu protected because we release
these buffers while deactivating the last profiling trace event.
v2: Move the buffers from trace_event to be global, as pointed by
Steven Rostedt.
v3: Fix the syscall events to handle the profiling buffer races
by disabling interrupts, now that the buffers are globals.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Factorize the events enabling accounting in a common tracing core
helper. This reduces the size of the profile_enable() and
profile_disable() callbacks for each trace events.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits)
sched: Fix SD_POWERSAVING_BALANCE|SD_PREFER_LOCAL vs SD_WAKE_AFFINE
sched: Stop buddies from hogging the system
sched: Add new wakeup preemption mode: WAKEUP_RUNNING
sched: Fix TASK_WAKING & loadaverage breakage
sched: Disable wakeup balancing
sched: Rename flags to wake_flags
sched: Clean up the load_idx selection in select_task_rq_fair
sched: Optimize cgroup vs wakeup a bit
sched: x86: Name old_perf in a unique way
sched: Implement a gentler fair-sleepers feature
sched: Add SD_PREFER_LOCAL
sched: Add a few SYNC hint knobs to play with
sched: Fix sync wakeups again
sched: Add WF_FORK
sched: Rename sync arguments
sched: Rename select_task_rq() argument
sched: Feature to disable APERF/MPERF cpu_power
x86: sched: Provide arch implementations using aperf/mperf
x86: Add generic aperf/mperf code
x86: Move APERF/MPERF into a X86_FEATURE
...
Fix up trivial conflict in arch/x86/include/asm/processor.h due to
nearby addition of amd_get_nb_id() declaration from the EDAC merge.
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (44 commits)
vsnprintf: remove duplicate comment of vsnprintf
softirq: add BLOCK_IOPOLL to softirq_to_name
oprofile: fix oprofile regression: select RING_BUFFER_ALLOW_SWAP
tracing: switch function prints from %pf to %ps
vsprintf: add %ps that is the same as %pS but is like %pf
tracing: Fix minor bugs for __unregister_ftrace_function_probe
tracing: remove notrace from __kprobes annotation
tracing: optimize global_trace_clock cachelines
MAINTAINERS: Update tracing tree details
ftrace: document function and function graph implementation
tracing: make testing syscall events a separate configuration
tracing: remove some unused macros
ftrace: add compile-time check on F_printk()
tracing: fix F_printk() typos
tracing: have TRACE_EVENT macro use __flags to not shadow parameter
tracing: add static to generated TRACE_EVENT functions
ring-buffer: typecast cmpxchg to fix PowerPC warning
tracing: add filter event logic to special, mmiotrace and boot tracers
tracing: remove trace_event_types.h
tracing: use the new trace_entries.h to create format files
...
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata: Add pata_atp867x driver for Artop/Acard ATP867X controllers
pata_amd: do not filter out valid modes in nv_mode_filter
sata_promise: update reset code
sata_promise: disable hotplug on 1st gen chips
libata: fix spurious WARN_ON_ONCE() on port freeze
ahci: restore pci_intx() handling
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (66 commits)
be2net: fix some cmds to use mccq instead of mbox
atl1e: fix 2.6.31-git4 -- ATL1E 0000:03:00.0: DMA-API: device driver frees DMA
pkt_sched: Fix qstats.qlen updating in dump_stats
ipv6: Log the affected address when DAD failure occurs
wl12xx: Fix print_mac() conversion.
af_iucv: fix race when queueing skbs on the backlog queue
af_iucv: do not call iucv_sock_kill() twice
af_iucv: handle non-accepted sockets after resuming from suspend
af_iucv: fix race in __iucv_sock_wait()
iucv: use correct output register in iucv_query_maxconn()
iucv: fix iucv_buffer_cpumask check when calling IUCV functions
iucv: suspend/resume error msg for left over pathes
wl12xx: switch to %pM to print the mac address
b44: the poll handler b44_poll must not enable IRQ unconditionally
ipv6: Ignore route option with ROUTER_PREF_INVALID
bonding: make ab_arp select active slaves as other modes
cfg80211: fix SME connect
rc80211_minstrel: fix contention window calculation
ssb/sdio: fix printk format warnings
p54usb: add Zcomax XG-705A usbid
...
The earlier approach required two scheduling-clock ticks to note an
preemptable-RCU quiescent state in the situation in which the
scheduling-clock interrupt is unlucky enough to always interrupt an
RCU read-side critical section.
With this change, the quiescent state is instead noted by the
outermost rcu_read_unlock() immediately following the first
scheduling-clock tick, or, alternatively, by the first subsequent
context switch. Therefore, this change also speeds up grace
periods.
Suggested-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
LKML-Reference: <12528585111945-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yakui pointed out that we don't properly no-op the ACPI button routines
if the button driver isn't built in. This will cause problems if ACPI
is disabled, so provide stub functions in that case.
Reported-by: ykzhao <yakui.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Similar to the madvise() concept, the application may wish to mark some
data as volatile. That is in the event of memory pressure the kernel is
free to discard such buffers safe in the knowledge that the application
can recreate them on demand, and is simply using these as a cache.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This is a new pata driver for ARTOP 867X 64bit 4-channel UDMA133 ATA ctrls.
Based on the Atp867 data sheet rev 1.2, Acard, and in part on early ide codes
from Eric Uhrhane <ericu@google.com>.
Signed-off-by: John(Jung-Ik) Lee <jilee@google.com>
Reviewed-by: Grant Grundler <grundler@google.com>
Reviewed-by: Gwendal Gringo <gwendal@google.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Currently we wake the mmap() consumer once every PAGE_SIZE of data
and/or once event wakeup_events when specified.
For high speed sampling this results in too many wakeups wrt. the
buffer size, hence change this.
We move the default wakeup limit to 1/4-th the buffer size, and
provide for means to manually specify this limit.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With BLOCK_IOPOLL_SOFTIRQ added, softirq_to_name[] and
show_softirq_name() needs to be updated.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4AB20398.8070209@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Remove reference to vgaarb.c and replace it with a comment about the
arbiter itself.
Reported-by: Tiago Vignatti <tiago.vignatti@nokia.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Today's linux-next build (sparc64_defconfig) failed like this:
In file included from arch/sparc/kernel/sys_sparc32.c:32:
include/linux/nfsd/nfsd.h: In function 'nfs4_state_start':
include/linux/nfsd/nfsd.h:177: error: no return statement in function returning non-void
Caused by commit 29ab23cc5d ("nfsd4: allow
nfs4 state startup to fail"). Please, if you add code that depends on a
CONFIG option, build with that option enabled and disabled.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Instead of hand rolling our own variant.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
HID core registers input, hidraw and hiddev devices, but leaves
unregistering it up to the individual driver, which is not really nice.
Let's move all the logic to the core.
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Reported-by: Brian Rogers <brian@xyzw.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Create a new wakeup preemption mode, preempt towards tasks that run
shorter on avg. It sets next buddy to be sure we actually run the task
we preempted for.
Test results:
root@twins:~# while :; do :; done &
[1] 6537
root@twins:~# while :; do :; done &
[2] 6538
root@twins:~# while :; do :; done &
[3] 6539
root@twins:~# while :; do :; done &
[4] 6540
root@twins:/home/peter# ./latt -c4 sleep 4
Entries: 48 (clients=4)
Averages:
------------------------------
Max 4750 usec
Avg 497 usec
Stdev 737 usec
root@twins:/home/peter# echo WAKEUP_RUNNING > /debug/sched_features
root@twins:/home/peter# ./latt -c4 sleep 4
Entries: 48 (clients=4)
Averages:
------------------------------
Max 14 usec
Avg 5 usec
Stdev 3 usec
Disabled by default - needs more testing.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <new-submission>
This adds support for the regulators found in the AB3100
Mixed-Signal IC.
It further also defines platform data for the ST-Ericsson
U300 platform and extends the AB3100 MFD driver so that
platform/board data with regulation constraints and an init
function can be passed down all the way from the board to
the regulators.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The TWL4030/5030 family of multifunction devices allows board-specific
control of the the various regulators, clock and reset lines through
'scripts' that are loaded into its memory. This allows for Dynamic Power
Switching (DPS).
Implement board-independent core support for DPS that is then used by
board-specific code to load custom DPS scripts.
Signed-off-by: Amit Kucheria <amit.kucheria@verdurent.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This driver provides the core Freescale MC13783 support. It
registers the client platform_devices and provides access
to the A/D converter.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This adds the _interruptible suffix to the AB3100 accessor
functions on par with mutex_lock_interruptible() that's used
for blocking simultaneous calls to the AB3100 acessor functions.
Since these accesses are slow on a 100kHz I2C bus and may line
up waiting for the mutex, we need to handle interruption by
system shutdown or kill signals and may just as well denote that
in the function names.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The WM831x series of devices provide three types of LDO:
- General purpose LDOs supporting voltages from 0.9-3.3V
- High performance analogue LDOs supporting voltages from 1-3.5V
- Very low power consumption LDOs intended to support always on
functionality.
This patch adds support for all three kinds of LDO. Each regulator
is probed as an individual platform device with resources used to
provide the register map location of the regulator. Mixed hardware
and software control of regulators is not current supported.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Liam Girdwood <lrg@slimlogic.co.uk>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The WM831x series of devices all have 3 DC-DC buck convertors. This
driver implements software control for these regulators via the
regulator API. Use with split hardware/software control of individual
regulators is not supported, though regulators not controlled by
software may be controlled via the hardware control interfaces.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Liam Girdwood <lrg@slimlogic.co.uk>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This is useful for implementing get_status() in terms of get_mode().
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The WM831x series of PMICs support control of initial power on
through the ON pin on the device with soft control of the pin
at other times. Represent this to userspace as KEY_POWER.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Add support for the GPIO pins on the WM831x. No direct support is
currently supplied for configuring non-gpiolib functionality such
as pull configuration and alternate functions, soft configuration
of these will be provided in a future patch.
Currently use of these pins as interrupts is not supported due to
the ongoing issues with generic irq not support interrupt controllers
on interrupt driven buses. Users can directly request the interrupts
with the wm831x-specific APIs currently provided if required.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The current settings which can be used with the WM831x current sinks
can't easily be mapped between register values and currents at run
time without a lookup table since the values scale logarithmically
to match the way the human eye interprets brightness. This lookup
table is inclided in the core since several drivers need to use it.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The WM831x series of devices use OTP (One Time Programmable, a type
of PROM) to store system configuration. At run time this data is
visible via registers.
Currently the only explicitly supported feature is that the unique
ID provided by every WM831x device is exported to user space via
sysfs. Other configuration data may be read by system-specific
code in the pre_init() and post_init() platform data operations.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The WM831x backlight driver requires at least the specification of the
current sink to use and a maximum current to allow them to function and
will actively interfere with other users of the regulators it uses if
misconfigured so only register the subdevice for it if this platform
data has been supplied.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The WM831x contains an auxiliary ADC with a number of switchable
inputs which is used to monitor some of the voltages and
temperatures in the system and has some external inputs which can be
used for machine specific purposes. Provide an API allowing drivers
to read values from the ADC.
An internal reference voltage is provided to allow callibration of
the ADC. This is used to calibrate the device at startup.
The hardware also supports continuous readings and digital comparators.
These are not yet supported by the driver.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The WM831x includes an interrupt controller managing interrupts for
the various functions on the chip. This patch adds support for the
core interrupt block on the device.
Ideally this would be supported by genirq, particularly for the
GPIOs, but currently genirq is unable to cope with controllers on
interrupt driven buses so we cut'n'paste the generic interface.
Once genirq is able to cope chips like this it should be a case
of filing the prefixes off the code and redoing wm831x-irq.c to
move over.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The WM831x series of devices are register compatible processor power
management subsystems, providing regulator and power path management
facilities along with other services like watchdog, RTC and touch
panel controllers.
This patch adds very basic support, providing basic single register
I2C access, handling of the security key and registration of the
devices.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Provide basic support for MFDs having multiple cells of a given
type with different IDs by adding an id to the mfd_cell structure
and then adding that to the id passed in to mfd_add_devices().
As it stands this approach requires that MFDs using this feature
deal with ensuring that there aren't any ID collisions resulting
from multiple MFDs of the same type being instantiated. This needs
to happen with the existing code too, but with this approach there
is a knock on effect on the IDs for non-duplicated devices.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Vibrator will be accessed via the pcap-regulator driver, no need to expose its
bits in the header file.
Signed-off-by: Daniel Ribeiro <drwyrm@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Using the default kernel "events" workqueue causes problems with
synchronous adc readings if initiated from some task on the same
workqueue.
I had a deadlock trying to use pcf50633_adc_sync_read from a
power_supply class driver because the reading was initiated from the
workqueue and it waited for the irq processing to complete (to get the
result) and that was put on the same workqueue.
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This driver provides reporting of the status supply voltage rails
of the WM835x series of PMICs via the hwmon API.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Provides an atomic set_bits functions, as needed by the pcap-regulator
driver.
Signed-off-by: Daniel Ribeiro <drwyrm@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Some TS controller bits are on the same register as the ADC control, save
TS specific bits and export a set_ts_bits function so the TS driver can set
it with the adc_mutex lock held.
Signed-off-by: Daniel Ribeiro <drwyrm@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Export an irq_to_pcap function to get pcap irq number, for the keypad driver.
Signed-off-by: Daniel Ribeiro <drwyrm@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
It does not make sense to store block number for journal as unsigned long
since they can be only 32-bit (because of on-disk format limitation). So
change in-memory structures and variables to use unsigned int instead.
Signed-off-by: Jan Kara <jack@suse.cz>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev
debugfs: Modify default debugfs directory for debugging pktcdvd.
debugfs: Modified default dir of debugfs for debugging UHCI.
debugfs: Change debugfs directory of IWMC3200
debugfs: Change debuhgfs directory of trace-events-sample.h
debugfs: Fix mount directory of debugfs by default in events.txt
hpilo: add poll f_op
hpilo: add interrupt handler
hpilo: staging for interrupt handling
driver core: platform_device_add_data(): use kmemdup()
Driver core: Add support for compatibility classes
uio: add generic driver for PCI 2.3 devices
driver-core: move dma-coherent.c from kernel to driver/base
mem_class: fix bug
mem_class: use minor as index instead of searching the array
driver model: constify attribute groups
UIO: remove 'default n' from Kconfig
Driver core: Add accessor for device platform data
Driver core: move dev_get/set_drvdata to drivers/base/dd.c
Driver core: add new device to bus's list before probing
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pcmcia-2.6:
pcmcia: document return value of pcmcia_loop_config
pcmcia: dtl1_cs: fix pcmcia_loop_config logic
pcmcia: drop non-existant includes
pcmcia: disable prefetch/burst for OZ6933
pcmcia: fix incorrect argument order to list_add_tail()
pcmcia: drivers/pcmcia/pcmcia_resource.c: Remove unnecessary semicolons
pcmcia: Use phys_addr_t for physical addresses
pcmcia: drivers/pcmcia: Make static
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (75 commits)
PCI hotplug: clean up acpi_run_hpp()
PCI hotplug: acpiphp: use generic pci_configure_slot()
PCI hotplug: shpchp: use generic pci_configure_slot()
PCI hotplug: pciehp: use generic pci_configure_slot()
PCI hotplug: add pci_configure_slot()
PCI hotplug: clean up acpi_get_hp_params_from_firmware() interface
PCI hotplug: acpiphp: don't cache hotplug_params in acpiphp_bridge
PCI hotplug: acpiphp: remove superfluous _HPP/_HPX evaluation
PCI: Clear saved_state after the state has been restored
PCI PM: Return error codes from pci_pm_resume()
PCI: use dev_printk in quirk messages
PCI / PCIe portdrv: Fix pcie_portdrv_slot_reset()
PCI Hotplug: convert acpi_pci_detect_ejectable() to take an acpi_handle
PCI Hotplug: acpiphp: find bridges the easy way
PCI: pcie portdrv: remove unused variable
PCI / ACPI PM: Propagate wake-up enable for devices w/o ACPI support
ACPI PM: Replace wakeup.prepared with reference counter
PCI PM: Introduce device flag wakeup_prepared
PCI / ACPI PM: Rework some debug messages
PCI PM: Simplify PCI wake-up code
...
Fixed up conflict in arch/powerpc/kernel/pci_64.c due to OF device tree
scanning having been moved and merged for the 32- and 64-bit cases. The
'needs_freset' initialization added in 6e19314cc ("PCI/powerpc: support
PCIe fundamental reset") is now in arch/powerpc/kernel/pci_of_scan.c.
Sysbench thinks SD_BALANCE_WAKE is too agressive and kbuild doesn't
really mind too much, SD_BALANCE_NEWIDLE picks up most of the
slack.
On a dual socket, quad core, dual thread nehalem system:
sysbench (--num_threads=16):
SD_BALANCE_WAKE-: 13982 tx/s
SD_BALANCE_WAKE+: 15688 tx/s
kbuild (-j16):
SD_BALANCE_WAKE-: 47.648295846 seconds time elapsed ( +- 0.312% )
SD_BALANCE_WAKE+: 47.608607360 seconds time elapsed ( +- 0.026% )
(same within noise)
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
bdi_start_writeback() is currently split into two paths, one for
WB_SYNC_NONE and one for WB_SYNC_ALL. Add bdi_sync_writeback()
for WB_SYNC_ALL writeback and let bdi_start_writeback() handle
only WB_SYNC_NONE.
Push down the writeback_control allocation and only accept the
parameters that make sense for each function. This cleans up
the API considerably.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Now that bdi_writeback_all() no longer handles integrity writeback,
it doesn't have to block anymore. This means that we can switch
bdi_list reader side protection to RCU.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
We do this automatically in get_sb_bdev() from the set_bdev_super()
callback. Filesystems that have their own private backing_dev_info
must assign that in ->fill_super().
Note that ->s_bdi assignment is required for proper writeback!
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It has been unused since it was introduced in:
commit 520808bf20e90fdbdb320264ba7dd5cf9d47dcac
Author: Andrew Morton <akpm@osdl.org>
Date: Fri May 21 00:46:17 2004 -0700
[PATCH] block device layer: separate backing_dev_info infrastructure
So lets just kill it.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Impact: optional, useful for debugging
Add a new madvice sub command to inject poison for some
pages in a process' address space. This is useful for
testing the poison page handling.
This patch can allow root to tie up large amounts of memory.
I got feedback from container developers and they didn't see any
problem.
v2: Use write flag for get_user_pages to make sure to always get
a fresh page
v3: Don't request write mapping (Fengguang Wu)
v4: Move MADV_* number to avoid conflict with KSM (Hugh Dickins)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Add the high level memory handler that poisons pages
that got corrupted by hardware (typically by a two bit flip in a DIMM
or a cache) on the Linux level. The goal is to prevent everyone
from accessing these pages in the future.
This done at the VM level by marking a page hwpoisoned
and doing the appropriate action based on the type of page
it is.
The code that does this is portable and lives in mm/memory-failure.c
To quote the overview comment:
High level machine check handler. Handles pages reported by the
hardware as being corrupted usually due to a 2bit ECC memory or cache
failure.
This focuses on pages detected as corrupted in the background.
When the current CPU tries to consume corruption the currently
running process can just be killed directly instead. This implies
that if the error cannot be handled for some reason it's safe to
just ignore it because no corruption has been consumed yet. Instead
when that happens another machine check will happen.
Handles page cache pages in various states. The tricky part
here is that we can access any page asynchronous to other VM
users, because memory failures could happen anytime and anywhere,
possibly violating some of their assumptions. This is why this code
has to be extremely careful. Generally it tries to use normal locking
rules, as in get the standard locks, even if that means the
error handling takes potentially a long time.
Some of the operations here are somewhat inefficient and have non
linear algorithmic complexity, because the data structures have not
been optimized for this case. This is in particular the case
for the mapping from a vma to a process. Since this case is expected
to be rare we hope we can get away with this.
There are in principle two strategies to kill processes on poison:
- just unmap the data and wait for an actual reference before
killing
- kill as soon as corruption is detected.
Both have advantages and disadvantages and should be used
in different situations. Right now both are implemented and can
be switched with a new sysctl vm.memory_failure_early_kill
The default is early kill.
The patch does some rmap data structure walking on its own to collect
processes to kill. This is unusual because normally all rmap data structure
knowledge is in rmap.c only. I put it here for now to keep
everything together and rmap knowledge has been seeping out anyways
Includes contributions from Johannes Weiner, Chris Mason, Fengguang Wu,
Nick Piggin (who did a lot of great work) and others.
Cc: npiggin@suse.de
Cc: riel@redhat.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
This allows processes to override their early/late kill
behaviour on hardware memory errors.
Typically applications which are memory error aware is
better of with early kill (see the error as soon
as possible), all others with late kill (only
see the error when the error is really impacting execution)
There's a global sysctl, but this way an application
can set its specific policy.
We're using two bits, one to signify that the process
stated its intention and that
I also made the prctl future proof by enforcing
the unused arguments are 0.
The state is inherited to children.
Note this makes us officially run out of process flags
on 32bit, but the next patch can easily add another field.
Manpage patch will be supplied separately.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Truncating metadata pages is not safe right now before
we haven't audited all file systems.
To enable truncation only for data address space define
a new address_space callback error_remove_page.
This is used for memory_failure.c memory error handling.
This can be then set to truncate_inode_page()
This patch just defines the new operation and adds documentation.
Callers and users come in followon patches.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Add a simple way to invalidate a single page
This is just a refactoring of the truncate.c code.
Originally from Fengguang, modified by Andi Kleen.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Extract out truncate_inode_page() out of the truncate path so that
it can be used by memory-failure.c
[AK: description, headers, fix typos]
v2: Some white space changes from Fengguang Wu
Signed-off-by: Andi Kleen <ak@linux.intel.com>
When a page has the poison bit set replace the PTE with a poison entry.
This causes the right error handling to be done later when a process runs
into it.
v2: add a new flag to not do that (needed for the memory-failure handler
later) (Fengguang)
v3: remove unnecessary is_migration_entry() test (Fengguang, Minchan)
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
try_to_unmap currently has multiple modi (migration, munlock, normal unmap)
which are selected by magic flag variables. The logic is not very straight
forward, because each of these flag change multiple behaviours (e.g.
migration turns off aging, not only sets up migration ptes etc.)
Also the different flags interact in magic ways.
A later patch in this series adds another mode to try_to_unmap, so
this becomes quickly unmanageable.
Replace the different flags with a action code (migration, munlock, munmap)
and some additional flags as modifiers (ignore mlock, ignore aging).
This makes the logic more straight forward and allows easier extension
to new behaviours. Change all the caller to declare what they want to
do.
This patch is supposed to be a nop in behaviour. If anyone can prove
it is not that would be a bug.
Cc: Lee.Schermerhorn@hp.com
Cc: npiggin@suse.de
Signed-off-by: Andi Kleen <ak@linux.intel.com>
- Add a new VM_FAULT_HWPOISON error code to handle_mm_fault. Right now
architectures have to explicitely enable poison page support, so
this is forward compatible to all architectures. They only need
to add it when they enable poison page support.
- Add poison page handling in swap in fault code
v2: Add missing delayacct_clear_flag (Hidehiro Kawai)
v3: Really use delayacct_clear_flag (Hidehiro Kawai)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Add new SIGBUS codes for reporting machine checks as signals. When
the hardware detects an uncorrected ECC error it can trigger these
signals.
This is needed for telling KVM's qemu about machine checks that happen to
guests, so that it can inject them, but might be also useful for other programs.
I find it useful in my test programs.
This patch merely defines the new types.
- Define two new si_codes for SIGBUS. BUS_MCEERR_AO and BUS_MCEERR_AR
* BUS_MCEERR_AO is for "Action Optional" machine checks, which means that some
corruption has been detected in the background, but nothing has been consumed
so far. The program can ignore those if it wants (but most programs would
already get killed)
* BUS_MCEERR_AR is for "Action Required" machine checks. This happens
when corrupted data is consumed or the application ran into an area
which has been known to be corrupted earlier. These require immediate
action and cannot just returned to. Most programs would kill themselves.
- They report the address of the corruption in the user address space
in si_addr.
- Define a new si_addr_lsb field that reports the extent of the corruption
to user space. That's currently always a (small) page. The user application
cannot tell where in this page the corruption happened.
AK: I plan to write a man page update before anyone asks.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Memory migration uses special swap entry types to trigger special actions on
page faults. Extend this mechanism to also support poisoned swap entries, to
trigger poison handling on page faults. This allows follow-on patches to
prevent processes from faulting in poisoned pages again.
v2: Fix overflow in MAX_SWAPFILES (Fengguang Wu)
v3: Better overflow fix (Hidehiro Kawai)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Needed for later patch that walks rmap entries on its own.
This used to be very frowned upon, but memory-failure.c does
some rather specialized rmap walking and rmap has been stable
for quite some time, so I think it's ok now to export it.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Hardware poisoned pages need special handling in the VM and shouldn't be
touched again. This requires a new page flag. Define it here.
The page flags wars seem to be over, so it shouldn't be a problem
to get a new one.
v2: Add TestSetHWPoison (suggested by Johannes Weiner)
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Use uX rather than uintX_t types for consistency.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And turn it on for NUMA and MC domains. This improves
locality in balancing decisions by keeping up to
capacity amount of tasks local before looking for idle
CPUs. (and twice the capacity if SD_POWERSAVINGS_BALANCE
is set.)
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When ftrace had issues with NMIs, it was needed to annotate all
the areas that kprobes had issues with notrace. Now that ftrace is
NMI safe, the functions that limit ftrace from tracing are just a
small few.
Kprobes is too big of a set for ftrace not to trace. Remove the
coupling.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Follows the model used by the NFS client. Setup the RPC prepare and done
function pointers so that we can populate the sequence information if
minorversion == 1. rpc_run_task() is then invoked directly just like
existing NFS client operations do.
nfsd4_cb_prepare() determines if the sequence information needs to be setup.
If the slot is in use, it adds itself to the wait queue.
nfsd4_cb_done() wakes anyone sleeping on the callback channel wait queue
after our RPC reply has been received. It also sets the task message
result pointer to NULL to clearly indicate we're done using it.
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[define and initialize cl_cb_seq_nr here]
[pulled out unused defintion of nfsd4_cb_done]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
RPC callback requests will wait on this wait queue if the backchannel
is out of slots.
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Follow the model we use in the client. Make the sequence arguments
part of the regular RPC arguments. None of the callbacks that are
soon to be implemented expect results that need to be passed back
to the caller, so we don't define a separate RPC results structure.
For session validation, the cb_sequence decoding will use a pointer
to the sequence arguments that are part of the RPC argument.
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[define struct nfsd4_cb_sequence here]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Keep the xprt used for create_session in cl_cb_xprt.
Mark cl_callback.cb_minorversion = 1 and remember
the client provided cl_callback.cb_prog rpc program number.
Use it to probe the callback path.
Use the client's network address to initialize as the
callback's address as expected by the xprt creation
routines.
Define xdr sizes and code nfs4_cb_compound header to be able
to send a null callback rpc.
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[get callback minorversion from fore channel's]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: change bc_sock to bc_xprt]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pulled definition for cl_cb_xprt]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: set up backchannel's cb_addr]
[moved rpc_create_args init to "nfsd: modify nfsd4.1 backchannel to use new xprt class"]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Callbacks are always made using the machine's identity, so we can use a
single auth_generic credential shared among callbacks to all clients and
let the rpc code take care of the rest.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Otherwise, the upcall is going to be synchronous, which may not be what the
caller wants...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Physical addresses are currently represented as int or long types.
However, this does not work for processors like the PPC440EPx, which
is a 32-bit processor with a 36-bit address space. This patch uses
the phys_addr_t type, which correctly holds a 36-bit address on
this processor.
Signed-off-by: Steven A. Falco <sfalco@harris.com>
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-next-2.6:
ide: fixup for fujitsu disk
ide: convert to ->proc_fops
at91_ide: remove headers specific for at91sam9263
IDE: palm_bk3710: convert clock usage after clkdev conversion
ide: fix races in handling of user-space SET XFER commands
ide: allow ide_dev_read_id() to be called from the IRQ context
ide: ide-taskfile.c fix style problems
drivers/ide/ide-cd.c: Use DIV_ROUND_CLOSEST
ide-tape: fix handling of postponed rqs
ide-tape: convert to ide_debug_log macro
ide-tape: fix debug call
ide: Fix annoying warning in ide_pio_bytes().
IDE: Save a call to PageHighMem()
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (134 commits)
powerpc/nvram: Enable use Generic NVRAM driver for different size chips
powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdline
powerpc/ps3: Workaround for flash memory I/O error
powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead
powerpc/perf_counters: Reduce stack usage of power_check_constraints
powerpc: Fix bug where perf_counters breaks oprofile
powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops
powerpc/irq: Improve nanodoc
powerpc: Fix some late PowerMac G5 with PCIe ATI graphics
powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT
powerpc/book3e: Add missing page sizes
powerpc/pseries: Fix to handle slb resize across migration
powerpc/powermac: Thermal control turns system off too eagerly
powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan()
powerpc/405ex: support cuImage via included dtb
powerpc/405ex: provide necessary fixup function to support cuImage
powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBC
powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support.
powerpc/44x: Update Arches defconfig
powerpc/44x: Update Arches dts
...
Fix up conflicts in drivers/char/agp/uninorth-agp.c
Devtmpfs lets the kernel create a tmpfs instance called devtmpfs
very early at kernel initialization, before any driver-core device
is registered. Every device with a major/minor will provide a
device node in devtmpfs.
Devtmpfs can be changed and altered by userspace at any time,
and in any way needed - just like today's udev-mounted tmpfs.
Unmodified udev versions will run just fine on top of it, and will
recognize an already existing kernel-created device node and use it.
The default node permissions are root:root 0600. Proper permissions
and user/group ownership, meaningful symlinks, all other policy still
needs to be applied by userspace.
If a node is created by devtmps, devtmpfs will remove the device node
when the device goes away. If the device node was created by
userspace, or the devtmpfs created node was replaced by userspace, it
will no longer be removed by devtmpfs.
If it is requested to auto-mount it, it makes init=/bin/sh work
without any further userspace support. /dev will be fully populated
and dynamic, and always reflect the current device state of the kernel.
With the commonly used dynamic device numbers, it solves the problem
where static devices nodes may point to the wrong devices.
It is intended to make the initial bootup logic simpler and more robust,
by de-coupling the creation of the inital environment, to reliably run
userspace processes, from a complex userspace bootstrap logic to provide
a working /dev.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Tested-By: Harald Hoyer <harald@redhat.com>
Tested-By: Scott James Remnant <scott@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When turning class devices into bus devices, we may need to
temporarily add links in sysfs so that user-space applications
are not confused. This is done by adding the following API:
* Functions to register and unregister compatibility classes.
These appear in sysfs at the same location as regular classes, but
instead of class devices, they contain links to bus devices.
* Functions to create and delete such links. Additionally, the caller
can optionally pass a target device to which a "device" link should
point (typically that would be the device's parent), to fully emulate
the original class device.
The i2c subsystem will be the first user of this API, as i2c adapters
are being converted from class devices to bus devices.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
This adds a generic uio driver that can bind to any PCI device. First
user will be virtualization where a qemu userspace process needs to give
guest OS access to the device.
Interrupts are handled using the Interrupt Disable bit in the PCI
command register and Interrupt Status bit in the PCI status register.
All devices compliant to PCI 2.3 (circa 2002) and all compliant PCI
Express devices should support these bits. Driver detects this support,
and won't bind to devices which do not support the Interrupt Disable Bit
in the command register.
It's expected that more features of interest to virtualization will be
added to this driver in the future. Possibilities are: mmap for device
resources, MSI/MSI-X, eventfd (to interface with kvm), iommu.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Let attribute group vectors be declared "const". We'd
like to let most attribute metadata live in read-only
sections... this is a start.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For consistency with driver data provide a dev_get_platdata() accessor
for reading the platform data from a device.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
No one should directly access the driver_data field, so remove the field
and make it private. We dynamically create the private field now if it
is needed, to handle drivers that call get/set before they are
registered with the driver core.
Also update the copyright notices on these files while we are there.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
powerpc64: convert to dynamic percpu allocator
sparc64: use embedding percpu first chunk allocator
percpu: kill lpage first chunk allocator
x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
percpu: update embedding first chunk allocator to handle sparse units
percpu: use group information to allocate vmap areas sparsely
vmalloc: implement pcpu_get_vm_areas()
vmalloc: separate out insert_vmalloc_vm()
percpu: add chunk->base_addr
percpu: add pcpu_unit_offsets[]
percpu: introduce pcpu_alloc_info and pcpu_group_info
percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
percpu: add @align to pcpu_fc_alloc_fn_t
percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
percpu: drop @static_size from first chunk allocators
percpu: generalize first chunk allocator selection
percpu: build first chunk allocators selectively
percpu: rename 4k first chunk allocator to page
percpu: improve boot messages
percpu: fix pcpu_reclaim() locking
...
Fix trivial conflict as by Tejun Heo in kernel/sched.c
Due to problems at cam.org, my nico@cam.org email address is no longer
valid. FRom now on, nico@fluxnic.net should be used instead.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, pat: Fix cacheflush address in change_page_attr_set_clr()
mm: remove !NUMA condition from PAGEFLAGS_EXTENDED condition set
x86: Fix earlyprintk=dbgp for machines without NX
x86, pat: Sanity check remap_pfn_range for RAM region
x86, pat: Lookup the protection from memtype list on vm_insert_pfn()
x86, pat: Add lookup_memtype to get the current memtype of a paddr
x86, pat: Use page flags to track memtypes of RAM pages
x86, pat: Generalize the use of page flag PG_uncached
x86, pat: Add rbtree to do quick lookup in memtype tracking
x86, pat: Add PAT reserve free to io_mapping* APIs
x86, pat: New i/f for driver to request memtype for IO regions
x86, pat: ioremap to follow same PAT restrictions as other PAT users
x86, pat: Keep identity maps consistent with mmaps even when pat_disabled
x86, mtrr: make mtrr_aps_delayed_init static bool
x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled
x86: Fix an incorrect argument of reserve_bootmem()
x86: Fix system crash when loading with "reservetop" parameter
* 'x86-txt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, intel_txt: clean up the impact on generic code, unbreak non-x86
x86, intel_txt: Handle ACPI_SLEEP without X86_TRAMPOLINE
x86, intel_txt: Fix typos in Kconfig help
x86, intel_txt: Factor out the code for S3 setup
x86, intel_txt: tboot.c needs <asm/fixmap.h>
intel_txt: Force IOMMU on for Intel TXT launch
x86, intel_txt: Intel TXT Sx shutdown support
x86, intel_txt: Intel TXT reboot/halt shutdown support
x86, intel_txt: Intel TXT boot support
* 'agp-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
agp/intel: remove restore in resume
agp: fix uninorth build
intel-agp: Set dma mask for i915
agp: kill phys_to_gart() and gart_to_phys()
intel-agp: fix sglist allocation to avoid vmalloc()
intel-agp: Move repeated sglist free into separate function
agp: Switch agp_{un,}map_page() to take struct page * argument
agp: tidy up handling of scratch pages w.r.t. DMA API
intel_agp: Use PCI DMA API correctly on chipsets new enough to have IOMMU
agp: Add generic support for graphics dma remapping
agp: Switch mask_memory() method to take address argument again, not page
Avoid the cache buddies from biasing the time distribution away
from fork()ers. Normally the next buddy will be the preferred
scheduling target, but this makes fork()s prefer to run the new
child, whereas we prefer to run the parent, since that will
generate more work.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In order to extend the functions to have more than 1 flag (sync),
rename the argument to flags, and explicitly define a WF_ space for
individual flags.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In order to be able to rename the sync argument, we need to rename
the current flag argument.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
APERF/MPERF support for cpu_power.
APERF/MPERF is arch defined to be a relative scale of work capacity
per logical cpu, this is assumed to include SMT and Turbo mode.
APERF/MPERF are specified to both reset to 0 when either counter
wraps, which is highly inconvenient, since that'll give a blimp
when that happens. The manual specifies writing 0 to the counters
after each read, but that's 1) too expensive, and 2) destroys the
possibility of sharing these counters with other users, so we live
with the blimp - the other existing user does too.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If we're looking to place a new task, we might as well find the
idlest position _now_, not 1 tick ago.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make the idle balancer more agressive, to improve a
x264 encoding workload provided by Jason Garrett-Glaser:
NEXT_BUDDY NO_LB_BIAS
encoded 600 frames, 252.82 fps, 22096.60 kb/s
encoded 600 frames, 250.69 fps, 22096.60 kb/s
encoded 600 frames, 245.76 fps, 22096.60 kb/s
NO_NEXT_BUDDY LB_BIAS
encoded 600 frames, 344.44 fps, 22096.60 kb/s
encoded 600 frames, 346.66 fps, 22096.60 kb/s
encoded 600 frames, 352.59 fps, 22096.60 kb/s
NO_NEXT_BUDDY NO_LB_BIAS
encoded 600 frames, 425.75 fps, 22096.60 kb/s
encoded 600 frames, 425.45 fps, 22096.60 kb/s
encoded 600 frames, 422.49 fps, 22096.60 kb/s
Peter pointed out that this is better done via newidle_idx,
not via LB_BIAS, newidle balancing should look for where
there is load _now_, not where there was load 2 ticks ago.
Worst-case latencies are improved as well as no buddies
means less vruntime spread. (as per prior lkml discussions)
This change improves kbuild-peak parallelism as well.
Reported-by: Jason Garrett-Glaser <darkshikari@gmail.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1253011667.9128.16.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
CPU level should have WAKE_AFFINE, whereas ALLNODES is dubious.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When merging select_task_rq_fair() and sched_balance_self() we lost
the use of wake_idx, restore that and set them to 0 to make wake
balancing more aggressive.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The problem with wake_idle() is that is doesn't respect things like
cpu_power, which means it doesn't deal well with SMT nor the recent
RT interaction.
To cure this, it needs to do what sched_balance_self() does, which
leads to the possibility of merging select_task_rq_fair() and
sched_balance_self().
Modify sched_balance_self() to:
- update_shares() when walking up the domain tree,
(it only called it for the top domain, but it should
have done this anyway), which allows us to remove
this ugly bit from try_to_wake_up().
- do wake_affine() on the smallest domain that contains
both this (the waking) and the prev (the wakee) cpu for
WAKE invocations.
Then use the top-down balance steps it had to replace wake_idle().
This leads to the dissapearance of SD_WAKE_BALANCE and
SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE.
SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective.
Touch all topology bits to replace the old with new SD flags --
platforms might need re-tuning, enabling SD_BALANCE_WAKE
conditionally on a NUMA distance seems like a good additional
feature, magny-core and small nehalem systems would want this
enabled, systems with slow interconnects would not.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We're going to want to drop rq->lock in try_to_wake_up() for a
longer period of time, however we also want to deal with concurrent
waking of the same task, which is currently handled by holding
rq->lock.
So introduce a new TASK state, namely TASK_WAKING, which indicates
someone is already waking the task (other wakers will fail p->state
& state).
We also keep preemption disabled over the whole ttwu().
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rather ugly patch to fully place the sched_balance_self() code
inside the fair class.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After the recent mq change there is the new select_queue qdisc class
method used in tc_modify_qdisc, but it works OK only for direct child
qdiscs of mq qdisc. Grandchildren always get the first tx queue, which
would give wrong qdisc_root etc. results (e.g. for sch_htb as child of
sch_prio). This patch fixes it by using parent's dev_queue for such
grandchildren qdiscs. The select_queue method's return type is changed
BTW.
With feedback from: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Parse RxRPC security index 5 type keys (Kerberos 5 tokens).
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow add_key() and KEYCTL_INSTANTIATE to accept key payloads in XDR form as
described by openafs-1.4.10/src/auth/afs_token.xg. This provides a way of
passing kaserver, Kerberos 4, Kerberos 5 and GSSAPI keys from userspace, and
allows for future expansion.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Declare the security index constants symbolically rather than just referring
to them numerically.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct socket has a 16 bit hole that triggers kmemcheck warnings.
As suggested by Ingo, use kmemcheck annotations
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes commit e36b9d16c6. The approach
there is to call dev_close()/dev_open() whenever the device type is changed in
order to remap the device IP multicast addresses to HW multicast addresses.
This approach suffers from 2 drawbacks:
*. It assumes tha the device is UP when calling dev_close(), or otherwise
dev_close() has no affect. It is worth to mention that initscripts (Redhat)
and sysconfig (Suse) doesn't act the same in this matter.
*. dev_close() has other side affects, like deleting entries from the routing
table, which might be unnecessary.
The fix here is to directly remap the IP multicast addresses to HW multicast
addresses for a bonding device that changes its type, and nothing else.
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was once upon time so that snd_sthresh was a 16-bit quantity.
...That has not been true for long period of time. I run across
some ancient compares which still seem to trust such legacy.
Put all that magic into a single place, I hopefully found all
of them.
Compile tested, though linking of allyesconfig is ridiculous
nowadays it seems.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
set_normalized_timespec() nsec argument is of type long. The recent
timekeeping changes of ktime_get_ts() feed
ts->tv_nsec + tomono.tv_nsec + nsecs
to set_normalized_timespec(). On 32 bit machines that sum can be
larger than (1 << 31) and therefor result in a negative value which
screws up the result completely.
Make the nsec argument of set_normalized_timespec() s64 to fix the
problem at hand. This also prevents similar problems for future users
of set_normalized_timespec().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Carsten Emde <carsten.emde@osadl.org>
LKML-Reference: <new-submission>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
* 'for-linus3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
SELinux: inline selinux_is_enabled in !CONFIG_SECURITY_SELINUX
KEYS: Fix garbage collector
KEYS: Unlock tasklist when exiting early from keyctl_session_to_parent
CRED: Allow put_cred() to cope with a NULL groups list
SELinux: flush the avc before disabling SELinux
SELinux: seperate avc_cache flushing
Creds: creds->security can be NULL is selinux is disabled
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: (23 commits)
at_hdmac: Rework suspend_late()/resume_early()
PM: Reset transition_started at dpm_resume_noirq
PM: Update kerneldoc comments in drivers/base/power/main.c
PM: Add convenience macro to make switching to dev_pm_ops less error-prone
hp-wmi: Switch driver to dev_pm_ops
floppy: Switch driver to dev_pm_ops
PM: Trivial fixes
PM / Hibernate / Memory hotplug: Always use for_each_populated_zone()
PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)
PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
PM/Hibernate: Rework shrinking of memory
PM: Fix typo in label name s/Platofrm_finish/Platform_finish/
PM: Run-time PM platform device bus support
PM: Introduce core framework for run-time PM of I/O devices (rev. 17)
Driver Core: Make PM operations a const pointer
PM: Remove platform device suspend_late()/resume_early() V2
USB: Rework musb suspend()/resume_early()
I2C: Rework i2c-s3c2410 suspend_late()/resume() V2
I2C: Rework i2c-pxa suspend_late()/resume_early()
DMA: Rework txx9dmac suspend_late()/resume_early()
...
Fix trivial conflict in drivers/base/platform.c (due to same
constification patch being merged in both sides, along with some other
PM work in the PM branch)
Using relative pathnames in #include statements interacts badly with
SystemTap, since the fs/ext4/*.h header files are not packaged up as
part of a distribution kernel's header files. Since systemtap doesn't
use TP_fast_assign(), we can use a blind structure definition and then
make sure the needed header files are defined before the ext4 source
files #include the trace/events/ext4.h header file.
https://bugzilla.redhat.com/show_bug.cgi?id=512478
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Without this patch building a kernel emits millions of warning like:
include/linux/selinux.h:92: warning: ?selinux_is_enabled? defined but not used
When it is build without CONFIG_SECURITY_SELINUX. This is harmless, but
the function should be inlined, so it gets compiled out.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (52 commits)
Input: bcm5974 - silence uninitialized variables warnings
Input: wistron_btns - add keymap for AOpen 1557
Input: psmouse - use boolean type
Input: i8042 - use platform_driver_probe
Input: i8042 - use boolean type where it makes sense
Input: i8042 - try disabling and re-enabling AUX port at close
Input: pxa27x_keypad - allow modifying keymap from userspace
Input: sunkbd - fix formatting
Input: i8042 - bypass AUX IRQ delivery test on laptops
Input: wacom_w8001 - simplify querying logic
Input: atkbd - allow setting force-release bitmap via sysfs
Input: w90p910_keypad - move a dereference below a NULL test
Input: add twl4030_keypad driver
Input: matrix-keypad - add function to build device keymap
Input: tosakbd - fix cleaning up KEY_STROBEs after error
Input: joydev - validate axis/button maps before clobbering current ones
Input: xpad - add USB ID for the drumkit controller from Rock Band
Input: w90p910_keypad - rename driver name to match platform
Input: add new driver for Sentelic Finger Sensing Pad
Input: psmouse - allow defining read-only attributes
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: completely remove apple mightymouse from blacklist
HID: support larger reports than 64 bytes in hiddev
HID: local function should be static
HID: ignore Philips IEEE802.15.4 RF Dongle
HID: ignore all recent SoundGraph iMON devices
HID: fix memory leak on error patch in debug code
HID: fix overrun in quirks initialization
HID: Drop NULL test on list_entry result
HID: driver for Twinhan USB 6253:0100 remote control
HID: adding __init/__exit macros to module init/exit functions
HID: add rumble support for Thrustmaster Dual Trigger 3-in-1
HID: ntrig tool separation and pen usages
HID: Avoid double spin_lock_init on usbhid->lock
HID: add force feedback support for Logitech WingMan Formula Force GP
HID: Support new variants of Samsung USB IR receiver (0419:0001)
HID: fix memory leak on error path in debug code
HID: fix debugfs build with !CONFIG_DEBUG_FS
HID: use debugfs for events/reports dumping
HID: use debugfs for report dumping descriptor
* 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits)
block: use blkdev_issue_discard in blk_ioctl_discard
Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads
block: don't assume device has a request list backing in nr_requests store
block: Optimal I/O limit wrapper
cfq: choose a new next_req when a request is dispatched
Seperate read and write statistics of in_flight requests
aoe: end barrier bios with EOPNOTSUPP
block: trace bio queueing trial only when it occurs
block: enable rq CPU completion affinity by default
cfq: fix the log message after dispatched a request
block: use printk_once
cciss: memory leak in cciss_init_one()
splice: update mtime and atime on files
block: make blk_iopoll_prep_sched() follow normal 0/1 return convention
cfq-iosched: get rid of must_alloc flag
block: use interrupts disabled version of raise_softirq_irqoff()
block: fix comment in blk-iopoll.c
block: adjust default budget for blk-iopoll
block: fix long lines in block/blk-iopoll.c
block: add blk-iopoll, a NAPI like approach for block devices
...
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (209 commits)
[SCSI] fix oops during scsi scanning
[SCSI] libsrp: fix memory leak in srp_ring_free()
[SCSI] libiscsi, bnx2i: make bound ep check common
[SCSI] libiscsi: add completion function for drivers that do not need pdu processing
[SCSI] scsi_dh_rdac: changes for rdac debug logging
[SCSI] scsi_dh_rdac: changes to collect the rdac debug information during the initialization
[SCSI] scsi_dh_rdac: move the init code from rdac_activate to rdac_bus_attach
[SCSI] sg: fix oops in the error path in sg_build_indirect()
[SCSI] mptsas : Bump version to 3.04.12
[SCSI] mptsas : FW event thread and scsi mid layer deadlock in SYNCHRONIZE CACHE command
[SCSI] mptsas : Send DID_NO_CONNECT for pending IOs of removed device
[SCSI] mptsas : PAE Kernel more than 4 GB kernel panic
[SCSI] mptsas : NULL pointer on big endian systems causing Expander not to tear off
[SCSI] mptsas : Sanity check for phyinfo is added
[SCSI] scsi_dh_rdac: Add support for Sun StorageTek ST2500, ST2510 and ST2530
[SCSI] pmcraid: PMC-Sierra MaxRAID driver to support 6Gb/s SAS RAID controller
[SCSI] qla2xxx: Update version number to 8.03.01-k6.
[SCSI] qla2xxx: Properly delete rports attached to a vport.
[SCSI] qla2xxx: Correct various NPIV issues.
[SCSI] qla2xxx: Correct qla2x00_eh_wait_on_command() to wait correctly.
...
* 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (257 commits)
[ARM] Update mach-types
ARM: 5636/1: Move vendor enum to AMBA include
ARM: Fix pfn_valid() for sparse memory
[ARM] orion5x: Add LaCie NAS 2Big Network support
[ARM] pxa/sharpsl_pm: zaurus c3000 aka spitz: fix resume
ARM: 5686/1: at91: Correct AC97 reset line in at91sam9263ek board
ARM: 5640/1: This patch modifies the support of AC97 on the at91sam9263 ek board
ARM: 5689/1: Update default config of HP Jornada 700-series machines
ARM: 5691/1: fix cache aliasing issues between kmap() and kmap_atomic() with highmem
ARM: 5688/1: ks8695_serial: disable_irq() lockup
ARM: 5687/1: fix an oops with highmem
ARM: 5684/1: Add nuc960 platform to w90x900
ARM: 5683/1: Add nuc950 platform to w90x900
ARM: 5682/1: Add cpu.c and dev.c and modify some files of w90p910 platform
ARM: 5626/1: add suspend/resume functions to amba-pl011 serial driver
ARM: 5625/1: fix hard coded 4K resource size in amba bus detection
MMC: MMCI: convert realview MMC to use gpiolib
ARM: 5685/1: Make MMCI driver compile without gpiolib
ARM: implement highpte
ARM: Show FIQ in /proc/interrupts on CONFIG_FIQ
...
Fix up trivial conflict in arch/arm/kernel/signal.c.
It was due to the TIF_NOTIFY_RESUME addition in commit d0420c83f ("KEYS:
Extend TIF_NOTIFY_RESUME to (almost) all architectures") and follow-ups.
* 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (202 commits)
MAINTAINERS: update KVM entry
KVM: correct error-handling code
KVM: fix compile warnings on s390
KVM: VMX: Check cpl before emulating debug register access
KVM: fix misreporting of coalesced interrupts by kvm tracer
KVM: x86: drop duplicate kvm_flush_remote_tlb calls
KVM: VMX: call vmx_load_host_state() only if msr is cached
KVM: VMX: Conditionally reload debug register 6
KVM: Use thread debug register storage instead of kvm specific data
KVM guest: do not batch pte updates from interrupt context
KVM: Fix coalesced interrupt reporting in IOAPIC
KVM guest: fix bogus wallclock physical address calculation
KVM: VMX: Fix cr8 exiting control clobbering by EPT
KVM: Optimize kvm_mmu_unprotect_page_virt() for tdp
KVM: Document KVM_CAP_IRQCHIP
KVM: Protect update_cr8_intercept() when running without an apic
KVM: VMX: Fix EPT with WP bit change during paging
KVM: Use kvm_{read,write}_guest_virt() to read and write segment descriptors
KVM: x86 emulator: Add adc and sbb missing decoder flags
KVM: Add missing #include
...
console_print() is an old legacy interface mostly unused in the entire
kernel tree. It's best to clean up its existing use and let developers
use their own implementation of it as they feel fit.
Signed-off-by: Anirban Sinha <asinha@zeugmasystems.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the generic pci_configure_slot() rather than the acpiphp-specific
decode_hpp() and program_hpp().
Unlike the previous acpiphp-specific code, pci_configure_slot() programs
PCIe settings when an _HPX method provides them, so acpiphp-managed PCIe
devices can now be configured.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch adds a new pci_configure_slot() function that programs the
PCI bus characteristics for a newly-added device. This is based on
code in pciehp_pci.c, but should be generic enough to be used by pciehp,
shpchp, and acpiphp.
The hotplug_params struct and the program_hpp_typeX() functions are based
on the ACPI definitions, but they aren't really ACPI-specific, and there's
no alternate implementation, so I don't see the need to abstract them yet.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Reviewed-by: Alex Chiang <achiang@hp.com>
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
slub: fix slab_pad_check()
slub: release kobject if sysfs_create_group failed in sysfs_slab_add
SLUB: fix ARCH_KMALLOC_MINALIGN cases 64 and 256
SLUB: Fix some coding style issues
SLUB: Drop write permission to /proc/slabinfo
slab: remove duplicate kmem_cache_init_late() declarations
slub: change kmem_cache->align to record the real alignment
slub: use size and objsize orders to disable debug flags
slub: add option to disable higher order debugging slabs
This patch makes acpi_get_hp_params_from_firmware() take a
pci_dev rather than a pci_bus and makes it return a standard
int errno rather than acpi_status.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Remove long removed "inet_protocol_base" declaration.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since my commits introducing netns awareness into
genetlink we can get this problem:
BUG: scheduling while atomic: modprobe/1178/0x00000002
2 locks held by modprobe/1178:
#0: (genl_mutex){+.+.+.}, at: [<ffffffff8135ee1a>] genl_register_mc_grou
#1: (rcu_read_lock){.+.+..}, at: [<ffffffff8135eeb5>] genl_register_mc_g
Pid: 1178, comm: modprobe Not tainted 2.6.31-rc8-wl-34789-g95cb731-dirty #
Call Trace:
[<ffffffff8103e285>] __schedule_bug+0x85/0x90
[<ffffffff81403138>] schedule+0x108/0x588
[<ffffffff8135b131>] netlink_table_grab+0xa1/0xf0
[<ffffffff8135c3a7>] netlink_change_ngroups+0x47/0x100
[<ffffffff8135ef0f>] genl_register_mc_group+0x12f/0x290
because I overlooked that netlink_table_grab() will
schedule, thinking it was just the rwlock. However,
in the contention case, that isn't actually true.
Fix this by letting the code grab the netlink table
lock first and then the RCU for netns protection.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'osync_cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
fsync: wait for data writeout completion before calling ->fsync
vfs: Remove generic_osync_inode() and sync_page_range{_nolock}()
fat: Opencode sync_page_range_nolock()
pohmelfs: Use new syncing helper
xfs: Convert sync_page_range() to simple filemap_write_and_wait_range()
ocfs2: Update syncing after splicing to match generic version
ntfs: Use new syncing helpers and update comments
ext4: Remove syncing logic from ext4_file_write
ext3: Remove syncing logic from ext3_file_write
ext2: Update comment about generic_osync_inode
vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode
vfs: Rename generic_file_aio_write_nolock
ocfs2: Use __generic_file_aio_write instead of generic_file_aio_write_nolock
pohmelfs: Use __generic_file_aio_write instead of generic_file_aio_write_nolock
vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write()
vfs: Export __generic_file_aio_write() and add some comments
vfs: Introduce filemap_fdatawait_range
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
GFS2: Whitespace fixes
GFS2: Remove unused sysfs file
GFS2: Be extra careful about deallocating inodes
GFS2: Remove no_formal_ino generating code
GFS2: Rename eattr.[ch] as xattr.[ch]
GFS2: Clean up of extended attribute support
GFS2: Add explanation of extended attr on-disk format
GFS2: Add "-o errors=panic|withdraw" mount options
GFS2: jumping to wrong label?
GFS2: free disk inode which is deleted by remote node -V2
GFS2: Add a document explaining GFS2's uevents
GFS2: Add sysfs link to device
GFS2: Replace assertion with proper error handling
GFS2: Improve error handling in inode allocation
GFS2: Add some more info to uevents
GFS2: Add online uevent to GFS2
In a number of cases, the .suspend, .freeze, .poweroff and .resume,
.thaw, .restore functions are identical. However, they all need to be
assigned to avoid regressionsm as the previous code called .suspend
resp. .resume in all those cases. SIMPLE_DEV_PM_OPS helps to deal
with this case.
[rjw: Changed the name of the macro and added the comment explaining its
purpose.]
Signed-off-by: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits)
netxen: update copyright
netxen: fix tx timeout recovery
netxen: fix file firmware leak
netxen: improve pci memory access
netxen: change firmware write size
tg3: Fix return ring size breakage
netxen: build fix for INET=n
cdc-phonet: autoconfigure Phonet address
Phonet: back-end for autoconfigured addresses
Phonet: fix netlink address dump error handling
ipv6: Add IFA_F_DADFAILED flag
net: Add DEVTYPE support for Ethernet based devices
mv643xx_eth.c: remove unused txq_set_wrr()
ucc_geth: Fix hangs after switching from full to half duplex
ucc_geth: Rearrange some code to avoid forward declarations
phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs
drivers/net/phy: introduce missing kfree
drivers/net/wan: introduce missing kfree
net: force bridge module(s) to be GPL
Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded
...
Fixed up trivial conflicts:
- arch/x86/include/asm/socket.h
converted to <asm-generic/socket.h> in the x86 tree. The generic
header has the same new #define's, so that works out fine.
- drivers/net/tun.c
fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that
switched over to using 'tun->socket.sk' instead of the redundantly
available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks
to the TUN driver") which added a new 'tun->sk' use.
Noted in 'next' by Stephen Rothwell.
acpi_pci_detect_ejectable() goes through effort to convert its
struct pci_bus arg to an acpi_handle, but every time we use this
interface, we already have the handle available.
So let's just use the handle instead of converting back and forth.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Tested-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The generated functions of TRACE_EVENT uses "flags" in one of the
sub macros which shadows a parameter in the outside macro.
Simple fix is to make the submacro use __flags instead.
Discovered by sparse.
Reported-by: Jaswinder Singh Rajput <jaswinder@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Introduce new function for generic inode syncing (vfs_fsync_range) and use
it from fsync() path. Introduce also new helper for syncing after a sync
write (generic_write_sync) using the generic function.
Use these new helpers for syncing from generic VFS functions. This makes
O_SYNC writes to block devices acquire i_mutex for syncing. If we really
care about this, we can make block_fsync() drop the i_mutex and reacquire
it before it returns.
CC: Evgeniy Polyakov <zbr@ioremap.net>
CC: ocfs2-devel@oss.oracle.com
CC: Joel Becker <joel.becker@oracle.com>
CC: Felix Blyakher <felixb@sgi.com>
CC: xfs@oss.sgi.com
CC: Anton Altaparmakov <aia21@cantab.net>
CC: linux-ntfs-dev@lists.sourceforge.net
CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
CC: linux-ext4@vger.kernel.org
CC: tytso@mit.edu
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
generic_file_aio_write_nolock() is now used only by block devices and raw
character device. Filesystems should use __generic_file_aio_write() in case
generic_file_aio_write() doesn't suit them. So rename the function to
blkdev_aio_write() and move it to fs/blockdev.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Rename __generic_file_aio_write_nolock() to __generic_file_aio_write(), add
comments to write helpers explaining how they should be used and export
__generic_file_aio_write() since it will be used by some filesystems.
CC: ocfs2-devel@oss.oracle.com
CC: Joel Becker <joel.becker@oracle.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
This simple helper saves some filesystems conversion from byte offset
to page numbers and also makes the fdata* interface more complete.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
* 'x86-percpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, percpu: Collect hot percpu variables into one cacheline
x86, percpu: Fix DECLARE/DEFINE_PER_CPU_PAGE_ALIGNED()
x86, percpu: Add 'percpu_read_stable()' interface for cacheable accesses
Some of the generated functions used in the TRACE_EVENT macros are
not declared static, but they are not global.
Discovered by sparse.
Reported-by: Jaswinder Singh Rajput <jaswinder@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard,
the only difference between the two is that blkdev_issue_discard needs to
send a barrier discard request and blk_ioctl_discard a non-barrier one,
and blk_ioctl_discard needs to wait on the request. To facilitates this
add a flags argument to blkdev_issue_discard to control both aspects of the
behaviour. This will be very useful later on for using the waiting
funcitonality for other callers.
Based on an earlier patch from Matthew Wilcox <matthew@wil.cx>.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The commands are conceptually writes, and in the case of IDE and SCSI
commands actually are writes. They were only reads because we thought
that would interact better with the elevators. Now the elevators know
about discard requests, that advantage no longer exists.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Implement blk_limits_io_opt() and make blk_queue_io_opt() a wrapper
around it. DM needs this to avoid poking at the queue_limits directly.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Currently, there is a single in_flight counter measuring the number of
requests in the request_queue. But some monitoring tools would like to
know how many read requests and write requests are in progress. Split the
current in_flight counter into two seperate counters for read and write.
This information is exported as a sysfs attribute, as changing the
currently available stat files would break the existing tools.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
When do_balance() balances the tree, a trick is performed to
provide the ability for other tree writers/readers to check whether
do_balance() is executing concurrently (requires CONFIG_REISERFS_CHECK).
This is done to protect concurrent accesses to the tree. The trick
is the following:
When do_balance is called, a unique global variable called cur_tb
takes a pointer to the current tree to be rebalanced.
Once do_balance finishes its work, cur_tb takes the NULL value.
Then, concurrent tree readers/writers just have to check the value
of cur_tb to ensure do_balance isn't executing concurrently.
If it is, then it proves that schedule() occured on do_balance(),
which then relaxed the bkl that protected the tree.
Now that the bkl has be turned into a mutex, this check is still
fine even though do_balance() becomes preemptible: the write lock
will not be automatically released on schedule(), so the tree is
still protected.
But this is only fine if we have a single reiserfs mountpoint.
Indeed, because the bkl is a global lock, it didn't allowed
concurrent executions between a tree reader/writer in a mount point
and a do_balance() on another tree from another mountpoint.
So assuming all these readers/writers weren't supposed to be
reentrant, the current check now sometimes detect false positives with
the current per-superblock mutex which allows this reentrancy.
This patch keeps the concurrent tree accesses check but moves it
per superblock, so that only trees from a same mount point are
checked to be not accessed concurrently.
[ Impact: fix spurious panic while running several reiserfs mount-points ]
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
While searching a pathname, an inode mutex can be acquired
in do_lookup() which calls reiserfs_lookup() which in turn
acquires the write lock.
On the other side reiserfs_fill_super() can acquire the write_lock
and then call reiserfs_lookup_privroot() which can acquire an
inode mutex (the root of the mount point).
So we theoretically risk an AB - BA lock inversion that could lead
to a deadlock.
As for other lock dependencies found since the bkl to mutex
conversion, the fix is to use reiserfs_mutex_lock_safe() which
drops the lock dependency to the write lock.
[ Impact: fix a possible deadlock with reiserfs ]
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
The goal of fs_changed() is to check whether the tree changed during a
schedule(). This is a BKL legacy.
A recent patch added an explicit unconditional release/reacquire of the
write lock around the cond_resched() called inside fs_changed.
But it's wasteful to unconditionally do that, we are creating superfluous
lock contention in !TIF_NEED_RESCHED case.
This patch manage that by calling reiserfs_cond_resched() from fs_changed()
which only releases the lock if we are going to reschedule.
[ Impact: inject less lock contention and tree job retries ]
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Usually, when we call cond_resched(), we want the write lock
to be released and then reacquired once we return from scheduling.
Not only does it follow the previous bkl based locking scheme, but
it also let other waiters to get the lock.
But if we aren't going to reschedule(), such as in !TIF_NEED_RESCHED
case, it's useless to release the lock. Worse, if we release and reacquire
the lock whereas it is not needed, we create useless contentions. Also
if someone takes the lock while we are modifying or reading the tree,
there are good chances we'll have to retry our operation, eg if the
block we were seeeking has moved.
So this patch introduces a helper which only unlock the write lock
if we are going to schedule.
[ Impact: prepare to inject less lock contention and less tree operation attempts ]
Reported-by: Andi Kleen <andi@firstfloor.org>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
fs_changed() is a macro used by reiserfs to check whether its tree has been
rebalanced. It has been designed to check parallel changes on the tree after
calling a sleeping function, which released the Bkl.
fs_changed() also calls cond_resched(), so that if rescheduling is needed,
we are in the best place to do that, since we check if the tree has changed
just after (because of the bkl release on schedule()).
Even if we are not anymore using the Bkl, we still want to release the lock
while we reschedule, so that other waiters for the lock can acquire it safely,
because of the following __fs_changed() check.
[ Impact: release the reiserfs write lock when it is not needed ]
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Sometimes we don't want to recursively hold the per superblock write
lock because we want to be sure it is actually released when we come
to sleep.
This patch introduces the necessary tools for that.
reiserfs_write_lock_once() does the same job than reiserfs_write_lock()
except that it won't try to acquire recursively the lock if the current
task already owns it. Also the lock_depth before the call of this function
is returned.
reiserfs_write_unlock_once() unlock only if reiserfs_write_lock_once()
returned a depth equal to -1, ie: only if it actually locked.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@texware.it>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
LKML-Reference: <1239680065-25013-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch is an attempt to remove the Bkl based locking scheme from
reiserfs and is intended.
It is a bit inspired from an old attempt by Peter Zijlstra:
http://lkml.indiana.edu/hypermail/linux/kernel/0704.2/2174.html
The bkl is heavily used in this filesystem to prevent from
concurrent write accesses on the filesystem.
Reiserfs makes a deep use of the specific properties of the Bkl:
- It can be acqquired recursively by a same task
- It is released on the schedule() calls and reacquired when schedule() returns
The two properties above are a roadmap for the reiserfs write locking so it's
very hard to simply replace it with a common mutex.
- We need a recursive-able locking unless we want to restructure several blocks
of the code.
- We need to identify the sites where the bkl was implictly relaxed
(schedule, wait, sync, etc...) so that we can in turn release and
reacquire our new lock explicitly.
Such implicit releases of the lock are often required to let other
resources producer/consumer do their job or we can suffer unexpected
starvations or deadlocks.
So the new lock that replaces the bkl here is a per superblock mutex with a
specific property: it can be acquired recursively by a same task, like the
bkl.
For such purpose, we integrate a lock owner and a lock depth field on the
superblock information structure.
The first axis on this patch is to turn reiserfs_write_(un)lock() function
into a wrapper to manage this mutex. Also some explicit calls to
lock_kernel() have been converted to reiserfs_write_lock() helpers.
The second axis is to find the important blocking sites (schedule...(),
wait_on_buffer(), sync_dirty_buffer(), etc...) and then apply an explicit
release of the write lock on these locations before blocking. Then we can
safely wait for those who can give us resources or those who need some.
Typically this is a fight between the current writer, the reiserfs workqueue
(aka the async commiter) and the pdflush threads.
The third axis is a consequence of the second. The write lock is usually
on top of a lock dependency chain which can include the journal lock, the
flush lock or the commit lock. So it's dangerous to release and trying to
reacquire the write lock while we still hold other locks.
This is fine with the bkl:
T1 T2
lock_kernel()
mutex_lock(A)
unlock_kernel()
// do something
lock_kernel()
mutex_lock(A) -> already locked by T1
schedule() (and then unlock_kernel())
lock_kernel()
mutex_unlock(A)
....
This is not fine with a mutex:
T1 T2
mutex_lock(write)
mutex_lock(A)
mutex_unlock(write)
// do something
mutex_lock(write)
mutex_lock(A) -> already locked by T1
schedule()
mutex_lock(write) -> already locked by T2
deadlock
The solution in this patch is to provide a helper which releases the write
lock and sleep a bit if we can't lock a mutex that depend on it. It's another
simulation of the bkl behaviour.
The last axis is to locate the fs callbacks that are called with the bkl held,
according to Documentation/filesystem/Locking.
Those are:
- reiserfs_remount
- reiserfs_fill_super
- reiserfs_put_super
Reiserfs didn't need to explicitly lock because of the context of these callbacks.
But now we must take care of that with the new locking.
After this patch, reiserfs suffers from a slight performance regression (for now).
On UP, a high volume write with dd reports an average of 27 MB/s instead
of 30 MB/s without the patch applied.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Bron Gondwana <brong@fastmail.fm>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
LKML-Reference: <1239070789-13354-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
__validate_process_creds should check if selinux is actually enabled before
running tests on the selinux portion of the credentials struct.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
[sunrpc: change idle timeout value for the backchannel]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This allows more precise tracking of how the scheduler accounts
(and acts upon) a task having spent N nanoseconds of CPU time.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The patch adds an interface to set the relationship between audio
channel number and slot number. The interface should be really useful
because audio channel n doesn't always use slot n in all platforms. And
for some devices, the relationship even can change with sound mode
switch in 2.1,3.1,4.1,5.1,6.1,7.1 etc.
Signed-off-by: Barry Song <21cnbao@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
This patch increases the max string used by predicates to
handle KSYM_SYMBOL_LEN.
Also moves an include to look nicer.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
__start_mcount_loc[] is unused after init, yet occupies RAM forever
as part of .rodata. 152kiB is typical on a 64-bit architecture. Instead,
__start_mcount_loc should be in the interval [__init_begin, __init_end)
so that the space is reclaimed after init.
__start_mcount_loc[] is generated during the load portion
of kernel build, and is used only by ftrace_init(). ftrace_init is declared
'__init' and is in .init.text, which is freed after init.
__start_mcount_loc is placed into .rodata by a call to MCOUNT_REC inside
the RO_DATA macro of include/asm-generic/vmlinux.lds.h. The array *is*
read-only, but more importantly it is not used after init. So the call to
MCOUNT_REC should be moved from RO_DATA to INIT_DATA.
This patch has been tested on x86_64 with CONFIG_DEBUG_PAGEALLOC=y
which verifies that the address range never is accessed after init.
Signed-off-by: John Reiser <jreiser@BitWagon.com>
LKML-Reference: <4A6DF0B6.7080402@bitwagon.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Booting 2.6.31 and executing
echo 1 >/sys/kernel/debug/tracing/events/enable
leads to
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<c032a583>] ftrace_raw_event_block_bio_bounce+0x4b/0xb9
Apparently,
bio = bio_map_user(q, NULL, uaddr, len, reading, gfp_mask);
is called in block/blk-map.c:58 where bio->bi_bdev in set to NULL and
still is NULL when an attempt is made to evaluate bio->bi_bdev->bd_dev
in include/trace/events/block.h:189.
The tracepoint should ensure bio->bi_bdev is not dereferenced, if NULL.
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
LKML-Reference: <4AAAC9B1.9060505@osadl.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
V4L2_FMT_FLAG_EMULATED 0x0002 This format is not native to the device but
emulated through software (usually libv4l2), where possible try to use a
native format instead for better performance.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Currently, V4L uses a scancode table whose index is the scancode and
the value is the keycode. While this works, it has some drawbacks:
1) It requires that the scancode to be at the range 00-7f;
2) keycodes should be masked on 7 bits in order for it to work;
3) due to the 7 bits approach, sometimes it is not possible to replace
the default keyboard to another one with a different encoding rule;
4) it is different than what is done with dvb-usb approach;
5) it requires a typedef for it to work. This is not a recommended
Linux CodingStyle.
This patch is part of a larger series of IR changes. It basically
replaces the IR_KEYTAB_TYPE tables by a structured table:
struct ir_scancode {
u16 scancode;
u32 keycode;
};
This is very close to what dvb does. So, a further integration with DVB
code will be easy.
While we've changed the tables, for now, the IR keycode handling is still
based on the old approach.
The only notable effect is the redution of about 35% of the ir-common
module size:
text data bss dec hex filename
6721 29208 4 35933 8c5d old/ir-common.ko
5756 18040 4 23800 5cf8 new/ir-common.ko
In thesis, we could be using above u8 for scancode, reducing even more the size
of the module, but defining it as u16 is more convenient, since, on dvb, each
scancode has up to 16 bits, and we currently have a few troubles with rc5, as their
scancodes are defined with more than 8 bits.
This patch itself shouldn't be doing any functional changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
[mchehab@redhat.com: Fix a few wrong IR keymaps]
Signed-off-by: Shine Liu <shinel@foxmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This patch adds files to control si4713 devices.
Internal functions to control device properties
and initialization procedures are into these files.
Also, a v4l2 subdev interface is also exported.
This way other drivers can use this as v4l2 i2c subdevice.
Signed-off-by: Eduardo Valentin <eduardo.valentin@nokia.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This patch adds files which creates the radio interface
for si4713 FM transmitter (modulator) devices.
In order to do the real access to device registers, this
driver uses the v4l2 subdev interface exported by si4713 i2c driver.
Signed-off-by: Eduardo Valentin <eduardo.valentin@nokia.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This patch adds a new class of extended controls. This class
is intended to support FM Radio Modulators properties such as:
rds, audio limiters, audio compression, pilot tone generation,
tuning power levels and preemphasis properties.
Signed-off-by: Eduardo Valentin <eduardo.valentin@nokia.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The upcoming RDS encoder needs support for string controls. This patch
implements the core implementation.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Add support for the remote control that comes with the Cinergy Hybrid T USB XS
Thanks to Jelle de Jong for providing sample hardware to test with.
Cc: Jelle de Jong <jelledejong@powercraft.nl>
Signed-off-by: Devin Heitmueller <dheitmueller@linuxtv.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This patch augments the init data passed by bridge drivers to
ir-kbd-i2c, so that the ir_type can be set explicitly, and so
ir-kbd-i2c internal get_key functions can be reused without
requiring symbols from ir-kbd-i2c in the bridge driver.
Signed-off-by: Andy Walls <awalls@radix.net>
Reviewed-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Douglas Schilling Landgraf <dougsland@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Add capabilities to describe an FM transmitter device.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
DMX_ADD_PID allows to add multiple PIDs to a transport stream filter
previously set up with DMX_SET_PES_FILTER and output=DMX_OUT_TSDEMUX_TAP.
DMX_REMOVE_PID is used to drop a PID from a filter.
These ioctls are to be used by readers of /dev/dvb/adapterX/demuxY. They
may be called at any time, i.e. before or after the first filter on the
shared file descriptor was started.
They make it possible to record multiple services without the need to de-
or re-multiplex TS packets.
To accomplish this, dmxdev_filter->feed.ts has been converted to a list
of struct dmxdev_feeds, each containing a PID value and a pointer to a
struct dmx_ts_feed.
Signed-off-by: Andreas Oberritter <obi@linuxtv.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
To make UVC constants accessible by a future UVC gadget driver, move them from
drivers/media/video/uvc/uvcvideo.h to include/linux/usb/video.h.
Signed-off-by: Laurent Pinchart <laurent.pinchart@skynet.be>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
With the changes this file suffered along the time, things got a little disorganized.
In particular, V4L2_PIX_FMT_YVYU were shown as a device-specific format, instead of
yet another variant of YUV.
There's no functional change on this patch. It just adds some comments and reorder
fourcc formats to their proper places.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
bnx2i currently has a check for if a ep is properly bound, so if
iscsi_queuecommand/xmit_task is called while there is no ep
we will not queue IO.
be2iscsi sends IO from queuecommand/xmit_task like how bnx2i does
and needs a similar test. This patch has us just use the suspend_bit
test for this.
When ep_poll has succeeed iscsid will call conn_bind, the LLD will
then call iscsi_conn_bind which will clear the suspend bit.
When ep_disconnect is called (or if there is a conn error) we set
the suspend bit. For the ep_disconnect case I am adding a helper
in this patch that will take the session lock to make sure
iscsi_queuecommand/xmit_task is not running and it will set
the suspend bit.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Jayamohan Kallickal <jayamohank@serverengines.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
beiscsi does not need the iscsi scsi cmd processing. It does not
even get this info on the completion path. This adds a function
to just update the sequencing numbers and complete a task.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Jayamohan Kallickal <jayamohank@serverengines.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
fw_card_get, fw_card_put, fw_card_release are currently not exported for
use outside the firewire-core. Move their definitions/ declarations
from the subsystem header file to the core header file.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
This moves the primecell vendor enum definition inside vic.c
out to linux/amba/bus.h where it belongs and replace any
occurances of specific vendor ID:s with the respective enums
instead.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (87 commits)
NFSv4: Disallow 'mount -t nfs4 -overs=2' and 'mount -t nfs4 -overs=3'
NFS: Allow the "nfs" file system type to support NFSv4
NFS: Move details of nfs4_get_sb() to a helper
NFS: Refactor NFSv4 text-based mount option validation
NFS: Mount option parser should detect missing "port="
NFS: out of date comment regarding O_EXCL above nfs3_proc_create()
NFS: Handle a zero-length auth flavor list
SUNRPC: Ensure that sunrpc gets initialised before nfs, lockd, etc...
nfs: fix compile error in rpc_pipefs.h
nfs: Remove reference to generic_osync_inode from a comment
SUNRPC: cache must take a reference to the cache detail's module on open()
NFS: Use the DNS resolver in the mount code.
NFS: Add a dns resolver for use with NFSv4 referrals and migration
SUNRPC: Fix a typo in cache_pipefs_files
nfs: nfs4xdr: optimize low level decoding
nfs: nfs4xdr: get rid of READ_BUF
nfs: nfs4xdr: simplify decode_exchange_id by reusing decode_opaque_inline
nfs: nfs4xdr: get rid of COPYMEM
nfs: nfs4xdr: introduce decode_sessionid helper
nfs: nfs4xdr: introduce decode_verifier helper
...
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: (25 commits)
pata_rz1000: use printk_once
ahci: kill @force_restart and refine CLO for ahci_kick_engine()
pata_cs5535: add pci id for AMD based CS5535 controllers
ahci: Add AMD SB900 SATA/IDE controller device IDs
drivers/ata: use resource_size
sata_fsl: Defer non-ncq commands when ncq commands active
libata: add SATA PMP revision information for spec 1.2
libata: fix off-by-one error in ata_tf_read_block()
ahci: Gigabyte GA-MA69VM-S2 can't do 64bit DMA
ahci: make ahci_asus_m2a_vm_32bit_only() quirk more generic
dmi: extend dmi_get_year() to dmi_get_date()
dmi: fix date handling in dmi_get_year()
libata: unbreak TPM filtering by reorganizing ata_scsi_pass_thru()
sata_sis: convert to slave_link
sata_sil24: always set protocol override for non-ATAPI data commands
libata: Export AHCI capabilities
libata: Delegate nonrot flag setting to SCSI
[libata] Add pata_rdc driver for RDC ATA devices
drivers/ata: Remove unnecessary semicolons
libata: remove spindown skipping and warning
...
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (105 commits)
ring-buffer: only enable ring_buffer_swap_cpu when needed
ring-buffer: check for swapped buffers in start of committing
tracing: report error in trace if we fail to swap latency buffer
tracing: add trace_array_printk for internal tracers to use
tracing: pass around ring buffer instead of tracer
tracing: make tracing_reset safe for external use
tracing: use timestamp to determine start of latency traces
tracing: Remove mentioning of legacy latency_trace file from documentation
tracing/filters: Defer pred allocation, fix memory leak
tracing: remove users of tracing_reset
tracing: disable buffers and synchronize_sched before resetting
tracing: disable update max tracer while reading trace
tracing: print out start and stop in latency traces
ring-buffer: disable all cpu buffers when one finds a problem
ring-buffer: do not count discarded events
ring-buffer: remove ring_buffer_event_discard
ring-buffer: fix ring_buffer_read crossing pages
ring-buffer: remove unnecessary cpu_relax
ring-buffer: do not swap buffers during a commit
ring-buffer: do not reset while in a commit
...
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits)
sched: Fix sched::sched_stat_wait tracepoint field
sched: Disable NEW_FAIR_SLEEPERS for now
sched: Keep kthreads at default priority
sched: Re-tune the scheduler latency defaults to decrease worst-case latencies
sched: Turn off child_runs_first
sched: Ensure that a child can't gain time over it's parent after fork()
sched: enable SD_WAKE_IDLE
sched: Deal with low-load in wake_affine()
sched: Remove short cut from select_task_rq_fair()
sched: Turn on SD_BALANCE_NEWIDLE
sched: Clean up topology.h
sched: Fix dynamic power-balancing crash
sched: Remove reciprocal for cpu_power
sched: Try to deal with low capacity, fix update_sd_power_savings_stats()
sched: Try to deal with low capacity
sched: Scale down cpu_power due to RT tasks
sched: Implement dynamic cpu_power
sched: Add smt_gain
sched: Update the cpu_power sum during load-balance
sched: Add SD_PREFER_SIBLING
...
* 'perfcounters-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
perf tools: Avoid unnecessary work in directory lookups
perf stat: Clean up statistics calculations a bit more
perf stat: More advanced variance computation
perf stat: Use stddev_mean in stead of stddev
perf stat: Remove the limit on repeat
perf stat: Change noise calculation to use stddev
x86, perf_counter, bts: Do not allow kernel BTS tracing for now
x86, perf_counter, bts: Correct pointer-to-u64 casts
x86, perf_counter, bts: Fail if BTS is not available
perf_counter: Fix output-sharing error path
perf trace: Fix read_string()
perf trace: Print out in nanoseconds
perf tools: Seek to the end of the header area
perf trace: Fix parsing of perf.data
perf trace: Sample timestamps as well
perf_counter: Introduce new (non-)paranoia level to allow raw tracepoint access
perf trace: Sample the CPU too
perf tools: Work around strict aliasing related warnings
perf tools: Clean up warnings list in the Makefile
perf tools: Complete support for dynamic strings
...
* 'irq-threaded-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
genirq: Do not mask oneshot edge type interrupts
genirq: Support nested threaded irq handling
genirq: Add buslock support
genirq: Add oneshot support
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (28 commits)
rcu: Move end of special early-boot RCU operation earlier
rcu: Changes from reviews: avoid casts, fix/add warnings, improve comments
rcu: Create rcutree plugins to handle hotplug CPU for multi-level trees
rcu: Remove lockdep annotations from RCU's _notrace() API members
rcu: Add #ifdef to suppress __rcu_offline_cpu() warning in !HOTPLUG_CPU builds
rcu: Add CPU-offline processing for single-node configurations
rcu: Add "notrace" to RCU function headers used by ftrace
rcu: Remove CONFIG_PREEMPT_RCU
rcu: Merge preemptable-RCU functionality into hierarchical RCU
rcu: Simplify rcu_pending()/rcu_check_callbacks() API
rcu: Use debugfs_remove_recursive() simplify code.
rcu: Merge per-RCU-flavor initialization into pre-existing macro
rcu: Fix online/offline indication for rcudata.csv trace file
rcu: Consolidate sparse and lockdep declarations in include/linux/rcupdate.h
rcu: Renamings to increase RCU clarity
rcu: Move private definitions from include/linux/rcutree.h to kernel/rcutree.h
rcu: Expunge lingering references to CONFIG_CLASSIC_RCU, optimize on !SMP
rcu: Delay rcu_barrier() wait until beginning of next CPU-hotunplug operation.
rcu: Fix typo in rcu_irq_exit() comment header
rcu: Make rcupreempt_trace.c look at offline CPUs
...
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (59 commits)
x86/gart: Do not select AGP for GART_IOMMU
x86/amd-iommu: Initialize passthrough mode when requested
x86/amd-iommu: Don't detach device from pt domain on driver unbind
x86/amd-iommu: Make sure a device is assigned in passthrough mode
x86/amd-iommu: Align locking between attach_device and detach_device
x86/amd-iommu: Fix device table write order
x86/amd-iommu: Add passthrough mode initialization functions
x86/amd-iommu: Add core functions for pd allocation/freeing
x86/dma: Mark iommu_pass_through as __read_mostly
x86/amd-iommu: Change iommu_map_page to support multiple page sizes
x86/amd-iommu: Support higher level PTEs in iommu_page_unmap
x86/amd-iommu: Remove old page table handling macros
x86/amd-iommu: Use 2-level page tables for dma_ops domains
x86/amd-iommu: Remove bus_addr check in iommu_map_page
x86/amd-iommu: Remove last usages of IOMMU_PTE_L0_INDEX
x86/amd-iommu: Change alloc_pte to support 64 bit address space
x86/amd-iommu: Introduce increase_address_space function
x86/amd-iommu: Flush domains if address space size was increased
x86/amd-iommu: Introduce set_dte_entry function
x86/amd-iommu: Add a gneric version of amd_iommu_flush_all_devices
...
In some cases, the network device driver knows what layer-3 address the
device should have. This adds support for the Phonet stack to
automatically request from the driver and add that address to the
network device.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFA_F_DADFAILED flag to denote an IPv6 address that has
failed Duplicate Address Detection, that way tools like
/sbin/ip can be more informative.
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
inet6 2001:db8::1/64 scope global tentative dadfailed
valid_lft forever preferred_lft forever
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Ethernet framing is used for a lot of devices these days. Most
prominent are WiFi and WiMAX based devices. However for userspace
application it is important to classify these devices correctly and
not only see them as Ethernet devices. The daemons like HAL, DeviceKit
or even NetworkManager with udev support tries to do the classification
in userspace with a lot trickery and extra system calls. This is not
good and actually reaches its limitations. Especially since the kernel
does know the type of the Ethernet device it is pretty stupid.
To solve this problem the underlying device type needs to be set and
then the value will be exported as DEVTYPE via uevents and available
within udev.
# cat /sys/class/net/wlan0/uevent
DEVTYPE=wlan
INTERFACE=wlan0
IFINDEX=5
This is similar to subsystems like USB and SCSI that distinguish
between hosts, devices, disks, partitions etc.
The new SET_NETDEV_DEVTYPE() is a convenience helper to set the actual
device type. All device types are free form, but for convenience the
same strings as used with RFKILL are choosen.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the call direction is a reply, copy the xid and call direction into the
req->rq_private_buf.head[0].iov_base otherwise rpc_verify_header returns
rpc_garbage.
Signed-off-by: Rahul Iyer <iyer@netapp.com>
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[get rid of CONFIG_NFSD_V4_1]
[sunrpc: refactoring of svc_tcp_recvfrom]
[nfsd41: sunrpc: create common send routine for the fore and the back channels]
[nfsd41: sunrpc: Use free_page() to free server backchannel pages]
[nfsd41: sunrpc: Document server backchannel locking]
[nfsd41: sunrpc: remove bc_connect_worker()]
[nfsd41: sunrpc: Define xprt_server_backchannel()[
[nfsd41: sunrpc: remove bc_close and bc_init_auto_disconnect dummy functions]
[nfsd41: sunrpc: eliminate unneeded switch statement in xs_setup_tcp()]
[nfsd41: sunrpc: Don't auto close the server backchannel connection]
[nfsd41: sunrpc: Remove unused functions]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: change bc_sock to bc_xprt]
[nfsd41: sunrpc: move struct rpc_buffer def into a common header file]
[nfsd41: sunrpc: use rpc_sleep in bc_send_request so not to block on mutex]
[removed cosmetic changes]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[sunrpc: add new xprt class for nfsv4.1 backchannel]
[sunrpc: v2.1 change handling of auto_close and init_auto_disconnect operations for the nfsv4.1 backchannel]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
[reverted more cosmetic leftovers]
[got rid of xprt_server_backchannel]
[separated "nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel"]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Cc: Trond Myklebust <trond.myklebust@netapp.com>
[sunrpc: change idle timeout value for the backchannel]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Acked-by: Trond Myklebust <trond.myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (102 commits)
crypto: sha-s390 - Fix warnings in import function
crypto: vmac - New hash algorithm for intel_txt support
crypto: api - Do not displace newly registered algorithms
crypto: ansi_cprng - Fix module initialization
crypto: xcbc - Fix alignment calculation of xcbc_tfm_ctx
crypto: fips - Depend on ansi_cprng
crypto: blkcipher - Do not use eseqiv on stream ciphers
crypto: ctr - Use chainiv on raw counter mode
Revert crypto: fips - Select CPRNG
crypto: rng - Fix typo
crypto: talitos - add support for 36 bit addressing
crypto: talitos - align locks on cache lines
crypto: talitos - simplify hmac data size calculation
crypto: mv_cesa - Add support for Orion5X crypto engine
crypto: cryptd - Add support to access underlaying shash
crypto: gcm - Use GHASH digest algorithm
crypto: ghash - Add GHASH digest algorithm for GCM
crypto: authenc - Convert to ahash
crypto: api - Fix aligned ctx helper
crypto: hmac - Prehash ipad/opad
...
* 'writeback' of git://git.kernel.dk/linux-2.6-block:
writeback: check for registered bdi in flusher add and inode dirty
writeback: add name to backing_dev_info
writeback: add some debug inode list counters to bdi stats
writeback: get rid of pdflush completely
writeback: switch to per-bdi threads for flushing data
writeback: move dirty inodes from super_block to backing_dev_info
writeback: get rid of generic_sync_sb_inodes() export
Add ID 99 for PXA3xx frame buffers and report it in the pxa frame buffer
conditionally, depending on a new flag in struct pxafb_mach_info.
Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: linux-fbdev-devel@lists.sourceforge.net
Cc: Dennis Oliver Kropp <dok@directfb.org>
Cc: Sven Neumann <s.neumann@raumfeld.com>
Cc: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
* 'kmemleak' of git://linux-arm.org/linux-2.6:
kmemleak: Improve the "Early log buffer exceeded" error message
kmemleak: fix sparse warning for static declarations
kmemleak: fix sparse warning over overshadowed flags
kmemleak: move common painting code together
kmemleak: add clear command support
kmemleak: use bool for true/false questions
kmemleak: Do no create the clean-up thread during kmemleak_disable()
kmemleak: Scan all thread stacks
kmemleak: Don't scan uninitialized memory when kmemcheck is enabled
kmemleak: Ignore the aperture memory hole on x86_64
kmemleak: Printing of the objects hex dump
kmemleak: Do not report alloc_bootmem blocks as leaks
kmemleak: Save the stack trace for early allocations
kmemleak: Mark the early log buffer as __initdata
kmemleak: Dump object information on request
kmemleak: Allow rescheduling during an object scanning
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (57 commits)
binfmt_elf: fix PT_INTERP bss handling
TPM: Fixup boot probe timeout for tpm_tis driver
sysfs: Add labeling support for sysfs
LSM/SELinux: inode_{get,set,notify}secctx hooks to access LSM security context information.
VFS: Factor out part of vfs_setxattr so it can be called from the SELinux hook for inode_setsecctx.
KEYS: Add missing linux/tracehook.h #inclusions
KEYS: Fix default security_session_to_parent()
Security/SELinux: includecheck fix kernel/sysctl.c
KEYS: security_cred_alloc_blank() should return int under all circumstances
IMA: open new file for read
KEYS: Add a keyctl to install a process's session keyring on its parent [try #6]
KEYS: Extend TIF_NOTIFY_RESUME to (almost) all architectures [try #6]
KEYS: Do some whitespace cleanups [try #6]
KEYS: Make /proc/keys use keyid not numread as file position [try #6]
KEYS: Add garbage collection for dead, revoked and expired keys. [try #6]
KEYS: Flag dead keys to induce EKEYREVOKED [try #6]
KEYS: Allow keyctl_revoke() on keys that have SETATTR but not WRITE perm [try #6]
KEYS: Deal with dead-type keys appropriately [try #6]
CRED: Add some configurable debugging [try #6]
selinux: Support for the new TUN LSM hooks
...
BIOS clear DMAR table INTR_REMAP flag to disable interrupt remapping. Current
kernel only check interrupt remapping(IR) flag in DRHD's extended capability
register to decide interrupt remapping support or not. But IR flag will not
change when BIOS disable/enable interrupt remapping.
When user disable interrupt remapping in BIOS or BIOS often defaultly disable
interrupt remapping feature when BIOS is not mature.Though BIOS disable
interrupt remapping but intr_remapping_supported function will always report
to OS support interrupt remapping if VT-d2 chipset populated. On this
cases, kernel will continue enable interrupt remapping and result kernel panic.
This bug exist on almost all platforms with interrupt remapping support.
This patch add DMAR table INTR_REMAP flag check before enable interrupt
remapping.
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The userstack trace required the recording of the tgid entry.
Unfortunately, it was added to the generic entry where it wasted
4 bytes of every entry and was only used by one entry.
This patch moves it out of the generic field and moves it into the
only user (userstack_entry).
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Test results here look good, and on big OLTP runs it has also shown
to significantly increase cycles attributed to the database and
cause a performance boost.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This borrows some code from NAPI and implements a polled completion
mode for block devices. The idea is the same as NAPI - instead of
doing the command completion when the irq occurs, schedule a dedicated
softirq in the hopes that we will complete more IO when the iopoll
handler is invoked. Devices have a budget of commands assigned, and will
stay in polled mode as long as they continue to consume their budget
from the iopoll softirq handler. If they do not, the device is set back
to interrupt completion mode.
This patch holds the core bits for blk-iopoll, device driver support
sold separately.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Instead of just checking whether this device uses block layer
tagging, we can improve the detection by looking at the maximum
queue depth it has reached. If that crosses 4, then deem it a
queuing device.
This is important on high IOPS devices, since plugging hurts
the performance there (it can be as much as 10-15% of the sys
time).
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Get rid of any functions that test for these bits and make callers
use bio_rw_flagged() directly. Then it is at least directly apparent
what variable and flag they check.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Failfast has characteristics from other attributes. When issuing,
executing and successuflly completing requests, failfast doesn't make
any difference. It only affects how a request is handled on failure.
Allowing requests with different failfast settings to be merged cause
normal IOs to fail prematurely while not allowing has performance
penalties as failfast is used for read aheads which are likely to be
located near in-flight or to-be-issued normal IOs.
This patch introduces the concept of 'mixed merge'. A request is a
mixed merge if it is merge of segments which require different
handling on failure. Currently the only mixable attributes are
failfast ones (or lack thereof).
When a bio with different failfast settings is added to an existing
request or requests of different failfast settings are merged, the
merged request is marked mixed. Each bio carries failfast settings
and the request always tracks failfast state of the first bio. When
the request fails, blk_rq_err_bytes() can be used to determine how
many bytes can be safely failed without crossing into an area which
requires further retrials.
This allows request merging regardless of failfast settings while
keeping the failure handling correct.
This patch only implements mixed merge but doesn't enable it. The
next one will update SCSI to make use of mixed merge.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Niel Lambrechts <niel.lambrechts@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
bio and request use the same set of failfast bits. This patch makes
the following changes to simplify things.
* enumify BIO_RW* bits and reorder bits such that BIOS_RW_FAILFAST_*
bits coincide with __REQ_FAILFAST_* bits.
* The above pushes BIO_RW_AHEAD out of sync with __REQ_FAILFAST_DEV
but the matching is useless anyway. init_request_from_bio() is
responsible for setting FAILFAST bits on FS requests and non-FS
requests never use BIO_RW_AHEAD. Drop the code and comment from
blk_rq_bio_prep().
* Define REQ_FAILFAST_MASK which is OR of all FAILFAST bits and
simplify FAILFAST flags handling in init_request_from_bio().
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Also a debugging aid. We want to catch dirty inodes being added to
backing devices that don't do writeback.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This enables us to track who does what and print info. Its main use
is catching dirty inodes on the default_backing_dev_info, so we can
fix that up.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This gets rid of pdflush for bdi writeout and kupdated style cleaning.
pdflush writeout suffers from lack of locality and also requires more
threads to handle the same workload, since it has to work in a
non-blocking fashion against each queue. This also introduces lumpy
behaviour and potential request starvation, since pdflush can be starved
for queue access if others are accessing it. A sample ffsb workload that
does random writes to files is about 8% faster here on a simple SATA drive
during the benchmark phase. File layout also seems a LOT more smooth in
vmstat:
r b swpd free buff cache si so bi bo in cs us sy id wa
0 1 0 608848 2652 375372 0 0 0 71024 604 24 1 10 48 42
0 1 0 549644 2712 433736 0 0 0 60692 505 27 1 8 48 44
1 0 0 476928 2784 505192 0 0 4 29540 553 24 0 9 53 37
0 1 0 457972 2808 524008 0 0 0 54876 331 16 0 4 38 58
0 1 0 366128 2928 614284 0 0 4 92168 710 58 0 13 53 34
0 1 0 295092 3000 684140 0 0 0 62924 572 23 0 9 53 37
0 1 0 236592 3064 741704 0 0 4 58256 523 17 0 8 48 44
0 1 0 165608 3132 811464 0 0 0 57460 560 21 0 8 54 38
0 1 0 102952 3200 873164 0 0 4 74748 540 29 1 10 48 41
0 1 0 48604 3252 926472 0 0 0 53248 469 29 0 7 47 45
where vanilla tends to fluctuate a lot in the creation phase:
r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 0 678716 5792 303380 0 0 0 74064 565 50 1 11 52 36
1 0 0 662488 5864 319396 0 0 4 352 302 329 0 2 47 51
0 1 0 599312 5924 381468 0 0 0 78164 516 55 0 9 51 40
0 1 0 519952 6008 459516 0 0 4 78156 622 56 1 11 52 37
1 1 0 436640 6092 541632 0 0 0 82244 622 54 0 11 48 41
0 1 0 436640 6092 541660 0 0 0 8 152 39 0 0 51 49
0 1 0 332224 6200 644252 0 0 4 102800 728 46 1 13 49 36
1 0 0 274492 6260 701056 0 0 4 12328 459 49 0 7 50 43
0 1 0 211220 6324 763356 0 0 0 106940 515 37 1 10 51 39
1 0 0 160412 6376 813468 0 0 0 8224 415 43 0 6 49 45
1 1 0 85980 6452 886556 0 0 4 113516 575 39 1 11 54 34
0 2 0 85968 6452 886620 0 0 0 1640 158 211 0 0 46 54
A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A
SSD based writeback test on XFS performs over 20% better as well, with
the throughput being very stable around 1GB/sec, where pdflush only
manages 750MB/sec and fluctuates wildly while doing so. Random buffered
writes to many files behave a lot better as well, as does random mmap'ed
writes.
A separate thread is added to sync the super blocks. In the long term,
adding sync_supers_bdi() functionality could get rid of this thread again.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This is a first step at introducing per-bdi flusher threads. We should
have no change in behaviour, although sb_has_dirty_inodes() is now
ridiculously expensive, as there's no easy way to answer that question.
Not a huge problem, since it'll be deleted in subsequent patches.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This adds two new exported functions:
- writeback_inodes_sb(), which only attempts to writeback dirty inodes on
this super_block, for WB_SYNC_NONE writeout.
- sync_inodes_sb(), which writes out all dirty inodes on this super_block
and also waits for the IO to complete.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Conflicts:
kernel/trace/trace_export.c
kernel/trace/trace_kprobe.c
Merge reason: This topic branch lacks an important
build fix in tracing/core:
0dd7b74787:
tracing: Fix double CPP substitution in TRACE_EVENT_FN
that prevents from multiple tracepoint headers inclusion crashes.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Some drivers need to know when a lid event occurs and get the current
status. This can be useful for when a platform firmware clobbers some
hardware state at lid time, and a driver needs to restore things when
the lid is opened again.
Acked-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
When an RSCN indicates changes to individual remote ports,
don't blindly log them out and then back in. Instead, determine
whether they're still in the directory, by doing GPN_ID.
If that is successful, call login, which will send ADISC and reverify,
otherwise, call logoff. Perhaps we should just delete the rport,
not send LOGO, but it seems safer.
Also, fix a possible issue where if a mix of records in the RSCN
cause us to queue disc_ports for disc_single and then we decide
to do full rediscovery, we leak memory for those disc_ports queued.
So, go through the list of disc_ports even if doing full discovery.
Free the disc_ports in any case. If any of the disc_single() calls
return error, do a full discovery.
The ability to fill in GPN_ID requests was added to fc_ct_fill().
For this, it needs the FC_ID to be passed in as an arg.
The did parameter for fc_elsct_send() is used for that, since the
actual D_DID will always be 0xfffffc for all CT requests so far.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
When rport_login is called on an rport that is already thought
to be logged in, use ADISC. If that fails, redo PLOGI.
This is less disruptive after fabric changes that don't affect
the state of the target.
Implement the sending of ADISC via fc_els_fill.
Add ADISC state to the rport state machine. This is entered from READY
and returns to READY after successful completion. If it fails, the rport
is either logged off and deleted or re-does PLOGI.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Improve lport and rport debug messages to indicate whether
the response is LS_ACC, LS_RJT, closed, or timeout.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This moves the remote port lookup for incoming ELS requests into
fc_rport.c, in preparation for handing PLOGI and LOGO from
unknown rports.
This changes the arg to rport_recv_req from an rdata to an lport.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Currently these values are initialized by the callers. This was exposed
by a later patch that adds PLOGI request support. The patch failed to
initialize the new remote port's roles and it caused problems. This patch
has the rport_create routine initialize the identifiers and then the
callers can override them with real values.
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
On some switches, an empty zone causes GPN_FT to be rejected
with reason 9 (unable) explanation 7 (FC-4 types not registered),
which causes discovery to be retried endlessly. Treat this as
just an empty response and consider discovery complete.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
When receiving an RSCN, do not log off all rports. This is
extremely disruptive. If, after the GPN_FT response, some
rports haven't been listed, delete them.
Add field disc_id to structs fc_rport_priv and fc_disc.
disc_id is an arbitrary serial number used to identify the
rports found by the latest discovery. This eliminates the need
to go through the rport list when restarting discovery.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Delete unused disc->delay element.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
There was no need to have the discovery status stored in struct fc_disc.
Change fc_disc_done() to take the discovery status as an argument
and just pass it on to the discovery callback.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Don't create a "dummy" remote port to go with fc_rport_priv.
Make the rport truly optional by allocating fc_rport_priv separately
and not requiring a dummy rport to be there if we haven't yet done
fc_remote_port_add().
The fc_rport_libfc_priv remains as a structure attached to the
rport for I/O purposes.
Be sure to hold references on rdata when the lock is dropped in
fc_rport_work().
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Remote ports will become READY more than once after
ADISC is implemented in a later patch.
The event callback that has been called "CREATED" will mean "READY".
Rename it now in preparation for those changes.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Allow a struct fc_rport_priv to have no fc_rport associated with it.
This sets up to remove the need for "rogue" rports.
Add a few fields to fc_rport_priv that are needed before the fc_rport
is created. These are the ids, maxframe_size, classes, and rport pointer.
Remove the macro PRIV_TO_RPORT(). Just use rdata->rport where appropriate.
To take the place of the get_device()/put_device ops that were used to
hold both the rport and rdata, add a reference count to rdata structures
using kref. When kref_get decrements the refcount to zero, a new template
function releasing the rdata should be called. This will take care of
freeing the rdata and releasing the hold on the rport (for now). After
subsequent patches make the rport truly optional, this release function
will simply free the rdata.
Remove the simple inline function fc_rport_set_name(), which becomes
semanticly ambiguous otherwise. The caller will set the port_name and
node_name in the rdata->Ids, which will later be copied to the rport
when it its created.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
tt.elsct_send is used by both FCP and by the rport state machine.
After further patches, these two modules will use different
structures for the remote port.
So, change elsct_send to use the FC_ID instead of the fc_rport_priv
as its argument. It currently only uses the FC_ID anyway.
For CT requests the destination FC_ID is still implicitly 0xfffffc.
After further patches the did arg on CT requests will be used to
specify the FC_ID being inquired about for GPN_ID or other queries.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The rport and discovery modules deal with remote ports
before fc_remote_port_add() can be done, because the
full set of rport identifiers is not known at early stages.
In preparation for splitting the fc_rport/fc_rport_priv allocation,
make fc_rport_priv the primary interface for the remote port and
discovery engines.
The FCP / SCSI layers still deal with fc_rport and
fc_rport_libfc_priv, however.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
These macros introduce extra undesirable semicolons that keep
them from being used in expressions, and they don't protect
against being passed an expression.
Add parens and remove the semicolons.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The interface for lport->tt.rport_create() takes a fc_disc_port arg,
which is unnatural for most calls. The only reason for this was
to avoid passing in the local port as an argument, but otherwise
added to complexity.
Simplify by just using lport and fc_rport_identifiers.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
While the I/O and LLD interfaces use fc_rport_libfc_priv, the
disc and rport interfaces will use fc_rport_priv, which will
be separately allocated.
Change the disc and rport usage of fc_rport_libfc_priv to fc_rport_priv.
Use #define temporarily to make both names equivalent until a
subsequent patch splits them.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This patch enables DCA support on multiple-IOH/multiple-IIO architectures.
It modifies dca module by replacing single dca_providers list
with dca_domains list, each domain containing separate list of providers.
This approach lets dca driver manage multiple domains, i.e. sets of providers
and requesters mapped back to the same PCI root complex device.
The driver takes care to register each requester to a provider
from the same domain.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
* topic/tlv-minmax:
ALSA: usb-audio - Correct bogus volume dB information
ALSA: usb-audio - Use the new TLV_DB_MINMAX type
ALSA: Add new TLV types for dBwith min/max
* topic/soundcore-preclaim:
sound: make OSS device number claiming optional and schedule its removal
sound: request char-major-* module aliases for missing OSS devices
chrdev: implement __[un]register_chrdev()
* topic/asoc: (226 commits)
ASoC: au1x: PSC-AC97 bugfixes
ASoC: Fix WM835x Out4 capture enumeration
ASoC: Remove unuused hw_read_t
ASoC: fix pxa2xx-ac97.c breakage
ASoC: Fully specify DC servo bits to update in wm_hubs
ASoC: Debugged improper setting of PLL fields in WM8580 driver
ASoC: new board driver to connect bfin-5xx with ad1836 codec
ASoC: OMAP: Add functionality to set CLKR and FSR sources in McBSP DAI
ASoC: davinci: i2c device creation moved into board files
ASoC: Don't reconfigure WM8350 FLL if not needed
ASoC: Fix s3c-i2s-v2 build
ASoC: Make platform data optional for TLV320AIC3x
ASoC: Add S3C24xx dependencies for Simtec machines
ASoC: SDP3430: Fix TWL GPIO6 pin mux request
ASoC: S3C platform: Fix s3c2410_dma_started() called at improper time
ARM: OMAP: McBSP: Merge two functions into omap_mcbsp_start/_stop
ASoC: OMAP: Fix setup of XCCR and RCCR registers in McBSP DAI
OMAP: McBSP: Use textual values in DMA operating mode sysfs files
ARM: OMAP: DMA: Add support for DMA channel self linking on OMAP1510
ASoC: Select core DMA when building for S3C64xx
...
kvm_para.h contains userspace interface and so
should be exported.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
So far unprivileged guest callers running in ring 3 can issue, e.g., MMU
hypercalls. Normally, such callers cannot provide any hand-crafted MMU
command structure as it has to be passed by its physical address, but
they can still crash the guest kernel by passing random addresses.
To close the hole, this patch considers hypercalls valid only if issued
from guest ring 0. This may still be relaxed on a per-hypercall base in
the future once required.
Cc: stable@kernel.org
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Now KVM allow guest to modify guest's physical address of EPT's identity mapping page.
(change from v1, discard unnecessary check, change ioctl to accept parameter
address rather than value)
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Remove kvm_cpu_has_interrupt() and kvm_arch_interrupt_allowed() from
interface between general code and arch code. kvm_arch_vcpu_runnable()
checks for interrupts instead.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest. Host userspace can register any
arbitrary IO address with a corresponding eventfd and then pass the eventfd
to a specific end-point of interest for handling.
Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
"heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.
However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc). For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible. All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling. This adds additional computational load on the
system, as well as latency to the signalling path.
Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd. This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.
To test this theory, we built a test-harness called "doorbell". This
module has a function called "doorbell_ring()" which simply increments a
counter for each time the doorbell is signaled. It supports signalling
from either an eventfd, or an ioctl().
We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl(). The other is direct via
ioeventfd.
You can download this test harness here:
ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2
The measured results are as follows:
qemu-mmio: 110000 iops, 9.09us rtt
ioeventfd-mmio: 200100 iops, 5.00us rtt
ioeventfd-pio: 367300 iops, 2.72us rtt
I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy. However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:
qemu-pio: 153139 iops, 6.53us rtt
ioeventfd-hc: 412585 iops, 2.37us rtt
these are just for fun, for now, until I can gather more data.
Here is a graph for your convenience:
http://developer.novell.com/wiki/images/7/76/Iofd-chart.png
The conclusion to draw is that we save about 4us by skipping the userspace
hop.
--------------------
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Today kvm_io_bus_regsiter_dev() returns void and will internally BUG_ON
if it fails. We want to create dynamic MMIO/PIO entries driven from
userspace later in the series, so we need to enhance the code to be more
robust with the following changes:
1) Add a return value to the registration function
2) Fix up all the callsites to check the return code, handle any
failures, and percolate the error up to the caller.
3) Add an unregister function that collapses holes in the array
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When kvm is in hpet_legacy_mode, the hpet is providing the timer
interrupt and the pit should not be. So in legacy mode, the pit timer
is destroyed, but the *state* of the pit is maintained. So if kvm or
the guest tries to modify the state of the pit, this modification is
accepted, *except* that the timer isn't actually started. When we exit
hpet_legacy_mode, the current state of the pit (which is up to date
since we've been accepting modifications) is used to restart the pit
timer.
The saved_mode code in kvm_pit_load_count temporarily changes mode to
0xff in order to destroy the timer, but then restores the actual
value, again maintaining "current" state of the pit for possible later
reenablement.
[avi: add some reserved storage in the ioctl; make SET_PIT2 IOW]
[marcelo: fix memory corruption due to reserved storage]
Signed-off-by: Beth Kon <eak@us.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Add tracepoint in msi/ioapic/pic set_irq() functions,
in IPI sending and in the point where IRQ is placed into
apic's IRR.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This changes bus accesses to use high-level kvm_io_bus_read/kvm_io_bus_write
functions. in_range now becomes unused so it is removed from device ops in
favor of read/write callbacks performing range checks internally.
This allows aliasing (mostly for in-kernel virtio), as well as better error
handling by making it possible to pass errors up to userspace.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Use slots_lock to protect device list on the bus. slots_lock is already
taken for read everywhere, so we only need to take it for write when
registering devices. This is in preparation to removing in_range and
kvm->lock around it.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Change kvm_vcpu_is_bsp to use vcpu_id instead of bsp_vcpu pointer, which
is only initialized at the end of kvm_vm_ioctl_create_vcpu.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This allows use of the powerful ftrace infrastructure.
See Documentation/trace/ for usage information.
[avi, stephen: various build fixes]
[sheng: fix control register breakage]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Disable usage of 2M pages if VMX_EPT_2MB_PAGE_BIT (bit 16) is clear
in MSR_IA32_VMX_EPT_VPID_CAP and EPT is enabled.
[avi: s/largepages_disabled/largepages_enabled/ to avoid negative logic]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Archs are free to use vcpu_id as they see fit. For x86 it is used as
vcpu's apic id. New ioctl is added to configure boot vcpu id that was
assumed to be 0 till now.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Protect irq injection/acking data structures with a separate irq_lock
mutex. This fixes the following deadlock:
CPU A CPU B
kvm_vm_ioctl_deassign_dev_irq()
mutex_lock(&kvm->lock); worker_thread()
-> kvm_deassign_irq() -> kvm_assigned_dev_interrupt_work_handler()
-> deassign_host_irq() mutex_lock(&kvm->lock);
-> cancel_work_sync() [blocked]
[gleb: fix ia64 path]
Reported-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Introduce irq_lock, and use to protect ioapic data structures.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Changing s390 code in kvm_arch_vcpu_load/put come across this header
declarations. They are complete duplicates, not even useful forward
declarations as nothing using it is in between (maybe it was that in
the past).
This patch removes the two dispensable lines.
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
We only trap one page for MSI-X entry now, so it's 4k/(128/8) = 256 entries at
most.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The in-kernel speaker emulation is only a dummy and also unneeded from
the performance point of view. Rather, it takes user space support to
generate sound output on the host, e.g. console beeps.
To allow this, introduce KVM_CREATE_PIT2 which controls in-kernel
speaker port emulation via a flag passed along the new IOCTL. It also
leaves room for future extensions of the PIT configuration interface.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
KVM provides a complete virtual system environment for guests, including
support for injecting interrupts modeled after the real exception/interrupt
facilities present on the native platform (such as the IDT on x86).
Virtual interrupts can come from a variety of sources (emulated devices,
pass-through devices, etc) but all must be injected to the guest via
the KVM infrastructure. This patch adds a new mechanism to inject a specific
interrupt to a guest using a decoupled eventfd mechnanism: Any legal signal
on the irqfd (using eventfd semantics from either userspace or kernel) will
translate into an injected interrupt in the guest at the next available
interrupt window.
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The related MSRs are emulated. MCE capability is exported via
extension KVM_CAP_MCE and ioctl KVM_X86_GET_MCE_CAP_SUPPORTED. A new
vcpu ioctl command KVM_X86_SETUP_MCE is used to setup MCE emulation
such as the mcg_cap. MCE is injected via vcpu ioctl command
KVM_X86_SET_MCE. Extended machine-check state (MCG_EXT_P) and CMCI are
not implemented.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When new child qdiscs are attached to the mq qdisc, they are actually
attached as root qdiscs to the device queues. The lock selection for
new estimators incorrectly picks the root lock of the existing and
to be replaced qdisc, which results in a use-after-free once the old
qdisc has been destroyed.
Mark mq qdisc instances with a new flag and treat qdiscs attached to
mq as children similar to regular root qdiscs.
Additionally prevent estimators from being attached to the mq qdisc
itself since it only updates its byte and packet counters during dumps.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces three new hooks. The inode_getsecctx hook is used to get
all relevant information from an LSM about an inode. The inode_setsecctx is
used to set both the in-core and on-disk state for the inode based on a context
derived from inode_getsecctx.The final hook inode_notifysecctx will notify the
LSM of a change for the in-core state of the inode in question. These hooks are
for use in the labeled NFS code and addresses concerns of how to set security
on an inode in a multi-xattr LSM. For historical reasons Stephen Smalley's
explanation of the reason for these hooks is pasted below.
Quote Stephen Smalley
inode_setsecctx: Change the security context of an inode. Updates the
in core security context managed by the security module and invokes the
fs code as needed (via __vfs_setxattr_noperm) to update any backing
xattrs that represent the context. Example usage: NFS server invokes
this hook to change the security context in its incore inode and on the
backing file system to a value provided by the client on a SETATTR
operation.
inode_notifysecctx: Notify the security module of what the security
context of an inode should be. Initializes the incore security context
managed by the security module for this inode. Example usage: NFS
client invokes this hook to initialize the security context in its
incore inode to the value provided by the server for the file when the
server returned the file's attributes to the client.
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
This factors out the part of the vfs_setxattr function that performs the
setting of the xattr and its notification. This is needed so the SELinux
implementation of inode_setsecctx can handle the setting of the xattr while
maintaining the proper separation of layers.
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
The wakeup.prepared flag is used for marking devices that have the
wake-up power already enabled, so that the wake-up power is not
enabled twice in a row for the same device. This assumes, however,
that device wake-up power will only be enabled once, while the device
is being prepared for a system-wide sleep transition, and the second
attempt is made by acpi_enable_wakeup_device_prep().
With the upcoming PCI wake-up rework this assumption will not hold
any more for PCI bridges and the root bridge whose wake-up power
may be enabled as a result of wake-up enable propagation from other
devices (eg. add-on devices that are not associated with any GPEs).
Thus, there may be many attempts to enable wake-up power on a PCI
bridge or the root bridge during a system power state transition
and it's better to replace wakeup.prepared with a reference counter.
Reviewed-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Introduce a new PCI device flag, wakeup_prepared, to prevent PCI
wake-up preparation code from being executed twice in a row for the
same device and for the same purpose.
Reviewed-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
In general a BIOS may goof or we may hotplug in a hotplug controller.
In either case the kernel needs to reserve resources for plugging
in more devices in the future instead of creating a minimal resource
assignment.
We already do this for cardbus bridges I am just adding a variant
for pcie bridges.
v2: Make testing for pcie hotplug bridges based on a flag.
So far we only set the flag for pcie but a header_quirk
could easily be added for the non-standard pci hotplug
bridges.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Separate out pci_add_dynid() from store_new_id() and export it so that
in-kernel code can add PCI IDs dynamically. As the function will be
available regardless of HOTPLUG, put it and pull pci_free_dynids()
outside of CONFIG_HOTPLUG.
This will be used by pci-stub to initialize initial IDs via module
param.
While at it, remove bogus get_driver() failure check.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add support for PCI-E 5.0 GT/s in max_bus_speed and cur_bus_speed.
Reviewed-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Fix some warnings reported in linux-next + also cleanup some
comment errors noticed by Pekka Paalanen.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This is the first of three patches that implement a bit field that PCI
Express device drivers can use to indicate they need a fundamental reset
during error recovery.
By default, the EEH framework on powerpc does what's known as a "hot
reset" during recovery of a PCI Express device. We've found a case
where the device needs a "fundamental reset" to recover properly. The
current PCI error recovery and EEH frameworks do not support this
distinction.
The attached patch (courtesy of Richard Lary) adds a bit field to
pci_dev that indicates whether the device requires a fundamental reset
during recovery.
These patches supersede the previously submitted patch that implemented
a fundamental reset bit field.
Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
Signed-off-by: Richard Lary <rlary@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Background:
Graphic devices are accessed through ranges in I/O or memory space. While most
modern devices allow relocation of such ranges, some "Legacy" VGA devices
implemented on PCI will typically have the same "hard-decoded" addresses as
they did on ISA. For more details see "PCI Bus Binding to IEEE Std 1275-1994
Standard for Boot (Initialization Configuration) Firmware Revision 2.1"
Section 7, Legacy Devices.
The Resource Access Control (RAC) module inside the X server currently does
the task of arbitration when more than one legacy device co-exists on the same
machine. But the problem happens when these devices are trying to be accessed
by different userspace clients (e.g. two server in parallel). Their address
assignments conflict. Therefore an arbitration scheme _outside_ of the X
server is needed to control the sharing of these resources. This document
introduces the operation of the VGA arbiter implemented for Linux kernel.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Tiago Vignatti <tiago.vignatti@nokia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
IDs should generally only be added to pci_ids.h when they're shared
across several files in the tree. IDs that are just used by a single
driver should be defined in the driver instead.
Perhaps documenting this is a good idea to prevent things being moved there,
as it still seems to be happening judging from the git log.
(based on discussion w/gregkh and others).
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Some devices allow an individual function to be reset without affecting
other functions in the same device: that's what pci_reset_function does.
For devices that have this support, expose reset attribite in sysfs.
This is useful e.g. for virtualization, where a qemu userspace
process wants to reset the device when the guest is reset,
to emulate machine reboot as closely as possible.
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We can simplify ACPI drivers if we can tell whether a handle is an
ACPI PCI root or not.
Reviewed-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This was #define'd as 0 on all platforms, so let's get rid of it.
This change makes pci_scan_slot() slightly easier to read.
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Luck <tony.luck@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Russell King <linux@arm.linux.org.uk>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Set child_runs_first default to off.
It hurts 'optimal' make -j<NR_CPUS> workloads as make jobs
get preempted by child tasks, reducing parallelism.
Note, this patch might make existing races in user
applications more prominent than before - so breakages
might be bisected to this commit.
Child-runs-first is broken on SMP to begin with, and we
already had it off briefly in v2.6.23 so most of the
offenders ought to be fixed. Would be nice not to revert
this commit but fix those apps finally ...
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1252486344.28645.18.camel@marge.simson.net>
[ made the sysctl independent of CONFIG_SCHED_DEBUG, in case
people want to work around broken apps. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add support for communicating with a Sonics Silicon Backplane through a
SDIO interface, as found in the Nintendo Wii WLAN daughter card.
The Nintendo Wii WLAN card includes a custom Broadcom 4318 chip with
a SDIO host interface.
Signed-off-by: Albert Herranz <albert_herranz@yahoo.es>
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Fix apparent thinko related to RTM_DELADDRLABEL, introduced by commit
2a8cc6c890 ("[IPV6] ADDRCONF: Support
RFC3484 configurable address selection policy table.").
Signed-off-by: Tushar Gohad <tgohad@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Version 20090903.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
There are cases where full date information is required instead of
just the year. Add month and day parsing to dmi_get_year() and rename
it to dmi_get_date().
As the original function only required '/' followed by any number of
parseable characters at the end of the string, keep that behavior to
avoid upsetting existing users.
The new function takes dates of format [mm[/dd]]/yy[yy]. Year, month
and date are checked to be in the ranges of [1-9999], [1-12] and
[1-31] respectively and any invalid or out-of-range component is
returned as zero.
The dummy implementation is updated accordingly but the return value
is updated to indicate field not found which is consistent with how
other dummy functions behave.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The tx_list attribute of struct dma_async_tx_descriptor is common to
most, but not all dma driver implementations. None of the upper level
code (dmaengine/async_tx) uses it, so allow drivers to implement it
locally if they need it. This saves sizeof(struct list_head) bytes for
drivers that do not manage descriptors with a linked list (e.g.: ioatdma
v2,3).
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Jasper Forest introduces raid offload support via ioat3.2 support. When
raid offload is enabled two (out of 8 channels) will report raid5/raid6
offload capabilities. The remaining channels will only report ioat3.0
capabilities (memcpy).
Signed-off-by: Tom Picard <tom.s.picard@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Some engines have transfer size and address alignment restrictions. Add
a per-operation alignment property to struct dma_device that the async
routines and dmatest can use to check alignment capabilities.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Channel switching is problematic for some dmaengine drivers as the
architecture precludes separating the ->prep from ->submit. In these
cases the driver can select ASYNC_TX_DISABLE_CHANNEL_SWITCH to modify
the async_tx allocator to only return channels that support all of the
required asynchronous operations.
For example MD_RAID456=y selects support for asynchronous xor, xor
validate, pq, pq validate, and memcpy. When
ASYNC_TX_DISABLE_CHANNEL_SWITCH=y any channel with all these
capabilities is marked DMA_ASYNC_TX allowing async_tx_find_channel() to
quickly locate compatible channels with the guarantee that dependency
chains will remain on one channel. When
ASYNC_TX_DISABLE_CHANNEL_SWITCH=n async_tx_find_channel() may select
channels that lead to operation chains that need to cross channel
boundaries using the async_tx channel switch capability.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Some engines optimize operation by reading ahead in the descriptor chain
such that descriptor2 may start execution before descriptor1 completes.
If descriptor2 depends on the result from descriptor1 then a fence is
required (on descriptor2) to disable this optimization. The async_tx
api could implicitly identify dependencies via the 'depend_tx'
parameter, but that would constrain cases where the dependency chain
only specifies a completion order rather than a data dependency. So,
provide an ASYNC_TX_FENCE to explicitly identify data dependencies.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When mounting an "nfs" type file system, recognize "v4," "vers=4," or
"nfsvers=4" mount options, and convert the file system to "nfs4" under
the covers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[trondmy: fixed up binary mount code so it sets the 'version' field too]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Some chips with complex internal supply (particularly clocking)
arragements may have multiple options for some of the supply
connections. Since these don't affect user-visible audio routing
the expectation would be that they would be managed automatically
by one of the drivers.
Support these users by allowing routes to have a connected function
which is queried before the connectedness of the path is checked as
normal. Currently this is only done for supplies, other widgets
could be supported but are not currently since the expectation for
them is that audio routing will be under the control of userspace.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
shmfs wants purely standard POSIX ACL semantics, so we can use the new
generic VFS layer POSIX ACL checking rather than cooking our own
'permission()' function.
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is stage one in flattening out the callchains for the common
permission testing. Rather than have most filesystem implement their
own inode->i_op->permission function that just calls back down to the
VFS layers 'generic_permission()' with the per-filesystem ACL checking
function, the filesystem can just expose its 'check_acl' function
directly, and let the VFS layer do everything for it.
This is all just preparatory - no filesystem actually enables this yet.
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add appropriate const prefix to char * arguments in proc helper functions.
Also fixed the caller side to be proper const pointers.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Re-export snd_pcm_format_name() function to be used outside the PCM core.
As a first example, usbaudio is changed to use it now again.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This was a non-trivial merge with some patches sent to Linus
in drm-fixes.
Conflicts:
drivers/gpu/drm/radeon/r300.c
drivers/gpu/drm/radeon/radeon_asic.h
drivers/gpu/drm/radeon/rs600.c
drivers/gpu/drm/radeon/rs690.c
drivers/gpu/drm/radeon/rv515.c
Now that SD_WAKE_IDLE doesn't make pipe-test suck anymore,
enable it by default for MC, CPU and NUMA domains.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
ADd support for the SPI part of LMS283GF05 LCD. The LCD uses SPI for
initialization and powerdown sequences. No further defails are specified
in the datasheet about the initialization/powerdown sequence, just the
magic numbers that have to be sent over SPI bus. This LCD can be found in
the Aeronix Zipit Z2 handheld.
Signed-off-by: Marek Vasut <marek.vasut@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>
The struct snd_monitor_file is used locally only in sound/core/init.c,
thus it should be moved there from the public sound/core.h.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The WM831x devices feature two software controlled status LEDs with
hardware assisted blinking.
The device can also autonomously control the LEDs based on a selection
of sources. This can be configured at boot time using either platform
data or the chip OTP. A sysfs file in the style of that for triggers
allowing the control source to be configured at run time. Triggers
can't be used here since they can't depend on the implementation details
of a specific LED type.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>
Fix the default security_session_to_parent() in linux/security.h to have a
body.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
This is a brute force removal of the wierd slave interface done for
DLCI -> SDLA transmit. Before it was using non-standard return values
and freeing skb in caller. This changes it to using normal return
values, and freeing in the callee. Luckly only one driver pair was
doing this. Not tested on real hardware, in fact I wonder if this
driver pair is even being used by any users.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a function that can be used to add the default mode for the output device
without EDID.
It will add the default mode that meets with the requirements of given
hdisplay/vdisplay limit.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Conflicts:
arch/Kconfig
kernel/trace/trace.h
Merge reason: resolve the conflicts, plus adopt to the new
ring-buffer APIs.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make security_cred_alloc_blank() return int, not void, when CONFIG_SECURITY=n.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
This patch adds a classful dummy scheduler which can be used as root qdisc
for multiqueue devices and exposes each device queue as a child class.
This allows to address queues individually and graft them similar to regular
classes. Additionally it presents an accumulated view of the statistics of
all real root qdiscs in the dummy root.
Two new callbacks are added to the qdisc_ops and qdisc_class_ops:
- cl_ops->select_queue selects the tx queue number for new child classes.
- qdisc_ops->attach() overrides root qdisc device grafting to attach
non-shared qdiscs to the queues.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It will be used in a following patch by the multiqueue qdisc.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the multiqueue integration with the qdisc API suffers from
a few problems:
- with multiple queues, all root qdiscs use the same handle. This means
they can't be exposed to userspace in a backwards compatible fashion.
- all API operations always refer to queue number 0. Newly created
qdiscs are automatically shared between all queues, its not possible
to address individual queues or restore multiqueue behaviour once a
shared qdisc has been attached.
- Dumps only contain the root qdisc of queue 0, in case of non-shared
qdiscs this means the statistics are incomplete.
This patch reintroduces dev->qdisc, which points to the (single) root qdisc
from userspace's point of view. Currently it either points to the first
(non-shared) default qdisc, or a qdisc shared between all queues. The
following patches will introduce a classful dummy qdisc, which will be used
as root qdisc and contain the per-queue qdiscs as children.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
dm snapshot: fix on disk chunk size validation
dm exception store: split set_chunk_size
dm snapshot: fix header corruption race on invalidation
dm snapshot: refactor zero_disk_area to use chunk_io
dm log: userspace add luid to distinguish between concurrent log instances
dm raid1: do not allow log_failure variable to unset after being set
dm log: remove incorrect field from userspace table output
dm log: fix userspace status output
dm stripe: expose correct io hints
dm table: add more context to terse warning messages
dm table: fix queue_limit checking device iterator
dm snapshot: implement iterate devices
dm multipath: fix oops when request based io fails when no paths
Tom Horsley reports that his debugger hangs when it tries to read
/proc/pid_of_tracee/maps, this happens since
"mm_for_maps: take ->cred_guard_mutex to fix the race with exec"
04b836cbf19e885f8366bccb2e4b0474346c02d
commit in 2.6.31.
But the root of the problem lies in the fact that do_execve() path calls
tracehook_report_exec() which can stop if the tracer sets PT_TRACE_EXEC.
The tracee must not sleep in TASK_TRACED holding this mutex. Even if we
remove ->cred_guard_mutex from mm_for_maps() and proc_pid_attr_write(),
another task doing PTRACE_ATTACH should not hang until it is killed or the
tracee resumes.
With this patch do_execve() does not use ->cred_guard_mutex directly and
we do not hold it throughout, instead:
- introduce prepare_bprm_creds() helper, it locks the mutex
and calls prepare_exec_creds() to initialize bprm->cred.
- install_exec_creds() drops the mutex after commit_creds(),
and thus before tracehook_report_exec()->ptrace_stop().
or, if exec fails,
free_bprm() drops this mutex when bprm->cred != NULL which
indicates install_exec_creds() was not called.
Reported-by: Tom Horsley <tom.horsley@att.net>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cancel_delayed_work() has to use del_timer_sync() to guarantee the timer
function is not running after return. But most users doesn't actually
need this, and del_timer_sync() has problems: it is not useable from
interrupt, and it depends on every lock which could be taken from irq.
Introduce __cancel_delayed_work() which calls del_timer() instead.
The immediate reason for this patch is
http://bugzilla.kernel.org/show_bug.cgi?id=13757
but hopefully this helper makes sense anyway.
As for 13757 bug, actually we need requeue_delayed_work(), but its
semantics are not yet clear.
Merge this patch early to resolves cross-tree interdependencies between
input and infiniband.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
More and more devices feature PLLs and FLLs with the ability to select
between multiple input clocks. In order to better support these devices
a new argument, source, has been added to the set_pll() configuration
API. Using set_clkdiv() is often difficult due to the need to stop the
PLL/FLL before any reconfiguration can be done.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Unlike on some other architectures ino_t is an unsigned int on s390.
So add an explicit cast to avoid lots of compile warnings:
In file included from include/trace/ftrace.h:285,
from include/trace/define_trace.h:61,
from include/trace/events/ext4.h:711,
from fs/ext4/super.c:50:
include/trace/events/ext4.h: In function 'ftrace_raw_output_ext4_free_inode':
include/trace/events/ext4.h:12: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
1. Updates fcoe_rcv() to queue incoming frames to the fcoe per
cpu thread on which this frame's exch was originated and simply
use current cpu for request exch not originated by initiator.
It is redundant to add this code under CONFIG_SMP, so removes
CONFIG_SMP uses around this code.
2. Updates fc_exch_em_alloc, fc_exch_delete, fc_exch_find to use
per cpu exch pools, here fc_exch_delete is rename of older
fc_exch_mgr_delete_ep since ep/exch are now deleted in pools
of EM and so brief new name is sufficient and better name.
Updates these functions to map exch id to their index into exch
pool using fc_cpu_mask, fc_cpu_order and EM min_xid.
This mapping is as per detailed explanation about this in
last patch and basically this is just as lower fc_cpu_mask
bits of exch id as cpu number and upper bit sum of EM min_xid
and exch index in pool.
Uses pool next_index to keep track of exch allocation from
pool along with pool_max_index as upper bound of exches array
in pool.
3. Adds exch pool ptr to fc_exch to free exch to its pool in
fc_exch_delete.
4. Updates fc_exch_mgr_reset to reset all exch pools of an EM,
this required adding fc_exch_pool_reset func to reset exches
in pool and then have fc_exch_mgr_reset call fc_exch_pool_reset
for each pool within each EM for a lport.
5. Removes no longer needed exches array, em_lock, next_xid, and
total_exches from struct fc_exch_mgr, these are not needed after
use of per cpu exch pool, also removes not used max_read,
last_read from struct fc_exch_mgr.
6. Updates locking notes for exch pool lock with fc_exch lock and
uses pool lock in exch allocation, lookup and reset.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Adds per cpu exch pool for these reasons:-
1. Currently an EM instance is shared across all cpus to manage
all exches for all cpus. This required em_lock across all
cpus for an exch alloc, free, lookup and reset each frame
and that made em_lock expensive, so instead having per cpu
exch pool with their own per cpu pool lock will likely reduce
locking contention in fast path for an exch alloc, free and
lookup.
2. Per cpu exch pool will likely improve cache hit ratio since
all frames of an exch will be processed on the same cpu on
which exch originated.
This patch is only prep work to help in keeping complexity of next
patch low, so this patch only sets up per cpu exch pool and related
helper funcs to be used by next patch. The next patch fully makes
use of per cpu exch pool in all code paths ie. tx, rx and reset.
Divides per EM exch id range equally across all cpus to setup per
cpu exch pool. This division is such that lower bits of exch id
carries cpu number info on which exch originated, later a simple
bitwise AND operation on exch id of incoming frame with fc_cpu_mask
retrieves cpu number info to direct all frames to same cpu on which
exch originated. This required a global fc_cpu_mask and fc_cpu_order
initialized to max possible cpus number nr_cpu_ids rounded up to 2's
power, this will be used in mapping exch id and exch ptr array
index in pool during exch allocation, find or reset code paths.
Adds a check in fc_exch_mgr_alloc() to ensure specified min_xid
lower bits are zero since these bits are used to carry cpu info.
Adds and initializes struct fc_exch_pool with all required fields
to manage exches in pool.
Allocates per cpu struct fc_exch_pool with memory for exches array
for range of exches per pool. The exches array memory is followed
by struct fc_exch_pool.
Adds fc_exch_ptr_get/set() helper functions to get/set exch ptr in
pool exches array at specified array index.
Increases default FCOE_MAX_XID to 0x0FFF from 0x07EF, so that more
exches are available per cpu after above described exch id range
division across all cpus to each pool.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
If a target closed the connection, we will detect it in the
state_changed or data_ready callout. This adds a new conn
error value to use for this problem, so it is not confused
with when the initiator throws a conn error and drops
the connection.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Since the ability to swap the cpu buffers adds a small overhead to
the recording of a trace, we only want to add it when needed.
Only the irqsoff and preemptoff tracers use this feature, and both are
not recommended for production kernels. This patch disables its use
when neither irqsoff nor preemptoff is configured.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The latency tracers (irqsoff and wakeup) can swap trace buffers
on the fly. If an event is happening and has reserved data on one of
the buffers, and the latency tracer swaps the global buffer with the
max buffer, the result is that the event may commit the data to the
wrong buffer.
This patch changes the API to the trace recording to be recieve the
buffer that was used to reserve a commit. Then this buffer can be passed
in to the commit.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This shrinks the size of struct sctp_association a little.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
This patch introduces a new sysctl option to make IPv4 Address Scoping
configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>.
In networking environments where DNAT rules in iptables prerouting
chains convert destination IP's to link-local/private IP addresses,
SCTP connections fail to establish as the INIT chunk is dropped by the
kernel due to address scope match failure.
For example to support overlapping IP addresses (same IP address with
different vlan id) a Layer-5 application listens on link local IP's,
and there is a DNAT rule that maps the destination IP to a link local
IP. Such applications never get the SCTP INIT if the address-scoping
draft is strictly followed.
This sysctl configuration allows SCTP to function in such
unconventional networking environments.
Sysctl options:
0 - Disable IPv4 address scoping draft altogether
1 - Enable IPv4 address scoping (default, current behavior)
2 - Enable address scoping but allow IPv4 private addresses in init/init-ack
3 - Enable address scoping but allow IPv4 link local address in init/init-ack
Signed-off-by: Bhaskar Dutta <bhaskar.dutta@globallogic.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
We had a bug that we never stored the user-defined value for
MAXSEG when setting the value on an association. Thus future
PMTU events ended up re-writing the frag point and increasing
it past user limit. Additionally, when setting the option on
the socket/endpoint, we effect all current associations, which
is against spec.
Now, we store the user 'maxseg' value along with the computed
'frag_point'. We inherit 'maxseg' from the socket at association
creation and use it as an upper limit for 'frag_point' when its
set.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
SCTP will delay the last part of a large write due to NAGLE, if that
part is smaller then MTU. Since we are doing large writes, we might
as well send the last portion now instead of waiting untill the next
large write happens. The small portion will be sent as is regardless,
so it's better to not delay it.
This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com>
and Doug Graham <dgraham@nortel.com>. Many thanks go out to them.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
SCTP has a problem that when small chunks are used, it is possible
to exhaust the receiver buffer without fully closing receive window.
This happens due to all overhead that we have account for with small
messages. To fix this, when receive buffer is exceeded, we'll drop
the window to 0 and save the 'drop' portion. When application starts
reading data and freeing up recevie buffer space, we'll wait until
we've reached the 'drop' window and then add back this 'drop' one
mtu at a time. This worked well in testing and under stress produced
rather even recovery.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Currenlty, sctp breaks up user messages into fragments and
sends each fragment to the lower layer by itself. This means
that for each fragment we go all the way down the stack
and back up. This also discourages bundling of multiple
fragments when they can fit into a sigle packet (ex: due
to user setting a low fragmentation threashold).
We introduce a new command SCTP_CMD_SND_MSG and hand the
whole message down state machine. The state machine and
the side-effect parser will cork the queue, add all chunks
from the message to the queue, and then un-cork the queue
thus causing the chunks to get transmitted.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
If a socket has a lot of association that are in the process of
of being closed/aborted, it is possible for a remote to establish
new associations during the time period that the old ones are shutting
down. If this was a result of a close() call, there will be no socket
and will cause a memory leak. We'll prevent this by setting the
socket state to CLOSING and disallow new associations when in this state.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
This patch removes an unused union definition (sctp_cmsg_data_t)
from include/net/sctp/user.h.
Signed-off-by: Rami Rosen <rosenrami@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Device-mapper userspace logs (like the clustered log) are
identified by a universally unique identifier (UUID). This
identifier is used to associate requests from the kernel to
a specific log in userspace. The UUID must be unique everywhere,
since multiple machines may use this identifier when communicating
about a particular log, as is the case for cluster logs.
Sometimes, device-mapper/LVM may re-use a UUID. This is the
case during pvmoves, when moving from one segment of an LV
to another, or when resizing a mirror, etc. In these cases,
a new log is created with the same UUID and loaded in the
"inactive" slot. When a device-mapper "resume" is issued,
the "live" table is deactivated and the new "inactive" table
becomes "live". (The "inactive" table can also be removed
via a device-mapper 'clear' command.)
The above two issues were colliding. More than one log was being
created with the same UUID, and there was no way to distinguish
between them. So, sometimes the wrong log would be swapped
out during the exchange.
The solution is to create a locally unique identifier,
'luid', to go along with the UUID. This new identifier is used
to determine exactly which log is being referenced by the kernel
when the log exchange is made. The identifier is not
universally safe, but it does not need to be, since
create/destroy/suspend/resume operations are bound to a specific
machine; and these are the operations that make up the exchange.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Set sensible I/O hints for striped DM devices in the topology
infrastructure added for 2.6.31 for userspace tools to
obtain via sysfs.
Add .io_hints to 'struct target_type' to allow the I/O hints portion
(io_min and io_opt) of the 'struct queue_limits' to be set by each
target and implement this for dm-stripe.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
The WM831x PMICs provide power path management from three sources:
a wall supply, USB and a battery with integrated charger. They also
provide an additional backup supply with integrated for maintaining
always on functionality such as the RTC and monitoring of power
switches.
After some initial configuration at startup the device operates
autonomously, the driver simply provides reporting of the current
state.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
This patch converts the wm97xx-battery driver to use platform_data
supplied by ac97 bus.
Signed-off-by: Marek Vasut <marek.vasut@gmail.com>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
The function ring_buffer_event_discard can be used on any item in the
ring buffer, even after the item was committed. This function provides
no safety nets and is very race prone.
An item may be safely removed from the ring buffer before it is committed
with the ring_buffer_discard_commit.
Since there are currently no users of this function, and because this
function is racey and error prone, this patch removes it altogether.
Note, removing this function also allows the counters to ignore
all discarded events (patches will follow).
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Ingo Molnar reported the following kmemcheck warning when running both
kmemleak and kmemcheck enabled:
PM: Adding info for No Bus:vcsa7
WARNING: kmemcheck: Caught 32-bit read from uninitialized memory
(f6f6e1a4)
d873f9f600000000c42ae4c1005c87f70000000070665f666978656400000000
i i i i u u u u i i i i i i i i i i i i i i i i i i i i i u u u
^
Pid: 3091, comm: kmemleak Not tainted (2.6.31-rc7-tip #1303) P4DC6
EIP: 0060:[<c110301f>] EFLAGS: 00010006 CPU: 0
EIP is at scan_block+0x3f/0xe0
EAX: f40bd700 EBX: f40bd780 ECX: f16b46c0 EDX: 00000001
ESI: f6f6e1a4 EDI: 00000000 EBP: f10f3f4c ESP: c2605fcc
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: e89a4844 CR3: 30ff1000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff4ff0 DR7: 00000400
[<c110313c>] scan_object+0x7c/0xf0
[<c1103389>] kmemleak_scan+0x1d9/0x400
[<c1103a3c>] kmemleak_scan_thread+0x4c/0xb0
[<c10819d4>] kthread+0x74/0x80
[<c10257db>] kernel_thread_helper+0x7/0x3c
[<ffffffff>] 0xffffffff
kmemleak: 515 new suspected memory leaks (see
/sys/kernel/debug/kmemleak)
kmemleak: 42 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
The problem here is that kmemleak will scan partially initialized
objects that makes kmemcheck complain. Fix that up by skipping
uninitialized memory regions when kmemcheck is enabled.
Reported-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Re-organize the flag settings so that it's visible at a glance
which sched-domains flags are set and which not.
With the new balancer code we'll need to re-tune these details
anyway, so make it cleaner to make fewer mistakes down the
road ;-)
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds the function can_free_echo_skb to the CAN
device interface to allow upcoming drivers to release echo
skb's in case of error.
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define ECC status for 4-bit ECC status
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Its a source of fail, also, now that cpu_power is dynamical,
its a waste of time.
before:
<idle>-0 [000] 132.877936: find_busiest_group: avg_load: 0 group_load: 8241 power: 1
after:
bash-1689 [001] 137.862151: find_busiest_group: avg_load: 10636288 group_load: 10387 power: 1
[ v2: build fix from From: Andreas Herrmann ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <20090901083826.425896304@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Keep an average on the amount of time spend on RT tasks and use
that fraction to scale down the cpu_power for regular tasks.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <20090901083826.287778431@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The idea is that multi-threading a core yields more work
capacity than a single thread, provide a way to express a
static gain for threads.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <20090901083826.073345955@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Do the placement thing using SD flags.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <20090901083825.897028974@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pack aligned things together into a special section to minimize
padding holes.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Tejun Heo <tj@kernel.org>
LKML-Reference: <4AA035C0.9070202@goop.org>
[ queued up in tip:x86/asm because it depends on this commit:
x86/i386: Make sure stack-protector segment base is cache aligned ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This fixed a lockdep warning which appeared when doing stress
memory tests over NFS:
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock
mount_root => nfs_root_data => tcp_close => lock sk_lock =>
tcp_send_fin => alloc_skb_fclone => page reclaim
David raised a concern that if the allocation fails in tcp_send_fin(), and it's
GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
for the allocation to succeed.
But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
loop endlessly under memory pressure.
CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
CC: David S. Miller <davem@davemloft.net>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support to flash a firmware image to a device using ethtool.
The driver gets the filename of the firmware image and flashes the image
using the request firmware path.
The region "on the chip" to be flashed can be specified by an option.
It is upto the device driver to enumerate the region number passed by ethtool,
to the region to be flashed.
The default behavior is to flash all the regions on the chip.
Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vlan devices are currently not multi-queue capable.
We can do that with a new rtnl_link_ops method,
get_tx_queues(), called from rtnl_create_link()
This new method gets num_tx_queues/real_num_tx_queues
from real device.
register_vlan_device() is also handled.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a keyctl to install a process's session keyring onto its parent. This
replaces the parent's session keyring. Because the COW credential code does
not permit one process to change another process's credentials directly, the
change is deferred until userspace next starts executing again. Normally this
will be after a wait*() syscall.
To support this, three new security hooks have been provided:
cred_alloc_blank() to allocate unset security creds, cred_transfer() to fill in
the blank security creds and key_session_to_parent() - which asks the LSM if
the process may replace its parent's session keyring.
The replacement may only happen if the process has the same ownership details
as its parent, and the process has LINK permission on the session keyring, and
the session keyring is owned by the process, and the LSM permits it.
Note that this requires alteration to each architecture's notify_resume path.
This has been done for all arches barring blackfin, m68k* and xtensa, all of
which need assembly alteration to support TIF_NOTIFY_RESUME. This allows the
replacement to be performed at the point the parent process resumes userspace
execution.
This allows the userspace AFS pioctl emulation to fully emulate newpag() and
the VIOCSETTOK and VIOCSETTOK2 pioctls, all of which require the ability to
alter the parent process's PAG membership. However, since kAFS doesn't use
PAGs per se, but rather dumps the keys into the session keyring, the session
keyring of the parent must be replaced if, for example, VIOCSETTOK is passed
the newpag flag.
This can be tested with the following program:
#include <stdio.h>
#include <stdlib.h>
#include <keyutils.h>
#define KEYCTL_SESSION_TO_PARENT 18
#define OSERROR(X, S) do { if ((long)(X) == -1) { perror(S); exit(1); } } while(0)
int main(int argc, char **argv)
{
key_serial_t keyring, key;
long ret;
keyring = keyctl_join_session_keyring(argv[1]);
OSERROR(keyring, "keyctl_join_session_keyring");
key = add_key("user", "a", "b", 1, keyring);
OSERROR(key, "add_key");
ret = keyctl(KEYCTL_SESSION_TO_PARENT);
OSERROR(ret, "KEYCTL_SESSION_TO_PARENT");
return 0;
}
Compiled and linked with -lkeyutils, you should see something like:
[dhowells@andromeda ~]$ keyctl show
Session Keyring
-3 --alswrv 4043 4043 keyring: _ses
355907932 --alswrv 4043 -1 \_ keyring: _uid.4043
[dhowells@andromeda ~]$ /tmp/newpag
[dhowells@andromeda ~]$ keyctl show
Session Keyring
-3 --alswrv 4043 4043 keyring: _ses
1055658746 --alswrv 4043 4043 \_ user: a
[dhowells@andromeda ~]$ /tmp/newpag hello
[dhowells@andromeda ~]$ keyctl show
Session Keyring
-3 --alswrv 4043 4043 keyring: hello
340417692 --alswrv 4043 4043 \_ user: a
Where the test program creates a new session keyring, sticks a user key named
'a' into it and then installs it on its parent.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Add garbage collection for dead, revoked and expired keys. This involved
erasing all links to such keys from keyrings that point to them. At that
point, the key will be deleted in the normal manner.
Keyrings from which garbage collection occurs are shrunk and their quota
consumption reduced as appropriate.
Dead keys (for which the key type has been removed) will be garbage collected
immediately.
Revoked and expired keys will hang around for a number of seconds, as set in
/proc/sys/kernel/keys/gc_delay before being automatically removed. The default
is 5 minutes.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Add a config option (CONFIG_DEBUG_CREDENTIALS) to turn on some debug checking
for credential management. The additional code keeps track of the number of
pointers from task_structs to any given cred struct, and checks to see that
this number never exceeds the usage count of the cred struct (which includes
all references, not just those from task_structs).
Furthermore, if SELinux is enabled, the code also checks that the security
pointer in the cred struct is never seen to be invalid.
This attempts to catch the bug whereby inode_has_perm() faults in an nfsd
kernel thread on seeing cred->security be a NULL pointer (it appears that the
credential struct has been previously released):
http://www.kerneloops.org/oops.php?number=252883
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
This patch adds VMAC (a fast MAC) support into crypto framework.
Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The function block inet_connect_sock_af_ops contains no data
make it constant.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add 3 schedstat tracepoints to help account for wait-time,
sleep-time and iowait-time.
They can also be used as a perf-counter source to profile tasks
on these clocks.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <new-submission>
[ build fix for the !CONFIG_SCHEDSTATS case ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For counting how long an application has been waiting for
(disk) IO, there currently is only the HZ sample driven
information available, while for all other counters in this
class, a high resolution version is available via
CONFIG_SCHEDSTATS.
In order to make an improved bootchart tool possible, we also
need a higher resolution version of the iowait time.
This patch below adds this scheduler statistic to the kernel.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4A64B813.1080506@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
A misconfiguration by the firmware of the U4 PCIe bridge on PowerMac G5
with the U4 bridge (latest generations, may also affect the iMac G5
"iSight") is causing us to re-assign the PCI BARs of the video card,
which can get it out of sync with the firmware, thus breaking offb.
This works around it by fixing up the bridge configuration properly
at boot time. It also fixes a bug where the firmware provides us with
an incorrect set of accessible regions in the device-tree.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Merge reason: bump from rc5 to rc8, but also pick up TP_perf_assign()
API, a patch will be queued that depends on it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/x86/kernel/reboot.c
security/Kconfig
Merge reason: resolve the conflicts, bump up from rc3 to rc8.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use NFSD_SLOT_CACHE_SIZE size buffers for sessions DRC instead of holding nfsd
pages in cache.
Connectathon testing has shown that 1024 bytes for encoded compound operation
responses past the sequence operation is sufficient, 512 bytes is a little too
small. Set NFSD_SLOT_CACHE_SIZE to 1024.
Allocate memory for the session DRC in the CREATE_SESSION operation
to guarantee that the memory resource is available for caching responses.
Allocate each slot individually in preparation for slot table size negotiation.
Remove struct nfsd4_cache_entry and helper functions for the old page-based
DRC.
The iov_len calculation in nfs4svc_encode_compoundres is now always
correct. Replay is now done in nfsd4_sequence under the state lock, so
the session ref count is only bumped on non-replay. Clean up the
nfs4svc_encode_compoundres session logic.
The nfsd4_compound_state statp pointer is also not used.
Remove nfsd4_set_statp().
Move useful nfsd4_cache_entry fields into nfsd4_slot.
Signed-off-by: Andy Adamson <andros@netapp.com
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
By using the requested ca_maxresponsesize_cached * ca_maxresponses to bound
a forechannel drc request size, clients can tailor a session to usage.
For example, an I/O session (READ/WRITE only) can have a much smaller
ca_maxresponsesize_cached (for only WRITE compound responses) and a lot larger
ca_maxresponses to service a large in-flight data window.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Move tboot.h from asm to linux to fix the build errors of intel_txt
patch on non-X86 platforms. Remove the tboot code from generic code
init/main.c and kernel/cpu.c.
Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
These are full of unresolved problems, mainly that conversions don't
work 1-1 from hrtimers to tasklet_hrtimers because unlike hrtimers
tasklets can't be killed from softirq context.
And when a qdisc gets reset, that's exactly what we need to do here.
We'll work this out in the net-next-2.6 tree and if warranted we'll
backport that work to -stable.
This reverts the following 3 changesets:
a2cb6a4dd4
("pkt_sched: Fix bogon in tasklet_hrtimer changes.")
38acce2d79
("pkt_sched: Convert CBQ to tasklet_hrtimer.")
ee5f9757ea
("pkt_sched: Convert qdisc_watchdog to tasklet_hrtimer")
Signed-off-by: David S. Miller <davem@davemloft.net>
->read_proc, ->write_proc are going away, ->proc_fops should be used instead.
The only tricky place is IDENTIFY handling: if for some reason
taskfile_lib_get_identify() fails, buffer _is_ changed and at least
first byte is overwritten. Emulate old behaviour with returning
that first byte to userspace and reporting length=1 despite overall -E.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These tables are never modified at runtime. Move to read-only
section.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch affects the retransmits_timed_out() function.
Changes:
1) Variables have more meaningful names
2) retransmits_timed_out() has an introductionary comment.
3) Small coding style changes.
Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>