If the accessed bit is not set, the guest has never accessed this page
(at least through this spte), so there's no need to mark the page
accessed. This provides more accurate data for the eviction algortithm.
Noted by Andrea Arcangeli.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Allow the Linux memory manager to reclaim memory in the kvm shadow cache.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Unify slots_lock acquision around vcpu_run(). This is simpler and less
error-prone.
Also fix some callsites that were not grabbing the lock properly.
[avi: drop slots_lock while in guest mode to avoid holding the lock
for indefinite periods]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
MSR Bitmap controls whether the accessing of an MSR causes VM Exit.
Eliminating exits on automatically saved and restored MSRs yields a
small performance gain.
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch adds functionality to detect if the kernel runs under the KVM
hypervisor. A macro MACHINE_IS_KVM is exported for device drivers. This
allows drivers to skip device detection if the systems runs non-virtualized.
We also define a preferred console to avoid having the ttyS0, which is a line
mode only console.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch adds the virtualization submenu and the kvm option to the kernel
config. It also defines HAVE_KVM for 64bit kernels.
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch introduces interpretation of some diagnose instruction intercepts.
Diagnose is our classic architected way of doing a hypercall. This patch
features the following diagnose codes:
- vm storage size, that tells the guest about its memory layout
- time slice end, which is used by the guest to indicate that it waits
for a lock and thus cannot use up its time slice in a useful way
- ipl functions, which a guest can use to reset and reboot itself
In order to implement ipl functions, we also introduce an exit reason that
causes userspace to perform various resets on the virtual machine. All resets
are described in the principles of operation book, except KVM_S390_RESET_IPL
which causes a reboot of the machine.
Acked-by: Martin Schwidefsky <martin.schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch introduces in-kernel handling of _some_ sigp interprocessor
signals (similar to ipi).
kvm_s390_handle_sigp() decodes the sigp instruction and calls individual
handlers depending on the operation requested:
- sigp sense tries to retrieve information such as existence or running state
of the remote cpu
- sigp emergency sends an external interrupt to the remove cpu
- sigp stop stops a remove cpu
- sigp stop store status stops a remote cpu, and stores its entire internal
state to the cpus lowcore
- sigp set arch sets the architecture mode of the remote cpu. setting to
ESAME (s390x 64bit) is accepted, setting to ESA/S390 (s390, 31 or 24 bit) is
denied, all others are passed to userland
- sigp set prefix sets the prefix register of a remote cpu
For implementation of this, the stop intercept indication starts to get reused
on purpose: a set of action bits defines what to do once a cpu gets stopped:
ACTION_STOP_ON_STOP really stops the cpu when a stop intercept is recognized
ACTION_STORE_ON_STOP stores the cpu status to lowcore when a stop intercept is
recognized
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch introduces in-kernel handling of some intercepts for privileged
instructions:
handle_set_prefix() sets the prefix register of the local cpu
handle_store_prefix() stores the content of the prefix register to memory
handle_store_cpu_address() stores the cpu number of the current cpu to memory
handle_skey() just decrements the instruction address and retries
handle_stsch() delivers condition code 3 "operation not supported"
handle_chsc() same here
handle_stfl() stores the facility list which contains the
capabilities of the cpu
handle_stidp() stores cpu type/model/revision and such
handle_stsi() stores information about the system topology
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch contains the s390 interrupt subsystem (similar to in kernel apic)
including timer interrupts (similar to in-kernel-pit) and enabled wait
(similar to in kernel hlt).
In order to achieve that, this patch also introduces intercept handling
for instruction intercepts, and it implements load control instructions.
This patch introduces an ioctl KVM_S390_INTERRUPT which is valid for both
the vm file descriptors and the vcpu file descriptors. In case this ioctl is
issued against a vm file descriptor, the interrupt is considered floating.
Floating interrupts may be delivered to any virtual cpu in the configuration.
The following interrupts are supported:
SIGP STOP - interprocessor signal that stops a remote cpu
SIGP SET PREFIX - interprocessor signal that sets the prefix register of a
(stopped) remote cpu
INT EMERGENCY - interprocessor interrupt, usually used to signal need_reshed
and for smp_call_function() in the guest.
PROGRAM INT - exception during program execution such as page fault, illegal
instruction and friends
RESTART - interprocessor signal that starts a stopped cpu
INT VIRTIO - floating interrupt for virtio signalisation
INT SERVICE - floating interrupt for signalisations from the system
service processor
struct kvm_s390_interrupt, which is submitted as ioctl parameter when injecting
an interrupt, also carrys parameter data for interrupts along with the interrupt
type. Interrupts on s390 usually have a state that represents the current
operation, or identifies which device has caused the interruption on s390.
kvm_s390_handle_wait() does handle waitpsw in two flavors: in case of a
disabled wait (that is, disabled for interrupts), we exit to userspace. In case
of an enabled wait we set up a timer that equals the cpu clock comparator value
and sleep on a wait queue.
[christian: change virtio interrupt to 0x2603]
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This path introduces handling of sie intercepts in three flavors: Intercepts
are either handled completely in-kernel by kvm_handle_sie_intercept(),
or passed to userspace with corresponding data in struct kvm_run in case
kvm_handle_sie_intercept() returns -ENOTSUPP.
In case of partial execution in kernel with the need of userspace support,
kvm_handle_sie_intercept() may choose to set up struct kvm_run and return
-EREMOTE.
The trivial intercept reasons are handled in this patch:
handle_noop() just does nothing for intercepts that don't require our support
at all
handle_stop() is called when a cpu enters stopped state, and it drops out to
userland after updating our vcpu state
handle_validity() faults in the cpu lowcore if needed, or passes the request
to userland
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch contains the port of Qumranet's kvm kernel module to IBM zSeries
(aka s390x, mainframe) architecture. It uses the mainframe's virtualization
instruction SIE to run virtual machines with up to 64 virtual CPUs each.
This port is only usable on 64bit host kernels, and can only run 64bit guest
kernels. However, running 31bit applications in guest userspace is possible.
The following source files are introduced by this patch
arch/s390/kvm/kvm-s390.c similar to arch/x86/kvm/x86.c, this implements all
arch callbacks for kvm. __vcpu_run calls back into
sie64a to enter the guest machine context
arch/s390/kvm/sie64a.S assembler function sie64a, which enters guest
context via SIE, and switches world before and after that
include/asm-s390/kvm_host.h contains all vital data structures needed to run
virtual machines on the mainframe
include/asm-s390/kvm.h defines kvm_regs and friends for user access to
guest register content
arch/s390/kvm/gaccess.h functions similar to uaccess to access guest memory
arch/s390/kvm/kvm-s390.h header file for kvm-s390 internals, extended by
later patches
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The SIE instruction on s390 uses the 2nd half of the page table page to
virtualize the storage keys of a guest. This patch offers the s390_enable_sie
function, which reorganizes the page tables of a single-threaded process to
reserve space in the page table:
s390_enable_sie makes sure that the process is single threaded and then uses
dup_mm to create a new mm with reorganized page tables. The old mm is freed
and the process has now a page status extended field after every page table.
Code that wants to exploit pgstes should SELECT CONFIG_PGSTE.
This patch has a small common code hit, namely making dup_mm non-static.
Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's
review feedback. Now we do have the prototype for dup_mm in
include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now
call task_lock() to prevent race against ptrace modification of mm_users.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This emulates the x86 hardware task switch mechanism in software, as it is
unsupported by either vmx or svm. It allows operating systems which use it,
like freedos, to run as kvm guests.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
When mmu_set_spte() checks if a page related to spte should be release as
dirty or clean, it check if the shadow pte was writeble, but in case
rmap_write_protect() is called called it is possible for shadow ptes that were
writeble to become readonly and therefor mmu_set_spte will release the pages
as clean.
This patch fix this issue by marking the page as dirty inside
rmap_write_protect().
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
If we populate a shadow pte due to a fault (and not speculatively due to a
pte write) then we can set the accessed bit on it, as we know it will be
set immediately on the next guest instruction. This saves a read-modify-write
operation.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch writes 0 (actually, what really matters is that the
LSB is cleared) to the system time msr before shutting down
the machine for kexec.
Without it, we can have a random memory location being written
when the guest comes back
It overrides the functions shutdown, used in the path of kernel_kexec() (sys.c)
and crash_shutdown, used in the path of crash_kexec() (kexec.c)
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
it will allow external users to call it. It is mainly
useful for routines that will override its machine_ops
field for its own special purposes, but want to call the
normal shutdown routine after they're done
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch a llows machine_crash_shutdown to
be replaced, just like any of the other functions
in machine_ops
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.
Don't report the feature if two dimensional paging is enabled.
[avi:
- guest/host split
- fix 32-bit truncation issues
- adjust to mmu_op
- adjust to ->release_*() renamed
- add ->release_pud()]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.
Don't report the feature if two dimensional paging is enabled.
[avi:
- one mmu_op hypercall instead of one per op
- allow 64-bit gpa on hypercall
- don't pass host errors (-ENOMEM) to guest]
[akpm: warning fix on i386]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The patch moves the PIT model from userspace to kernel, and increases
the timer accuracy greatly.
[marcelo: make last_injected_time per-guest]
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Tested-and-Acked-by: Alex Davis <alex14641@yahoo.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Most Intel hosts have a stable tsc, and playing with the offset only
reduces accuracy. By limiting tsc offset adjustment only to forward updates,
we effectively disable tsc offset adjustment on these hosts.
Signed-off-by: Avi Kivity <avi@qumranet.com>
In the current inject_page_fault path KVM only checks if there is another PF
pending and injects a DF then. But it has to check for a pending DF too to
detect a shutdown condition in the VCPU. If this is not detected the VCPU goes
to a PF -> DF -> PF loop when it should triple fault. This patch detects this
condition and handles it with an KVM_SHUTDOWN exit to userspace.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Create large pages mappings if the guest PTE's are marked as such and
the underlying memory is hugetlbfs backed. If the largepage contains
write-protected pages, a large pte is not used.
Gives a consistent 2% improvement for data copies on ram mounted
filesystem, without NPT/EPT.
Anthony measures a 4% improvement on 4-way kernbench, with NPT.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Mark zapped root pagetables as invalid and ignore such pages during lookup.
This is a problem with the cr3-target feature, where a zapped root table fools
the faulting code into creating a read-only mapping. The result is a lockup
if the instruction can't be emulated.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
In two case statements, use the ever popular 'i' instead of index:
arch/x86/kvm/x86.c:1063:7: warning: symbol 'index' shadows an earlier one
arch/x86/kvm/x86.c:1000:9: originally declared here
arch/x86/kvm/x86.c:1079:7: warning: symbol 'index' shadows an earlier one
arch/x86/kvm/x86.c:1000:9: originally declared here
Make it static.
arch/x86/kvm/x86.c:1945:24: warning: symbol 'emulate_ops' was not declared. Should it be static?
Drop the return statements.
arch/x86/kvm/x86.c:2878:2: warning: returning void-valued expression
arch/x86/kvm/x86.c:2944:2: warning: returning void-valued expression
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Fixes sparse warning as well.
arch/x86/kvm/svm.c:69:15: warning: symbol 'iopm_base' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Nesting __emulate_2op_nobyte inside__emulate_2op produces many shadowed
variable warnings on the internal variable _tmp used by both macros.
Change the outer macro to use __tmp.
Avoids a sparse warning like the following at every call site of __emulate_2op
arch/x86/kvm/x86_emulate.c:1091:3: warning: symbol '_tmp' shadows an earlier one
arch/x86/kvm/x86_emulate.c:1091:3: originally declared here
[18 more warnings suppressed]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Replaces open-coded mask calculation in macros.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This is the guest part of kvm clock implementation
It does not do tsc-only timing, as tsc can have deltas
between cpus, and it did not seem worthy to me to keep
adjusting them.
We do use it, however, for fine-grained adjustment.
Other than that, time comes from the host.
[randy dunlap: add missing include]
[randy dunlap: disallow on Voyager or Visual WS]
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This is the host part of kvm clocksource implementation. As it does
not include clockevents, it is a fairly simple implementation. We
only have to register a per-vcpu area, and start writing to it periodically.
The area is binary compatible with xen, as we use the same shadow_info
structure.
[marcelo: fix bad_page on MSR_KVM_SYSTEM_TIME]
[avi: save full value of the msr, even if enable bit is clear]
[avi: clear previous value of time_page]
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch implements the Last Branch Record Virtualization (LBRV) feature of
the AMD Barcelona and Phenom processors into the kvm-amd module. It will only
be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So
there is no increased world switch overhead when the guest doesn't use these
MSRs.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch changes the kvm-amd module to allocate the SVM MSR permission map
per VCPU instead of a global map for all VCPUs. With this we have more
flexibility allowing specific guests to access virtualized MSRs. This is
required for LBR virtualization.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Change the parameter of the init_vmcb() function in the kvm-amd module from
struct vmcb to struct vcpu_svm.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Looking at Intel Volume 3b, page 148, table 20-11 and noticed
that the field name is 'Deliver' not 'Deliever'. Attached patch changes
the define name and its user in vmx.c
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch contains the SVM architecture dependent changes for KVM to enable
support for the Nested Paging feature of AMD Barcelona and Phenom processors.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch contains the changes to the KVM MMU necessary for support of the
Nested Paging feature in AMD Barcelona and Phenom Processors.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The load_pdptrs() function is required in the SVM module for NPT support.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The mapping function for the nonpaging case in the softmmu does basically the
same as required for Nested Paging. Make this function generic so it can be
used for both.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The generic x86 code has to know if the specific implementation uses Nested
Paging. In the generic code Nested Paging is called Two Dimensional Paging
(TDP) to avoid confusion with (future) TDP implementations of other vendors.
This patch exports the availability of TDP to the generic x86 code.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
To disable the use of the Nested Paging feature even if it is available in
hardware this patch adds a module parameter. Nested Paging can be disabled by
passing npt=0 to the kvm_amd module.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Let SVM detect if the Nested Paging feature is available on the hardware.
Disable it to keep this patch series bisectable.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
By moving the SVM feature detection from the each_cpu code to the hardware
setup code it runs only once. As an additional advance the feature check is now
available earlier in the module setup process.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch makes the EFER register accessible on a 32bit KVM host. This is
necessary to boot 32 bit PAE guests under SVM.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
To allow access to the EFER register in 32bit KVM the EFER specific code has to
be exported to the x86 generic code. This patch does this in a backwards
compatible manner.
[avi: add check for EFER-less hosts]
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch aligns the bits the guest can set in the EFER register with the
features in the host processor. Currently it lets EFER.NX disabled if the
processor does not support it and enables EFER.LME and EFER.LMA only for KVM on
64 bit hosts.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch give the SVM and VMX implementations the ability to add some bits
the guest can set in its EFER register.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
To allow TLB entries to be retained across VM entry and VM exit, the VMM
can now identify distinct address spaces through a new virtual-processor ID
(VPID) field of the VMCS.
[avi: drop vpid_sync_all()]
[avi: add "cc" to asm constraints]
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Currently an mmio guest pte is encoded in the shadow pagetable as a
not-present trapping pte, with the SHADOW_IO_MARK bit set. However
nothing is ever done with this information, so maintaining it is a
useless complication.
This patch moves the check for mmio to before shadow ptes are instantiated,
so the shadow code is never invoked for ptes that reference mmio. The code
is simpler, and with future work, can be made to handle mmio concurrently.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Certain x86 instructions use bits 3:5 of the byte following the opcode as an
opcode extension, with the decode sometimes depending on bits 6:7 as well.
Add support for this in the main decoding table rather than an ad-hock
adaptation per opcode.
Signed-off-by: Avi Kivity <avi@qumranet.com>
A guest partial guest pte write will leave shadow_trap_nonpresent_pte
in spte, which generates a vmexit at the next guest access through that pte.
This patch improves this by reading the full guest pte in advance and thus
being able to update the spte and eliminate the vmexit.
This helps pae guests which use two 32-bit writes to set a single 64-bit pte.
[truncation fix by Eric]
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
I must have disabled this due to other bugs which were fixed over
time. And this is needed in order for child devices of "pmu"
to get proper resource values.
Signed-off-by: David S. Miller <davem@davemloft.net>
It's completely superfluous, CONFIG_COMPAT is sufficient.
What this used to be is an umbrella for enabling code shared
by all 32-bit compat binary support types. But with the
removal of SunOS and Solaris support, the only one left is
Linux 32-bit ELF.
Update defconfig.
Signed-off-by: David S. Miller <davem@davemloft.net>
Refer to chip as "SPARC" throughout.
Say 32-bit SPARC and 64-bit SPARC rather than mentioning specific
chips such like UltraSPARC, as appropriate.
Remove non-sense help text referring to things that will never appear
on a SPARC system, such as EISA busses etc.
Use "help" instead of "--help--"
Signed-off-by: David S. Miller <davem@davemloft.net>
Kernel bugzilla 10273
As reported by Jos van der Ende, ever since commit
5a606b72a4 ("[SPARC64]: Do not ACK an
INO if it is disabled or inprogress.") sun4u interrupts
can get stuck.
What this changset did was add the following conditional to
the various IRQ chip ->enable() handlers on sparc64:
if (unlikely(desc->status & (IRQ_DISABLED|IRQ_INPROGRESS)))
return;
which is correct, however it means that special care is needed
in the ->enable() method.
Specifically we must put the interrupt into IDLE state during
an enable, or else it might never be sent out again.
Setting the INO interrupt state to IDLE resets the state machine,
the interrupt input to the INO is retested by the hardware, and
if an interrupt is being signalled by the device, the INO
moves back into TRANSMIT state, and an interrupt vector is sent
to the cpu.
The two sun4v IRQ chip handlers were already doing this properly,
only sun4u got it wrong.
Signed-off-by: David S. Miller <davem@davemloft.net>
OK, so 25-mm1 gave a lockdep error which made me look into this.
The first thing that I noticed was the horrible mess; the second thing I
saw was hacks like: 71e93d1561
The problem is that arch idle routines are somewhat inconsitent with
their IRQ state handling and instead of fixing _that_, we go paper over
the problem.
So the thing I've tried to do is set a standard for idle routines and
fix them all up to adhere to that. So the rules are:
idle routines are entered with IRQs disabled
idle routines will exit with IRQs enabled
Nearly all already did this in one form or another.
Merge the 32 and 64 bit bits so they no longer have different bugs.
As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated
irq-enable.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
so will disable that feature by default, and only enable that via
pci=check_enable_amd_mmconf or for system match with dmi table.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
normally BIOSes assign io/mmio range to different HT links without
overlapping, even same node same link should get non overlapping
entries.
but Rafael L. Wysocki's buggy BIOS creates a link with overlapping
entries for mmio and io:
node 0 link 0: io port [1000, ffffff]
node 0 link 0: mmio [e0000000, efffffff]
node 0 link 0: mmio [a0000, bffff]
node 0 link 0: mmio [80000000, ffffffff]
try to merge them and we will get:
bus: [00, ff] on node 0 link 0
bus: 00 index 0 io port: [0, ffff]
bus: 00 index 1 mmio: [80000000, fcffffffff]
bus: 00 index 2 mmio: [a0000, bffff]
so later we will reduce the chance to assign used resource to
unassigned device.
Reported-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[mingo@elte.hu: split from "x86_64: get boot_cpu_id as early for k8_scan_nodes]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
if there's only one root bus there's no need to split resources.
This patch fixes the issue described at:
http://lkml.org/lkml/2008/4/10/304
Reported-and-bisected-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
some bioses give same range to mmconf for fam10h msr, and mmio for node/link.
fam10h msr will overide mmio for node/link.
so we can not assign range to devices under node/link for unassigned resources.
this patch will take range out from the mmio for node/link
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
scan AMD opteron io/mmio routing to make sure every pci root bus get correct
resource range. Thus later pci scan could assign correct resource to device
with unassigned resource.
this can fix a system without _CRS for multi pci root bus.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
... so we use the same code with Quad core cpu as old opteron.
This patch is useful when acpi=off or _PXM is not there in DSDT.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Currently, on an amd k8 system with multi ht chains, the numa_node of
pci devices under /sys/devices/pci0000:80/* is always 0, even if that
chain is on node 1 or 2 or 3.
Workaround: pcibus_to_node(bus) is used when we want to get the node that
pci_device is on.
In struct device, we already have numa_node member, and we could use
dev_to_node()/set_dev_node() to get and set numa_node in the device.
set_dev_node is called in pci_device_add() with pcibus_to_node(bus),
and pcibus_to_node uses bus->sysdata for nodeid.
The problem is when pci_add_device is called, bus->sysdata is not assigned
correct nodeid yet. The result is that numa_node will always be 0.
pcibios_scan_root and pci_scan_root could take sysdata. So we need to get
mp_bus_to_node mapping before these two are called, and thus
get_mp_bus_to_node could get correct node for sysdata in root bus.
In scanning of the root bus, all child busses will take parent bus sysdata.
So all pci_device->dev.numa_node will be assigned correctly and automatically.
Later we could use dev_to_node(&pci_dev->dev) to get numa_node, and we
could also could make other bus specific device get the correct numa_node
too.
This is an updated version of pci_sysdata and Jeff's pci_domain patch.
[ mingo@elte.hu: build fix ]
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
doesn't need to check if it is type1 or type2, we can use raw_pci_ops
directly.
also make pci_direct_conf1 static again.
anyway is there system with type 2 and mmconf support?
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
mmconfig is only used to access extended configuration space.
so don't need to reject MFG that only have one entry and only handle bus0.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
some BIOS only let AMD fam 10h handle bus0, and nvidia mcp55/ck804
to handle other buses. at that case MCFG will cover all over them.
but with acpi=off, we can not use MCFG. this patch will double check
the busnbits, and if it is less handling 256 bues, and acpi=off
will forcely reset the mmconf in msr, so we still use mmconf in above case.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
so even booting kernel with acpi=off or even MCFG is not there, we still can
use MMCONFIG.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Greg KH <greg@kroah.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
So we can use MMCONF when MMCONF is not set by BIOS
using TOP_MEM2 msr to get memory top, and try to scan fam10h mmio routing to
make sure the range is not conflicted with some prefetch MMIO that is above 4G.
(current only LinuxBIOS assign 64 bit mmio above 4G for some co-processor)
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
reuse pci_cfg_space_size but skip check pci express and pci-x CAP ID.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Patch
"x86: validate against ACPI motherboard resources"
changed the mmconf init sequence, and init MMCONF late in acpi_init.
here change it back to old sequence:
1. check hostbridge in early
2. check MCFG with e820 in early
3. if all fail, will check MCFg with acpi _CRS in acpi_init
So we can make MCONF working again when acpi=off is set if hostbridge
support that.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Cc: Greg KH <greg@kroah.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
For x86_64, need to free pci_mmcfg_virt, and iounmap some pointers
when MMCONF is not reserved in E820 or acpi _CRS and get rejected.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Cc: Greg KH <greg@kroah.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This path adds validation of the MMCONFIG table against the ACPI reserved
motherboard resources. If the MMCONFIG table is found to be reserved in
ACPI, we don't bother checking the E820 table. The PCI Express firmware
spec apparently tells BIOS developers that reservation in ACPI is required
and E820 reservation is optional, so checking against ACPI first makes
sense. Many BIOSes don't reserve the MMCONFIG region in E820 even though
it is perfectly functional, the existing check needlessly disables MMCONFIG
in these cases.
In order to do this, MMCONFIG setup has been split into two phases. If PCI
configuration type 1 is not available then MMCONFIG is enabled early as
before. Otherwise, it is enabled later after the ACPI interpreter is
enabled, since we need to be able to execute control methods in order to
check the ACPI reserved resources. Presently this is just triggered off
the end of ACPI interpreter initialization.
There are a few other behavioral changes here:
- Validate all MMCONFIG configurations provided, not just the first one.
- Validate the entire required length of each configuration according to
the provided ending bus number is reserved, not just the minimum required
allocation.
- Validate that the area is reserved even if we read it from the chipset
directly and not from the MCFG table. This catches the case where the
BIOS didn't set the location properly in the chipset and has mapped it
over other things it shouldn't have.
This also cleans up the MMCONFIG initialization functions so that they
simply do nothing if MMCONFIG is not compiled in.
Based on an original patch by Rajesh Shah from Intel.
[akpm@linux-foundation.org: many fixes and cleanups]
Signed-off-by: Robert Hancock <hancockr@shaw.ca>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Andi Kleen <ak@suse.de>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3:
x86_64/mm: check and print vmemmap allocation continuous
x86_64: fix setup_node_bootmem to support big mem excluding with memmap
x86_64: make reserve_bootmem_generic() use new reserve_bootmem()
mm: allow reserve_bootmem() cross nodes
mm: offset align in alloc_bootmem()
mm: fix alloc_bootmem_core to use fast searching for all nodes
mm: make mem_map allocation continuous
On big systems with lots of memory, don't print out too much during
bootup, and make it easy to find if it is continuous.
on 256G 8 sockets system will get
[ffffe20000000000-ffffe20002bfffff] PMD -> [ffff810001400000-ffff810003ffffff] on node 0
[ffffe2001c700000-ffffe2001c7fffff] potential offnode page_structs
[ffffe20002c00000-ffffe2001c7fffff] PMD -> [ffff81000c000000-ffff8100255fffff] on node 0
[ffffe20038700000-ffffe200387fffff] potential offnode page_structs
[ffffe2001c800000-ffffe200387fffff] PMD -> [ffff810820200000-ffff81083c1fffff] on node 1
[ffffe20040000000-ffffe2007fffffff] PUD ->ffff811027a00000 on node 2
[ffffe20038800000-ffffe2003fffffff] PMD -> [ffff811020200000-ffff8110279fffff] on node 2
[ffffe20054700000-ffffe200547fffff] potential offnode page_structs
[ffffe20040000000-ffffe200547fffff] PMD -> [ffff811027c00000-ffff81103c3fffff] on node 2
[ffffe20070700000-ffffe200707fffff] potential offnode page_structs
[ffffe20054800000-ffffe200707fffff] PMD -> [ffff811820200000-ffff81183c1fffff] on node 3
[ffffe20080000000-ffffe200bfffffff] PUD ->ffff81202fa00000 on node 4
[ffffe20070800000-ffffe2007fffffff] PMD -> [ffff812020200000-ffff81202f9fffff] on node 4
[ffffe2008c700000-ffffe2008c7fffff] potential offnode page_structs
[ffffe20080000000-ffffe2008c7fffff] PMD -> [ffff81202fc00000-ffff81203c3fffff] on node 4
[ffffe200a8700000-ffffe200a87fffff] potential offnode page_structs
[ffffe2008c800000-ffffe200a87fffff] PMD -> [ffff812820200000-ffff81283c1fffff] on node 5
[ffffe200c0000000-ffffe200ffffffff] PUD ->ffff813037a00000 on node 6
[ffffe200a8800000-ffffe200bfffffff] PMD -> [ffff813020200000-ffff8130379fffff] on node 6
[ffffe200c4700000-ffffe200c47fffff] potential offnode page_structs
[ffffe200c0000000-ffffe200c47fffff] PMD -> [ffff813037c00000-ffff81303c3fffff] on node 6
[ffffe200c4800000-ffffe200e07fffff] PMD -> [ffff813820200000-ffff81383c1fffff] on node 7
instead of a very long print out...
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
typical case: four sockets system, every node has 4g ram, and we are using:
memmap=10g$4g
to mask out memory on node1 and node2
when numa is enabled, early_node_mem is used to get node_data and node_bootmap.
if it can not get memory from the same node with find_e820_area(), it will
use alloc_bootmem to get buff from previous nodes.
so check it and print out some info about it.
need to move early_res_to_bootmem into every setup_node_bootmem.
and it takes range that node has. otherwise alloc_bootmem could return addr
that reserved early.
depends on "mm: make reserve_bootmem can crossed the nodes".
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
"mm: make reserve_bootmem can crossed the nodes" provides new
reserve_bootmem(), let reserve_bootmem_generic() use that.
reserve_bootmem_generic() is used to reserve initramdisk, so this way
we can make sure even when bootloader or kexec load ranges cross the
node memory boundaries, reserve_bootmem still works.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-generic-bitops-v3:
x86, bitops: select the generic bitmap search functions
x86: include/asm-x86/pgalloc.h/bitops.h: checkpatch cleanups - formatting only
x86: finalize bitops unification
x86, UML: remove x86-specific implementations of find_first_bit
x86: optimize find_first_bit for small bitmaps
x86: switch 64-bit to generic find_first_bit
x86: generic versions of find_first_(zero_)bit, convert i386
bitops: use __fls for fls64 on 64-bit archs
generic: implement __fls on all 64-bit archs
generic: introduce a generic __fls implementation
x86: merge the simple bitops and move them to bitops.h
x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps
x86, uml: fix uml with generic find_next_bit for x86
x86: change x86 to use generic find_next_bit
uml: Kconfig cleanup
uml: fix build error
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootparam:
x86, boot: Document for linked list of struct setup_data
x86, boot: export linked list of struct setup_data via debugfs
x86, boot: add linked list of struct setup_data
x86, boot: add free_early to early reservation machanism
Export linked list of struct setup_data via debugfs.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds a field of 64-bit physical pointer to NULL terminated
single linked list of struct setup_data to real-mode kernel
header. This is used as a more extensible boot parameters passing
mechanism.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add free_early to early reservation mechanism - this way early bootup
failure paths can stop wasting memory.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
disable /dev/mem mmap of RAM with PAT. It makes things safer and
eliminates aliasing. A future improvement would be to avoid the
range_is_allowed duplication.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce GENERIC_FIND_FIRST_BIT and GENERIC_FIND_NEXT_BIT in
lib/Kconfig, defaulting to off. An arch that wants to use the
generic implementation now only has to use a select statement
to include them.
I added an always-y option (X86_CPU) to arch/x86/Kconfig.cpu
and used that to select the generic search functions. This
way ARCH=um SUBARCH=i386 automatically picks up the change
too, and arch/um/Kconfig.i386 can therefore be simplified a
bit. ARCH=um SUBARCH=x86_64 does things differently, but
still compiles fine. It seems that a "def_bool y" always
wins over a "def_bool n"?
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86 has been switched to the generic versions of find_first_bit
and find_first_zero_bit, but the original versions were retained.
This patch just removes the now unused x86-specific versions.
also update UML.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Switch x86_64 to generic find_first_bit. The x86_64-specific
implementation is not removed.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Generic versions of __find_first_bit and __find_first_zero_bit
are introduced as simplified versions of __find_next_bit and
__find_next_zero_bit. Their compilation and use are guarded by
a new config variable GENERIC_FIND_FIRST_BIT.
The generic versions of find_first_bit and find_first_zero_bit
are implemented in terms of the newly introduced __find_first_bit
and __find_first_zero_bit.
This patch does not remove the i386-specific implementation,
but it does switch i386 to use the generic functions by setting
GENERIC_FIND_FIRST_BIT=y for X86_32.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some of those can be written in such a way that the same
inline assembly can be used to generate both 32 bit and
64 bit code.
For ffs and fls, x86_64 unconditionally used the cmov
instruction and i386 unconditionally used a conditional
branch over a mov instruction. In the current patch I
chose to select the version based on the availability
of the cmov instruction instead. A small detail here is
that x86_64 did not previously set CONFIG_X86_CMOV=y.
Improved comments for ffs, ffz, fls and variations.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The versions with inline assembly are in fact slower on the machines I
tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD
Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid
crashing the benchmark.
Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size
1...512, for each possible bitmap with one bit set, for each possible
offset: find the position of the first bit starting at offset. If you
follow ;). Times include setup of the bitmap and checking of the
results.
Athlon Xeon Opteron 32/64bit
x86-specific: 0m3.692s 0m2.820s 0m3.196s / 0m2.480s
generic: 0m2.622s 0m1.662s 0m2.100s / 0m1.572s
If the bitmap size is not a multiple of BITS_PER_LONG, and no set
(cleared) bit is found, find_next_bit (find_next_zero_bit) returns a
value outside of the range [0, size]. The generic version always returns
exactly size. The generic version also uses unsigned long everywhere,
while the x86 versions use a mishmash of int, unsigned (int), long and
unsigned long.
Using the generic version does give a slightly bigger kernel, though.
defconfig: text data bss dec hex filename
x86-specific: 4738555 481232 626688 5846475 5935cb vmlinux (32 bit)
generic: 4738621 481232 626688 5846541 59360d vmlinux (32 bit)
x86-specific: 5392395 846568 724424 6963387 6a40bb vmlinux (64 bit)
generic: 5392458 846568 724424 6963450 6a40fa vmlinux (64 bit)
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
pointed out by Linus: arch/um/Kconfig.x86_64 should
include arch/x86/Kconfig.cpu instead of defining those
symbols itself.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix:
arch/um/os-Linux/helper.c: In function 'run_helper':
arch/um/os-Linux/helper.c:73: error: 'PATH_MAX' undeclared (first use in this function)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes:
x86 PAT: decouple from nonpromisc devmem
x86 PAT: tone down debugging messages
Linus pointed it out that PAT should not depend on NONPROMISC_DEVMEM.
Also make PAT non-default.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Rothwell reported that linux-next did not build on powerpc64.
make optimized inlining dependent on architecture opt-in.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
add CONFIG_OPTIMIZE_INLINING=y.
allow gcc to optimize the kernel image's size by uninlining
functions that have been marked 'inline'. Previously gcc was
forced by Linux to always-inline these functions via a gcc
attribute:
#define inline inline __attribute__((always_inline))
Especially when the user has already selected
CONFIG_OPTIMIZE_FOR_SIZE=y this can make a huge difference in
kernel image size (using a standard Fedora .config):
text data bss dec hex filename
5613924 562708 3854336 10030968 990f78 vmlinux.before
5486689 562708 3854336 9903733 971e75 vmlinux.after
that's a 2.3% text size reduction (!).
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch fixes section mismatch warnings in unlock_ExtINT_logic().
WARNING: arch/x86/kernel/built-in.o(.text+0x14a92): Section mismatch in reference from the function unlock_ExtINT_logic()
to the function .init.text:find_isa_irq_pin()
The function unlock_ExtINT_logic() references
the function __init find_isa_irq_pin().
This is often because unlock_ExtINT_logic lacks a __init
annotation or the annotation of find_isa_irq_pin is wrong.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix following warning:
WARNING: arch/x86/kernel/built-in.o(.text+0x12cc9): Section mismatch in reference from the function unlock_ExtINT_logic()
unlock_ExtINT_logic() is only used by __init check_timer(). Annotate unlock_ExtINT_logic() witch __init.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix folowing warning:
WARNING: arch/x86/kernel/built-in.o(.text+0x10799): Section mismatch in reference from the function uniq_ioapic_id()
uniq_ioapic_id() is only used by __init mp_register_ioapic(). Annotate uniq_ioapic_id() with __init.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch fixes section mismatch warnings of __cpuinit
setup_trampoline() on 32-bit host.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
get_bios_ebda() exists in asm/rio.h and asm/bios_ebda.h.
This patch removes the one in asm/rio.h.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the magic number in the third argment of div_sc().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the magic number in the second argument of clocksource_hz2mult()
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
memset and NULL check after alloc_bootmem() are unnecessary.
Because it returns zeroed memory and it never return NULL.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use bitmap library for pin_programmed rather than reinvent
bitmaps.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove duplicate code by using MP_intsrc_info() in mpparse.c
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use BUILD_BUG_ON() instead of compile-time error technique with
extern non-exsistent function.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that there are no more special cases in sys32_ptrace, we
can convert to using the generic compat_sys_ptrace entry point.
The sys32_ptrace function gets simpler and becomes compat_arch_ptrace.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This removes the special-case handling for PTRACE_GETSIGINFO
and PTRACE_SETSIGINFO from x86_64's sys32_ptrace. The generic
compat_ptrace_request code handles these.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This lifts the set_fs(USER_DS) call for signal handler setup out of the
three places copying the same code into the one place that calls them
all. There is no change in what it does.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This lifts the code diddling the TF and DF bits for signal handler setup
out of the several places copying the same code into the one place that
calls them all. There is no change in what it does.
I also separated the recently-added DF bit clearing from the TF diddling.
The compiler turns them back into one instruction anyway. The tossing
in of DF to the same line of code with no new comments was a bit more
arcane than seems wise.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It is claimed that NexGen CPUs were never shipped:
http://lkml.org/lkml/2008/4/20/179
Also, the kernel support for these chips has been broken for
a long time, the code intended to support NexGen thereby being
essentially dead.
As an outcome of the discussion that can be found using the URL
above, this patch removes the NexGen support altogether.
The changes in this patch survived a defconfig build for i386, a
couple of successful randconfig builds, as well as a runtime test,
which consisted in booting a 32-bit x86 box up to the shell prompt.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In arch/x86/kernel/setup_64.c, the standard_io_resources array
is needlessly defined as global. This patch makes this variable
static.
This patch was successfully build-tested using the defconfig
for x86_64. Runtime test was performed by booting a 64-bit x86
box up to the shell prompt.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There are no users for the function amd_init_cpu() defined in
arch/x86/kernel/cpu/amd.c. This patch removes this routine.
This patch was build-tested using defconfigs for i386 and x86_64,
and a few randconfig instances. Runtime tests were performed by
booting 32- and 64-bit x86 boxen up to the shell prompt.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
At least on my Barcelona, I see MCE log entries after cold boot caused
by BIOS not properly clearing the respective registers. Therefore, this
patch extends the workaround to families 0x10 and 0x11 (the latter just
for completeness, I have nothing to verify this against).
At the same time, provide a way to make these entries visible via the
'mce=bootlog' command line option even on these machines.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
.. since it uses ILL_BADSTK (which is meaningless in the context of
SIGSEGV).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There apparently was an unnoticed conflict between an earlier patch to
this file and mine (d1e084746b), which
I noticed only now. I suppose a change like the one below (untested) is
needed; I didn't get any response on a confirmation request for this from
the submitter of the first patch.
The issue is the writing of the 'checkbit' member at the end of
setup_intel_arch_watchdog(), which my patch made go to intel_arch_wd_ops
rather than wd_ops.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Two prior changes resulted in the "ecx" clobber being lost.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus reported these excessive debug printouts:
> Overlap at 0xe0300000-0xe0400000
> Overlap at 0xe0300000-0xe0380000
> Overlap at 0xe0300000-0xe0400000
> Overlap at 0xe0300000-0xe0400000
> Overlap at 0xe0300000-0xe0400000
> Overlap at 0xe0300000-0xe0400000
> Overlap at 0xe0300000-0xe0400000
turn that into a pr_debug().
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (49 commits)
[POWERPC] Add zImage.iseries to arch/powerpc/boot/.gitignore
[POWERPC] bootwrapper: fix build error on virtex405-head.S
[POWERPC] 4xx: Fix 460GT support to not enable FPU
[POWERPC] 4xx: Add NOR FLASH entries to Canyonlands and Glacier dts
[POWERPC] Xilinx: of_serial support for Xilinx uart 16550.
[POWERPC] Xilinx: boot support for Xilinx uart 16550.
[POWERPC] celleb: Add support for PCI Express
[POWERPC] celleb: Move miscellaneous files for Beat
[POWERPC] celleb: Move a file for SPU on Beat
[POWERPC] celleb: Move files for Beat mmu and iommu
[POWERPC] celleb: Move files for Beat hvcall interfaces
[POWERPC] celleb: Move the SCC related code for celleb
[POWERPC] celleb: Move the files for celleb base support
[POWERPC] celleb: Consolidate io-workarounds code
[POWERPC] cell: Generalize io-workarounds code
[POWERPC] Add CONFIG_PPC_PSERIES_DEBUG to enable debugging for platforms/pseries
[POWERPC] Convert from DBG() to pr_debug() in platforms/pseries/
[POWERPC] Register udbg console early on pseries LPAR
[POWERPC] Mark udbg console as CON_ANYTIME, ie. callable early in boot
[POWERPC] Set udbg_console index to 0
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-pat:
generic: add ioremap_wc() interface wrapper
/dev/mem: make promisc the default
pat: cleanups
x86: PAT use reserve free memtype in mmap of /dev/mem
x86: PAT phys_mem_access_prot_allowed for dev/mem mmap
x86: PAT avoid aliasing in /dev/mem read/write
devmem: add range_is_allowed() check to mmap of /dev/mem
x86: introduce /dev/mem restrictions with a config option
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-xen-next: (52 commits)
xen: add balloon driver
xen: allow compilation with non-flat memory
xen: fold xen_sysexit into xen_iret
xen: allow set_pte_at on init_mm to be lockless
xen: disable preemption during tlb flush
xen pvfb: Para-virtual framebuffer, keyboard and pointer driver
xen: Add compatibility aliases for frontend drivers
xen: Module autoprobing support for frontend drivers
xen blkfront: Delay wait for block devices until after the disk is added
xen/blkfront: use bdget_disk
xen: Make xen-blkfront write its protocol ABI to xenstore
xen: import arch generic part of xencomm
xen: make grant table arch portable
xen: replace callers of alloc_vm_area()/free_vm_area() with xen_ prefixed one
xen: make include/xen/page.h portable moving those definitions under asm dir
xen: add resend_irq_on_evtchn() definition into events.c
Xen: make events.c portable for ia64/xen support
xen: move events.c to drivers/xen for IA64/Xen support
xen: move features.c from arch/x86/xen/features.c to drivers/xen
xen: add missing definitions in include/xen/interface/vcpu.h which ia64/xen needs
...
Add some autogenerated files to various .gitignore files
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clean up the codepath, remove alignment restrictions and do sanity
checking of the end result, to make sure we patched the right site.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
kernel_text_address returns true even for modules which is not wanted
in text_poke. Use core_kernel_text instead.
This is a regression introduced in e587cadd8f
which caused occasionaly crashes after suspend/resume.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <andi@firstfloor.org>
CC: pageexec@freemail.hu
CC: H. Peter Anvin <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The actual warning is
arch/arm/mach-ns9xxx/irq.c:65:6: warning: symbol 'handle_prio_irq' was not declared. Should it be static?
Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
When I copy-adapted handle_level_irq I skipped note_interrupt because
I considered it unimportant. If I had understand its importance I would
have saved myself some ours of debugging.
Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
When an irq is reported all lower prio irqs are masked until the current
irq is acked. So never leave handle_prio_irq without acking.
desc->status & IRQ_INPROGRESS should never become true because the current
irq is masked until it is acked, too.
Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Otherwise all sorts of bad things can happen, including
spurious softlockup reports.
Other platforms have this same bug, in one form or
another, just don't see the issue because they
don't sleep as long as sparc64 can in NOHZ.
Thanks to some brilliant debugging by Peter Zijlstra.
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no real reason we can't support sparsemem/discontigmem, so do so.
This is mostly useful to support hotplug memory.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
xen_sysexit and xen_iret were doing essentially the same thing. Rather
than having a separate implementation for xen_sysexit, we can just strip
the stack back to an iret frame and jump into xen_iret. This removes
a lot of code and complexity - specifically, another critical region.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The usual pagetable locking protocol doesn't seem to apply to updates
to init_mm, so don't rely on preemption being disabled in xen_set_pte_at
on init_mm.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Various places in the kernel flush the tlb even though preemption doens't
guarantee the tlb flush is happening on any particular CPU. In many cases
this doesn't seem to matter, so don't make a fuss about it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
split out x86 specific part from grant-table.c and
allow ia64/xen specific initialization.
ia64/xen grant table is based on pseudo physical address
(guest physical address) unlike x86/xen. On ia64 init_mm
doesn't map identity straight mapped area.
ia64/xen specific grant table initialization is necessary.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
move arch/x86/xen/events.c undedr drivers/xen to share codes
with x86 and ia64. And minor adjustment to compile.
ia64/xen also uses events.c
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ia64/xen also uses it too. Move it into common place so that
ia64/xen can share the code.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use jmp rather than call for the iret fixup, so its consistent with
the sysexit fixup, and it simplifies the stack (which is already
complex).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Mask MCE/MCA out of cpu caps. Its harmless to leave them there, but
it does prevent the kernel from starting an unnecessary thread.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If an event comes in while events are currently being processed, then
just increment the counter and have the outer event loop reprocess the
pending events. This prevents unbounded recursion on heavy event
loads (of course massive event storms will cause infinite loops).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
retrigger_dynirq() was incomplete, and didn't properly set the event
to be pending again. It doesn't seem to actually get used.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Xen supports the notion of a debug interrupt which can be triggered
from the console. For now this is implemented to show pending events,
masks and each CPU's pending event set.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
64-bit Xen supports sysenter for 32-bit guests, so support its
use. (sysenter is faster than int $0x80 in 32-on-64.)
sysexit is still not supported, so we fake it up using iret.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
All pagetables need fundamentally the same setup and destruction, so
just use the same code for everything.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Make KERNEL_PGD_PTRS common, as previously it was only being defined
for 32-bit.
There are a couple of follow-on changes from this:
- KERNEL_PGD_PTRS was being defined in terms of USER_PGD_PTRS. The
definition of USER_PGD_PTRS doesn't really make much sense on x86-64,
since it can have two different user address-space configurations.
I renamed USER_PGD_PTRS to KERNEL_PGD_BOUNDARY, which is meaningful
for all of 32/32, 32/64 and 64/64 process configurations.
- USER_PTRS_PER_PGD was also defined and was being used for similar
purposes. Converting its users to KERNEL_PGD_BOUNDARY left it
completely unused, and so I removed it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We can fold the essentially common pte functions together now.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
pte_t always contains a "pte" field for the whole pte value, so make
use of it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Rename (alloc|release)_(pt|pd) to pte/pmd to explicitly match the name
of the appropriate pagetable level structure.
[ x86.git merge work by Mark McLoughlin <markmc@redhat.com> ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Common definitions for 3-level pagetable functions.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Common definitions for 2-level pagetable functions.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add a common arch/x86/mm/pgtable.c file for common pagetable functions.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Convert asm-x86/pgalloc_64.h from macros into functions (#include hell
prevents __*_free_tlb from being inline, but they're probably a bit
big to inline anyway).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use reserve_memtype and free_memtype wrappers for /dev/mem mmaps. The memtype
is slightly complicated here, given that we have to support existing X mappings.
We fallback on UC_MINUS for that.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce phys_mem_access_prot_allowed(), which checks whether the mapping
is possible, without any conflicts and returns success or failure based on that.
phys_mem_access_prot() by itself does not allow failure case. This ability
to return error is needed for PAT where we may have aliasing conflicts.
x86 setup __HAVE_PHYS_MEM_ACCESS_PROT and move x86 specific code out of
/dev/mem into arch specific area.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add xlate and unxlate around /dev/mem read/write. This sets up the mapping
that can be used for /dev/mem read and write without aliasing worries.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch introduces a restriction on /dev/mem: Only non-memory can be
read or written unless the newly introduced config option is set.
The X server needs access to /dev/mem for the PCI space, but it doesn't need
access to memory; both the file permissions and SELinux permissions of /dev/mem
just make X effectively super-super powerful. With the exception of the
BIOS area, there's just no valid app that uses /dev/mem on actual memory.
Other popular users of /dev/mem are rootkits and the like.
(note: mmap access of memory via /dev/mem was already not allowed since
a really long time)
People who want to use /dev/mem for kernel debugging can enable the config
option.
The restrictions of this patch have been in the Fedora and RHEL kernels for
at least 4 years without any problems.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
virtex405-head.S is an assembler file, not a C file; therefore BOOTAFLAGS
is the correct place to set the needed -mcpu=405 flag.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
The AMCC 460GT doesn't have an FPU so let's not enable support for it.
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This patch adds default NOR entries to the AMCC Canyonlands (460EX)
and Glacier (460GT) dts files.
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
The Xilinx 16550 uart core is not a standard 16550 because it uses
word-based addressing rather than byte-based adressing. With
additional properties it is compatible with the open firmware
'ns16550' compatible binding.
This code updates the ns16550 driver to use the reg-offset property
so that the Xilinx UART 16550 can be used with it. The reg-shift
was already being handled.
Signed-off-by: John Linn <john.linn@xilinx.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6: (85 commits)
Blackfin char driver for Blackfin on-chip OTP memory (v3)
Blackfin Serial Driver: fix bug - use mod_timer to replace only add_timer.
Blackfin Serial Driver: the uart break anomaly has been given its own number, so switch to it
Blackfin Serial Driver: use BFIN_UART_NR_PORTS to help SIR driver in uart port.
Blackfin Serial Driver: Fix bug - kernel hangs when accessing uart 0 on bf537 when booting u-boot and linux on uart 1
Blackfin Serial Driver: punt unused lsr variable
Blackfin Serial Driver: Enable IR function when user application (irattach /dev/ttyBFx -s) call TIOCSETD ioctl with line discipline N_IRDA
[Blackfin] arch: add include/boot .gitignore files
[Blackfin] arch: Functional power management support: Add support for cpu frequency scaling
[Blackfin] arch: Functional power management support: Remove broken cpu frequency scaling drivers
[Blackfin] arch: Equalize include files: Add PLL_DIV Masks
[Blackfin] arch: Add a warning about the value of CLKIN.
[Blackfin] arch: take DDR DEVWD into consideration as well for BF548
[Blackfin] arch: Remove the circular buffering mechanism for exceptions
[Blackfin] arch: lose unnecessary dependency on CONFIG_BFIN_ICACHE for MPU
[Blackfin] arch: fix bug - before assign new channel to the map register, need clear the bits first.
[Blackfin] arch: add Blackfin on-chip SIR IrDA driver support
[Blackfin] arch: BF54x memsizes are in mbits, not mbytes
[Blackfin] arch: try to remove condition that causes double fault, by checking current before it gets dereferenced
[Blackfin] arch: Update anomaly list.
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: (23 commits)
sparc: sunzilog uart order
[SPARC64]: Detect trap frames in stack backtraces.
[SPARC64]: %l6 trap return handling no longer necessary.
[SPARC64]: Use trap type stored in pt_regs to handle syscall restart.
[SPARC64]: Store magic cookie and trap type in pt_regs.
[SPARC64]: PROM debug console can be CON_ANYTIME.
sparc64: cleanup after SunOS/Solaris binary emulation removal
sparc: cleanup after SunOS binary emulation removal
[SPARC64]: Add NUMA support.
[SPARC64]: Allocate TSB node-local.
[SPARC64]: NUMA device infrastructure.
[SPARC64]: Kill pci_iommu_table_init() declaration.
[SPARC64]: Once we have the boot cmdline, call parse_early_param()
[SPARC64]: Remove unused asm-sparc64/numnodes.h
[SPARC64]: Decrease SECTION_SIZE_BITS to 30.
[SPARC64]: Initialize MDESC earlier and use lmb_alloc()
[SPARC64]: Use lmb_alloc() for PROM device tree.
[SPARC64]: Call real_setup_per_cpu_areas() earlier and use lmb_alloc().
[SPARC64]: Fully use LMB information in bootmem_init().
[SPARC64]: Start using LMB information in bootmem_init().
...
OSF/1 brk(2) was broken by following one-liner in sys_brk()
(commit 4cc6028d40):
- if (brk < mm->end_code)
+ if (brk < mm->start_brk)
goto out;
The problem is that osf_set_program_attributes()
does update mm->end_code, but not mm->start_brk,
which still contains inappropriate value left from
binary loader, so brk() always fails.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Legacy IDE resources were never properly allocated on most
alpha platforms, so IDE expectedly stopped working after
commit 10f000a2fd (generic
pci_enable_resources).
Always allocate "fixed" PCI resources before doing anything else;
remove Cypress IDE quirk, as it's a generic problem which is
handled in common PCI probe code.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The recent irq cleanups for arch/arm/mach-integrator/time.c and
drivers/char/mwave/tp3780i.c changed the request_irq() dev_id
parameter, but neglected to change the matching free_irq() parameter,
thus creating a bug upon irq de-registration.
Given that the impetus for the changes is not yet accepted upstream,
it is best to revert the irq cleanups.
Mostly. A comment is added to time.c to reduce future confusion,
of type that led to my time.c cleanup in the first place.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This adds support for PCI Express port on Celleb. I/O space of this
PCI Express port is not mapped in memory space. So we use the
io-workaround mechanism to make accesses indirect.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves miscellaneous files for Beat into platforms/cell/.
All files in this patch are used by celleb-beat only.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves SPU support code on Beat into platforms/cell/.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves files for mmu and iommu on Beat into platforms/cell/.
All files in this patch are used by celleb-beat only.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves files for Beat hvcall interfaces into platforms/cell/.
All files in this patch are used by celleb-beat only.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves the SCC (Super Companion Chip) related code for celleb
into platforms/cell/.
All files in this patch are used by celleb-beat and celleb-native
commonly.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves the base code for celleb support into platforms/cell/.
All files in this patch are used by celleb-beat and celleb-native
commonly.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now, we can use generic io-workarounds mechanism and the workaround
code for spider-pci. This changes Celleb PCI code to use spider-pci
code.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This splits cell io-workaround code into spider-pci dependent code and
a generic part, and also moves io-workarounds initialization into
cell_setup_phb.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a DEBUG config setting which turns on all (most) of the debugging
under platforms/pseries.
To have this take effect we need to remove all the #undef DEBUG's, in
various files. We leave the #undef DEBUG in platforms/pseries/lpar.c,
as this enables debugging printks from the low-level hash table routines,
and tends to make your system unusable. If you want those enabled you
still have to turn them on by hand.
Also some of the RAS code has a DEBUG block which causes a functional
change, so I've keyed this off a different (non-existant) debug #define.
This is only enabled if you have PPC_EARLY_DEBUG enabled also.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In pseries/lpar.c, fix some printf specifier mismatches, and add
a newline to one printk.
In pseries/rtasd.c add "rtasd" to some messages to make it clear
where they're coming from.
In pseries/scanlog.c remove the hand-rolled runtime debugging support
in there. This file has been largely unchanged for eons, if we need to
debug it in future we can recompile.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On pseries LPAR we can call the udbg routines, and the udbg console very
early. So mark the udbg console as safe to call early in boot, and register
the udbg console as soon as the udbg routines are hooked up.
This allows platforms/pseries code to use printk() and pr_debug() rather
than needing to call udbg_printf() directly for early debugging. This is
nice because a) it's standard, b) it goes via the printk buffer, and c)
you can get printk time stamps.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The udbg console should be safe to call basically at any time after boot.
It does not need any per-cpu resources or for the cpu to be online, as
long as there is a udbg_putc routine hooked up it should work. So mark it
as CON_ANYTIME.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Because the udbg_console has CON_ENABLED set, it's possible that when we
register it with the console code the index won't be set. This leads to
slightly confusing boot messages like:
[ 0.000000] console [udbg-1] enabled
We could remove CON_ENABLED, but we don't want to do that, we always
want the udbg console to be activated, even if the user specified some
other console on the command line.
The simplest fix seems to be just to set the index to 0 by hand. There
is no issue with duplicate udbg consoles, as we guard against registering
multiple times in register_early_udbg_console().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds the required functionality to fill in all pacas at runtime.
With NR_CPUS=1024
text data bss dec hex filename
137 1704032 0 1704169 1a00e9 arch/powerpc/kernel/paca.o :Before
121 1179744 524288 1704153 1a00d9 arch/powerpc/kernel/paca.o :After
Also remove unneeded #includes from arch/powerpc/kernel/paca.c
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently all iSeries secondary CPUs spin directly on the cpu_start
field in their paca. Make them spin on the global
__secondary_hold_spinloop until after the pacas have been initialised.
As Stephen Rothwell points out, this works at the moment because
__secondary_hold_spinloop is being set already, but iSeries isn't
looking at it :)
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* Removed get_msr(), get_srr0(), and get_srr1() - not used anywhere
* Use STACK_FRAME_OVERHEAD instead of magic number
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Replace two open-coded occurences of the of_get_next_parent() logic.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
As BenH said the other day, it is an "accident" that prom_init.o is
linked with the rest of the kernel. The truth is a little more
subtle, prom_init isn't truly bootloader, it does access kernel data
in a few places.
What we can do is discourage people from adding new code that accesses
data outside of prom_init. And hence this patch; from the script:
# This script checks prom_init.o to see what external symbols it
# is using, if it finds symbols not in the whitelist it returns
# an error. The point of this is to discourage people from
# intentionally or accidentally adding new code to prom_init.c
# which has side effects on other parts of the kernel.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* Removed TI_EXECDOMAIN define as its not used anywhere
* Use STACK_INT_FRAME_SIZE to allow common define of INT_FRAME_SIZE
* Define TI_CPU on both ppc32 & ppc64 (removes an ifdef).
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Use (31-THREAD_SHIFT) to get to thread_info from stack pointer. This makes
the code a bit easier to read and more robust if we ever change THREAD_SHIFT.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove the inclusion of asm-offsets.h from stacktrace.c. It isn't
supposed to be included in C code and it causes problems with multiple
definitions of things.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The fixmap code from x86 allows us to have compile time virtual addresses
that we change the physical addresses of at run time.
This is useful for applications like kmap_atomic, PCI config that is done
via direct memory map, kexec/kdump.
We got ride of CONFIG_HIGHMEM_START as we can now determine a more optimal
location for PKMAP_BASE based on where the fixmap addresses start and
working back from there.
Additionally, the kmap code in asm-powerpc/highmem.h always had debug
enabled. Moved to using CONFIG_DEBUG_HIGHMEM to determine if we should
have the extra debug checking.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Added support to allow an 85xx kernel to be run from a non-zero physical
address (useful for cooperative asymmetric multiprocessing situations and
kdump). The support can be configured at compile time by setting
CONFIG_PAGE_OFFSET, CONFIG_KERNEL_START, and CONFIG_PHYSICAL_START as
desired.
Alternatively, the kernel build can set CONFIG_RELOCATABLE. Setting this
config option causes the kernel to determine at runtime the physical
addresses of CONFIG_PAGE_OFFSET and CONFIG_KERNEL_START. If
CONFIG_RELOCATABLE is set, then CONFIG_PHYSICAL_START has no meaning.
However, CONFIG_PHYSICAL_START will always be used to set the LOAD program
header physical address field in the resulting ELF image.
Currently we are limited to running at a physical address that is a
multiple of 256M. This is due to how we map TLBs to cover
lowmem. This should be fixed to allow 64M or maybe even 16M alignment
in the future. It is considered an error to try and run a kernel at a
non-aligned physical address.
All the magic for this support is accomplished by proper initialization
of the kernel memory subsystem and use of ARCH_PFN_OFFSET.
The use of ARCH_PFN_OFFSET only affects normal memory and not IO mappings.
ioremap uses map_page and isn't affected by ARCH_PFN_OFFSET.
/dev/mem continues to allow access to any physical address in the system
regardless of how CONFIG_PHYSICAL_START is set.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit 0119536cd3 added an assembly
version of strncmp to PowerPC. However, it changed a common header
file between arch/ppc and arch/powerpc without adding strncmp to
arch/ppc. This fixes that omission so that arch/ppc links again.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The MPSC driver and prpmc2800.dts have been modified to use property
'cell-index' as the serial port number, but the early serial console
driver for the mv64x60 has not been modified to use this new property.
This fixes it.
Signed-off-by: Remi Machet (rmachet@slac.stanford.edu)
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If one of the devices of the mv64x60 init fails, the remaining
devices are not initialized.
This changes the code to display an error and continue the
initialization.
Signed-off-by: Remi Machet (rmachet@slac.stanford.edu)
Signed-off-by: Paul Mackerras <paulus@samba.org>
I2C parameters freq_m and freq_n are assigned defaults in the code,
but if properties for those parameters are not found in the open
firmware description the init routine returns an error and doesn't
create the platform device.
This changes the code so that it doesn't return an error if the
properties are not found but instead uses the default values.
Signed-off-by: Remi Machet (rmachet@slac.stanford.edu)
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The powerpc kernel stacks need to be naturally aligned, as they
contain the thread info at the bottom, which is obtained by
clearing the low bits of the stack pointer.
However, when using 64K pages, the stack is smaller than a page,
so we use kmalloc to allocate it, but that doesn't provide the
alignment guarantee we need.
It appeared to work so far... until one enables SLUB debugging
which then returns unaligned pointers. Ooops...
This fixes it by using a slab cache with enforced alignment. It
relies on my previous patch that adds a thread_info_cache_init()
callback.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
os-area.c requires routines declared in linux/of.h, so should include it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
numa.c requires routines declared in linux/of.h, so should include it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now that we have a magic cookie in the pt_regs, we can
properly detect trap frames in stack bactraces.
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we indicate the "restart system call" in the
trap type field of pt_regs->magic, we don't need to
set the %l6 boolean in all of the trap return paths.
And we therefore don't need to pass it to do_notify_resume().
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we can check the trap type directly, we don't need the
funny restart_syscall indication from the trap return paths.
Signed-off-by: David S. Miller <davem@davemloft.net>
The proc-*.S files have the _prefetch_abort pointer placed at the end
of the processor structure but the cpu-multi32.h defines it in the
second position. The patch also fixes the support for XSC3 and the
MMU-less CPUs (740, 7tdmi, 940, 946 and 9tdmi).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This sets us up for several simplifications and facilities:
1) The magic cookie lets us identify trap frames more
accurately in stack backtraces.
2) The trap type lets us simplify all of the "are we in
a syscall" state management and checks.
3) We can now see if a task off the cpu is sleeping in
a system call or not. In fact, we can see what
trap it is sleeping in whatever the type. The utrace
guys will use this.
Based upon some discussions with Roland McGrath.
Signed-off-by: David S. Miller <davem@davemloft.net>
The following cleanups are now possible:
- arch/sparc64/kernel/entry.S:ret_sys_call no longer has to be global
- arch/sparc64/kernel/sparc64_ksyms.c:
remove no longer used prototypes
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following cleanups are now possible:
- arch/sparc/kernel/entry.S:ret_sys_call no longer has to be global
- arch/sparc/kernel/signal.c:sys_sigpause() can be removed
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently there is only code to parse NUMA attributes on
sun4v/niagara systems, but later on we will add such parsing
for older systems.
Signed-off-by: David S. Miller <davem@davemloft.net>
We have to do it like this before we can move the PROM and MDESC device
tree code over to using lmb_alloc().
Signed-off-by: David S. Miller <davem@davemloft.net>
Call lmb_add() on available regions, and call lmb_reserve()
on the main kernel image and the ramdisk (if any).
Signed-off-by: David S. Miller <davem@davemloft.net>
And add some comments explaining all of the quirks involved in
the way the bootloader provides this information.
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit e4cc58944c, as
requested by Roland McGrath, because compat_ptrace_request (added in
commit e16b278164, "ptrace:
compat_ptrace_request siginfo") now handles this case.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Move XPC and XPNET from arch/ia64/sn/kernel to drivers/misc/sgi-xp.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
- remove unused 'irq' argument from pfm_do_interrupt_handler()
- remove pointless cast to void*
- add KERN_xxx prefix to printk()
- remove braces around singleton C statement
- in tioce_provider.c, start tioce_dma_consistent() and
tioce_error_intr_handler() function declarations in column 0
This change's main purpose is to prepare for the patchset in
jgarzik/misc-2.6.git#irq-remove, that explores removal of the
never-used 'irq' argument in each interrupt handler.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There are many notify_die() and almost all take same style with
ia64_mca_spin(). This patch defines macros and replace them all,
to reduce lines and to improve readability.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There are 3 hooks in MCA handler, but this DIE_MCA_MONARCH_PROCESS
event does not notified other than for the first monarch.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While testing with CONFIG_VIRT_CPU_ACCOUNTING=y, I found that
I occasionally get very huge system time in some threads.
So I dug the issue and finally noticed that it was caused
because of an interrupt which interrupt in the following window:
> [arch/ia64/kernel/entry.S: (!CONFIG_PREEMPT && CONFIG_VIRT_CPU_ACCOUNTING)]
>
> ENTRY(ia64_leave_syscall)
> :
> (pUStk) rsm psr.i
> cmp.eq pLvSys,p0=r0,r0 // pLvSys=1: leave from syscall
> (pUStk) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUStk
> .work_processed_syscall:
> adds r2=PT(LOADRS)+16,r12
> (pUStk) mov.m r22=ar.itc // fetch time at leave
> adds r18=TI_FLAGS+IA64_TASK_SIZE,r13
> ;;
> <<< window: from here >>>
> (p6) ld4 r31=[r18] // load current_thread_info()->flags
> ld8 r19=[r2],PT(B6)-PT(LOADRS)
> adds r3=PT(AR_BSPSTORE)+16,r12
> ;;
> mov r16=ar.bsp
> ld8 r18=[r2],PT(R9)-PT(B6)
> (p6) and r15=TIF_WORK_MASK,r31 // any work other than TIF_SYSCALL_TRACE?
> ;;
> ld8 r23=[r3],PT(R11)-PT(AR_BSPSTORE)
> (p6) cmp4.ne.unc p6,p0=r15, r0 // any special work pending?
> (p6) br.cond.spnt .work_pending_syscall
> ;;
> ld8 r9=[r2],PT(CR_IPSR)-PT(R9)
> ld8 r11=[r3],PT(CR_IIP)-PT(R11)
> (pNonSys) break 0 // bug check: we shouldn't be here if pNonSys is TRUE!
> ;;
> invala
> <<< window: to here >>>
> rsm psr.i | psr.ic // turn off interrupts and interruption collection
If pUStk is true, it means we are going to return user mode, hence we fetch
ar.itc to get time at leave from system.
It seems that it is not possible to interrupt the window if pUStk is true,
because interrupts are disabled early. And also disabling interrupt makes
sense because it is safe for referring current_thread_info()->flags.
However interrupting the window while pUStk is true was possible.
The route was:
ia64_trace_syscall
-> .work_pending_syscall_end
-> .work_processed_syscall
Only in case entering the window from this route, interrupts are enabled
during in the window even if pUStk is true. I suppose interrupts must be
disabled here anyway if pUStk is true.
I'm not sure but afraid that what kind of bad effect were there, other
than crazy system time which I found.
FYI, there was a commit 6f6d75825d that
points out a bug at same point(exit of ia64_trace_syscall) in 2006.
It can be said that there was an another bug.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/juhl/trivial: (24 commits)
DOC: A couple corrections and clarifications in USB doc.
Generate a slightly more informative error msg for bad HZ
fix typo "is" -> "if" in Makefile
ext*: spelling fix prefered -> preferred
DOCUMENTATION: Use newer DEFINE_SPINLOCK macro in docs.
KEYS: Fix the comment to match the file name in rxrpc-type.h.
RAID: remove trailing space from printk line
DMA engine: typo fixes
Remove unused MAX_NODES_SHIFT
MAINTAINERS: Clarify access to OCFS2 development mailing list.
V4L: Storage class should be before const qualifier (sn9c102)
V4L: Storage class should be before const qualifier
sonypi: Storage class should be before const qualifier
intel_menlow: Storage class should be before const qualifier
DVB: Storage class should be before const qualifier
arm: Storage class should be before const qualifier
ALSA: Storage class should be before const qualifier
acpi: Storage class should be before const qualifier
firmware_sample_driver.c: fix coding style
MAINTAINERS: Add ati_remote2 driver
...
Fixed up trivial conflicts in firmware_sample_driver.c
This patch removes the no longer used export of kmap_atomic_to_page.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
[HWRNG] omap: Minor updates
[CRYPTO] kconfig: Ordering cleanup
[CRYPTO] all: Clean up init()/fini()
[CRYPTO] padlock-aes: Use generic setkey function
[CRYPTO] aes: Export generic setkey
[CRYPTO] api: Make the crypto subsystem fully modular
[CRYPTO] cts: Add CTS mode required for Kerberos AES support
[CRYPTO] lrw: Replace all adds to big endians variables with be*_add_cpu
[CRYPTO] tcrypt: Change the XTEA test vectors
[CRYPTO] tcrypt: Shrink the tcrypt module
[CRYPTO] tcrypt: Change the usage of the test vectors
[CRYPTO] api: Constify function pointer tables
[CRYPTO] aes-x86-32: Remove unused return code
[CRYPTO] tcrypt: Shrink speed templates
[CRYPTO] tcrypt: Group common speed templates
[CRYPTO] sha512: Rename sha512 to sha512_generic
[CRYPTO] sha384: Hardware acceleration for s390
[CRYPTO] sha512: Hardware acceleration for s390
[CRYPTO] s390: Generic sha_update and sha_final
[CRYPTO] api: Switch to proc_create()
* 'irq-cleanups-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6:
[ISDN] minor irq handler cleanups
drivers/char: minor irq handler cleanups
[PPC] minor irq handler cleanups
[BLACKFIN] minor irq handler cleanups
[SPARC] minor irq handler cleanups
ARM minor irq handler cleanup: avoid passing unused info to irq
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (202 commits)
[POWERPC] Fix compile breakage for 64-bit UP configs
[POWERPC] Define copy_siginfo_from_user32
[POWERPC] Add compat handler for PTRACE_GETSIGINFO
[POWERPC] i2c: Fix build breakage introduced by OF helpers
[POWERPC] Optimize fls64() on 64-bit processors
[POWERPC] irqtrace support for 64-bit powerpc
[POWERPC] Stacktrace support for lockdep
[POWERPC] Move stackframe definitions to common header
[POWERPC] Fix device-tree locking vs. interrupts
[POWERPC] Make pci_bus_to_host()'s struct pci_bus * argument const
[POWERPC] Remove unused __max_memory variable
[POWERPC] Simplify xics direct/lpar irq_host setup
[POWERPC] Use pseries_setup_i8259_cascade() in pseries_mpic_init_IRQ()
[POWERPC] Turn xics_setup_8259_cascade() into a generic pseries_setup_i8259_cascade()
[POWERPC] Move xics_setup_8259_cascade() into platforms/pseries/setup.c
[POWERPC] Use asm-generic/bitops/find.h in bitops.h
[POWERPC] 83xx: mpc8315 - fix USB UTMI Host setup
[POWERPC] 85xx: Fix the size of qe muram for MPC8568E
[POWERPC] 86xx: mpc86xx_hpcn - Temporarily accept old dts node identifier.
[POWERPC] 86xx: mark functions static, other minor cleanups
...
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: (36 commits)
SCSI: convert struct class_device to struct device
DRM: remove unused dev_class
IB: rename "dev" to "srp_dev" in srp_host structure
IB: convert struct class_device to struct device
memstick: convert struct class_device to struct device
driver core: replace remaining __FUNCTION__ occurrences
sysfs: refill attribute buffer when reading from offset 0
PM: Remove destroy_suspended_device()
Firmware: add iSCSI iBFT Support
PM: Remove legacy PM (fix)
Kobject: Replace list_for_each() with list_for_each_entry().
SYSFS: Explicitly include required header file slab.h.
Driver core: make device_is_registered() work for class devices
PM: Convert wakeup flag accessors to inline functions
PM: Make wakeup flags available whenever CONFIG_PM is set
PM: Fix misuse of wakeup flag accessors in serial core
Driver core: Call device_pm_add() after bus_add_device() in device_add()
PM: Handle device registrations during suspend/resume
block: send disk "change" event for rescan_partitions()
sysdev: detect multiple driver registrations
...
Fixed trivial conflict in include/linux/memory.h due to semaphore header
file change (made irrelevant by the change to mutex).
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel: (62 commits)
sched: build fix
sched: better rt-group documentation
sched: features fix
sched: /debug/sched_features
sched: add SCHED_FEAT_DEADLINE
sched: debug: show a weight tree
sched: fair: weight calculations
sched: fair-group: de-couple load-balancing from the rb-trees
sched: fair-group scheduling vs latency
sched: rt-group: optimize dequeue_rt_stack
sched: debug: add some debug code to handle the full hierarchy
sched: fair-group: SMP-nice for group scheduling
sched, cpuset: customize sched domains, core
sched, cpuset: customize sched domains, docs
sched: prepatory code movement
sched: rt: multi level group constraints
sched: task_group hierarchy
sched: fix the task_group hierarchy for UID grouping
sched: allow the group scheduler to have multiple levels
sched: mix tasks and groups
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: (77 commits)
x86: UV startup of slave cpus
x86: integrate pci-dma.c
x86: don't do dma if mask is NULL.
x86: return conditional to mmu
x86: remove kludge from x86_64
x86: unify gfp masks
x86: retry allocation if failed
x86: don't try to allocate from DMA zone at first
x86: use a fallback dev for i386
x86: use numa allocation function in i386
x86: remove virt_to_bus in pci-dma_64.c
x86: adjust dma_free_coherent for i386
x86: move bad_dma_address
x86: isolate coherent mapping functions
x86: move dma_coherent functions to pci-dma.c
x86: merge iommu initialization parameters
x86: merge dma_supported
x86: move pci fixup to pci-dma.c
x86: move x86_64-specific to common code.
x86: move initialization functions to pci-dma.c
...
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (27 commits)
sh: Fix up L2 cache probe.
sh: Fix up SH-4A part probe.
sh: Add support for SH7723 CPU subtype.
sh: Fix up SH7763 build.
sh: Add migor_ts support to MigoR
sh: Add rs5c732b RTC support to MigoR
sh: Add I2C support to MigoR
sh: Add I2C platform data to sh7722
sh: MigoR NAND flash support using gen_flash
sh: MigoR NOR flash support using physmap-flash
sh: Fix up mach-types formatting from merge damage.
sh: r7780rp: Hook up the I2C and SMBus platform devices.
sh: Use phyical addresses for MigoR smc91x resources
sh: Use physical addresses for sh7722 USBF resources
sh: Add MigoR header file
Fix sh_keysc double free
sh: Fix up __access_ok() check for nommu.
sh: Allow optimized clear/copy page routines to be used on SH-2.
sh: Hook up the rest of the SH7770 serial ports.
sh: Add support for Solution Engine SH7721 board
...
The C99 specification states in section 6.11.5:
The placement of a storage-class specifier other than at the
beginning of the declaration specifiers in a declaration is an
obsolescent feature.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
603 CPUs have the same issue that some 750 CPUs have in that they can crash
in funny ways if a store from an FPU register instruction is executed on a
register that has never been initialized since power on. This patch fixes
it by making sure all FP registers have been properly initialized at kernel
boot.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Use the generic pci_enable_resources() instead of the arch-specific code.
Unlike this arch-specific code, the generic version:
- checks PCI_NUM_RESOURCES (11), not DEVICE_COUNT_RESOURCE (12), resources
- skips resources that have neither IORESOURCE_IO nor IORESOURCE_MEM set
- skips ROM resources unless IORESOURCE_ROM_ENABLE is set
- checks for resource collisions with "!r->parent"
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use the generic pci_enable_resources() instead of the arch-specific code.
Unlike this arch-specific code, the generic version:
- checks PCI_NUM_RESOURCES (11), not 6, resources
- skips resources that have neither IORESOURCE_IO nor IORESOURCE_MEM set
- skips ROM resources unless IORESOURCE_ROM_ENABLE is set
- checks for resource collisions with "!r->parent", not IORESOURCE_UNSET
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use the generic pci_enable_resources() instead of the arch-specific code.
The generic version is functionally equivalent, but uses dev_printk.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use the generic pci_enable_resources() instead of the arch-specific code.
Unlike this arch-specific code, the generic version:
- does not check for a NULL dev pointer
- skips resources that have neither IORESOURCE_IO nor IORESOURCE_MEM set
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use the generic pci_enable_resources() instead of the arch-specific code.
Unlike this arch-specific code, the generic version:
- skips resources unless requested in "mask"
- skips ROM resources unless IORESOURCE_ROM_ENABLE is set
- checks for resource collisions with "!r->parent"
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use the generic pci_enable_resources() instead of the arch-specific code.
Unlike this arch-specific code, the generic version:
- checks for resource collisions with "!r->parent"
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The "pci=routeirq" option was added in 2004, and I don't get any valid
reports anymore. The option is still mentioned in kernel-parameters.txt.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The PCI bus names included in /proc/iomem and /proc/ioports are
of the form 'PCI Bus #XX' where XX is the bus number. This patch
changes the naming to 'PCI Bus XXXX:YY' where XXXX is the domain
number and YY is the bus number. For example, PCI bus 14 in
domain 0 will show as 'PCI Bus 0000:14' instead of 'PCI Bus #14'.
This change makes the naming consistent with other architectures
such as ia64 where multiple PCI domain support has been around
longer.
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This function was obviously never being used since early 2.5 days as any
device that it would try to remove would never really be removed from
the system due to the PCI device list being held in the driver core, not
the general list of PCI devices.
As we have not had a single report of a problem here in 4 years, I think
it's safe to remove now.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This lets us check if the device is really added to the driver core or
not, which is what we need when walking some of the bus lists. The flag
is there in anticipation of getting rid of the other PCI device list,
which is what we used to check in this situation.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We currently keep 2 lists of PCI devices in the system, one in the
driver core, and one all on its own. This second list is sorted at boot
time, in "BIOS" order, to try to remain compatible with older kernels
(2.2 and earlier days). There was also a "nosort" option to turn this
sorting off, to remain compatible with even older kernel versions, but
that just ends up being what we have been doing from 2.5 days...
Unfortunately, the second list of devices is not really ever used to
determine the probing order of PCI devices or drivers[1]. That is done
using the driver core list instead. This change happened back in the
early 2.5 days.
Relying on BIOS ording for the binding of drivers to specific device
names is problematic for many reasons, and userspace tools like udev
exist to properly name devices in a persistant manner if that is needed,
no reliance on the BIOS is needed.
Matt Domsch and others at Dell noticed this back in 2006, and added a
boot option to sort the PCI device lists (both of them) in a
breadth-first manner to help remain compatible with the 2.4 order, if
needed for any reason. This option is not going away, as some systems
rely on them.
This patch removes the sorting of the internal PCI device list in "BIOS"
mode, as it's not needed at all anymore, and hasn't for many years.
I've also removed the PCI flags for this from some other arches that for
some reason defined them, but never used them.
This should not change the ordering of any drivers or device probing.
[1] The old-style pci_get_device and pci_find_device() still used this
sorting order, but there are very few drivers that use these functions,
as they are deprecated for use in this manner. If for some reason, a
driver rely on the order and uses these functions, the breadth-first
boot option will resolve any problem.
Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This isn't needed, we can just walk the devices in bus order with no
problems at all, as we really want to remove pci_get_device_reverse from
the kernel tree.
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The return parameter isn't used remove it.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Exploit the System z10 hardware acceleration for SHA384.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Exploit the System z10 hardware acceleration for SHA512.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The sha_{update|final} functions are similar for every sha variant.
Since that is error-prone and redundant replace these functions by
a shared generic implementation for s390.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
- whitespace cleanups
- remove pointless prototype (uses always follow func implementation)
- 'irq' argument is used here purely as a local variable. rename
argument to 'dummy' and define 'irq' as local to make this plain.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
- mark timer_interrupt() static
- sparc_floppy_request_irq() prototype should use irq_handler_t
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
arch/arm/mach-lh7a40x/arch-kev7a400.c: In function `kev7a400_cpld_handler':
arch/arm/mach-lh7a40x/arch-kev7a400.c:80: error: structure has no member named `handle'
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Define the copy_siginfo_from_user32 entry point for powerpc, so
that generic CONFIG_COMPAT code can call it. We already had the
code rolled into compat_sys_rt_sigqueueinfo, this just moves it
out into the canonical function that other arch's define.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Current versions of gdb require a working implementation of
PTRACE_GETSIGINFO for proper watchpoint support. Since struct siginfo
contains pointers it must be converted when passed to a 32-bit debugger.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
After 2.6.24 there was a plan to make the PM core acquire all device
semaphores during a suspend/hibernation to protect itself from
concurrent operations involving device objects. That proved to be
too heavy-handed and we found a better way to achieve the goal, but
before it happened, we had introduced the functions
device_pm_schedule_removal() and destroy_suspended_device() to allow
drivers to "safely" destroy a suspended device and we had adapted some
drivers to use them. Now that these functions are no longer necessary,
it seems reasonable to remove them and modify their users to use the
normal device unregistration instead.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add /sysfs/firmware/ibft/[initiator|targetX|ethernetX] directories along with
text properties which export the the iSCSI Boot Firmware Table (iBFT)
structure.
What is iSCSI Boot Firmware Table? It is a mechanism for the iSCSI tools to
extract from the machine NICs the iSCSI connection information so that they
can automagically mount the iSCSI share/target. Currently the iSCSI
information is hard-coded in the initrd. The /sysfs entries are read-only
one-name-and-value fields.
The usual set of data exposed is:
# for a in `find /sys/firmware/ibft/ -type f -print`; do echo -n "$a: "; cat $a; done
/sys/firmware/ibft/target0/target-name: iqn.2007.com.intel-sbx44:storage-10gb
/sys/firmware/ibft/target0/nic-assoc: 0
/sys/firmware/ibft/target0/chap-type: 0
/sys/firmware/ibft/target0/lun: 00000000
/sys/firmware/ibft/target0/port: 3260
/sys/firmware/ibft/target0/ip-addr: 192.168.79.116
/sys/firmware/ibft/target0/flags: 3
/sys/firmware/ibft/target0/index: 0
/sys/firmware/ibft/ethernet0/mac: 00:11:25:9d:8b:01
/sys/firmware/ibft/ethernet0/vlan: 0
/sys/firmware/ibft/ethernet0/gateway: 192.168.79.254
/sys/firmware/ibft/ethernet0/origin: 0
/sys/firmware/ibft/ethernet0/subnet-mask: 255.255.252.0
/sys/firmware/ibft/ethernet0/ip-addr: 192.168.77.41
/sys/firmware/ibft/ethernet0/flags: 7
/sys/firmware/ibft/ethernet0/index: 0
/sys/firmware/ibft/initiator/initiator-name: iqn.2007-07.com:konrad.initiator
/sys/firmware/ibft/initiator/flags: 3
/sys/firmware/ibft/initiator/index: 0
For full details of the IBFT structure please take a look at:
ftp://ftp.software.ibm.com/systems/support/system_x_pdf/ibm_iscsi_boot_firmware_table_v1.02.pdf
[akpm@linux-foundation.org: fix build]
Signed-off-by: Konrad Rzeszutek <konradr@linux.vnet.ibm.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Peter Jones <pjones@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Source drivers/uio/Kconfig to make UIO available in menuconfig if ARCH=arm.
Signed-off-by: Hans J Koch <hjk@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds in the indirect call to pm_power_off(), as is done in
other architectures (e.g. ARM).
Tested on NGW100, with custom board with GPIO control over main DC
power.
Signed-off-by: Peter Ma <pma@mediamatech.com>
Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
This patch is a take two of adding full functionality to PLL1 on
AT32AP7000. This allows board-specific code and drivers to configure
and enable PLL1. This is useful when precise control over the
frequency of e.g. a genclock is needed and requested by users for the
ABDAC device.
The patch is based upon previous patches from both Haavard Skinnemoen
and David Brownell.
Signed-off-by: Hans-Christian Egtvedt <hcegtvedt@atmel.com>
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
This combines three patches from David Brownell:
* avr32: tclib support
* avr32: simplify clocksources
* avr32: Turn count/compare into a oneshot clockevent device
Register both TC blocks (instead of just the first one) so that
the AT32/AT91 tclib code will pick them up (instead of just the
avr32-only PIT-style clocksource).
Rename the first one and its resources appropriately.
More cleanups to the cycle counter clocksource code
- Disable all the weak symbol magic; remove the AVR32-only TCB-based
clocksource code (source and header).
- Mark the __init code properly.
- Don't forget to report IRQF_TIMER.
- Make the system work properly with this clocksource, by preventing
use of the CPU "idle" sleep state in the idle loop when it's used.
Package the avr32 count/compare timekeeping support as a oneshot
clockevent device, so it supports NO_HZ and high res timers.
This means it also supports plugging in other clockevent devices
and clocksources.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Create a new file, pm-at32ap700x.S, in mach-at32ap and move the CPU
idle sleep code there. Make it possible to disable the sleep code.
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Move the only thing that was actually implemented and used in
asm/intc.h, intc_get_pending(), into asm/irq.h and delete asm/intc.h
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Start cleaning up the AVR32 clocksource mess, starting with the cycle
counter clocksource: remove unneeded pseudo-RTC (just inline that
call to mktime) and associated build warning, and unused sysdev.
Add comment about the problem using the cycle counter register,
and adjust the clocksource rating accordingly. Later patches can
make this usable again (by disabling use of the idle state and
providing a proper clocksource without the weak binding hacks)
and move towards TCB-based clockevent support (including high
resolution timers) that's shared between AT91 and AVR32.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
New-style I2C drivers require that motherboard-mounted I2C devices are
registered with the I2C core, typically at arch_initcall time. This
can be done nice and neat by passing the struct i2c_board_info[]
through at32_add_device_twi just like we do for the SPI board info.
While we've got the hood up, remove a duplicate declaration of
at32_add_device_twi() in board.h.
[hskinnemoen@atmel.com: add missing i2c_board_info forward-declaration]
Signed-Off-By: Ben Nizette <bn@niasdigital.com>
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
* Here is a simple patch to use an allocated array of cpumasks to
represent cpumask_of_cpu() instead of constructing one on the stack.
It's based on the Kconfig option "HAVE_CPUMASK_OF_CPU_MAP" which is
currently only set for x86_64 SMP. Otherwise the the existing
cpumask_of_cpu() is used but has been changed to produce an lvalue
so a pointer to it can be used.
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Modify sched_affinity functions to pass cpumask_t variables by reference
instead of by value.
* Use new set_cpus_allowed_ptr function.
Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr function
Cc: Paul Jackson <pj@sgi.com>
Cc: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Use new set_cpus_allowed_ptr() function added by previous patch,
which instead of passing the "newly allowed cpus" cpumask_t arg
by value, pass it by pointer:
-int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
+int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)
* Cleanup uses of CPU_MASK_ALL.
* Collapse other NR_CPUS changes to arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
Use pointers to cpumask_t arguments whenever possible.
Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr function
Cc: Len Brown <len.brown@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Replace usages of CPU_MASK_NONE, CPU_MASK_ALL, NODE_MASK_NONE,
NODE_MASK_ALL to reduce stack requirements for large NR_CPUS
and MAXNODES counts.
* In some cases, the cpumask variable was initialized but then overwritten
with another value. This is the case for changes like this:
- cpumask_t oldmask = CPU_MASK_ALL;
+ cpumask_t oldmask;
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Move large array "struct bootnode nodes" from stack to _initdata
section to reduce amount of stack space required.
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Change the following arrays sized by NR_CPUS to be PERCPU variables:
static struct op_msrs cpu_msrs[NR_CPUS];
static unsigned long saved_lvtpc[NR_CPUS];
Also some minor complaints from checkpatch.pl fixed.
Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
All changes were transparent except for:
static void nmi_shutdown(void)
{
+ struct op_msrs *msrs = &__get_cpu_var(cpu_msrs);
nmi_enabled = 0;
on_each_cpu(nmi_cpu_shutdown, NULL, 0, 1);
unregister_die_notifier(&profile_exceptions_nb);
- model->shutdown(cpu_msrs);
+ model->shutdown(msrs);
free_msrs();
}
The existing code passed a reference to cpu 0's instance of struct op_msrs
to model->shutdown, whilst the other functions are passed a reference to
<this cpu's> instance of a struct op_msrs. This seemed to be a bug to me
even though as long as cpu 0 and <this cpu> are of the same type it would
have the same effect...?
Cc: Philippe Elie <phil.el@wanadoo.fr>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Change the following static arrays sized by NR_CPUS to
per_cpu data variables:
_cpuid4_info *cpuid4_info[NR_CPUS];
_index_kobject *index_kobject[NR_CPUS];
kobject * cache_kobject[NR_CPUS];
* Remove the local NR_CPUS array with a kmalloc'd region in
show_shared_cpu_map().
Also some minor complaints from checkpatch.pl fixed.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch changes smpboot.c so that it can start slave cpus running
in UV non-unique apicid mode. The SIPI must be sent using a UV-specific
mechanism.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The code in pci-dma_{32,64}.c are now sufficiently
close to each other. We merge them in pci-dma.c.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
if the device hasn't provided a mask, abort allocation.
Note that we're using a fallback device now, so it does not cover
the case of a NULL device: just drivers passing NULL masks around.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Just return our allocation if we don't have an mmu. For i386, where this patch
is being applied, we never have. So our goal is just to have the code to look like
x86_64's.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The claim is that i386 does it. Just it does not.
So remove it.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use the same gfp masks for x86_64 and i386.
It involves using HIGHMEM or DMA32 where necessary, for the sake
of code compatibility, (no real effect), and using the NORETRY
mask for i386.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch puts in the code to retry allocation in case it fails. By its
own, it does not make much sense but making the code look like x86_64.
But later patches in this series will make we try to allocate from
zones other than DMA first, which will possibly fail.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If we fail, we'll loop into the allocation again,
and then allocate in the DMA zone.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We can use a fallback dev for cases of a NULL device being passed (mostly ISA)
This comes from x86_64 implementation.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We can do it here to, in the same way x86_64 does.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
virt_to_bus() is deprecated according to the docs, and moreover,
won't return the right thing in i386 if we're dealing with high memory mappings.
So we make our allocation function return a page, and then use page_address() (for
virtual addr) and page_to_phys() (for physical addr) instead.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We call unmap_single, if available.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It goes to pci-dma.c, and is removed from the arch-specific files.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
i386 implements the declare coherent memory API, and x86_64 does not
it is reflected in pieces of dma_alloc_coherent and dma_free_coherent.
Those pieces are isolated in separate functions, that are declared
as empty macros in x86_64. This way we can make the code the same.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
They are placed in an ifdef, since they are i386 specific
the structure definition goes to dma-mapping.h.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
we merge the iommu initialization parameters in pci-dma.c
Nice thing, that both architectures at least recognize the same
parameters.
usedac i386 parameter is marked for deprecation
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The code for both arches are very similar, so this patch merge them.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
via_no_dac provides a fixup that is the same for both
architectures. Move it to pci-dma.c.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch moves the bootmem functions, that are largely
x86_64-specific into pci-dma.c. The code goes inside an ifdef.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
initcalls that triggers the various possibiities for
dma subsys are moved to pci-dma.c.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
merge pci-base_32.c and pci-nommu_64.c into pci-nommu.c
Their code were made the same, so now they can be merged.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move dma_ops structure definition to pci-dma.c, where it
belongs.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is done to get the code closer to x86_64.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In the very same way i386 do, we use WARN_ON functions
in map_simple and map_sg.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
To make the code usable in i386, where we have high memory mappings,
we drop te virt_to_bus(sg_virt()) construction in favour of sg_phys.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds flush_write_buffers() in some functions of pci-nommu_64.c
They are added anywhere i386 would also have it. This is not a problem
for x86_64, since flush_rite_buffers() an nop for it.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch implements mapping_error for pci-nommu_64.c.
It takes care to keep the same compatible behaviour it already
had. Although this file is not (yet) used for i386, we introduce
the i386 version here. Again, care is taken, even at the expense of
an ifdef, to keep the same behaviour inconditionally.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This functions are now called conditionally on their
existence in the struct. So just delete them, instead
of keeping an empty implementation.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch introduces pci-dma.c, a common file for pci dma
between i386 and x86_64. As a start, dma_set_mask() is the same
between architectures, and is placed there.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ERROR: "dma_supported" [drivers/ssb/ssb.ko] undefined!
ERROR: "dma_set_mask" [drivers/scsi/qla2xxx/qla2xxx.ko] undefined!
ERROR: "dma_set_mask" [drivers/scsi/aic7xxx/aic7xxx.ko] undefined!
ERROR: "dma_set_mask" [drivers/scsi/aic7xxx/aic79xx.ko] undefined!
ERROR: "dma_supported" [drivers/net/pcnet32.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/saa7134/saa7134.ko] undefined!
ERROR: "dma_set_mask" [drivers/media/video/meye.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/cx88/cx8802.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/cx88/cx8800.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/cx88/cx88-alsa.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/cx23885/cx23885.ko] undefined!
They just need to be exported like on x86_64.
dma_supported() and dma_set_mask() were previously inlined,
but are now moved to pci-dma_32.c.
Since they're used by various drivers, they need to be
exported.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We provide a map_error function in pci-base_32.c to make
sure i386 keeps with the same behaviour it used to.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It's initially 0, since we don't expect any DMA there.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is the way x86_64 does, so this make them equal. They have
to be extern now in the header, and the extern definition is moved to
the common dma-mapping.h header.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
the old i386 implementation is moved to pci-base_32.c
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
i386 base does not need it, so it gets an empty function.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
That's already the name of the game for x86_64. For i386,
we add a pci-base_32.c, that will hold the default operations.
The function call itself goes through dma-mapping.h , the common
header
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
a system with 256 GB of RAM, when NUMA is disabled crashes the
following way:
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Cannot allocate aperture memory hole (ffff8101c0000000,65536K)
Kernel panic - not syncing: Not enough memory for aperture
Pid: 0, comm: swapper Not tainted 2.6.25-rc4-x86-latest.git #33
Call Trace:
[<ffffffff84037c62>] panic+0xb2/0x190
[<ffffffff840381fc>] ? release_console_sem+0x7c/0x250
[<ffffffff847b1628>] ? __alloc_bootmem_nopanic+0x48/0x90
[<ffffffff847b0ac9>] ? free_bootmem+0x29/0x50
[<ffffffff847ac1f7>] gart_iommu_hole_init+0x5e7/0x680
[<ffffffff847b255b>] ? alloc_large_system_hash+0x16b/0x310
[<ffffffff84506a2f>] ? _etext+0x0/0x1
[<ffffffff847a2e8c>] pci_iommu_alloc+0x1c/0x40
[<ffffffff847ac795>] mem_init+0x45/0x1a0
[<ffffffff8479ff35>] start_kernel+0x295/0x380
[<ffffffff8479f1c2>] _sinittext+0x1c2/0x230
the root cause is : memmap PMD is too big,
[ffffe200e0600000-ffffe200e07fffff] PMD ->ffff81383c000000 on node 0
almost near 4G..., and vmemmap_alloc_block will use up the ram under 4G.
solution will be:
1. make memmap allocation get memory above 4G...
2. reserve some dma32 range early before we try to set up memmap for all.
and release that before pci_iommu_alloc, so gart or swiotlb could get some
range under 4g limit for sure.
the patch is using method 2.
because method1 may need more code to handle SPARSEMEM and SPASEMEM_VMEMMAP
will get
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 4000000
Memory: 264245736k/268959744k available (8484k kernel code, 4187464k reserved, 4004k data, 724k init)
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
For example, If the physical address layout on a two node system with 8 GB
memory is something like:
node 0: 0-2GB, 4-6GB
node 1: 2-4GB, 6-8GB
Current kernels fail to boot/detect this NUMA topology.
ACPI SRAT tables can expose such a topology which needs to be supported.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The 64-bit vDSO's sources are compiled with -g0 for no good reason.
Using -g when enabled lets their separate debug files be used at
runtime via build ID matching, same as we can see 32-bit vDSO's
assembly sources.
Signed-off-by: Roland McGrath <roland@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Only allocate the FPU area when the application actually uses FPU, i.e., in the
first lazy FPU trap. This could save memory for non-fpu using apps.
for example: on my system after boot, there are around 300 processes, with
only 17 using FPU.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Split the FPU save area from the task struct. This allows easy migration
of FPU context, and it's generally cleaner. It also allows the following
two optimizations:
1) only allocate when the application actually uses FPU, so in the first
lazy FPU trap. This could save memory for non-fpu using apps. Next patch
does this lazy allocation.
2) allocate the right size for the actual cpu rather than 512 bytes always.
Patches enabling xsave/xrstor support (coming shortly) will take advantage
of this.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
this function doesnt just 'find' the max_pfn - it also has
other side-effects such as registering sparse memory maps.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We already catch most of the TSC problems by sanity checks, but there
is a subtle bug which has been in the code forever. This can cause
time jumps in the range of hours.
This was reported in:
http://lkml.org/lkml/2007/8/23/96
and
http://lkml.org/lkml/2008/3/31/23
I was able to reproduce the problem with a gettimeofday loop test on a
dual core and a quad core machine which both have sychronized
TSCs. The TSCs seems not to be perfectly in sync though, but the
kernel is not able to detect the slight delta in the sync check. Still
there exists an extremly small window where this delta can be observed
with a real big time jump. So far I was only able to reproduce this
with the vsyscall gettimeofday implementation, but in theory this
might be observable with the syscall based version as well.
CPU 0 updates the clock source variables under xtime/vyscall lock and
CPU1, where the TSC is slighty behind CPU0, is reading the time right
after the seqlock was unlocked.
The clocksource reference data was updated with the TSC from CPU0 and
the value which is read from TSC on CPU1 is less than the reference
data. This results in a huge delta value due to the unsigned
subtraction of the TSC value and the reference value. This algorithm
can not be changed due to the support of wrapping clock sources like
pm timer.
The huge delta is converted to nanoseconds and added to xtime, which
is then observable by the caller. The next gettimeofday call on CPU1
will show the correct time again as now the TSC has advanced above the
reference value.
To prevent this TSC specific wreckage we need to compare the TSC value
against the reference value and return the latter when it is larger
than the actual TSC value.
I pondered to mark the TSC unstable when the readout is smaller than
the reference value, but this would render an otherwise good and fast
clocksource unusable without a real good reason.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch implements the PR_GET_TSC and PR_SET_TSC prctl()
commands on the x86 platform (both 32 and 64 bit.) These
commands control the ability to read the timestamp counter
from userspace (the RDTSC instruction.)
While the RDTSC instuction is a useful profiling tool,
it is also the source of some non-determinism in ring-3.
For deterministic replay applications it is useful to be
able to trap and emulate (and record the outcome of) this
instruction.
This patch uses code earlier used to disable the timestamp
counter for the SECCOMP framework. A side-effect of this
patch is that the SECCOMP environment will now also disable
the timestamp counter on x86_64 due to the addition of the
TIF_NOTSC define on this platform.
The code which enables/disables the RDTSC instruction during
context switches is in the __switch_to_xtra function, which
already handles other unusual conditions, so normal
performance should not have to suffer from this change.
Signed-off-by: Erik Bosman <ejbosman@cs.vu.nl>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This annotates NMI functions with notrace. Some tracers may be able
to live with this, but some cannot. The safest is to turn it off,
it's not particularly interesting anyway.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
- noexec32 is on by default for years already
- add noexec32 to kernel-parameters and fix noexec typo in there
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
cleanup: change the _end in compressed vmlinux_64.lds.
also change _heap to _ebss that is not needed.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The kernel decompressor wrapper uses memory located beyond the
end of the image. This might lead to hard to debug problems,
but even if it can be proven to be safe, it is at the very
least unclean. I don't see any advantages either, unless you
count it not being zeroed out as an advantage. This patch
moves the boot-heap area to the bss segment.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix printk formats in x86/mm/ioremap.c:
next-20080410/arch/x86/mm/ioremap.c:137: warning: format '%llx' expects type 'long long unsigned int', but argument 2 has type 'resource_size_t'
next-20080410/arch/x86/mm/ioremap.c:188: warning: format '%llx' expects type 'long long unsigned int', but argument 2 has type 'resource_size_t'
next-20080410/arch/x86/mm/ioremap.c:188: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'long unsigned int'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
fix section mismatch warnings which occurs on my x86_64 box while compiling
linux-next-20080410:
Warning messages:
WARNING: arch/x86/kernel/built-in.o(.text+0x7bc2): Section mismatch in reference from the function bad_addr() to the
variable .init.data:early_res
The function bad_addr() references
the variable __initdata early_res.
This is often because bad_addr lacks a __initdata
annotation or the annotation of early_res is wrong.
WARNING: arch/x86/kernel/built-in.o(.text+0x7c3b): Section mismatch in reference from the function bad_addr_size() to
the variable .init.data:early_res
The function bad_addr_size() references
the variable __initdata early_res.
This is often because bad_addr_size lacks a __initdata
annotation or the annotation of early_res is wrong.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
I've made a small investigation about vm86.h inclusion rules and it
looks like everything is more or less ok.
Files that rely on asm/vm86.h symbols are:
- kprobes.c
- process_32.c
- signal_32.c
- traps_32.c
- vm86_32.c
File process_32.c includes vm86.h explicitly. We can remove that
include and it won't break anything.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove old comments that include the old arch/i386 directory.
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ramdisk is reserved via reserve_early in x86_64_start_kernel,
later early_res_to_bootmem() will convert to reservation in bootmem.
so don't need to reserve that again.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Make x86 EFI code works when EFI_PAGE_SHIFT != PAGE_SHIFT. The
memrage_efi_to_native() provided in this patch can be used on other
EFI platform such as IA64 too.
This patch has been tested on Intel x86_64 platform with EFI 64/32
firmware.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Platform-specific code for Phytec's phyCORE-PXA270 platform
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch adds a driver for the Quick Capture Interface on the PXA270.
It is based on the original driver from Intel, but has been re-worked
multiple times since then, now it also supports the V4L2 API.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Fix the pm sys device .name initialiser which was
missed when updating the last patch submission.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch adds functions to set clkout rate for Samsung S3C2410
This patch supersedes 4884/1, that contained an error
Comments from Ben Dooks:
Note, looks like this needs to be applied before 4882/1
Signed-off-by: Davide Rizzo <davide@elpa.it>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Adds support for the generic GPIO lib to the EP93xx family. The gpio
handling code has been moved from core.c to a new file called gpio.c.
The GPIO based IRQ code has not been changed.
Signed-off-by: Ryan Mallon <ryan@bluewatersys.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Initialise PXA devices before platform initialisation, so that
platforms can parent devices to these.
Acked-by: eric miao <ymiao3@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This adds support for two more leds:
the wlan one (found in SL-6000W and SL-6000L) and
the blutooth one (found in SL-6000W).
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Now that scoop gpio's are converted to generic_gpio,
tosascoop_device and tosascoop_jc_device don't have
to be exported.
Also make tosa_gpio_* static
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Shut up sparse warnings by making GPIO_IRQ_MASK unisgned
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Set up the IRQ line for the WM9713 device on the Zylonite.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Now as the scoop pins are covered by the generic gpio API,
we can use leds-gpio driver instead of special leds-tosa.
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Convert set/reset_scoop_gpio to generic gpio calls.
This patch depends on the pxaficp_ir hooks patch.
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The SPI information got placed in the middle of the SMC91x data.
Lets move it up a few lines so that we keep related things grouped
together.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
TOSA_GPIO_ON_KEY can't wakeup the device. But the board
provides TOSA_GPIO_POWERON which is OR of (on_ac) and (on_button).
Use it for wake up.
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Although the GPIO alternate functions should be correctly set
by the bootloader, configure them here to be sure.
To save power, FFUART/BTUART/STUART are left unconfigured (output, low)
until they are needed by pxaficp or the magician GSM chipset driver.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch enables LEDs and the 1-wire bus (connected to
a DS2760 battery monitor) on the magician.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add a call to pxa_set_i2c_info() to force i2c registration
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Clean up all pins configuration to use currently proposed MFP table
schema.
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Acked-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
All magician devices I've encountered so far have featured the Toppoly
TD028STEB1 display, so the Samsung LTP280QV support is untested.
The power-on sequence is not correct because pxafb doesn't yet support
enabling the LCD controller in the middle of the it.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This enables rootfs on StrataFlash if the bootloader supplies the
partition list.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
needed for power management (audio, BT, charging, GSM, LCD, SD), GSM, flash and SD operation and audio routing.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Since recent PXA changes the (non-power-)I2C bus has to be explicitly
enabled from board initialisation code.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The PXA3xx will not suspend if there are no wakeup sources configured.
Print a diagnostic message to make it easier for the user to see what's
happening.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Mainstone has the primary I2C bus exposed for use on plugin modules.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch implements support for Gumstix-F flash, udc and mci. Fixes since the last time are:
- Steve Sakoman as maintainer
- cleanup for udc and mci setup
Signed-off-by: Jaya Kumar <jayakumar.lkml@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This is partial because mainstone's keypad is really special, some of
the keys like '1', '2', ... are actually connected to two row/column
juntions, thus pressing '1' is equivalent to pressing 'A' & 'H'.
This is really brain damanged since it makes distinguishing between
pressing '1' and multiple keys pressing of 'A' & 'H' difficult.
So these special keys are not supported for the time being.
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
also update the clk definitions in pxa27x and pxa3xx.
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
NOTE: currently don't know if the key code of KEY_SUSPEND is fit for
such usage.
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Changes include:
1. rename MFP_LPM_WAKEUP_ENABLE into MFP_LPM_CAN_WAKEUP to indicate
the board capability of this pin to wakeup the system
2. add gpio_set_wake() and keypad_set_wake() to allow dynamically
enable/disable wakeup from GPIOs and keypad GPIO
* these functions are currently kept in mfp-pxa2xx.c due to their
dependency to the MFP configuration
3. pxa2xx_mfp_config() only gives early warning if MFP_LPM_CAN_WAKEUP
is set on incorrect pins
So that the GPIO's wakeup capability is now decided by the following:
a) processor's capability: (only those GPIOs which have dedicated
bits within PWER/PRER/PFER can wakeup the system), this is
initialized by pxa{25x,27x}_init_mfp()
b) board design decides:
- whether the pin is designed to wakeup the system (some of
the GPIOs are configured as other functions, which is not
intended to be a wakeup source), by OR'ing the pin config
with MFP_LPM_CAN_WAKEUP
- which edge the pin is designed to wakeup the system, this
may depends on external peripherals/connections, which is
totally board specific; this is indicated by MFP_LPM_EDGE_*
c) the corresponding device's (most likely the gpio_keys.c) wakeup
attribute:
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
1. the following code to configure PGSRx is no way portable and
intuitive:
- PGSR0 = 0x00008800;
- PGSR1 = 0x00000002;
- PGSR2 = 0x0001FC00;
- PGSR3 = 0x00001F81;
this is removed as low power state has already been encoded in
the pin configuration definitions.
Note: there is no specific reason for some of the GPIOs to drive
high in low power mode as indicated by the above setting, those
bits are ignored, and the result is validated to work.
2. the following code to configure GPIO wakeup is removed as this
is now totally handled by pxa2xx_mfp_config():
- PWER = 0xC0000002;
- PRER = 0x00000002;
- PFER = 0x00000002;
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pin configuration on pxa{25x,27x} has now separated from generic GPIO
into dedicated mfp-pxa2xx.c by this patch. The name "mfp" is borrowed
from pxa3xx and is used here to alert the difference between the two
concepts: pin configuration and generic GPIOs. A GPIO can be called
a "GPIO" _only_ when the corresponding pin is configured so.
A pin configuration on pxa{25x,27x} is composed of:
- alternate function selection (or pin mux as commonly called)
- low power state or sleep state
- wakeup enabling from low power mode
The following MFP_xxx bit definitions in mfp.h are re-used:
- MFP_PIN(x)
- MFP_AFx
- MFP_LPM_DRIVE_{LOW, HIGH}
- MFP_LPM_EDGE_*
Selecting alternate function on pxa{25x, 27x} involves configuration
of GPIO direction register GPDRx, so a new bit and MFP_DIR_{IN, OUT}
are introduced. And pin configurations are defined by the following
two macros:
- MFP_CFG_IN : for input alternate functions
- MFP_CFG_OUT : for output alternate functions
Every configuration should provide a low power state if it configured
as output using MFP_CFG_OUT(). As a general guideline, the low power
state should be decided to minimize the overall power dissipation. As
an example, it is better to drive the pin as high level in low power
mode if the GPIO is configured as an active low chip select.
Pins configured as GPIO are defined by MFP_CFG_IN(). This is to avoid
side effects when it is firstly configured as output. The actual
direction of the GPIO is configured by gpio_direction_{input, output}
Wakeup enabling on pxa{25x, 27x} is actually GPIO based wakeup, thus
the device based enable_irq_wake() mechanism is not applicable here.
E.g. invoking enable_irq_wake() with a GPIO IRQ as in the following
code to enable OTG wakeup is by no means portable and intuitive, and
it is valid _only_ when GPIO35 is configured as USB_P2_1:
enable_irq_wake( gpio_to_irq(35) );
To make things worse, not every GPIO is able to wakeup the system.
Only a small number of them can, on either rising or falling edge,
or when level is high (for keypad GPIOs).
Thus, another new bit is introduced to indicate that the GPIO will
wakeup the system:
- MFP_LPM_WAKEUP_ENABLE
The following macros can be used in platform code, and be OR'ed to
the GPIO configuration to enable its wakeup:
- WAKEUP_ON_EDGE_{RISE, FALL, BOTH}
- WAKEUP_ON_LEVEL_HIGH
The WAKEUP_ON_LEVEL_HIGH is used for keypad GPIOs _only_, there is
no edge settings for those GPIOs.
These WAKEUP_ON_* flags OR'ed on wrong GPIOs will be ignored in case
that platform code author is careless enough.
The tradeoff here is that the wakeup source is fully determined by
the platform configuration, instead of enable_irq_wake().
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
two reasons:
1. GPIO namings and their mode definitions are conceptually not part
of the PXA register definitions
2. this is actually a temporary move in the transition of PXA2xx to
use MFP-alike APIs (as what PXA3xx is now doing), so that legacy
code will still work and new code can be added in step by step
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
MFP configurations after resume should be done before the GPIO registers
are restored. Move the mfp sysdev registeration to the same place where
GPIO and IRQ sysdev(s) are registered to better control the order.
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The main issue here is that pxa3xx does not have GAFRx registers,
access directly to these registers should be avoided for pxa3xx:
1. introduce __gpio_is_occupied() to indicate the GAFRx and GPDRx
registers are already configured on pxa{25x,27x} while returns
0 always on pxa3xx
2. pxa_gpio_mode(gpio | GPIO_IN) is replaced directly with assign-
ment of GPDRx, the side effect of this change is that the pin
_must_ be configured before use, pxa_gpio_irq_type() will not
change the pin to GPIO, as this restriction is sane, esp. with
the new MFP framework
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
To further clean up the GPIO and IRQ structure:
1. pxa_init_irq_gpio() and pxa_init_gpio() combines into a single
function pxa_init_gpio()
2. assignment of set_wake merged into pxa_init_{irq,gpio}() as
an argument
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This makes the code better organized and simplified a bit. The change
will lose a bit of performance when performing IRQ ack/mask/unmask,but
that's not too much after checking the result binary.
This patch also removes the ugly #ifdef CONFIG_PXA27x .. #endif by
carefully not to access those pxa{27x,3xx} specific registers, this
is done by keeping an internal IRQ number variable. The pxa-regs.h
is also modified so registers for IRQ > PXA_IRQ(31) are made public
even if CONFIG_PXA{27x,3xx} isn't defined (for pxa25x's sake)
The incorrect assumption in the original code that internal irq starts
from 0 is also corrected by comparing with PXA_IRQ(0).
"struct sys_device" for the IRQ are reduced into one single device on
pxa{27x,3xx}.
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
by
1. wrapping long lines and making comments tidy
2. using IRQ_TYPE_* instead of migration macros __IRQT_*
3. introduce a pr_debug() for the commented printk(KERN_DEBUG ...)
stuff
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
by:
1. introduce dedicated pxa_{mask,unmask}_low_gpio()
2. remove set_irq_chip(IRQ_GPIO_2_x, ...) which has already been
initialized in pxa_init_irq()
3. introduce dedicated pxa_init_gpio_set_wake()
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
1. As David Brownell suggests, using ffs() is going to make the loop
a bit faster (by avoiding unnecessary shift and iteration)
2. Russell suggested find_{first,next}_bit() being used with the
gedr[] array
Signed-off-by: eric miao <eric.miao@marvell.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The AC97 clock rate on PXA3xx is generated with a configurable divider
from sys_pll.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Expose control of the PXA3xx 13MHz CLK_POUT pin via the clock API
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
If we fail to boot due to an unsupported processor ID, print the
processor ID as part of the failure message.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
U-Boot puts an image at the load address specified in the uImage
header before jumping to the entry point.
In the CONFIG_ZBOOT_ROM case ZBOOT_ROM_TEXT is the right load
address.
Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26: (1090 commits)
[NET]: Fix and allocate less memory for ->priv'less netdevices
[IPV6]: Fix dangling references on error in fib6_add().
[NETLABEL]: Fix NULL deref in netlbl_unlabel_staticlist_gen() if ifindex not found
[PKT_SCHED]: Fix datalen check in tcf_simp_init().
[INET]: Uninline the __inet_inherit_port call.
[INET]: Drop the inet_inherit_port() call.
SCTP: Initialize partial_bytes_acked to 0, when all of the data is acked.
[netdrvr] forcedeth: internal simplifications; changelog removal
phylib: factor out get_phy_id from within get_phy_device
PHY: add BCM5464 support to broadcom PHY driver
cxgb3: Fix __must_check warning with dev_dbg.
tc35815: Statistics cleanup
natsemi: fix MMIO for PPC 44x platforms
[TIPC]: Cleanup of TIPC reference table code
[TIPC]: Optimized initialization of TIPC reference table
[TIPC]: Remove inlining of reference table locking routines
e1000: convert uint16_t style integers to u16
ixgb: convert uint16_t style integers to u16
sb1000.c: make const arrays static
sb1000.c: stop inlining largish static functions
...
By default, this option was selected by the platform Kconfig. This
patch adds "depends on" to L2X0 so that it can be enabled/disabled
manually.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch enables the building of Linux for the PB1176 platform.
Signed-off-by: Bahadir Balban <bahadir.balban@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch adds the base files for the PB1176 platform support.
Signed-off-by: Bahadir Balban <bahadir.balban@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch adds the resource and device definitions for the compact
flash.
Signed-off-by: Bahadir Balban <bahadir.balban@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch adds the PB11MPCore support to the corresponding Kconfig
and Makefile to enable building.
Signed-off-by: Bahadir Balban <bahadir.balban@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch adds the base files for the PB11MPCore platform support.
Signed-off-by: Bahadir Balban <bahadir.balban@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The upcoming PB11MPCore and PB1176 have different memory maps and some
of the definitions in platform.h are no longer common. This patch
moves them to the board-eb.h file and updates their usage in
realview_eb.c.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since the PB1176 has different UART base addresses, this patch moves
the definitions form platorm.h to board-eb.h. It also modifies
uncompress.h to detect the platform type at run-time.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch moves the timer definitions from platform.h into board-eb.h
as they are different on PB11MPCore and PB1176. It also adds
timerX_va_base variables in core.c which are set by the
realview_eb_timer_init function before invoking realview_timer_init.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch moves the patch definitions into board-eb.h and
realview_eb.c (from core.c) as they are different on the PB11MPCore
and PB1176 platforms.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This is in preparation for the RealView PB11MPCore and PB1176 patches
which have different base addresses for the GIC.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
RealView/EB revD platform comes with the SMSC LAN9118 Ethernet
chip. This patch allows either the smc91x or the smc911x drivers to be
used with the RealView/EB platform.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch moves the SCU initialisation from __v6_setup to the
smp_prepare_cpus() function as it relies on platform-specific
settings. Changes to get_core_count() are mainly for allowing cleaner
code with the upcoming PB11MPCore patches.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch implements Thumb-2 application support in Linux. Original
implementation by Paul Brook with fixes for VFP and Neon by Catalin
Marinas.
Signed-off-by: Paul Brook <paul@codesourcery.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch adds a prefetch abort handler similar to the data abort one
and renames the latter for consistency. Initial implementation by Paul
Brook with some renaming by Catalin Marinas.
Signed-off-by: Paul Brook <paul@codesourcery.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
do not return a -EINVAL when mmap()-ing PCI holes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (137 commits)
[SCSI] iscsi: bidi support for iscsi_tcp
[SCSI] iscsi: bidi support at the generic libiscsi level
[SCSI] iscsi: extended cdb support
[SCSI] zfcp: Fix error handling for blocked unit for send FCP command
[SCSI] zfcp: Remove zfcp_erp_wait from slave destory handler to fix deadlock
[SCSI] zfcp: fix 31 bit compile warnings
[SCSI] bsg: no need to set BSG_F_BLOCK bit in bsg_complete_all_commands
[SCSI] bsg: remove minor in struct bsg_device
[SCSI] bsg: use better helper list functions
[SCSI] bsg: replace kobject_get with blk_get_queue
[SCSI] bsg: takes a ref to struct device in fops->open
[SCSI] qla1280: remove version check
[SCSI] libsas: fix endianness bug in sas_ata
[SCSI] zfcp: fix compiler warning caused by poking inside new semaphore (linux-next)
[SCSI] aacraid: Do not describe check_reset parameter with its value
[SCSI] aacraid: Fix down_interruptible() to check the return value
[SCSI] sun3_scsi_vme: add MODULE_LICENSE
[SCSI] st: rename flush_write_buffer()
[SCSI] tgt: use KMEM_CACHE macro
[SCSI] initio: fix big endian problems for auto request sense
...
TF_MASK is no longer defined, use X86_EFLAGS_TF.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SH7723 is the first hard silicon to implement the L2, and unsurprisingly,
does the precise inverse of what the specification alleges. XOR the
URAM/L2 size bits to get back in line with the existing parsing logic.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The SH-4A series probe we were relying on doesn't work any more on the
newer parts, bump this up to use CVR.CHIP instead so we have consistent
behaviour across all of the parts, which is what this should have been
testing in the first place.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Add support for the migor_ts touch panel to the MigoR board.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Add platform data for the SuperH Mobile I2C block to sh7722.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Add NAND flash support to the MigoR board by giving board specific data
to the gen_nand platform driver.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Add NOR flash support to the MigoR board by giving board specific data
to the physmap-flash platform driver.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Use physical addresses and change resource name of MigoR ethernet chip.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Use physical addresses and change resource name to follow data sheet.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch adds a MigoR specific header file. We may want to use a cpu
specific header file instead, but this will do for now.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Presently these are restricted to SH-3 and SH-4, so we reorder the
ifdefs a bit to let other parts use these also.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Add support for Solution Engine SH7721 board(MS7721RP01).
Signed-off-by: Yoshihiro Shimoda <shimoda.yoshihiro@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Add KEYSC platform data for the Solution Engine 7722 board.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Cc: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Add KEYSC platform data for the sh7722 MigoR board.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Cc: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc:
Remove DEBUG_SEMAPHORE from Kconfig
Improve semaphore documentation
Simplify semaphore implementation
Add down_timeout and change ACPI to use it
Introduce down_killable()
Generic semaphore implementation
Add semaphore.h to kernel_lock.c
Fix quota.h includes
This adds the low level irq tracing hooks to the powerpc architecture
needed to enable full lockdep functionality.
This is partly based on Johannes Berg's initial version. I removed
the asm trampoline that isn't needed (thus improving performance) and
modified all sorts of bits and pieces, reworking most of the assembly,
etc...
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds stacktrace support for powerpc, which will be needed for
lockdep.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves various definitions used all over the place to parse stack
frames to ptrace.h so only one definition is needed.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Lockdep found out that we can occasionally take the device-tree
lock for reading from softirq time (from rtas_token called
by the rtas real time clock code called by the NTP code),
while we take it occasionally for writing without masking
interrupts. The combination of those two can thus deadlock.
While some of those cases of interrupt read lock could be fixed
(such as caching the RTAS tokens) I figured that taking the
lock for writing is so rare (device-tree modification) that we
may as well penalize that case and allow reading from interrupts.
Thus, this turns all the writers to take the lock with irqs
masked to avoid the situation.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove the __max_memory variable, as it is not referenced anywhere
in the tree besides some code in arch/ppc.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The xics code currently has a direct and lpar variant of
xics_host_map, the only difference being which irq_chip they use. If
we remember which irq_chip we're using we can combine these two
routines. That also allows us to have a single irq_host_ops instead
of two.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
pseries_mpic_init_IRQ() implements the same logic as the xics code did to
find the i8259 cascade irq. Now that we've pulled that logic out into
pseries_setup_i8259_cascade() we can use it in the mpic code.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove the xics references from xics_setup_8259_cascade(), and merge the
good bits from the almost identical logic in pseries_mpic_init_IRQ().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The code in xics.c to setup the i8259 cascaded irq handler is not really
xics specific, so move it into setup.c - we will clean this up further in
a subsequent patch.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
IDE PMAC host driver and all IDE PCI host drivers use pci_enable_device()
nowadays so the following quirk in pmac_pcibios_after_init() can be removed.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add special cases for pplus and prep to ide_default_{irq,io_base}()
(+ FIXMEs about the need to use IDE platform host driver instead).
* Remove no longer needed ppc_ide_md and struct ide_machdep_calls.
* Then remove <linux/ide.h> include from:
- arch/powerpc/kernel/setup_32.c
- arch/ppc/kernel/ppc_ksyms.c
- arch/ppc/kernel/setup.c
- arch/ppc/platforms/pplus.c
- arch/ppc/platforms/prep_setup.c
There should be no functional changes caused by this patch.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Call ide_init_default_irq() for pplus in init_ide_data().
* Remove no longer needed pplus_ide_init_hwif_ports().
There should be no functional changes caused by this patch.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add IDE_HFLAG_FORCE_LEGACY_IRQS host flag for Motorola-Sandpoint platform
to sl82c105 host driver.
* Disable ide_generic host driver in arch/ppc/configs/sandpoint_defconfig
and enable sl82c105 one.
* Remove ppc_ide_md hooks from arch/ppc/platforms/sandpoint.c - no need for
them (sl82c105 host driver takes care of all this setup).
* Then remove no longer needed <linux/ide.h> include.
* Also update arch/ppc/platforms/sandpoint.h.
Unfortunately (unlike lopec's case) sl82c105 host driver was not enabled
in defconfing so there is a funcionality change.
[ Not a big deal since sl82c105 is superior over ide_generic. ]
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add IDE_HFLAG_FORCE_LEGACY_IRQS host flag for Motorola-LoPEC platform
to sl82c105 host driver.
* Remove ppc_ide_md hooks from arch/ppc/platforms/lopec.c - no need for
them (sl82c105 host driver takes care of all this setup).
* Then remove no longer needed <linux/ide.h> include.
Looking at arch/ppc/configs/lopec_defconfig:
...
CONFIG_IDE_GENERIC=y
CONFIG_BLK_DEV_IDEPCI=y
# CONFIG_IDEPCI_SHARE_IRQ is not set
# CONFIG_BLK_DEV_OFFBOARD is not set
CONFIG_BLK_DEV_GENERIC=y
# CONFIG_BLK_DEV_OPTI621 is not set
CONFIG_BLK_DEV_SL82C105=y
...
there should be no functional changes unless somebody preferred to disable
sl82c105 host driver and use only ide_generic one (but why would anybody
want to do such thing :-).
PS It seems that lopec_defconfig hasn't been updated for ages but if somebody
is going to do it please look into disabling IDE_GENERIC and BLK_DEV_GENERIC
config options. Thanks.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Initialize IDE ports in mpc8xx_ide_probe().
* Remove m8xx_ide_init() and ppc_ide_md hooks - no need for them
(IDE mpc8xx host driver takes care of all this setup).
* Remove needless 'if (irq)' and 'if (data_port >= MAX_HWIFS)' checks
from m8xx_ide_init_hwif_ports().
* Remove 'ctrl_port' and 'irq' arguments from m8xx_ide_init_hwif_ports().
* Rename m8xx_ide_init_hwif_ports() to m8xx_ide_init_ports().
* Add __init tag to m8xx_ide_init_ports().
This patch fixes hwif->irq always being overriden to 0 (== auto-probe, is
this even working on PPC?) because of ide_init_default_irq() call in ide.c.
There should be no other functional changes.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Vitaly Bordug <vitb@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add pmac_ide_init_ports() helper and use it instead of
pmac_ide_init_hwif_ports().
* Remove ppc_ide_md hooks - no need for them
(IDE pmac host driver takes care of all this setup).
* Then remove no longer needed <linux/ide.h> include
from arch/powerpc/platforms/powermac/pmac.h.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
There are no "default" IDE ports on PPC4xx so ppc4xx_ide_init_hwif_ports() is
unnecessary, remove it. Also remove no longer needed <linux/ide.h> include.
There should be no functional changes caused by this patch.
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Also remove now not needed <linux/ide.h> include.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
kgdb core fixes:
- Check to see that mm->mmap_cache is not null before calling
flush_cache_range(), else on arch=ARM it will cause a fatal
fault.
- Breakpoints should only be restored if they are in the BP_ACTIVE
state.
- Fix a typo in comments to "kgdb_register_io_module"
x86 kgdb fixes:
- Fix the x86 arch handler such that on a kill or detach that the
appropriate cleanup on the single stepping flags gets run.
- Add in the DIE_NMIWATCHDOG call for x86_64
- Touch the nmi watchdog before returning the system to normal
operation after performing any kind of kgdb operation, else
the possibility exists to trigger the watchdog.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add HW breakpoints into the arch specific portion of x86 kgdb. In the
current x86 kernel.org kernels HW breakpoints are changed out in lazy
fashion because there is no infrastructure around changing them when
changing to a kernel task or entering the kernel mode via a system
call. This lazy approach means that if a user process uses HW
breakpoints the kgdb will loose out. This is an acceptable trade off
because the developer debugging the kernel is assumed to know what is
going on system wide and would be aware of this trade off.
There is a minor bug fix to the kgdb core so as to correctly call the
hw breakpoint functions with a valid value from the enum.
There is also a minor change to the x86_64 startup code when using
early HW breakpoints. When the debugger is connected, the cpu startup
code must not zero out the HW breakpoint registers or you cannot hit
the breakpoints you are interested in, in the first place.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch fixes the hang regression with kgdb when the NMI interrupt
comes in while the master core is returning from an exception.
Adjust the NMI logic such that KGDB will not stop NMI exceptions from
occurring by in general returning NOTIFY_DONE. It is not possible to
distinguish the debug NMI sync vs the normal NMI apic interrupt so
kgdb needs to catch the unknown NMI if it the debugger was previously
active on one of the cpus.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
simplified and streamlined kgdb support on x86, both 32-bit and 64-bit,
based on patch from:
Subject: kgdb: core-lite
From: Jason Wessel <jason.wessel@windriver.com>
[ and countless other authors - see the patch for details. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Add the TinCanTools Hammer board to list of supported machines in the
arch/arm/mach-s3c2410 directory, as well as a default config entry. the
mach-tct_hammer.c file initializes basic i/o, clocks, irqs, as well as
the mtd flash layout if enabled in the kernel configuration.
Signed-off-by: David Anders <danders@amltd.com>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
There seems to be some problem with at-least the S3C2440 and
bus traffic during an reset. It is unlikely, but still possible
that the system will hang in such a way that the watchdog cannot
get the system out of the state it is in.
Change to making the code that calls the watchdog reset run from
cached memory so that instruction fetches have quiesced before the
watchdog fires.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
All current Simtec designs source the DCLK outputs from
the UPLL. This means the DCLK's parent must be set to UPLL
so that anything enabling and disabling an UPLL sourced
clock does not shutdown the DCLK due to missing open counts.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Fix the name of the S3C2412_CLKDIVN_ARMDIVN define.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Change GPA21 to output over reset so that nRSTOUT is not
asserted whilst suspended.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Move wakeup code to .c, so that video mode setting code can be shared
between boot and wakeup. Remove nasty assembly code in 64-bit case by
re-using trampoline code. Stack setup was fixed to clear high 16bits
of %esp, maybe that fixes some machines.
.c code sharing and morse code was done H. Peter Anvin, Sam Ravnborg
reviewed kbuild related stuff, and it seems okay to him. Rafael did
some cleanups.
[rjw:
* Made the patch stop breaking compilation on x86-32
* Added arch/x86/kernel/acpi/sleep.h
* Got rid of compiler warnings in arch/x86/kernel/acpi/sleep.c
* Fixed 32-bit compilation on x86-64 systems
* Added include/asm-x86/trampoline.h and fixed the non-SMP
compilation on 64-bit x86
* Removed arch/x86/kernel/acpi/sleep_32.c which was not used
* Fixed some breakage caused by the integration of smpboot.c done
under us in the meantime]
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
this patch fixes section mismatch warnings (on x86_64 host) in setup_trampoline(),
which was referencing __initdata variables trampoline_data and trampoline_end.
Warning messages:
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x2b6a): Section mismatch in reference from the function setup_trampoline()
to the variable .init.data:trampoline_data
The function __cpuinit setup_trampoline() references
a variable __initdata trampoline_data.
If trampoline_data is only used by setup_trampoline then
annotate trampoline_data with a matching annotation.
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x2b71): Section mismatch in reference from the function setup_trampoline()
to the variable .init.data:trampoline_end
The function __cpuinit setup_trampoline() references
a variable __initdata trampoline_end.
If trampoline_end is only used by setup_trampoline then
annotate trampoline_end with a matching annotation.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch fixes mismatch warnings in smp_checks() (in arch/x86/kernel/smpboot.c):
WARNING: arch/x86/kernel/built-in.o(.text+0x11922): Section mismatch in reference from the function smp_checks()
to the variable .cpuinit.data:smp_b_stepping
The function smp_checks() references
the variable __cpuinitdata smp_b_stepping.
This is often because smp_checks lacks a __cpuinitdata
annotation or the annotation of smp_b_stepping is wrong.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > Make sure that we clear the "shutdown status flag" in the CMOS
> > register after each CPU is brought up. This fixes a problem where the
> > "shutdown status flag" may remain set when a CPU is brought up after
> > booting.
>
> btw., what problem does this result in, exactly?
The shutdown status flag set to "0xA", corresponds to "JMP double word
request without INT init".
This JMP at reboot time is at an unintended location. And results in
Triple faults in our case.
Though this error at reboot can be safely ignored in a VM environment,
am not sure what the effect would be on a physical system. May be it
will result in a triple fault and an eventual hardware reset thus
masking this BUG in the kernel.
This fix just makes sure that we reset that status flag after
initialization is done.
Fix paranoia about using BIOS quickboot mechanism.
Make sure that we clear the "shutdown status flag" in the CMOS register
after each CPU is brought up. This fixes a problem where the "shutdown
status flag" may remain set when a CPU is brought up after booting.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Dan Arai <arai@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use cpumask_of_cpu() rather than the pair of cpus_clear() and cpu_set().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
No need to clear the memory allocated by alloc_bootmem().
It is already filled with zero.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove duplicate code by using ioapic_read_entry() and ioapic_write_entry()
in io_apic_{32,64}.c
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If one can find an ack pending pin, there is no need to check
the rest of them.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In arch/x86/boot/compressed/misc.c, the variable vidmem is
the only variable that ends up in de data segment. It's also
superfluous, because the first thing the code does is:
if (RM_SCREEN_INFO.orig_video_mode == 7) {
vidmem = (char *) 0xb0000;
vidport = 0x3b4;
} else {
vidmem = (char *) 0xb8000;
vidport = 0x3d4;
}
This patch removes the initialisation.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
without this patch:
VOYAGER:
kernel/built-in.o: In function `crash_kexec':
(.text+0x28588): undefined reference to `machine_crash_shutdown'
VISWS:
kernel/built-in.o: In function `crash_kexec':
/next-20080401/kernel/kexec.c:1074: undefined reference to `machine_crash_shutdown'
make[1]: *** [.tmp_vmlinux1] Error 1
because arch/x86/kernel/reboot.c isn't built since CONFIG_X86_BIOS_REBOOT=n,
so machine_crash_shutdown() isn't available.
This patch does seem a small bit odd since the KEXEC help text says that
kexec is independent of the system firmware.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I've now noticed that the machine I call MPENTIUM4 for 32-bit kernels
is called MPSC for 64-bit kernels, and in that case it still doesn't
get the P6 NOPs it ought to. hpa explains that MK8 should still be
excluded, so it's just a matter of including MPSC along with MPENTIUM4.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We should call for kfree if only we really need it.
Though it's safe to call kfree with NULL pointer passed
in this code we've already tested the pointer and can
eliminate the call
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yinghai Lu pointed out a bug in the previous patches,
fix double-shift of apicid.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cleanup references to the early cpu maps for the non-SMP configuration
and remove some functions called for SMP configurations only.
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
UV supports really big systems. So big, in fact, that the APICID register
does not contain enough bits to contain an APICID that is unique across all
cpus.
The UV BIOS supports 3 APICID modes:
- legacy mode. This mode uses the old APIC mode where
APICID is in bits [31:24] of the APICID register.
- x2apic mode. This mode is whitebox-compatible. APICIDs
are unique across all cpus. Standard x2apic APIC operations
(Intel-defined) can be used for IPIs. The node identifier
fits within the Intel-defined portion of the APICID register.
- x2apic-uv mode. In this mode, the APICIDs on each node have
unique IDs, but IDs on different node are not unique. For example,
if each mode has 32 cpus, the APICIDs on each node might be
0 - 31. Every node has the same set of IDs.
The UV hub is used to route IPIs/interrupts to the correct node.
Traditional APIC operations WILL NOT WORK.
In x2apic-uv mode, the ACPI tables all contain a full unique ID (note:
exact bit layout still changing but the following is close):
nnnnnnnnnnlc0cch
n = unique node number
l = socket number on board
c = core
h = hyperthread
Only the "lc0cch" bits are written to the APICID register. The remaining bits are
supplied by having the get_apic_id() function "OR" the extra bits into the value
read from the APICID register. (Hmmm.. why not keep the ENTIRE APICID register
in per-cpu data....)
The x2apic-uv mode is recognized by the MADT table containing:
oem_id = "SGI"
oem_table_id = "UV-X"
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add kernel support for new ACPI "sapic" tables that contain 16-bit APICIDs.
This patch simply adds parsing of an optional SAPIC table if present.
Otherwise, the traditional local APIC table is used.
Note: the SAPIC table is not a new ACPI table - it exists on other architectures
but is not currently recognized by x86_64.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Increase the number of bits in an apicid from 8 to 32.
By default, MP_processor_info() gets the APICID from the
mpc_config_processor structure. However, this structure limits
the size of APICID to 8 bits. This patch allows the caller of
MP_processor_info() to optionally pass a larger APICID that will
be used instead of the one in the mpc_config_processor struct.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add functions that can be used to determine if an x86_64
system is a SGI "UV" system. UV systems come in 3 types and
are identified by the OEM ID in the MADT.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce a function to read the local APIC_ID.
This change is in preparation for additional changes to
the APICID functions that will come in a later patch.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch renames VM_MASK to X86_VM_MASK (which
in turn defined as alias to X86_EFLAGS_VM) to better
distinguish from virtual memory flags. We can't just
use X86_EFLAGS_VM instead because it is also used
for conditional compilation
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The memory resource is also used for main memory, and we need it to
allocate physical addresses for memory hotplug. Knobbling io space is
enough to get the job done anyway.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
xen does not use the global cpu_initialized mask, but rather,
a specific one. So we change its name so it won't conflict with the upcoming
movement of cpu_initialized_mask from smp_64.h to smp_32.h.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Report when microcode was successfully updated. It used to be there but
now with DEBUG unset it becomes very silent. Also some cosmetic fixes.
Signed-off-by: Ben Castricum <lk08@bencastricum.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Upcoming 64 bit processors from Centaur can use sysenter.
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Jesse Ahrens <jahrens@centtech.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
By including processor-flags.h we are allowed to use predefined
macroses instead of keeping own ones
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On AMD SMM protected memory is part of the address map, but handled
internally like an MTRR. That leads to large pages getting split
internally which has some performance implications. Check for the
AMD TSEG MSR and split the large page mapping on that area
explicitely if it is part of the direct mapping.
There is also SMM ASEG, but it is in the first 1MB and already covered by
the earlier split first page patch.
Idea for this came from an earlier patch by Andreas Herrmann
On a RevF dual Socket Opteron system kernbench shows a clear
improvement from this:
(together with the earlier patches in this series, especially the
split first 2MB patch)
[lower is better]
no split stddev split stddev delta
Elapsed Time 87.146 (0.727516) 84.296 (1.09098) -3.2%
User Time 274.537 (4.05226) 273.692 (3.34344) -0.3%
System Time 34.907 (0.42492) 34.508 (0.26832) -1.1%
Percent CPU 322.5 (38.3007) 326.5 (44.5128) +1.2%
=> About 3.2% improvement in elapsed time for kernbench.
With GB pages on AMD Fam1h the impact of splitting is much higher of course,
since it would split two full GB pages (together with the first
1MB split patch) instead of two 2MB pages. I could not benchmark
a clear difference in kernbench on gbpages, so I kept it disabled
for that case
That was only limited benchmarking of course, so if someone
was interested in running more tests for the gbpages case
that could be revisited (contributions welcome)
I didn't bother implementing this for 32bit because it is very
unlikely the 32bit lowmem mapping overlaps into the TSEG near 4GB
and the 2MB low split is already handled for both.
[ mingo@elte.hu: do it on gbpages kernels too, there's no clear reason
why it shouldnt help there. ]
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: andreas.herrmann3@amd.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Intel recommends to not use large pages for the first 1MB
of the physical memory because there are fixed size MTRRs there
which cause splitups in the TLBs.
On AMD doing so is also a good idea.
The implementation is a little different between 32bit and 64bit.
On 32bit I just taught the initial page table set up about this
because it was very simple to do. This also has the advantage
that the risk of a prefetch ever seeing the page even
if it only exists for a short time is minimized.
On 64bit that is not quite possible, so use set_memory_4k() a little
later (in check_bugs) instead.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: andreas.herrmann3@amd.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add a new function to force split large pages into 4k pages.
This is needed for some followup optimizations.
I had to add a new field to cpa_data to pass down the information
that try_preserve_large_page should not run.
Right now no set_page_4k() because I didn't need it and all the
specialized users I have in mind would be more comfortable with
pure addresses. I also didn't export it because it's unlikely
external code needs it.
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: andreas.herrmann3@amd.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When end_pfn is not aligned to 2MB (or 1GB) then the kernel might
map more memory than end_pfn. Account this in max_pfn_mapped.
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: andreas.herrmann3@amd.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Even on 32bit 2MB pages can map more memory than is in the true
max_low_pfn if end_pfn is not highmem and not aligned to 2MB.
Add a end_pfn_map similar to x86-64 that accounts for this
fact. This is important for code that really needs to know about
all mapping aliases.
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: andreas.herrmann3@amd.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently they are in .text.head because the rest of head_64.S.
.text.head is not removed as init data, but the early exception handlers
should be because they are not needed after early boot of the BP.
So move them over.
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The early exception handlers are currently set up using a macro
recursion. There is only one user left. Replace the macro with a
standard loop in place.
Noop patch, just a cleanup.
[ tglx@linutronix.de: simplified ]
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
All of early setup runs with interrupts disabled, so there is no
need to set up early exception handlers for vectors >= 32
This saves some minor text size.
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Ingo Molnar (mingo@elte.hu) wrote:
>
> * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
>
> > The shadow vmap for DEBUG_RODATA kernel text modification uses
> > virt_to_page to get the pages from the pointer address.
> >
> > However, I think vmalloc_to_page would be required in case the page is
> > used for modules.
> >
> > Since only the core kernel text is marked read-only, use
> > kernel_text_address() to make sure we only shadow map the core kernel
> > text, not modules.
>
> actually, i think we should mark module text readonly too.
>
Yes, but in the meantime, the x86 tree would need this patch to make
kprobes work correctly on modules.
I suspect that without this fix, with the enhanced hotplug and kprobes
patch, kprobes will use text_poke to insert breakpoints in modules
(vmalloced pages used), which will map the wrong pages and corrupt
random kernel locations instead of updating the correct page.
Work that would write protect the module pages should clearly be done,
but it can come in a later time. We have to make sure we interact
correctly with the page allocation debugging, as an example.
Here is the patch against x86.git 2.6.25-rc5 :
The shadow vmap for DEBUG_RODATA kernel text modification uses virt_to_page to
get the pages from the pointer address.
However, I think vmalloc_to_page would be required in case the page is used for
modules.
Since only the core kernel text is marked read-only, use kernel_text_address()
to make sure we only shadow map the core kernel text, not modules.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
vSMP detection: access pci config space early in boot to detect if the
system is a vSMPowered box, and cache the result in a flag, so that
is_vsmp_box() retrieves the value of the flag always.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The sysenter path tries to enable interrupts immediately. Unfortunately
this doesn't work in a paravirt environment, because not enough kernel
state has been set up at that point (namely, pointing %fs to the kernel
percpu data segment). To fix this, defer ENABLE_INTERRUPTS until after
the kernel state has been set up.
Unfortunately this means that we're running with interrupts disabled
for a while without calling the IRQ tracing code, but that can't be
called without setting up %fs either.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch does clean up relocate_kernel_(32|64).S a bit by getting rid
of local PAGE_ALIGNED macro. We should use well-known PAGE_SIZE instead
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Allow the maximum number of nodes in an x86_64 system to
be configurable. This patch does NOT change the default value
but allows the value to be a config option.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/math-emu/reg_ld_str.c:380: warning: 'l[0]' may be used uninitialized in this function
arch/x86/math-emu/reg_ld_str.c:380: warning: 'l[1]' may be used uninitialized in this function
I can't actually spot the bug here. There's one obvious place, but fixing
that didn't shut the warning up.
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/math-emu/fpu_entry.c:555: warning: 'entry_sel_off.empty' is used uninitialized in this function
Presumably it's harmless, but I'll sleep better at night knowing that we
initialised it.
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make the PAT related printks in ioremap pr_debug.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Bug fixes for reserve_memtype() call in __ioremap and pci_mmap_page_range().
If reserve_memtype returns non-zero, then it is an error and subsequent free is
not required. Requested and returned prot value check should be done when
reserve_memtype returns success.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
make known_pat_cpu to think amd k8 and fam10h is ok too.
also make tom2 below to be WRBACK
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix double help section in PAT Kconfig. Thanks to Randy Dunlap for catching
this bug.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Adds debug prints at critical code. Adds enough info in dmesg to allow us to
do effective first round of analysis of any issues that may result due to PAT
patch series.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce ioremap_wc for wc remap.
(generic wrapper is in a later patch)
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add a set_memory_wc interface(), similar to set_memory_uc interface.
Callers has to call set_memory_uc, set_memory_wb and
set_memory_wc, set_memory_wb as pairs.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add reserve_memtype and free_memtype wrapper for pci_mmap_page_range. Free
is called on unmap, but identity map continues to be mapped as per
pci_mmap_page_range request, until next request for the same region calls
ioremap_change_attr(), which will go through without conflict. This way of
mapping is identical to one used in ioremap/iounmap.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use reserve_memtype and free_memtype interfaces in set_memory_uc/set_memory_wb
interfaces to avoid aliasing.
Usage model of set_memory_uc and set_memory_wb is for RAM memory and users
will first call set_memory_uc and call set_memory_wb after use to reset the
attribute.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use reserve_memtype and free_memtype interfaces in ioremap/iounmap to avoid
aliasing.
If there is an existing alias for the region, inherit the memory type from
the alias. If there are conflicting aliases for the entire region, then fail
ioremap.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make ioremap_change_attr() non-static and use prot_val in place of ioremap_mode.
This interface is used in subsequent PAT patches.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Sets up pat_init() infrastructure.
PAT MSR has following setting.
PAT
|PCD
||PWT
|||
000 WB _PAGE_CACHE_WB
001 WC _PAGE_CACHE_WC
010 UC- _PAGE_CACHE_UC_MINUS
011 UC _PAGE_CACHE_UC
We are effectively changing WT from boot time setting to WC.
UC_MINUS is used to provide backward compatibility to existing /dev/mem
users(X).
reserve_memtype and free_memtype are new interfaces for maintaining alias-free
mapping. It is currently implemented in a simple way with a linked list and
not optimized. reserve and free tracks the effective memory type, as a result
of PAT and MTRR setting rather than what is actually requested in PAT.
pat_init piggy backs on mtrr_init as the rules for setting both pat and mtrr
are same.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Initializing to zero is generally bad idea, I hope it is right for
__init data, too.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
do simple memtest after init_memory_mapping
use find_e820_area_size to find all ram range that is not reserved.
and do some simple bits test to find some bad ram.
if find some bad ram, use reserve_early to exclude that range.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After an experimental cleanup of <linux/percpu.h>, these files were
exposed as invoking kmalloc() without including <linux/slab.h>.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I was trying to get the address of instruction to be executed
next after the kprobed instruction. But regs->eip in post_handler()
contains value which is useless to the user. It's pre-corrected value.
This value is difficult to use without access to resume_execution(), which
is not exported anyway.
I moved the invocation of post_handler() to *after* resume_execution().
Now regs->eip contains meaningful value in post_handler().
I do not think this change breaks any backward-compatibility.
To make meaning of the old value, post_handler() would need access to
resume_execution() which is not exported. I have difficulty to believe
that previous, uncorrected, regs->eip can be meaningfully used in
post_handler().
Signed-off-by: Yakov Lerner <iler.ml@gmail.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use force_sig in handle_vm86_trap like other machine traps do.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The PT_DTRACE flag is meaningless and obsolete.
Don't touch it.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The previous "x86_64 ia32 ptrace vs -ENOSYS" fix only covered
the int $0x80 system call entries. This does the same fix
for the sysenter and syscall instruction paths.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When we're stopped at syscall entry tracing, ptrace can change the %rax
value from -ENOSYS to something else. If no system call is actually made
because the syscall number (now in orig_rax) is bad, then we now always
reset %rax to -ENOSYS again.
This changes it to leave the return value alone after entry tracing.
That way, the %rax value set by ptrace is there to be seen in user mode
(or in syscall exit tracing). This is consistent with what the 32-bit
kernel does.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When we're stopped at syscall entry tracing, ptrace can change the %eax
value from -ENOSYS to something else. If no system call is actually made
because the syscall number (now in orig_eax) is bad, then the %eax value
set by ptrace should be returned to the user. But, instead it gets reset
to -ENOSYS again. This is a regression from the native 32-bit kernel.
This change fixes it by leaving the return value alone after entry tracing.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch removes the write-only timer_uses_ioapic_pin_0
(gsi can't be <= 15 in the line of it's fake usage in mpparse_32.c).
Spotted by the GNU C compiler.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Indicate TSCs are unreliable as time sources if the platform is
a multi chassi ScaleMP vSMPowered machine.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Re-arrange set_vsmp_pv_ops so that pv_ops are set only if
the platform has capability to support paravirtualized irq ops
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- Fix the the build breakage when PARAVIRT is defined
but PCI is not
This fixes problem reported at:
http://marc.info/?l=linux-kernel&m=120525966600698&w=2
- Make is_vsmp_box() available even when PARAVIRT is not defined.
This is needed to determine if tsc's are reliable as a time source
even when PARAVIRT is not defined.
- split vsmp_init to use is_vsmp_box() and set_vsmp_pv_ops()
set_vsmp_pv_ops will do nothing if PCI is not enabled in the config.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
is_vsmp_box() currently does not work on vSMPowered systems, as pci cfg
space is not read correctly -- This patch fixes it.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the last leftovers from the files. Move the ones
that are still used to the files they belong, the others
that grep can't reach, simply throw away.
Merge comments ontop of file and that's it: smpboot integrated
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
They are i386 specific (the x86_64 definitions live
elsewhere, and should remain there), so are enclosed around
an ifdef
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
this is the last remaining function in smpboot_32.c
Since it is i386 specific, move it around an ifdef to
smpboot.c
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With the previous changes, code for native_smp_prepare_cpus()
in i386 and x86_64 now look very similar. merge them into
smpboot.c. Minor differences are inside ifdef
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86_64 has two nr_ioapics = 0 statements. In 32-bit, it can be done
too. We do it through the smpboot_clear_io_apic() inline function,
to cope with subarchitectures (visws) that does not compile mpparse in
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
They are mostly inocuous. APIC_INTEGRATED will expand to 1,
check_phys_apicid_present is checking for the same thing it was before,
etc. But the code is identical to i386 now, and will allow us to
integrate it.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This test exists in x86_64 and also applies to i386. So we add it
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
An APIC test is moved, and code is replaced by the mach-default
already defined function (smpboot_setup_io_apic).
setup_portio_remap() is added, but it is a nop in mach-default.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add function calls to native_smp_prepare_cpus in i386
to match x86_64
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch get rid of smp_boot_cpus(), since it does not
boot any cpu anymore. Its code is split in a way to make
it closer to x86_64
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
if smp configuration is not found at all, hook into 0.
This is done to match x86_64
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
They look similar enough, and are merged. Only difference
(zap_low_mapping for i386) is inside ifdef
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
it is practically the same between arches now, so it is
moved to smpboot.c. Minor differences (gdt initialization)
live inside an ifdef
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It now looks the same between architectures, so we
merge it in smpboot.c. Minor differences goes inside
an ifdef
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is a very large patch, because it depends on a lot
of auxiliary static functions. But they all have been modified
to the point that they're sufficiently close now. So they're just
merged in smpboot.c
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is to match i386. The former name was cuter,
but the current is more meaningful and more general,
since cpu_id can be a logical id.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
voyager would conflict with it, but the types are ultimately
compatible. So remove the extern definition from voyager_smp.c
in favour of the common one
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move map_cpu_to_logical_apicid() and unmap_cpu_to_logical_apicid()
to smpboot.c. They take together all the bunch of static functions
they rely upon
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that we boot cpus here, callin_map has this meaning (same
as x86_64)
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
wakeup_secondary_via_INIT => wakeup_secondary_cpu.
This is to match i386, where init is not always used.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After the inclusion, a lot of files needs fixing for conflicts,
some of them in the headers themselves, to accomodate for both
i386 and x86_64 versions.
[ mingo@elte.hu: build fix ]
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch provides minor adjustments for do_boot_cpus
in both architectures to allow for integration
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We do it to make it close to x86_64. The later needs it,
otherwise the nmi watchdog can get into the scene and kill us
with a hammer.
Enabling irqs here used to trigger a bug in i386. This is because
time irq handling relies upon structures that are only initialized
after smp initcalls (More precisely, it will find
per_cpu(hrtimer_bases, cpu)->cb_pending list not initialized and crash)
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It splits setup_local_APIC in two, providing a function corresponding
to the ending part of it. As a side effect, smp_callin looks the same
between i386 and x86_64.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
it is a little bit more complicated than x86_64 due to erratas and
other stuff, but its existance will ease integration
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We introduce empty macros just to make them look like the same
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use a new worker, with help of the create_idle struct
to fork the idle thread. We now have two workers, the first
of them triggered by __smp_prepare_cpu. But the later is
going away soon.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After all the infrastructure work, we're now prepared
to boot the cpus from cpu_up, and not from prepare_cpus.
So the difference between cold boot and hotplug is effectively
over, and the functions are used to the purposes they're meant to.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It was okay when cpus were cold booted before this point.
But with the new state machine, they will not have arrived to
the trampoline yet. zapping low mappings will have the bad effect
of breaking it completely after paging enablement
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Only call schedule_work if keventd is already running.
This is already the way x86_64 does
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
it is redundant, since it is already done by set_cpu_sibling_map()
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Two more files goes away. nmi_64.h and nmi_32.h gives birth
to nmi.h
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We move it to apic_32.c, since it's irq related anyway,
and only called from that file.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We do it and also fix conflicts, which makes x86_64 automatically
closer to i386
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Do it and also fix conflicts, which automatically makes
x86_64 look closer to i386
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
this patch allows x86_64 to use subarch mach_ headers
in practice, since x86_64 does not have any subarch, it
will use mach_default. But it will allow for substantially
less code duplication
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
the cpu count is changed accordingly: now, what matters is
online cpus.
Also, we add those functions for x86_64
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impressing friends is a very important thing.
Do it in a separate function to make it even more
explicit, and ease integration.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is the way x86_64 does, and complement the already
present patch that does the bios cpu to apicid mapping here
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We fill the per-cpu (or array) that maps
bios cpu id to apicid in mpparse_32.c, the way x86_64 does
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We use the same routing as x86_64, moved now to setup.c.
Just with a few ifdefs inside.
Note that this routing uses prefill_possible_map().
It has the very nice side effect of allowing hotplugging of
cpus that are marked as present but disabled by acpi bios.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
this will serve as a reference as to whether or not to
use the per_cpu variables in mpparse. Done the same way
as x86_64
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This mapping already exists in x86_64, just provide it for
i386
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We have already removed the only condition that could fail here.
so just don't test for any return value
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Do tests before do_boot_cpu in native_cpu_up for i386.
Tests are a little bit broader than originally, and are the
same as x86_64. Test for smp_callin is not applicable right now
and is deferred.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Isolate all sanity checking in a smp_sanity_check()
function as x86_64 does.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
goal is to have i386 and x86_64 closer, so we
add barriers to match
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch does not change the behaviour of x86_64, since APIC_INTEGRATED
is always defined as (1). But the code now matches exactly i386 version
(well, this part of the code, at least)
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is done so we call setup_secondary_clock() in the same place x86_64
does. A separate patch for this is appearantly not needed. But clock
initialization is such a delicate thing, that it's safer to do this way
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This matches x86_64 behaviour, which is a superior one IMHO
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
now that it is the same between arches, put it into smpboot.c
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Call it conditionally for secondary cpus. This behaviour
matches i386
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
provide two specialized identify_secondary_cpu() and identify_boot_cpu()
routines for x86_64. Although not strictly needed, they are functionally
correct, and will ease integration with i386
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The split of smp_store_cpu_info in a quirks-only part
will ease integration with x86_64
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It is used to match i386. The definition for the non-paravirt
case is moved to smp.h instead of smp_32.h
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch replaces apic_read() for apic_read_around()
and apic_write for apic_write_around() in smpboot_64.c
We do it to have a common usage between x86_64 and i386.
In the former, it will always simply expand to apic_write
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add loglevel facilities to printks in __inquire_remote_apic.
the levels are the ones to match x86_64 ones.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
change some variables' types in __inquire_remote_apic to
match x86_64
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix coding style in pci-dma_64.c and add stubs for documentation. I
hope someone fills the rest, I understand maybe off and soft...
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>