Commit Graph

8320 Commits

Author SHA1 Message Date
Linus Torvalds
e80fb7e52f Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Set correct normal_prio and prio values in sched_fork()
2009-10-08 12:07:24 -07:00
Linus Torvalds
f17f36bb1c Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: user local buffer variable for trace branch tracer
  tracing: fix warning on kernel/trace/trace_branch.c andtrace_hw_branches.c
  ftrace: check for failure for all conversions
  tracing: correct module boundaries for ftrace_release
  tracing: fix transposed numbers of lock_depth and preempt_count
  trace: Fix missing assignment in trace_ctxwake_*
  tracing: Use free_percpu instead of kfree
  tracing: Check total refcount before releasing bufs in profile_enable failure
2009-10-08 12:06:09 -07:00
Linus Torvalds
b924f9599d Merge branch 'sparc-perf-events-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sparc-perf-events-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  mm, perf_event: Make vmalloc_user() align base kernel virtual address to SHMLBA
  perf_event: Provide vmalloc() based mmap() backing
2009-10-08 12:05:50 -07:00
Linus Torvalds
b9d40b7b1e Merge branch 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf_events: Make ABI definitions available to userspace
  perf tools: elf_sym__is_function() should accept "zero" sized functions
  tracing/syscalls: Use long for syscall ret format and field definitions
  perf trace: Update eval_flag() flags array to match interrupt.h
  perf trace: Remove unused code in builtin-trace.c
  perf: Propagate term signal to child
2009-10-08 12:05:00 -07:00
Steven Rostedt
8f6e8a314a tracing: user local buffer variable for trace branch tracer
Just using the tr->buffer for the API to trace_buffer_lock_reserve
is not good enough. This is because the tr->buffer may change, and we
do not want to commit with a different buffer that we reserved from.

This patch uses a local variable to hold the buffer that was used to
reserve and commit with.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-10-07 21:53:41 -04:00
Zhenwen Xu
c8647b2872 tracing: fix warning on kernel/trace/trace_branch.c andtrace_hw_branches.c
fix warnings that caused the API change of trace_buffer_lock_reserve()
change files: kernel/trace/trace_hw_branch.c
              kernel/trace/trace_branch.c

Signed-off-by: Zhenwen Xu <helight.xu@gmail.com>
LKML-Reference: <20091008012146.GA4170@helight>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-10-07 21:52:03 -04:00
Steven Rostedt
3279ba37db ftrace: check for failure for all conversions
Due to legacy code from back when the dynamic tracer used a daemon,
only core kernel code was checking for failures. This is no longer
the case. We must check for failures any time we perform text modifications.

Cc: stable@kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-10-07 17:22:24 -04:00
jolsa@redhat.com
e7247a15ff tracing: correct module boundaries for ftrace_release
When the module is about the unload we release its call records.
The ftrace_release function was given wrong values representing
the module core boundaries, thus not releasing its call records.

Plus making ftrace_release function module specific.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1254934835-363-3-git-send-email-jolsa@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-10-07 15:52:09 -04:00
Steven Rostedt
829b876dfc tracing: fix transposed numbers of lock_depth and preempt_count
The lock_depth and preempt_count numbers in the latency format is
transposed.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-10-07 14:05:04 -04:00
Eero Nurkkala
fdc6f192e7 NOHZ: update idle state also when NOHZ is inactive
Commit f2e21c9610 had unfortunate side
effects with cpufreq governors on some systems.

If the system did not switch into NOHZ mode ts->inidle is not set when
tick_nohz_stop_sched_tick() is called from the idle routine. Therefor
all subsequent calls from irq_exit() to tick_nohz_stop_sched_tick()
fail to call tick_nohz_start_idle(). This results in bogus idle
accounting information which is passed to cpufreq governors.

Set the inidle flag unconditionally of the NOHZ active state to keep
the idle time accounting correct in any case.

[ tglx: Added comment and tweaked the changelog ]

Reported-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Eero Nurkkala <ext-eero.nurkkala@nokia.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: stable@kernel.org
LKML-Reference: <1254907901.30157.93.camel@eenurkka-desktop>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-07 13:05:05 +02:00
Hiroshi Shimamoto
b0f56f1a63 trace: Fix missing assignment in trace_ctxwake_*
The state char variable S should be reassigned, if S == 0.

We are missing the state of the task that is going to sleep for the
context switch events (in the raw mode).

Fortunately the problem arises with the sched_switch/wake_up
tracers, not the sched trace events.

The formers are legacy now. But still, that was buggy.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4AC43118.6050409@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-06 14:28:24 +02:00
Peter Zijlstra
906010b213 perf_event: Provide vmalloc() based mmap() backing
Some architectures such as Sparc, ARM and MIPS (basically
everything with flush_dcache_page()) need to deal with dcache
aliases by carefully placing pages in both kernel and user maps.

These architectures typically have to use vmalloc_user() for this.

However, on other architectures, vmalloc() is not needed and has
the downsides of being more restricted and slower than regular
allocations.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: David Miller <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1254830228.21044.272.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-06 14:21:50 +02:00
Tom Zanussi
ee949a86b3 tracing/syscalls: Use long for syscall ret format and field definitions
The syscall event definitions use long for the syscall exit ret
value, but unsigned long for the same thing in the format and field
definitions.  Change them all to long.

Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: rostedt@goodmis.org
Cc: lizf@cn.fujitsu.com
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1254808849-7829-4-git-send-email-tzanussi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-06 12:02:34 +02:00
Linus Torvalds
41cb6654eb Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf tools: Run generate-cmdlist.sh properly
  perf_event: Clean up perf_event_init_task()
  perf_event: Fix event group handling in __perf_event_sched_*()
  perf timechart: Add a power-only mode
  perf top: Add poll_idle to the skip list
2009-10-05 12:04:41 -07:00
Linus Torvalds
e69a9ac596 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  hrtimer: Remove overly verbose "switch to high res mode" message
2009-10-05 12:04:16 -07:00
Linus Torvalds
0f26ec69f0 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  kmemtrace: Fix up tracer registration
  tracing: Fix infinite recursion in ftrace_update_pid_func()
2009-10-05 12:03:43 -07:00
Peter Williams
f83f9ac263 sched: Set correct normal_prio and prio values in sched_fork()
normal_prio should be updated if policy changes from RT to
SCHED_MORMAL or if static_prio/nice is changed.

Some paths through sched_fork() ignore this requirement and may
result in normal_prio having an invalid value.

Fixing this issue allows the call to effective_prio() in
wake_up_new_task() to be removed.

Signed-off-by: Peter Williams <pwil3058@bigpond.net.au>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <f8f46736fd4e7f090ac0.1253774830@mudlark.pw.nest>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-05 13:42:20 +02:00
Frederic Weisbecker
75fb4090b3 tracing: Use free_percpu instead of kfree
In the event->profile_enable() failure path, we release the per cpu
buffers using kfree which is wrong because they are per cpu pointers.
Although free_percpu only wraps kfree for now, that may change in the
future so lets use the correct way.

Reported-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
2009-10-05 10:57:56 +02:00
Frederic Weisbecker
fe8e5b5a60 tracing: Check total refcount before releasing bufs in profile_enable failure
When we call the profile_enable() callback of an event, we release the
shared perf event tracing buffers unconditionnaly in the failure path.
This is wrong because there may be other users of these. Then check the
total refcount before doing this.

Reported-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
2009-10-05 10:57:41 +02:00
Linus Torvalds
58e57fbd1c Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (41 commits)
  Revert "Seperate read and write statistics of in_flight requests"
  cfq-iosched: don't delay async queue if it hasn't dispatched at all
  block: Topology ioctls
  cfq-iosched: use assigned slice sync value, not default
  cfq-iosched: rename 'desktop' sysfs entry to 'low_latency'
  cfq-iosched: implement slower async initiate and queue ramp up
  cfq-iosched: delay async IO dispatch, if sync IO was just done
  cfq-iosched: add a knob for desktop interactiveness
  Add a tracepoint for block request remapping
  block: allow large discard requests
  block: use normal I/O path for discard requests
  swapfile: avoid NULL pointer dereference in swapon when s_bdev is NULL
  fs/bio.c: move EXPORT* macros to line after function
  Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
  cciss: fix build when !PROC_FS
  block: Do not clamp max_hw_sectors for stacking devices
  block: Set max_sectors correctly for stacking devices
  cciss: cciss_host_attr_groups should be const
  cciss: Dynamically allocate the drive_info_struct for each logical drive.
  cciss: Add usage_count attribute to each logical drive in /sys
  ...
2009-10-04 12:39:14 -07:00
KAMEZAWA Hiroyuki
4e649152cb memcg: some modification to softlimit under hierarchical memory reclaim.
This patch clean up/fixes for memcg's uncharge soft limit path.

Problems:
  Now, res_counter_charge()/uncharge() handles softlimit information at
  charge/uncharge and softlimit-check is done when event counter per memcg
  goes over limit. Now, event counter per memcg is updated only when
  memory usage is over soft limit. Here, considering hierarchical memcg
  management, ancesotors should be taken care of.

  Now, ancerstors(hierarchy) are handled in charge() but not in uncharge().
  This is not good.

  Prolems:
  1. memcg's event counter incremented only when softlimit hits. That's bad.
     It makes event counter hard to be reused for other purpose.

  2. At uncharge, only the lowest level rescounter is handled. This is bug.
     Because ancesotor's event counter is not incremented, children should
     take care of them.

  3. res_counter_uncharge()'s 3rd argument is NULL in most case.
     ops under res_counter->lock should be small. No "if" sentense is better.

Fixes:
  * Removed soft_limit_xx poitner and checks in charge and uncharge.
    Do-check-only-when-necessary scheme works enough well without them.

  * make event-counter of memcg incremented at every charge/uncharge.
    (per-cpu area will be accessed soon anyway)

  * All ancestors are checked at soft-limit-check. This is necessary because
    ancesotor's event counter may never be modified. Then, they should be
    checked at the same time.

Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01 16:11:13 -07:00
KAMEZAWA Hiroyuki
3dece8347d cgroup: catch bad css refcnt at css_put
__css_put() doesn't check a bug as refcnt goes to minus.
I think it should be caught. This patch adds a check for it.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01 16:11:12 -07:00
Alexey Dobriyan
828c09509b const: constify remaining file_operations
[akpm@linux-foundation.org: fix KVM]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01 16:11:11 -07:00
Paul Mundt
3ae91c21dd module: fix up CONFIG_KALLSYMS=n build.
Starting from commit 4a4962263f "reduce
symbol table for loaded modules (v2)", the kernel/module.c build is broken
with CONFIG_KALLSYMS disabled.

  CC      kernel/module.o
kernel/module.c:1995: warning: type defaults to 'int' in declaration of 'Elf_Hdr'
kernel/module.c:1995: error: expected ';', ',' or ')' before '*' token
kernel/module.c: In function 'load_module':
kernel/module.c:2203: error: 'strmap' undeclared (first use in this function)
kernel/module.c:2203: error: (Each undeclared identifier is reported only once
kernel/module.c:2203: error: for each function it appears in.)
kernel/module.c:2239: error: 'symoffs' undeclared (first use in this function)
kernel/module.c:2239: error: implicit declaration of function 'layout_symtab'
kernel/module.c:2240: error: 'stroffs' undeclared (first use in this function)
make[1]: *** [kernel/module.o] Error 1
make: *** [kernel/module.o] Error 2

There are three different issues:

    - layout_symtab() takes a const Elf_Ehdr

    - layout_symtab() needs to return a value

    - symoffs/stroffs/strmap are referenced by the load_module() code
      despite being ifdefed out, which seems unnecessary given the noop
      behaviour of layout_symtab()/add_kallsyms() in the case of
      CONFIG_KALLSYMS=n.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01 16:11:11 -07:00
Jun'ichi Nomura
b0da3f0dad Add a tracepoint for block request remapping
Since 2.6.31 now has request-based device-mapper, it's useful to have
a tracepoint for request-remapping as well as bio-remapping.
This patch adds a tracepoint for request-remapping, trace_block_rq_remap().

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01 21:19:34 +02:00
Zdenek Kabelac
48c0d4d4c0 Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
introduced in commit 1d54ad6da9.
Release kobject also in case the request_fn is NULL.

Problem was noticed via kmemleak backtrace when some sysfs entries were
note properly destroyed during  device removal:

unreferenced object 0xffff88001aa76640 (size 80):
  comm "lvcreate", pid 2120, jiffies 4294885144
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 f0 65 a7 1a 00 88 ff ff  .........e......
    90 66 a7 1a 00 88 ff ff 86 1d 53 81 ff ff ff ff  .f........S.....
  backtrace:
    [<ffffffff813f9cc6>] kmemleak_alloc+0x26/0x60
    [<ffffffff8111d693>] kmem_cache_alloc+0x133/0x1c0
    [<ffffffff81195891>] sysfs_new_dirent+0x41/0x120
    [<ffffffff81194b0c>] sysfs_add_file_mode+0x3c/0xb0
    [<ffffffff81197c81>] internal_create_group+0xc1/0x1a0
    [<ffffffff81197d93>] sysfs_create_group+0x13/0x20
    [<ffffffff810d8004>] blk_trace_init_sysfs+0x14/0x20
    [<ffffffff8123f45c>] blk_register_queue+0x3c/0xf0
    [<ffffffff812447e4>] add_disk+0x94/0x160
    [<ffffffffa00d8b08>] dm_create+0x598/0x6e0 [dm_mod]
    [<ffffffffa00de951>] dev_create+0x51/0x350 [dm_mod]
    [<ffffffffa00de823>] ctl_ioctl+0x1a3/0x240 [dm_mod]
    [<ffffffffa00de8f2>] dm_compat_ctl_ioctl+0x12/0x20 [dm_mod]
    [<ffffffff81177bfd>] compat_sys_ioctl+0xcd/0x4f0
    [<ffffffff81036ed8>] sysenter_dispatch+0x7/0x2c
    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Zdenek Kabelac <zkabelac@redhat.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01 21:15:46 +02:00
Paul Mundt
f9ac5a69ed kmemtrace: Fix up tracer registration
Commit ddc1637af2 ("kmemtrace: Print
binary output only if 'bin' option is set") ended up inverting the
error detection logic. register_tracer() returns 0 on success,
which this change caused to treat as an error, resulting in:

[    0.132000] Warning: could not register the kmem tracer

as well as bailing out of the initcall with an error value. This
restores the old logic.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <20090928075540.GD6668@linux-sh.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-01 11:53:44 +02:00
Xiao Guangrong
27f9994c50 perf_event: Clean up perf_event_init_task()
While at it: we can traverse ctx->group_list to get all
group leader, it should be safe since we hold ctx->mutex.

Changlog v1->v2:

  - remove WARN_ON_ONCE() according to Peter Zijlstra's suggestion

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <4ABC5AF9.6060808@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-01 09:30:44 +02:00
Xiao Guangrong
8c9ed8e14c perf_event: Fix event group handling in __perf_event_sched_*()
Paul Mackerras says:

 "Actually, looking at this more closely, it has to be a group
 leader anyway since it's at the top level of ctx->group_list.  In
 fact I see four places where we do:

  list_for_each_entry(event, &ctx->group_list, group_entry) {
	if (event == event->group_leader)
		...

 or the equivalent, three of which appear to have been introduced
 by afedadf2 ("perf_counter: Optimize sched in/out of counters")
 back in May by Peter Z.

 As far as I can see the if () is superfluous in each case (a
 singleton event will be a group of 1 and will have its
 group_leader pointing to itself)."

 [ See: http://marc.info/?l=linux-kernel&m=125361238901442&w=2 ]

And Peter Zijlstra points out this is a bugfix:

 "The intent was to call event_sched_{in,out}() for single event
  groups because that's cheaper than group_sched_{in,out}(),
  however..

  - as you noticed, I got the condition wrong, it should have read:

      list_empty(&event->sibling_list)

  - it failed to call group_can_go_on() which deals with ->exclusive.

  - it also doesn't call hw_perf_group_sched_in() which might break
    power."

 [ See: http://marc.info/?l=linux-kernel&m=125369523318583&w=2 ]

Changelog v1->v2:

 - Fix the title name according to Peter Zijlstra's suggestion

 - Remove the comments and WARN_ON_ONCE() as Peter Zijlstra's
   suggestion

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <4ABC5A55.7000208@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-01 09:30:44 +02:00
Matt Fleming
33974093c0 tracing: Fix infinite recursion in ftrace_update_pid_func()
When CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST is enabled
__ftrace_trace_function contains the current trace function, not
ftrace_trace_function.

In ftrace_update_pid_func() we currently incorrectly assign the
value of ftrace_trace_function to __ftrace_trace_funcion before
returning.

Without this patch it is possible to execute an infinite recursion
whereby ftrace_test_stop_func() calls __ftrace_trace_function,
which was assigned ftrace_test_stop_func() in
ftrace_update_pid_func().

Signed-off-by: Matt Fleming <matthew.fleming@imgtec.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1254152581-18347-1-git-send-email-matt@console-pimps.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-01 08:19:24 +02:00
Eric Dumazet
152f9d0710 sched_clock: Fix atomicity/continuity bug by using cmpxchg64()
Commit def0a9b257 (sched_clock: Make it NMI safe) assumed
cmpxchg() of 64bit values was available on X86_32.

That is not so - and causes some subtle scheduler misbehavior due
to incorrect timestamps off to up by ~4 seconds.

Two symptoms are known right now:

 - interactivity problems seen by Arjan: up to 600 msecs
   latencies instead of the expected 20-40 msecs. These
   latencies are very visible on the desktop.

 - incorrect CPU stats: occasionally too high percentages in 'top',
   and crazy CPU usage stats.

Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090930170754.0886ff2e@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-30 22:56:10 +02:00
Alexey Dobriyan
f0f37e2f77 const: mark struct vm_struct_operations
* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code

But leave TTM code alone, something is fishy there with global vm_ops
being used.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-27 11:39:25 -07:00
Linus Torvalds
6f5071020d Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  hrtimer: Eliminate needless reprogramming of clock events device
2009-09-27 10:39:04 -07:00
Linus Torvalds
3b383767c4 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  futex: Add memory barrier commentary to futex_wait_queue_me()
  futex: Fix wakeup race by setting TASK_INTERRUPTIBLE before queue_me()
  futex: Correct futex_q woken state commentary
  futex: Make function kernel-doc commentary consistent
  futex: Correct queue_me and unqueue_me commentary
  futex: Correct futex_wait_requeue_pi() commentary
2009-09-26 10:15:53 -07:00
Linus Torvalds
179b9145d5 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  clocksource: Resume clocksource without taking the clocksource mutex
2009-09-26 10:14:41 -07:00
Linus Torvalds
4187e7e9f1 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  modules, tracing: Remove stale struct marker signature from module_layout()
  tracing/workqueue: Use %pf in workqueue trace events
  tracing: Fix a comment and a trivial format issue in tracepoint.h
  tracing: Fix failure path in ftrace_regex_open()
  tracing: Fix failure path in ftrace_graph_write()
  tracing: Check the return value of trace_get_user()
  tracing: Fix off-by-one in trace_get_user()
2009-09-26 10:13:54 -07:00
Roland Dreier
d3f6302e7e hrtimer: Remove overly verbose "switch to high res mode" message
On big systems, printing <number of CPUs> copies of

    Switched to high resolution mode on CPU nnn

clutters up the kernel log for minimal gain.  Just get rid of them.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
LKML-Reference: <ada1vlw126s.fsf_-_@cisco.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-26 16:58:09 +02:00
David S. Miller
8b3f6af863 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/
Conflicts:
	drivers/staging/Kconfig
	drivers/staging/Makefile
	drivers/staging/cpc-usb/TODO
	drivers/staging/cpc-usb/cpc-usb_drv.c
	drivers/staging/cpc-usb/cpc.h
	drivers/staging/cpc-usb/cpc_int.h
	drivers/staging/cpc-usb/cpcusb.h
2009-09-24 15:13:11 -07:00
Martin Schwidefsky
89133f9350 clocksource: Resume clocksource without taking the clocksource mutex
git commit 75c5158f70 converted the clocksource spinlock to a
mutex. This causes the following BUG:

BUG: sleeping function called from invalid context at
kernel/mutex.c:280 in_atomic(): 0, irqs_disabled(): 1, pid: 2473,
name: pm-suspend 2 locks held by pm-suspend/2473:
 #0:  (&buffer->mutex){......}, at: [<ffffffff8115ab13>]
sysfs_write_file+0x3c/0x137
 #1:  (pm_mutex){......}, at: [<ffffffff810865b5>]
enter_state+0x39/0x130 Pid: 2473, comm: pm-suspend Not tainted 2.6.31
#1 Call Trace:
 [<ffffffff810792f0>] ? __debug_show_held_locks+0x22/0x24
 [<ffffffff8104a2ef>] __might_sleep+0x107/0x10b
 [<ffffffff8141fca9>] mutex_lock_nested+0x25/0x43
 [<ffffffff81073537>] clocksource_resume+0x1c/0x60
 [<ffffffff81072902>] timekeeping_resume+0x1e/0x1c8
 [<ffffffff812aee62>] __sysdev_resume+0x25/0xcf
 [<ffffffff812aef79>] sysdev_resume+0x6d/0xae
 [<ffffffff810864f8>] suspend_devices_and_enter+0x12b/0x1af
 [<ffffffff8108665b>] enter_state+0xdf/0x130
 [<ffffffff81085dc3>] state_store+0xb6/0xd3
 [<ffffffff81204c73>] kobj_attr_store+0x17/0x19
 [<ffffffff8115abd2>] sysfs_write_file+0xfb/0x137
 [<ffffffff811057d2>] vfs_write+0xae/0x10b
 [<ffffffff81208392>] ? __up_read+0x1a/0x7f
 [<ffffffff811058ef>] sys_write+0x4a/0x6e
 [<ffffffff81011b82>] system_call_fastpath+0x16/0x1b

clocksource_resume is called early in the resume process, there is
only one cpu, no processes are running and the interrupts are
disabled. It is therefore possible to resume the clocksources
without taking the clocksource mutex.

Reported-by: Xiaotian Feng <xtfeng@gmail.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Tested-by: Michal Schmidt <mschmidt@redhat.com>
Cc: Xiaotian Feng <xtfeng@gmail.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090924172952.49697825@mschwide.boeblingen.de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24 22:37:53 +02:00
Darren Hart
9beba3c54d futex: Add memory barrier commentary to futex_wait_queue_me()
The memory barrier semantics of futex_wait_queue_me() are
non-obvious. Add some commentary to try and clarify it.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090924185447.694.38948.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24 22:30:10 +02:00
Linus Torvalds
a6b49cb210 Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze
* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze: (24 commits)
  microblaze: Disable heartbeat/enable emaclite in defconfigs
  microblaze: Support simpleImage.dts make target
  microblaze: Fix _start symbol to physical address
  microblaze: Use LOAD_OFFSET macro to get correct LMA for all sections
  microblaze: Create the LOAD_OFFSET macro used to compute VMA vs LMA offsets
  microblaze: Copy ppc asm-compat.h for clean handling of constants in asm and C
  microblaze: Actually show KiB rather than pages in "Freeing initrd memory:"
  microblaze: Support ptrace syscall tracing.
  microblaze: Updated CPU version and FPGA family codes in PVR
  microblaze: Generate correct signal and siginfo for integer div-by-zero
  microblaze: Don't be noisy when userspace causes hardware exceptions
  microblaze: Remove ipc.h file which points to non-existing asm-generic file
  microblaze: Clear sticky FSR register after generating exception signals
  microblaze: Ensure CPU usermode is set on new userspace processes
  microblaze: Use correct kbuild variable KBUILD_CFLAGS
  microblaze: Save and restore msr in hw exception
  microblaze: Add architectural support for USB EHCI host controllers
  microblaze: Implement include/asm/syscall.h.
  microblaze: Improve checking mechanism for MSR instruction
  microblaze: Add checking mechanism for MSR instruction
  ...
2009-09-24 09:01:44 -07:00
Linus Torvalds
2c9871de0a Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  module: don't call percpu_modfree on NULL pointer.
  module: fix memory leak when load fails after srcversion/version allocated
  module: preferred way to use MODULE_AUTHOR
  param: allow whitespace as kernel parameter separator
  module: reduce string table for loaded modules (v2)
  module: reduce symbol table for loaded modules (v2)
2009-09-24 09:01:05 -07:00
Linus Torvalds
6d39b27f0a Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
  lsm: Use a compressed IPv6 string format in audit events
  Audit: send signal info if selinux is disabled
  Audit: rearrange audit_context to save 16 bytes per struct
  Audit: reorganize struct audit_watch to save 8 bytes
2009-09-24 08:31:04 -07:00
Rusty Russell
ffa9f12a41 module: don't call percpu_modfree on NULL pointer.
The general one handles NULL, the static obsolescent
(CONFIG_HAVE_LEGACY_PER_CPU_AREA) one in module.c doesn't; Eric's
commit 720eba31 assumed it did, and various frobbings since then kept
that assumption.

All other callers in module.c all protect it with an if; this effectively
does the same as free_init is only goto if we fail percpu_modalloc().

Reported-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Américo Wang <xiyou.wangcong@gmail.com>
Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
2009-09-25 00:32:59 +09:30
Rusty Russell
a263f7763c module: fix memory leak when load fails after srcversion/version allocated
Normally the twisty paths of sysfs will free the attributes, but not if
we fail before we hook it into sysfs (which is the last thing we do in
load_module).

(This sysfs code is a turd, no doubt there are other issues lurking too).

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
2009-09-25 00:32:59 +09:30
Peter Oberparleiter
26d052bfce param: allow whitespace as kernel parameter separator
Some boot mechanisms require that kernel parameters are stored in a
separate file which is loaded to memory without further processing
(e.g. the "Load from FTP" method on s390). When such a file contains
newline characters, the kernel parameter preceding the newline might
not be correctly parsed (due to the newline being stuck to the end of
the actual parameter value) which can lead to boot failures.

This patch improves kernel command line usability in such a situation
by allowing generic whitespace characters as separators between kernel
parameters.

Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-25 00:32:58 +09:30
Jan Beulich
554bdfe5ac module: reduce string table for loaded modules (v2)
Also remove all parts of the string table (referenced by the symbol
table) that are not needed for kallsyms use (i.e. which were only
referenced by symbols discarded by the previous patch, or not
referenced at all for whatever reason).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-25 00:32:57 +09:30
Jan Beulich
4a4962263f module: reduce symbol table for loaded modules (v2)
Discard all symbols not interesting for kallsyms use: absolute,
section, and in the common case (!KALLSYMS_ALL) data ones.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-25 00:32:57 +09:30
Linus Torvalds
db16826367 Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6
* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
  HWPOISON: Enable error_remove_page on btrfs
  HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
  HWPOISON: Add madvise() based injector for hardware poisoned pages v4
  HWPOISON: Enable error_remove_page for NFS
  HWPOISON: Enable .remove_error_page for migration aware file systems
  HWPOISON: The high level memory error handler in the VM v7
  HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
  HWPOISON: shmem: call set_page_dirty() with locked page
  HWPOISON: Define a new error_remove_page address space op for async truncation
  HWPOISON: Add invalidate_inode_page
  HWPOISON: Refactor truncate to allow direct truncating of page v2
  HWPOISON: check and isolate corrupted free pages v2
  HWPOISON: Handle hardware poisoned pages in try_to_unmap
  HWPOISON: Use bitmask/action code for try_to_unmap behaviour
  HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
  HWPOISON: Add poison check to page fault handling
  HWPOISON: Add basic support for poisoned pages in fault handler v3
  HWPOISON: Add new SIGBUS error codes for hardware poison signals
  HWPOISON: Add support for poison swap entries v2
  HWPOISON: Export some rmap vma locking to outside world
  ...
2009-09-24 07:53:22 -07:00
Hiroshi Shimamoto
801460d0cf task_struct cleanup: move binfmt field to mm_struct
Because the binfmt is not different between threads in the same process,
it can be moved from task_struct to mm_struct.  And binfmt moudle is
handled per mm_struct instead of task_struct.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:05 -07:00