summaryrefslogtreecommitdiff
path: root/tools/perf/bench
AgeCommit message (Collapse)Author
2020-05-28perf tools: Replace zero-length array with flexible-arrayGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gustavo A. R. Silva <gustavo@embeddedor.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200515172926.GA31976@embeddedor Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-28tools feature: Rename HAVE_EVENTFD to HAVE_EVENTFD_SUPPORTArnaldo Carvalho de Melo
To be consistent with other such auto-detected features. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anand K Mistry <amistry@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05perf bench: Add kallsyms parsingIan Rogers
Add a benchmark for kallsyms parsing. Example output: Running 'internals/kallsyms-parse' benchmark: Average kallsyms__parse took: 103.971 ms (+- 0.121 ms) Committer testing: Test Machine: AMD Ryzen 5 3600X 6-Core Processor [root@five ~]# perf bench internals kallsyms-parse # Running 'internals/kallsyms-parse' benchmark: Average kallsyms__parse took: 79.692 ms (+- 0.101 ms) [root@five ~]# perf stat -r5 perf bench internals kallsyms-parse # Running 'internals/kallsyms-parse' benchmark: Average kallsyms__parse took: 80.563 ms (+- 0.079 ms) # Running 'internals/kallsyms-parse' benchmark: Average kallsyms__parse took: 81.046 ms (+- 0.155 ms) # Running 'internals/kallsyms-parse' benchmark: Average kallsyms__parse took: 80.874 ms (+- 0.104 ms) # Running 'internals/kallsyms-parse' benchmark: Average kallsyms__parse took: 81.173 ms (+- 0.133 ms) # Running 'internals/kallsyms-parse' benchmark: Average kallsyms__parse took: 81.169 ms (+- 0.074 ms) Performance counter stats for 'perf bench internals kallsyms-parse' (5 runs): 8,093.54 msec task-clock # 0.999 CPUs utilized ( +- 0.14% ) 3,165 context-switches # 0.391 K/sec ( +- 0.18% ) 10 cpu-migrations # 0.001 K/sec ( +- 23.13% ) 744 page-faults # 0.092 K/sec ( +- 0.21% ) 34,551,564,954 cycles # 4.269 GHz ( +- 0.05% ) (83.33%) 1,160,584,308 stalled-cycles-frontend # 3.36% frontend cycles idle ( +- 1.60% ) (83.33%) 14,974,323,985 stalled-cycles-backend # 43.34% backend cycles idle ( +- 0.24% ) (83.33%) 58,712,905,705 instructions # 1.70 insn per cycle # 0.26 stalled cycles per insn ( +- 0.01% ) (83.34%) 14,136,433,778 branches # 1746.632 M/sec ( +- 0.01% ) (83.33%) 141,943,217 branch-misses # 1.00% of all branches ( +- 0.04% ) (83.33%) 8.1040 +- 0.0115 seconds time elapsed ( +- 0.14% ) [root@five ~]# Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200501221315.54715-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30perf bench: Add a multi-threaded synthesize benchmarkIan Rogers
By default this isn't run as it reads /proc and may not have access. For consistency, modify the single threaded benchmark to compute an average time per event. Committer testing: $ grep -m1 "model name" /proc/cpuinfo model name : Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz $ grep "model name" /proc/cpuinfo | wc -l 8 $ $ perf bench internals synthesize -h # Running 'internals/synthesize' benchmark: Usage: perf bench internals synthesize <options> -I, --multi-iterations <n> Number of iterations used to compute multi-threaded average -i, --single-iterations <n> Number of iterations used to compute single-threaded average -M, --max-threads <n> Maximum number of threads in multithreaded bench -m, --min-threads <n> Minimum number of threads in multithreaded bench -s, --st Run single threaded benchmark -t, --mt Run multi-threaded benchmark $ $ perf bench internals synthesize -t # Running 'internals/synthesize' benchmark: Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 1 Average synthesis took: 65449.000 usec (+- 586.442 usec) Average num. events: 9405.400 (+- 0.306) Average time per event 6.959 usec Number of synthesis threads: 2 Average synthesis took: 37838.300 usec (+- 130.259 usec) Average num. events: 9501.800 (+- 20.469) Average time per event 3.982 usec Number of synthesis threads: 3 Average synthesis took: 48551.400 usec (+- 225.686 usec) Average num. events: 9544.000 (+- 0.000) Average time per event 5.087 usec Number of synthesis threads: 4 Average synthesis took: 29632.500 usec (+- 50.808 usec) Average num. events: 9544.000 (+- 0.000) Average time per event 3.105 usec Number of synthesis threads: 5 Average synthesis took: 33920.400 usec (+- 284.509 usec) Average num. events: 9544.000 (+- 0.000) Average time per event 3.554 usec Number of synthesis threads: 6 Average synthesis took: 27604.100 usec (+- 72.344 usec) Average num. events: 9548.000 (+- 0.000) Average time per event 2.891 usec Number of synthesis threads: 7 Average synthesis took: 25406.300 usec (+- 933.371 usec) Average num. events: 9545.500 (+- 0.167) Average time per event 2.662 usec Number of synthesis threads: 8 Average synthesis took: 24110.400 usec (+- 73.229 usec) Average num. events: 9551.000 (+- 0.000) Average time per event 2.524 usec $ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrey Zhizhikin <andrey.z@gmail.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200415054050.31645-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-22perf bench: Fix div-by-zero if runtime is zeroTommi Rantala
Fix div-by-zero if runtime is zero: $ perf bench futex hash --runtime=0 # Running 'futex/hash' benchmark: Run summary [PID 12090]: 4 threads, each operating on 1024 [private] futexes for 0 secs. Floating point exception (core dumped) Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200417132330.119407-4-tommi.t.rantala@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-16perf bench: Add event synthesis benchmarkIan Rogers
Event synthesis may occur at the start or end (tail) of a perf command. In system-wide mode it can scan every process in /proc, which may add seconds of latency before event recording. Add a new benchmark that times how long event synthesis takes with and without data synthesis. An example execution looks like: $ perf bench internals synthesize # Running 'internals/synthesize' benchmark: Average synthesis took: 168.253800 usec Average data synthesis took: 208.104700 usec Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrey Zhizhikin <andrey.z@gmail.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200402154357.107873-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-06perf bench: Clear struct sigaction before sigaction() syscallTommi Rantala
Avoid garbage in sigaction structs used in sigaction() syscalls. Valgrind is complaining about it. Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200305083714.9381-4-tommi.t.rantala@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-06perf bench futex-wake: Restore thread count default to online CPU countTommi Rantala
Since commit 3b2323c2c1c4 ("perf bench futex: Use cpumaps") the default number of threads the benchmark uses got changed from number of online CPUs to zero: $ perf bench futex wake # Running 'futex/wake' benchmark: Run summary [PID 15930]: blocking on 0 threads (at [private] futex 0x558b8ee4bfac), waking up 1 at a time. [Run 1]: Wokeup 0 of 0 threads in 0.0000 ms [...] [Run 10]: Wokeup 0 of 0 threads in 0.0000 ms Wokeup 0 of 0 threads in 0.0004 ms (+-40.82%) Restore the old behavior by grabbing the number of online CPUs via cpu->nr: $ perf bench futex wake # Running 'futex/wake' benchmark: Run summary [PID 18356]: blocking on 8 threads (at [private] futex 0xb3e62c), waking up 1 at a time. [Run 1]: Wokeup 8 of 8 threads in 0.0260 ms [...] [Run 10]: Wokeup 8 of 8 threads in 0.0270 ms Wokeup 8 of 8 threads in 0.0419 ms (+-24.35%) Fixes: 3b2323c2c1c4 ("perf bench futex: Use cpumaps") Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200305083714.9381-3-tommi.t.rantala@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-03perf bench: Share some global variables to fix build with gcc 10Arnaldo Carvalho de Melo
Noticed with gcc 10 (fedora rawhide) that those variables were not being declared as static, so end up with: ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `end'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `start'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `runtime'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `end'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `start'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `runtime'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here make[4]: *** [/git/perf/tools/build/Makefile.build:145: /tmp/build/perf/bench/perf-in.o] Error 1 Prefix those with bench__ and add them to bench/bench.h, so that we can share those on the tools needing to access those variables from signal handlers. Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200303155811.GD13702@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-20perf env: Remove needless cpumap.h headerArnaldo Carvalho de Melo
Only a 'struct perf_cmp_map' forward allocation is necessary, fix the places that need the header but were getting it indirectly, by luck, from env.h. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-3sj3n534zghxhk7ygzeaqlx9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-20perf tools: Remove util.h from where it is not neededArnaldo Carvalho de Melo
Check that it is not needed and remove, fixing up some fallout for places where it was only serving to get something else. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-9h6dg6lsqe2usyqjh5rrues4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-20perf tools: Remove needless builtin.h include directivesArnaldo Carvalho de Melo
Now that builtin.h isn't included by any other header, we can check where it is really needed, i.e. we can remove it and be sure that it isn't being obtained indirectly. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-mn7jheex85iw9qo6tlv26hb2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-29perf tools: Remove perf.h from source files not needing itArnaldo Carvalho de Melo
With the movement of lots of stuff out of perf.h to other headers we ended up not needing it in lots of places, remove it from those places. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-c718m0sxxwp73lp9d8vpihb4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-29perf tools: Move everything related to sys_perf_event_open() to perf-sys.hArnaldo Carvalho de Melo
And remove unneeded include directives from perf-sys.h to prune the header dependency tree. Fixup the fallout in places where definitions were being used without the needed include directives that were being satisfied because they were in perf-sys.h. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-7b1zvugiwak4ibfa3j6ott7f@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-12Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo
To get closer to upstream and check if we need to sync more UAPI headers, pick up fixes for libbpf that prevent perf's container tests from completing successfuly, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-01perf bench numa: Fix cpu0 bindingJiri Olsa
Michael reported an issue with perf bench numa failing with binding to cpu0 with '-0' option. # perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZcm0 --thp 1 -M 1 -ddd # Running 'numa/mem' benchmark: # Running main, "perf bench numa numa-mem -p 3 -t 1 -P 512 -s 100 -zZcm0 --thp 1 -M 1 -ddd" binding to node 0, mask: 0000000000000001 => -1 perf: bench/numa.c:356: bind_to_memnode: Assertion `!(ret)' failed. Aborted (core dumped) This happens when the cpu0 is not part of node0, which is the benchmark assumption and we can see that's not the case for some powerpc servers. Using correct node for cpu0 binding. Reported-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20190801142642.28004-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-29libperf: Add perf_cpu_map__new()/perf_cpu_map__read() functionsJiri Olsa
Moving the following functions from tools/perf: cpu_map__new() cpu_map__read() to libperf with the following names: perf_cpu_map__new() perf_cpu_map__read() Committer notes: Fixed up this one: tools/perf/arch/arm/util/cs-etm.c Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-44-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-29perf cpu_map: Rename struct cpu_map to struct perf_cpu_mapJiri Olsa
Rename struct cpu_map to struct perf_cpu_map, so it could be part of libperf. Committer notes: Added fixes for arm64, provided by Jiri. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-09perf tools: Use zfree() where applicableArnaldo Carvalho de Melo
In places where the equivalent was already being done, i.e.: free(a); a = NULL; And in placs where struct members are being freed so that if we have some erroneous reference to its struct, then accesses to freed members will result in segfaults, which we can detect faster than use after free to areas that may still have something seemingly valid. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-jatyoofo5boc1bsvoig6bb6i@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-09tools lib: Adopt zalloc()/zfree() from tools/perfArnaldo Carvalho de Melo
Eroding a bit more the tools/perf/util/util.h hodpodge header. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-natazosyn9rwjka25tvcnyi0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-02perf bench numa: Add define for RUSAGE_THREAD if not presentArnaldo Carvalho de Melo
While cross building perf to the ARC architecture on a fedora 30 host, we were failing with: CC /tmp/build/perf/bench/numa.o bench/numa.c: In function ‘worker_thread’: bench/numa.c:1261:12: error: ‘RUSAGE_THREAD’ undeclared (first use in this function); did you mean ‘SIGEV_THREAD’? getrusage(RUSAGE_THREAD, &rusage); ^~~~~~~~~~~~~ SIGEV_THREAD bench/numa.c:1261:12: note: each undeclared identifier is reported only once for each function it appears in [perfbuilder@60d5802468f6 perf]$ /arc_gnu_2019.03-rc1_prebuilt_uclibc_le_archs_linux_install/bin/arc-linux-gcc --version | head -1 arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 [perfbuilder@60d5802468f6 perf]$ Trying to reproduce a report by Vineet, I noticed that, with just cross-built zlib and numactl libraries, I ended up with the above failure. So, since RUSAGE_THREAD is available as a define, check for that and numactl libraries, I ended up with the above failure. So, since RUSAGE_THREAD is available as a define in the system headers, check if it is defined in the 'perf bench numa' sources and define it if not. Now it builds and I have to figure out if the problem reported by Vineet only takes place if we have libelf or some other library available. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: linux-snps-arc@lists.infradead.org Cc: Namhyung Kim <namhyung@kernel.org> Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com> Link: https://lkml.kernel.org/n/tip-2wb4r1gir9xrevbpq7qp0amk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-22Merge tag 'perf-core-for-mingo-5.1-20190321' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/core improvements and fixes from Arnaldo: BPF: Song Liu: - Add support for annotating BPF programs, using the PERF_RECORD_BPF_EVENT and PERF_RECORD_KSYMBOL recently added to the kernel and plugging binutils's libopcodes disassembly of BPF programs with the existing annotation interfaces in 'perf annotate', 'perf report' and 'perf top' various output formats (--stdio, --stdio2, --tui). perf list: Andi Kleen: - Filter metrics when using substring search. perf record: Andi Kleen: - Allow to limit number of reported perf.data files - Clarify help for --switch-output. perf report: Andi Kleen - Indicate JITed code better. - Show all sort keys in help output. perf script: Andi Kleen: - Support relative time. perf stat: Andi Kleen: - Improve scaling. General: Changbin Du: - Fix some mostly error path memory and reference count leaks found using gcc's ASan and UBSan. Vendor events: Mamatha Inamdar: - Remove P8 HW events which are not supported. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-03-19perf tools: Fix errors under optimization level '-Og'Changbin Du
Optimization level '-Og' offers a reasonable level of optimization while maintaining fast compilation and a good debugging experience. This patch tries to make it work. $ make DEBUG=1 EXTRA_CFLAGS='-Og' bench/epoll-ctl.c: In function ‘do_threads’: bench/epoll-ctl.c:274:9: error: ‘ret’ may be used uninitialized in this function [-Werror=maybe-uninitialized] return ret; ^~~ ... Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20190316080556.3075-4-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-05tools/: replace open encodings for NUMA_NO_NODEStephen Rothwell
This replaces all open encodings in tools with NUMA_NO_NODE. Also linux/numa.h is now needed for the perf build. [sfr@canb.auug.org.au: fix for replace open encodings for NUMA_NO_NODE] Link: http://lkml.kernel.org/r/20190108131141.730e9c4f@canb.auug.org.au Link: http://lkml.kernel.org/r/1545127933-10711-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: David Hildenbrand <david@redhat.com> Cc: Doug Ledford <dledford@redhat.com> [drivers/infiniband] Cc: Hans Verkuil <hverkuil@xs4all.nl> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe] Cc: Jens Axboe <axboe@kernel.dk> [mtip32xx] Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: Vinod Koul <vkoul@kernel.org> [dmaengine.c] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-21perf bench: Add epoll_ctl(2) benchmarkDavidlohr Bueso
Benchmark the various operations allowed for epoll_ctl(2). The idea is to concurrently stress a single epoll instance doing add/mod/del operations. Committer testing: # perf bench epoll ctl # Running 'epoll/ctl' benchmark: Run summary [PID 20344]: 4 threads doing epoll_ctl ops 64 file-descriptors for 8 secs. [thread 0] fdmap: 0x21a46b0 ... 0x21a47ac [ add: 1680960 ops; mod: 1680960 ops; del: 1680960 ops ] [thread 1] fdmap: 0x21a4960 ... 0x21a4a5c [ add: 1685440 ops; mod: 1685440 ops; del: 1685440 ops ] [thread 2] fdmap: 0x21a4c10 ... 0x21a4d0c [ add: 1674368 ops; mod: 1674368 ops; del: 1674368 ops ] [thread 3] fdmap: 0x21a4ec0 ... 0x21a4fbc [ add: 1677568 ops; mod: 1677568 ops; del: 1677568 ops ] Averaged 1679584 ADD operations (+- 0.14%) Averaged 1679584 MOD operations (+- 0.14%) Averaged 1679584 DEL operations (+- 0.14%) # Lets measure those calls with 'perf trace' to get a glympse at what this benchmark is doing in terms of syscalls: # perf trace -m32768 -s perf bench epoll ctl # Running 'epoll/ctl' benchmark: Run summary [PID 20405]: 4 threads doing epoll_ctl ops 64 file-descriptors for 8 secs. [thread 0] fdmap: 0x21764e0 ... 0x21765dc [ add: 1100480 ops; mod: 1100480 ops; del: 1100480 ops ] [thread 1] fdmap: 0x2176790 ... 0x217688c [ add: 1250176 ops; mod: 1250176 ops; del: 1250176 ops ] [thread 2] fdmap: 0x2176a40 ... 0x2176b3c [ add: 1022464 ops; mod: 1022464 ops; del: 1022464 ops ] [thread 3] fdmap: 0x2176cf0 ... 0x2176dec [ add: 705472 ops; mod: 705472 ops; del: 705472 ops ] Averaged 1019648 ADD operations (+- 11.27%) Averaged 1019648 MOD operations (+- 11.27%) Averaged 1019648 DEL operations (+- 11.27%) Summary of events: epoll-ctl (20405), 1264 events, 0.0% syscall calls total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- --------- --------- --------- --------- ------ eventfd2 256 9.514 0.001 0.037 5.243 68.00% clone 4 1.245 0.204 0.311 0.531 24.13% mprotect 66 0.345 0.002 0.005 0.021 7.43% openat 45 0.313 0.004 0.007 0.073 21.93% mmap 88 0.302 0.002 0.003 0.013 5.02% futex 4 0.160 0.002 0.040 0.140 83.43% sched_setaffinity 4 0.124 0.005 0.031 0.070 49.39% read 44 0.103 0.001 0.002 0.013 15.54% fstat 40 0.052 0.001 0.001 0.003 5.43% close 39 0.039 0.001 0.001 0.001 1.48% stat 9 0.034 0.003 0.004 0.006 7.30% access 3 0.023 0.007 0.008 0.008 4.25% open 2 0.021 0.008 0.011 0.013 22.60% getdents 4 0.019 0.001 0.005 0.009 37.15% write 2 0.013 0.004 0.007 0.009 38.48% munmap 1 0.010 0.010 0.010 0.010 0.00% brk 3 0.006 0.001 0.002 0.003 26.34% rt_sigprocmask 2 0.004 0.001 0.002 0.003 43.95% rt_sigaction 3 0.004 0.001 0.001 0.002 16.07% prlimit64 3 0.004 0.001 0.001 0.001 5.39% prctl 1 0.003 0.003 0.003 0.003 0.00% epoll_create 1 0.003 0.003 0.003 0.003 0.00% lseek 2 0.002 0.001 0.001 0.001 11.42% sched_getaffinity 1 0.002 0.002 0.002 0.002 0.00% arch_prctl 1 0.002 0.002 0.002 0.002 0.00% set_tid_address 1 0.001 0.001 0.001 0.001 0.00% getpid 1 0.001 0.001 0.001 0.001 0.00% set_robust_list 1 0.001 0.001 0.001 0.001 0.00% execve 1 0.000 0.000 0.000 0.000 0.00% epoll-ctl (20406), 1245480 events, 14.6% syscall calls total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- --------- --------- --------- --------- ------ epoll_ctl 619511 1034.927 0.001 0.002 6.691 0.67% nanosleep 3226 616.114 0.006 0.191 10.376 7.57% futex 2 11.336 0.002 5.668 11.334 99.97% set_robust_list 1 0.001 0.001 0.001 0.001 0.00% clone 1 0.000 0.000 0.000 0.000 0.00% epoll-ctl (20407), 1243151 events, 14.5% syscall calls total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- --------- --------- --------- --------- ------ epoll_ctl 618350 1042.181 0.001 0.002 2.512 0.40% nanosleep 3220 366.261 0.012 0.114 18.162 9.59% futex 4 5.463 0.001 1.366 5.427 99.12% set_robust_list 1 0.002 0.002 0.002 0.002 0.00% epoll-ctl (20408), 1801690 events, 21.1% syscall calls total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- --------- --------- --------- --------- ------ epoll_ctl 896174 1540.581 0.001 0.002 6.987 0.74% nanosleep 4667 783.393 0.006 0.168 10.419 7.10% futex 2 4.682 0.002 2.341 4.681 99.93% set_robust_list 1 0.002 0.002 0.002 0.002 0.00% clone 1 0.000 0.000 0.000 0.000 0.00% epoll-ctl (20409), 4254890 events, 49.8% syscall calls total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- --------- --------- --------- --------- ------ epoll_ctl 2116416 3768.097 0.001 0.002 9.956 0.41% nanosleep 11023 1141.778 0.006 0.104 9.447 4.95% futex 3 0.037 0.002 0.012 0.029 70.50% set_robust_list 1 0.008 0.008 0.008 0.008 0.00% madvise 1 0.005 0.005 0.005 0.005 0.00% clone 1 0.000 0.000 0.000 0.000 0.00% # Committer notes: Fix build on fedora:24-x-ARC-uClibc, debian:experimental-x-mips, debian:experimental-x-mipsel, ubuntu:16.04-x-arm and ubuntu:16.04-x-powerpc CC /tmp/build/perf/bench/epoll-ctl.o bench/epoll-ctl.c: In function 'init_fdmaps': bench/epoll-ctl.c:214:16: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] for (i = 0; i < nfds; i+=inc) { ^ bench/epoll-ctl.c: In function 'bench_epoll_ctl': bench/epoll-ctl.c:377:16: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] for (i = 0; i < nthreads; i++) { ^ bench/epoll-ctl.c:388:16: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] for (i = 0; i < nthreads; i++) { ^ cc1: all warnings being treated as errors Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Jason Baron <jbaron@akamai.com> Link: http://lkml.kernel.org/r/20181106152226.20883-3-dave@stgolabs.net [ Use inttypes.h to print rlim_t fields, fixing the build on Alpine Linux / musl libc ] [ Check if eventfd() is available, i.e. if HAVE_EVENTFD is defined ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21perf bench: Add epoll parallel epoll_wait benchmarkDavidlohr Bueso
This program benchmarks concurrent epoll_wait(2) for file descriptors that are monitored with with EPOLLIN along various semantics, by a single epoll instance. Such conditions can be found when using single/combined or multiple queuing when load balancing. Each thread has a number of private, nonblocking file descriptors, referred to as fdmap. A writer thread will constantly be writing to the fdmaps of all threads, minimizing each threads's chances of epoll_wait not finding any ready read events and blocking as this is not what we want to stress. Full details in the start of the C file. Committer testing: # perf bench Usage: perf bench [<common options>] <collection> <benchmark> [<options>] # List of all available benchmark collections: sched: Scheduler and IPC benchmarks mem: Memory access benchmarks numa: NUMA scheduling and MM benchmarks futex: Futex stressing benchmarks epoll: Epoll stressing benchmarks all: All benchmarks # perf bench epoll # List of available benchmarks for collection 'epoll': wait: Benchmark epoll concurrent epoll_waits all: Run all futex benchmarks # perf bench epoll wait # Running 'epoll/wait' benchmark: Run summary [PID 19295]: 3 threads monitoring on 64 file-descriptors for 8 secs. [thread 0] fdmap: 0xdaa650 ... 0xdaa74c [ 328241 ops/sec ] [thread 1] fdmap: 0xdaa900 ... 0xdaa9fc [ 351695 ops/sec ] [thread 2] fdmap: 0xdaabb0 ... 0xdaacac [ 381423 ops/sec ] Averaged 353786 operations/sec (+- 4.35%), total secs = 8 # Committer notes: Fix the build on debian:experimental-x-mips, debian:experimental-x-mipsel and others: CC /tmp/build/perf/bench/epoll-wait.o bench/epoll-wait.c: In function 'writerfn': bench/epoll-wait.c:399:12: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] printinfo("exiting writer-thread (total full-loops: %ld)\n", iter); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~ bench/epoll-wait.c:86:31: note: in definition of macro 'printinfo' do { if (__verbose) { printf(fmt, ## arg); fflush(stdout); } } while (0) ^~~ cc1: all warnings being treated as errors Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Jason Baron <jbaron@akamai.com> <jbaron@akamai.com> Link: http://lkml.kernel.org/r/20181106152226.20883-2-dave@stgolabs.net Link: http://lkml.kernel.org/r/20181106182349.thdkpvshkna5vd7o@linux-r8p5> [ Applied above fixup as per Davidlohr's request ] [ Use inttypes.h to print rlim_t fields, fixing the build on Alpine Linux / musl libc ] [ Check if eventfd() is available, i.e. if HAVE_EVENTFD is defined ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21perf bench: Move HAVE_PTHREAD_ATTR_SETAFFINITY_NP into bench.hDavidlohr Bueso
Both futex and epoll need this call, and can cause build failure on systems that don't have it pthread_attr_setaffinity_np(). Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Jason Baron <jbaron@akamai.com> Link: http://lkml.kernel.org/r/20181109210719.pr7ohayuwqmfp2wl@linux-r8p5 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-30tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy'Arnaldo Carvalho de Melo
To cope with the changes in: 12c89130a56a ("x86/asm/memcpy_mcsafe: Add write-protection-fault handling") 60622d68227d ("x86/asm/memcpy_mcsafe: Return bytes remaining") bd131544aa7e ("x86/asm/memcpy_mcsafe: Add labels for __memcpy_mcsafe() write fault handling") da7bc9c57eb0 ("x86/asm/memcpy_mcsafe: Remove loop unrolling") This needed introducing a file with a copy of the mcsafe_handle_tail() function, that is used in the new memcpy_64.S file, as well as a dummy mcsafe_test.h header. Testing it: $ nm ~/bin/perf | grep mcsafe 0000000000484130 T mcsafe_handle_tail 0000000000484300 T __memcpy_mcsafe $ $ perf bench mem memcpy # Running 'mem/memcpy' benchmark: # function 'default' (Default memcpy() provided by glibc) # Copying 1MB bytes ... 44.389205 GB/sec # function 'x86-64-unrolled' (unrolled memcpy() in arch/x86/lib/memcpy_64.S) # Copying 1MB bytes ... 22.710756 GB/sec # function 'x86-64-movsq' (movsq-based memcpy() in arch/x86/lib/memcpy_64.S) # Copying 1MB bytes ... 42.459239 GB/sec # function 'x86-64-movsb' (movsb-based memcpy() in arch/x86/lib/memcpy_64.S) # Copying 1MB bytes ... 42.459239 GB/sec $ This silences this perf tools build warning: Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S' Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-igdpciheradk3gb3qqal52d0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-25perf bench: Fix numa report output codeJiri Olsa
Currently we can hit following assert when running numa bench: $ perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZ0cm --thp 1 perf: bench/numa.c:1577: __bench_numa: Assertion `!(!(((wait_stat) & 0x7f) == 0))' failed. The assertion is correct, because we hit the SIGFPE in following line: Thread 2.2 "thread 0/0" received signal SIGFPE, Arithmetic exception. [Switching to Thread 0x7fffd28c6700 (LWP 11750)] 0x000.. in worker_thread (__tdata=0x7.. ) at bench/numa.c:1257 1257 td->speed_gbs = bytes_done / (td->runtime_ns / NSEC_PER_SEC) / 1e9; We don't check if the runtime is actually bigger than 1 second, and thus this might end up with zero division within FPU. Adding the check to prevent this. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180620094036.17278-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-07perf bench numa: Fix typo in optionsYisheng Xie
'R' means access the data via reads instead of writes, fix this typo. Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1524644707-11030-1-git-send-email-xieyisheng1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-12-27perf perf: Remove duplicate includesPravin Shedge
These duplicate includes have been found with scripts/checkincludes.pl but they have been removed manually to avoid removing false positives. Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1512582204-6493-1-git-send-email-pravin.shedge4linux@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-12-05perf bench futex: Sync waker threadsJames Yang
Waker threads in the futex wake-parallel benchmark are started by a loop using pthread_create(). However, there is no synchronization for when the waker threads wake the waiting threads. Comparison of the waker threads' measurement timestamps show they are not all running concurrently because older waker threads finish their task before newer waker threads even start. This patch uses a barrier to better synchronize the waker threads. Signed-off-by: James Yang <james.yang@arm.com Cc: Kim Phillips <kim.phillips@arm.com> Link: http://lkml.kernel.org/r/20171127042101.3659-4-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> [ Disable the wake-parallel test for systems without pthread_barrier_t ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-11-30perf bench futex: Use cpumapsDavidlohr Bueso
It was reported that the whole futex bench breaks when dealing with non-contiguously numbered cpus. $ echo 0 | sudo tee /sys/devices/system/cpu/cpu3/online $ ./perf bench futex all perf: pthread_create: Operation not permitted Run summary [PID 14934]: 7 threads, each .... James had implemented an approach with cpumaps that use an in house flavor. Instead of re-inventing the wheel, I've redone the patch such that we use the perf's util/cpumap.c interface instead. Applies to all futex benchmarks. Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Originally-from: James Yang <james.yang@arm.com> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Kim Phillips <kim.phillips@arm.com> Link: http://lkml.kernel.org/r/20171127042101.3659-2-dave@stgolabs.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-11-28perf bench numa: Fixup discontiguous/sparse numa nodesSatheesh Rajendran
Certain systems are designed to have sparse/discontiguous nodes. On such systems, 'perf bench numa' hangs, shows wrong number of nodes and shows values for non-existent nodes. Handle this by only taking nodes that are exposed by kernel to userspace. Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1edbcd353c009e109e93d78f2f46381930c340fe.1511368645.git.sathnaga@linux.vnet.ibm.com Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-19perf tools: Use __maybe_unused consistentlyArnaldo Carvalho de Melo
Instead of defining __unused or redefining __maybe_unused. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-4eleto5pih31jw1q4dypm9pf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-19perf tools: Move extra string util functions to util/string2.hArnaldo Carvalho de Melo
Moving them from util.h, where they don't belong. Since libc already have string.h, name it slightly differently, as string2.h. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-eh3vz5sqxsrdd8lodoro4jrw@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-19perf tools: Including missing inttypes.h headerArnaldo Carvalho de Melo
Needed to use the PRI[xu](32,64) formatting macros. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-wkbho8kaw24q67dd11q0j39f@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-19perf tools: Add include <linux/kernel.h> where ARRAY_SIZE() is usedArnaldo Carvalho de Melo
To pave the way for further cleanups where linux/kernel.h may stop being included in some header. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-qqxan6tfsl6qx3l0v3nwgjvk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-27perf tools: Remove unused 'prefix' from builtin functionsArnaldo Carvalho de Melo
We got it from the git sources but never used it for anything, with the place where this would be somehow used remaining: static int run_builtin(struct cmd_struct *p, int argc, const char **argv) { prefix = NULL; if (p->option & RUN_SETUP) prefix = NULL; /* setup_perf_directory(); */ Ditch it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-uw5swz05vol0qpr32c5lpvus@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-06perf bench numa: Add more comment for -c optionJiri Olsa
Adding more commentary for -c/--show_convergence option, to explain how the convergence is defined. Before: -c, --show_convergence show convergence details Now: -c, --show_convergence convergence is reached when each process \ (all its threads) is running on a single NUMA node. Suggested--by: Jiri Hladky <jhladky@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Hladky <jhladky@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1488732011-27384-1-git-send-email-jolsa@kernel.org [ Rephrased a bit based on a IRC conversation with Jiri ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-03perf bench futex: Fix build on musl + clangArnaldo Carvalho de Melo
When building with clang on a musl libc system, Alpine Linux, we end up hitting a problem where memset() is used but its prototype is not present, add it to avoid this: bench/futex-wake.c:99:3: error: implicitly declaring library function 'memset' with type 'void *(void *, int, unsigned long)' [-Werror,-Wimplicit-function-declaration] CPU_ZERO(&cpu); ^ /usr/include/sched.h:127:23: note: expanded from macro 'CPU_ZERO' #define CPU_ZERO(set) CPU_ZERO_S(sizeof(cpu_set_t),set) ^ /usr/include/sched.h:110:30: note: expanded from macro 'CPU_ZERO_S' #define CPU_ZERO_S(size,set) memset(set,0,size) ^ bench/futex-wake.c:99:3: note: include the header <string.h> or explicitly provide a declaration for 'memset' Found while updating my test build containers to build perf with clang in more systems. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-jh10vaz2r98zl6gm5iau8prr@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-03perf bench futex: Use __maybe_unusedArnaldo Carvalho de Melo
Instead of attributing a variable to itself to silence the compiler, use the attribute designed for that, avoiding this: In file included from bench/futex-hash.c:24: bench/futex.h:95:7: error: explicitly assigning value of variable of type 'pthread_attr_t *' to itself [-Werror,-Wself-assign] attr = attr; ~~~~ ^ ~~~~ bench/futex.h:96:13: error: explicitly assigning value of variable of type 'size_t' (aka 'unsigned long') to itself [-Werror,-Wself-assign] cpusetsize = cpusetsize; ~~~~~~~~~~ ^ ~~~~~~~~~~ bench/futex.h:97:9: error: explicitly assigning value of variable of type 'cpu_set_t *' (aka 'struct cpu_set_t *') to itself [-Werror,-Wself-assign] cpuset = cpuset; ~~~~~~ ^ ~~~~~~ That is only triggered when HAVE_PTHREAD_ATTR_SETAFFINITY_NP isn't set. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-14ws1d1elj2d5ej8g7cwdqau@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-02-14perf bench numa: Make sure dprintf() is not definedArnaldo Carvalho de Melo
When building with clang we get this error: bench/numa.c:46:9: error: 'dprintf' macro redefined [-Werror,-Wmacro-redefined] #define dprintf(x...) do { if (g && g->p.show_details >= 1) printf(x); } while (0) ^ /usr/include/bits/stdio2.h:145:12: note: previous definition is here # define dprintf(fd, ...) \ ^ CC /tmp/build/perf/tests/parse-no-sample-id-all.o 1 error generated. So, make sure it is undefined before using that name. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Jakub Jelen <jjelen@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-f654o2svtrutamvxt7igwz74@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-02-14Revert "perf bench futex: Sanitize numeric parameters"Arnaldo Carvalho de Melo
This reverts commit 60758d6668b3e2fa8e5fd143d24d0425203d007e. Now that libsubcmd makes sure that OPT_UINTEGER options will not return negative values, we can revert this patch while addressing the problem it solved: # perf bench futex hash -t -4 # Running 'futex/hash' benchmark: Error: switch `t' expects an unsigned numerical value Usage: perf bench futex hash <options> -t, --threads <n> Specify amount of threads # perf bench futex hash -t-4 # Running 'futex/hash' benchmark: Error: switch `t' expects an unsigned numerical value Usage: perf bench futex hash <options> -t, --threads <n> Specify amount of threads # IMO it is more reasonable to flat out refuse to process a negative number than to silently turn it into an absolute value. This also helps in silencing clang's complaint about asking for an absolute value of an unsigned integer: bench/futex-hash.c:133:10: error: taking the absolute value of unsigned type 'unsigned int' has no effect [-Werror,-Wabsolute-value] nsecs = futexbench_sanitize_numeric(nsecs); ^ bench/futex.h:104:42: note: expanded from macro 'futexbench_sanitize_numeric' #define futexbench_sanitize_numeric(__n) abs((__n)) ^ bench/futex-hash.c:133:10: note: remove the call to 'abs' since unsigned values cannot be negative Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-2kl68v22or31vw643m2exz8x@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-02-09perf bench numa: Avoid possible truncation when using snprintf()Arnaldo Carvalho de Melo
Addressing this warning from gcc 7: CC /tmp/build/perf/bench/numa.o bench/numa.c: In function '__bench_numa': bench/numa.c:1582:42: error: '%d' directive output may be truncated writing between 1 and 10 bytes into a region of size between 8 and 17 [-Werror=format-truncation=] snprintf(tname, 32, "process%d:thread%d", p, t); ^~ bench/numa.c:1582:25: note: directive argument in the range [0, 2147483647] snprintf(tname, 32, "process%d:thread%d", p, t); ^~~~~~~~~~~~~~~~~~~~ In file included from /usr/include/stdio.h:939:0, from bench/../util/util.h:47, from bench/../builtin.h:4, from bench/numa.c:11: /usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' output between 17 and 35 bytes into a destination of size 32 return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ __bos (__s), __fmt, __va_arg_pack ()); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Petr Holasek <pholasek@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-twa37vsfqcie5gwpqwnjuuz9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20perf bench futex: Fix lock-pi help stringDavidlohr Bueso
Obvious copy/paste typo from the requeue program. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Davidlohr Bueso <dbueso@suse.de> Link: http://lkml.kernel.org/r/1481830584-30909-1-git-send-email-dave@stgolabs.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-25perf bench futex: Sanitize numeric parametersDavidlohr Bueso
This gets rid of oddities such as: perf bench futex hash -t -4 perf: calloc: Cannot allocate memory Runtime (and many more) are equally busted, i.e. run for bogus amounts of time. Just use the abs, instead of, for example errorring out. Committer note: After the patch: $ perf bench futex hash -t -4 # Running 'futex/hash' benchmark: Run summary [PID 10178]: 4 threads, each operating on 1024 [private] futexes for 10 secs. [thread 0] futexes: 0x34f9fa0 ... 0x34faf9c [ 4702208 ops/sec ] [thread 1] futexes: 0x34fb140 ... 0x34fc13c [ 4707020 ops/sec ] [thread 2] futexes: 0x34fc2e0 ... 0x34fd2dc [ 4711526 ops/sec ] [thread 3] futexes: 0x34fd480 ... 0x34fe47c [ 4709683 ops/sec ] Averaged 4707609 operations/sec (+- 0.04%), total secs = 10 $ Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/1477342613-9938-3-git-send-email-dave@stgolabs.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-25perf bench futex: Avoid worker cacheline bouncingDavidlohr Bueso
Sebastian noted that overhead for worker thread ops (throughput) accounting was producing 'perf' to appear in the profiles, consuming a non-trivial (i.e. 13%) amount of CPU. This is due to cacheline bouncing due to the increment of w->ops. We can easily fix this by just working on a local copy and updating the actual worker once done running, and ready to show the program summary. There is no danger of the worker being concurrent, so we can trust that no stale value is being seen by another thread. This also gets rid of the unnecessary cache alignment hack; its not worth it. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/1477342613-9938-2-git-send-email-dave@stgolabs.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24perf bench futex: Cache align the worker structSebastian Andrzej Siewior
It popped up in perf testing that the worker consumes some amount of CPU. It boils down to the increment of `ops` which causes cache line bouncing between the individual threads. This patch aligns the struct by 256 bytes to ensure that not a cache line is shared among CPUs. 128 byte is the x86 worst case and grep says that L1_CACHE_SHIFT is set to 8 on s390. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161016190803.3392-1-bigeasy@linutronix.de Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>