summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)Author
2021-04-14perf report: Fix wrong LBR block sortingJin Yao
[ Upstream commit f2013278ae40b89cc27916366c407ce5261815ef ] When '--total-cycles' is specified, it supports sorting for all blocks by 'Sampled Cycles%'. This is useful to concentrate on the globally hottest blocks. 'Sampled Cycles%' - block sampled cycles aggregation / total sampled cycles But in current code, it doesn't use the cycles aggregation. Part of 'cycles' counting is possibly dropped for some overlap jumps. But for identifying the hot block, we always need the full cycles. # perf record -b ./triad_loop # perf report --total-cycles --stdio Before: # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object # ............... .............. ........... .......... ............................................................. ................. # 0.81% 793 4.32% 793 [setup-vdso.h:34 -> setup-vdso.h:40] ld-2.27.so 0.49% 480 0.87% 160 [native_write_msr+0 -> native_write_msr+16] [kernel.kallsyms] 0.48% 476 0.52% 95 [native_read_msr+0 -> native_read_msr+29] [kernel.kallsyms] 0.31% 303 1.65% 303 [nmi_restore+0 -> nmi_restore+37] [kernel.kallsyms] 0.26% 255 1.39% 255 [nohz_balance_exit_idle+75 -> nohz_balance_exit_idle+162] [kernel.kallsyms] 0.24% 234 1.28% 234 [end_repeat_nmi+67 -> end_repeat_nmi+83] [kernel.kallsyms] 0.23% 227 1.24% 227 [__irqentry_text_end+96 -> __irqentry_text_end+126] [kernel.kallsyms] 0.20% 194 1.06% 194 [native_set_debugreg+52 -> native_set_debugreg+56] [kernel.kallsyms] 0.11% 106 0.14% 26 [native_sched_clock+0 -> native_sched_clock+98] [kernel.kallsyms] 0.10% 97 0.53% 97 [trigger_load_balance+0 -> trigger_load_balance+67] [kernel.kallsyms] 0.09% 85 0.46% 85 [get-dynamic-info.h:102 -> get-dynamic-info.h:111] ld-2.27.so ... 0.00% 92.7K 0.02% 4 [triad_loop.c:64 -> triad_loop.c:65] triad_loop The hottest block '[triad_loop.c:64 -> triad_loop.c:65]' is not at the top of output. After: # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object # ............... .............. ........... .......... .............................................................. ................. # 94.35% 92.7K 0.02% 4 [triad_loop.c:64 -> triad_loop.c:65] triad_loop 0.81% 793 4.32% 793 [setup-vdso.h:34 -> setup-vdso.h:40] ld-2.27.so 0.49% 480 0.87% 160 [native_write_msr+0 -> native_write_msr+16] [kernel.kallsyms] 0.48% 476 0.52% 95 [native_read_msr+0 -> native_read_msr+29] [kernel.kallsyms] 0.31% 303 1.65% 303 [nmi_restore+0 -> nmi_restore+37] [kernel.kallsyms] 0.26% 255 1.39% 255 [nohz_balance_exit_idle+75 -> nohz_balance_exit_idle+162] [kernel.kallsyms] 0.24% 234 1.28% 234 [end_repeat_nmi+67 -> end_repeat_nmi+83] [kernel.kallsyms] 0.23% 227 1.24% 227 [__irqentry_text_end+96 -> __irqentry_text_end+126] [kernel.kallsyms] 0.20% 194 1.06% 194 [native_set_debugreg+52 -> native_set_debugreg+56] [kernel.kallsyms] 0.11% 106 0.14% 26 [native_sched_clock+0 -> native_sched_clock+98] [kernel.kallsyms] 0.10% 97 0.53% 97 [trigger_load_balance+0 -> trigger_load_balance+67] [kernel.kallsyms] 0.09% 85 0.46% 85 [get-dynamic-info.h:102 -> get-dynamic-info.h:111] ld-2.27.so 0.08% 82 0.06% 11 [intel_pmu_drain_pebs_nhm+580 -> intel_pmu_drain_pebs_nhm+627] [kernel.kallsyms] 0.08% 77 0.42% 77 [lru_add_drain_cpu+0 -> lru_add_drain_cpu+133] [kernel.kallsyms] 0.08% 74 0.10% 18 [handle_pmi_common+271 -> handle_pmi_common+310] [kernel.kallsyms] 0.08% 74 0.40% 74 [get-dynamic-info.h:131 -> get-dynamic-info.h:157] ld-2.27.so 0.07% 69 0.09% 17 [intel_pmu_drain_pebs_nhm+432 -> intel_pmu_drain_pebs_nhm+468] [kernel.kallsyms] Now the hottest block is reported at the top of output. Fixes: b65a7d372b1a55db ("perf hist: Support block formats with compare/sort/display") Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210407024452.29988-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-14perf inject: Fix repipe usageAdrian Hunter
[ Upstream commit 026334a3bb6a3919b42aba9fc11843db2b77fd41 ] Since commit 14d3d54052539a1e ("perf session: Try to read pipe data from file") 'perf inject' has started printing "PERFILE2h" when not processing pipes. The commit exposed perf to the possiblity that the input is not a pipe but the 'repipe' parameter gets used. That causes the printing because perf inject sets 'repipe' to true always. The 'repipe' parameter of perf_session__new() is used by 2 functions: - perf_file_header__read_pipe() - trace_report() In both cases, the functions copy data to STDOUT_FILENO when 'repipe' is true. Fix by setting 'repipe' to true only if the output is a pipe. Fixes: e558a5bd8b74aff4 ("perf inject: Work with files") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Andrew Vagin <avagin@openvz.org> Link: http://lore.kernel.org/lkml/20210401103605.9000-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30perf synthetic events: Avoid write of uninitialized memory when generating ↵Ian Rogers
PERF_RECORD_MMAP* records [ Upstream commit 2a76f6de07906f0bb5f2a13fb02845db1695cc29 ] Account for alignment bytes in the zero-ing memset. Fixes: 1a853e36871b533c ("perf record: Allow specifying a pid to record") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210309234945.419254-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30perf auxtrace: Fix auxtrace queue conflictAdrian Hunter
[ Upstream commit b410ed2a8572d41c68bd9208555610e4b07d0703 ] The only requirement of an auxtrace queue is that the buffers are in time order. That is achieved by making separate queues for separate perf buffer or AUX area buffer mmaps. That generally means a separate queue per cpu for per-cpu contexts, and a separate queue per thread for per-task contexts. When buffers are added to a queue, perf checks that the buffer cpu and thread id (tid) match the queue cpu and thread id. However, generally, that need not be true, and perf will queue buffers correctly anyway, so the check is not needed. In addition, the check gets erroneously hit when using sample mode to trace multiple threads. Consequently, fix that case by removing the check. Fixes: e502789302a6 ("perf auxtrace: Add helpers for queuing AUX area tracing data") Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20210308151143.18338-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17perf report: Fix -F for branch & mem modesRavi Bangoria
commit 6740a4e70e5d1b9d8e7fe41fd46dd5656d65dadf upstream. perf report fails to add valid additional fields with -F when used with branch or mem modes. Fix it. Before patch: $ perf record -b $ perf report -b -F +srcline_from --stdio Error: Invalid --fields key: `srcline_from' After patch: $ perf report -b -F +srcline_from --stdio # Samples: 8K of event 'cycles' # Event count (approx.): 8784 ... Committer notes: There was an inversion: when looking at branch stack dimensions (keys) it was checking if the sort mode was 'mem', not 'branch'. Fixes: aa6b3c99236b ("perf report: Make -F more strict like -s") Reported-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20210304062958.85465-1-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17perf traceevent: Ensure read cmdlines are null terminated.Ian Rogers
commit 137a5258939aca56558f3a23eb229b9c4b293917 upstream. Issue detected by address sanitizer. Fixes: cd4ceb63438e9e28 ("perf util: Save pid-cmdline mapping into tracing header") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210226221431.1985458-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17perf build: Fix ccache usage in $(CC) when generating arch errno tableAntonio Terceiro
commit dacfc08dcafa7d443ab339592999e37bbb8a3ef0 upstream. This was introduced by commit e4ffd066ff440a57 ("perf: Normalize gcc parameter when generating arch errno table"). Assuming the first word of $(CC) is the actual compiler breaks usage like CC="ccache gcc": the script ends up calling ccache directly with gcc arguments, what fails. Instead of getting the first word, just remove from $(CC) any word that starts with a "-". This maintains the spirit of the original patch, while not breaking ccache users. Fixes: e4ffd066ff440a57 ("perf: Normalize gcc parameter when generating arch errno table") Signed-off-by: Antonio Terceiro <antonio.terceiro@linaro.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: He Zhe <zhe.he@windriver.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20210224130046.346977-1-antonio.terceiro@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04perf stat: Use nftw() instead of ftw()Paul Cercueil
commit a81fbb8771a3810a58d657763fde610bf2c33286 upstream. ftw() has been obsolete for about 12 years now. Committer notes: Further notes provided by the patch author: "NOTE: Not runtime-tested, I have no idea what I need to do in perf to test this. But at least it compiles now with my uClibc-based toolchain." I looked at the nftw()/ftw() man page and for the use made with cgroups in 'perf stat' the end result is equivalent. Fixes: bb1c15b60b98 ("perf stat: Support regex pattern in --for-each-cgroup") Signed-off-by: Paul Cercueil <paul@crapouillou.net> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: od@zcrc.me Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20210208181157.1324550-1-paul@crapouillou.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04perf test: Fix unaligned access in sample parsing testNamhyung Kim
[ Upstream commit c5c97cadd7ed13381cb6b4bef5c841a66938d350 ] The ubsan reported the following error. It was because sample's raw data missed u32 padding at the end. So it broke the alignment of the array after it. The raw data contains an u32 size prefix so the data size should have an u32 padding after 8-byte aligned data. 27: Sample parsing :util/synthetic-events.c:1539:4: runtime error: store to misaligned address 0x62100006b9bc for type '__u64' (aka 'unsigned long long'), which requires 8 byte alignment 0x62100006b9bc: note: pointer points here 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ #0 0x561532a9fc96 in perf_event__synthesize_sample util/synthetic-events.c:1539:13 #1 0x5615327f4a4f in do_test tests/sample-parsing.c:284:8 #2 0x5615327f3f50 in test__sample_parsing tests/sample-parsing.c:381:9 #3 0x56153279d3a1 in run_test tests/builtin-test.c:424:9 #4 0x56153279c836 in test_and_print tests/builtin-test.c:454:9 #5 0x56153279b7eb in __cmd_test tests/builtin-test.c:675:4 #6 0x56153279abf0 in cmd_test tests/builtin-test.c:821:9 #7 0x56153264e796 in run_builtin perf.c:312:11 #8 0x56153264cf03 in handle_internal_command perf.c:364:8 #9 0x56153264e47d in run_argv perf.c:408:2 #10 0x56153264c9a9 in main perf.c:538:3 #11 0x7f137ab6fbbc in __libc_start_main (/lib64/libc.so.6+0x38bbc) #12 0x561532596828 in _start ... SUMMARY: UndefinedBehaviorSanitizer: misaligned-pointer-use util/synthetic-events.c:1539:4 in Fixes: 045f8cd8542d ("perf tests: Add a sample parsing test") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20210214091638.519643-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04perf intel-pt: Fix IPC with CYC thresholdAdrian Hunter
[ Upstream commit 6af4b60033e0ce0332fcdf256c965ad41942821a ] The code assumed every CYC-eligible packet has a CYC packet, which is not the case when CYC thresholds are used. Fix by checking if a CYC packet is actually present in that case. Fixes: 5b1dc0fd1da06 ("perf intel-pt: Add support for samples to contain IPC ratio") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210205175350.23817-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04perf intel-pt: Fix premature IPCAdrian Hunter
[ Upstream commit 20aa39708a5999b7921b27482a756766272286ac ] The code assumed a change in cycle count means accurate IPC. That is not correct, for example when sampling both branches and instructions, or at a FUP packet (which is not CYC-eligible) address. Fix by using an explicit flag to indicate when IPC can be sampled. Fixes: 5b1dc0fd1da06 ("perf intel-pt: Add support for samples to contain IPC ratio") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/20210205175350.23817-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04perf intel-pt: Fix missing CYC processing in PSBAdrian Hunter
[ Upstream commit 03fb0f859b45d1eb05c984ab4bd3bef67e45ede2 ] Add missing CYC packet processing when walking through PSB+. This improves the accuracy of timestamps that follow PSB+, until the next MTC. Fixes: 3d49807870f08 ("perf tools: Add new Intel PT packet definitions") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210205175350.23817-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04perf unwind: Set userdata for all __report_module() pathsDave Rigby
[ Upstream commit 4e1481445407b86a483616c4542ffdc810efb680 ] When locating the DWARF module for a given address, __find_debuginfo() requires a 'struct dso' passed via the userdata argument. However, this field is only set in __report_module() if the module is found in via dwfl_addrmodule(), not if it is found later via dwfl_report_elf(). Set userdata irrespective of how the DWARF module was found, as long as we found a module. Fixes: bf53fc6b5f41 ("perf unwind: Fix separate debug info files when using elfutils' libdw's unwinder") Signed-off-by: Dave Rigby <d.rigby@me.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211801 Acked-by: Jan Kratochvil <jan.kratochvil@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/linux-perf-users/20210218165654.36604-1-d.rigby@me.com/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04perf record: Fix continue profiling after draining the bufferYang Jihong
[ Upstream commit e16c2ce7c5ed5de881066c1fd10ba5c09af69559 ] Commit da231338ec9c0987 ("perf record: Use an eventfd to wakeup when done") uses eventfd() to solve a rare race where the setting and checking of 'done' which add done_fd to pollfd. When draining buffer, revents of done_fd is 0 and evlist__filter_pollfd function returns a non-zero value. As a result, perf record does not stop profiling. The following simple scenarios can trigger this condition: # sleep 10 & # perf record -p $! After the sleep process exits, perf record should stop profiling and exit. However, perf record keeps running. If pollfd revents contains only POLLERR or POLLHUP, perf record indicates that buffer is draining and need to stop profiling. Use fdarray_flag__nonfilterable() to set done eventfd to nonfilterable objects, so that evlist__filter_pollfd() does not filter and check done eventfd. Fixes: da231338ec9c0987 ("perf record: Use an eventfd to wakeup when done") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: zhangjinhao2@huawei.com Link: http://lore.kernel.org/lkml/20210205065001.23252-1-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04perf symbols: Fix return value when loading PE DSONicholas Fraser
[ Upstream commit 77771a97011fa9146ccfaf2983a3a2885dc57b6f ] The first time dso__load() was called on a PE file it always returned -1 error. This caused the first call to map__find_symbol() to always fail on a PE file so the first sample from each PE file always had symbol <unknown>. Subsequent samples succeed however because the DSO is already loaded. This fixes dso__load() to return 0 when successfully loading a DSO with libbfd. Fixes: eac9a4342e5447ca ("perf symbols: Try reading the symbol table with libbfd") Signed-off-by: Nicholas Fraser <nfraser@codeweavers.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Huw Davies <huw@codeweavers.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Remi Bernon <rbernon@codeweavers.com> Cc: Song Liu <songliubraving@fb.com> Cc: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Ulrich Czekalla <uczekalla@codeweavers.com> Link: http://lore.kernel.org/lkml/1671b43b-09c3-1911-dbf8-7f030242fbf7@codeweavers.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04perf symbols: Use (long) for iterator for bfd symbolsDmitry Safonov
[ Upstream commit 96de68fff5ded8833bf5832658cb43c54f86ff6c ] GCC (GCC) 8.4.0 20200304 fails to build perf with: : util/symbol.c: In function 'dso__load_bfd_symbols': : util/symbol.c:1626:16: error: comparison of integer expressions of different signednes : for (i = 0; i < symbols_count; ++i) { : ^ : util/symbol.c:1632:16: error: comparison of integer expressions of different signednes : while (i + 1 < symbols_count && : ^ : util/symbol.c:1637:13: error: comparison of integer expressions of different signednes : if (i + 1 < symbols_count && : ^ : cc1: all warnings being treated as errors It's unlikely that the symtable will be that big, but the fix is an oneliner and as perf has CORE_CFLAGS += -Wextra, which makes build to fail together with CORE_CFLAGS += -Werror Fixes: eac9a4342e54 ("perf symbols: Try reading the symbol table with libbfd") Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Jacek Caban <jacek@codeweavers.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Remi Bernon <rbernon@codeweavers.com> Link: http://lore.kernel.org/lkml/20210209145148.178702-1-dima@arista.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04perf vendor events arm64: Fix Ampere eMag event typoJohn Garry
[ Upstream commit 2bf797be81fa808f05f1a7a65916619132256a27 ] The "briefdescription" for event 0x35 has a typo - fix it. Fixes: d35c595bf005 ("perf vendor events arm64: Revise core JSON events for eMAG") Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxarm@openeuler.org Link: https://lore.kernel.org/r/1611835236-34696-2-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04perf tools: Fix DSO filtering when not finding a map for a sampled addressArnaldo Carvalho de Melo
[ Upstream commit c69bf11ad3d30b6bf01cfa538ddff1a59467c734 ] When we lookup an address and don't find a map we should filter that sample if the user specified a list of --dso entries to filter on, fix it. Before: $ perf script sleep 274800 2843.556162: 1 cycles:u: ffffffffbb26bff4 [unknown] ([unknown]) sleep 274800 2843.556168: 1 cycles:u: ffffffffbb2b047d [unknown] ([unknown]) sleep 274800 2843.556171: 1 cycles:u: ffffffffbb2706b2 [unknown] ([unknown]) sleep 274800 2843.556174: 6 cycles:u: ffffffffbb2b0267 [unknown] ([unknown]) sleep 274800 2843.556176: 59 cycles:u: ffffffffbb2b03b1 [unknown] ([unknown]) sleep 274800 2843.556180: 691 cycles:u: ffffffffbb26bff4 [unknown] ([unknown]) sleep 274800 2843.556189: 9160 cycles:u: 7fa9550eeaa3 __GI___tunables_init+0xf3 (/usr/lib64/ld-2.32.so) sleep 274800 2843.556312: 86937 cycles:u: 7fa9550e157b _dl_lookup_symbol_x+0x4b (/usr/lib64/ld-2.32.so) $ So we have some samples we somehow didn't find in a map for, if we now do: $ perf report --stdio --dso /usr/lib64/ld-2.32.so # dso: /usr/lib64/ld-2.32.so # # Total Lost Samples: 0 # # Samples: 8 of event 'cycles:u' # Event count (approx.): 96856 # # Overhead Command Symbol # ........ ....... ........................ # 89.76% sleep [.] _dl_lookup_symbol_x 9.46% sleep [.] __GI___tunables_init 0.71% sleep [k] 0xffffffffbb26bff4 0.06% sleep [k] 0xffffffffbb2b03b1 0.01% sleep [k] 0xffffffffbb2b0267 0.00% sleep [k] 0xffffffffbb2706b2 0.00% sleep [k] 0xffffffffbb2b047d $ After this patch we get the right output with just entries for the DSOs specified in --dso: $ perf report --stdio --dso /usr/lib64/ld-2.32.so # dso: /usr/lib64/ld-2.32.so # # Total Lost Samples: 0 # # Samples: 8 of event 'cycles:u' # Event count (approx.): 96856 # # Overhead Command Symbol # ........ ....... ........................ # 89.76% sleep [.] _dl_lookup_symbol_x 9.46% sleep [.] __GI___tunables_init $ # Fixes: 96415e4d3f5fdf9c ("perf symbols: Avoid unnecessary symbol loading when dso list is specified") Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210128131209.GD775562@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-21perf script: Fix overrun issue for dynamically-allocated PMU type numberJin Yao
When unpacking the event which is from dynamic PMU, the array output[OUTPUT_TYPE_MAX] may be overrun. For example, type number of SKL uncore_imc is 10, but OUTPUT_TYPE_MAX is 7 now (OUTPUT_TYPE_MAX = PERF_TYPE_MAX + 1). /* In builtin-script.c */ process_event() { unsigned int type = output_type(attr->type); if (output[type].fields == 0) return; } output[10] is overrun. Create a type OUTPUT_TYPE_OTHER for dynamic PMU events, then output_type(attr->type) will return OUTPUT_TYPE_OTHER here. Note that if PERF_TYPE_MAX ever changed, then there would be a conflict between old perf.data files that had a dynamicaliy allocated PMU number that would then be the same as a fixed PERF_TYPE. Example: # perf record --switch-events -C 0 -e "{cpu-clock,uncore_imc/data_reads/,uncore_imc/data_writes/}:SD" -a -- sleep 1 # perf script Before: swapper 0 [000] 1479253.987551: 277766 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.987797: 246709 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.988127: 329883 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.988273: 146393 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.988523: 249977 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.988877: 354090 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.989023: 145940 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.989383: 359856 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.989523: 140082 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) After: swapper 0 [000] 1397040.402011: 272384 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1397040.402011: 5396 uncore_imc/data_reads/: swapper 0 [000] 1397040.402011: 967 uncore_imc/data_writes/: swapper 0 [000] 1397040.402259: 249153 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1397040.402259: 7231 uncore_imc/data_reads/: swapper 0 [000] 1397040.402259: 1297 uncore_imc/data_writes/: swapper 0 [000] 1397040.402508: 249108 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1397040.402508: 5333 uncore_imc/data_reads/: swapper 0 [000] 1397040.402508: 1008 uncore_imc/data_writes/: Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20201209005828.21302-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-21perf metricgroup: Fix system PMU metricsJohn Garry
Joakim reports that getting "perf stat" for multiple system PMU metrics segfaults: $ perf stat -a -I 1000 -M imx8mm_ddr_write.all,imx8mm_ddr_write.all Segmentation fault $ While the same works without issue for a single metric. The logic in metricgroup__add_metric_sys_event_iter() is broken, in that add_metric() @m argument should be NULL for each new metric. Fix by not passing a holder for that, and rather make local in metricgroup__add_metric_sys_event_iter(). Fixes: be335ec28efa ("perf metricgroup: Support adding metrics for system PMUs") Reported-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linuxarm@openeuler.org Link: https://lore.kernel.org/r/1611050655-44020-1-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-21perf metricgroup: Fix for metrics containing duration_timeJohn Garry
Metrics containing duration_time cause a segfault: $ perf stat -v -M L1D_Cache_Fill_BW sleep 1 Using CPUID GenuineIntel-6-3D-4 metric expr 64 * l1d.replacement / 1000000000 / duration_time for L1D_Cache_Fill_BW found event duration_time found event l1d.replacement adding {l1d.replacement}:W,duration_time l1d.replacement -> cpu/umask=0x1,(null)=0x1e8483,event=0x51/ Segmentation fault $ In commit c2337d67199a1ea1 ("perf metricgroup: Fix metrics using aliases covering multiple PMUs"), the logic in find_evsel_group() when iter'ing events was changed to not only select events in same group, but also for aliased PMUs. Checking whether events were for aliased PMUs was done by comparing the event PMU name. This was not safe for duration_time event, which has no associated PMU (and no PMU name), so fix by checking if the event PMU name is set also. Committer testing: Reproduced the bug, then, on a: $ grep -m1 ^'model name' /proc/cpuinfo model name : Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz $ We now get: $ perf stat -M L1D_Cache_Fill_BW sleep 1 Performance counter stats for 'sleep 1': 4,141 l1d.replacement:u 1,001,285,107 ns duration_time:u 1.001285107 seconds time elapsed 0.000000000 seconds user 0.001119000 seconds sys $ Detais from -v: Using CPUID GenuineIntel-6-8E-A metric expr 64 * l1d.replacement / 1000000000 / duration_time for L1D_Cache_Fill_BW found event duration_time found event l1d.replacement adding {l1d.replacement}:W,duration_time l1d.replacement -> cpu/(null)=0x1e8483,umask=0x1,event=0x51/ Control descriptor is not initialized Warning: kernel.perf_event_paranoid=2, trying to fall back to excluding kernel and hypervisor samples Warning: kernel.perf_event_paranoid=2, trying to fall back to excluding kernel and hypervisor samples l1d.replacement:u: 4592 612201 612201 duration_time:u: 1001478621 1001478621 1001478621 Fixes: c2337d67199a1ea1 ("perf metricgroup: Fix metrics using aliases covering multiple PMUs") Reported-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linuxarm@openeuler.org Link: https://lore.kernel.org/r/1611159518-226883-1-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15perf inject: Correct event attribute sizesAl Grant
When 'perf inject' reads a perf.data file from an older version of perf, it writes event attributes into the output with the original size field, but lays them out as if they had the size currently used. Readers see a corrupt file. Update the size field to match the layout. Signed-off-by: Al Grant <al.grant@foss.arm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20201124195818.30603-1-al.grant@arm.com Signed-off-by: Denis Nikitin <denik@chromium.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15perf intel-pt: Fix 'CPU too large' errorAdrian Hunter
In some cases, the number of cpus (nr_cpus_online) is confused with the maximum cpu number (nr_cpus_avail), which results in the error in the example below: Example on system with 8 cpus: Before: # echo 0 > /sys/devices/system/cpu/cpu2/online # ./perf record --kcore -e intel_pt// taskset --cpu-list 7 uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.147 MB perf.data ] # ./perf script --itrace=e Requested CPU 7 too large. Consider raising MAX_NR_CPUS 0x25908 [0x8]: failed to process type: 68 [Invalid argument] After: # ./perf script --itrace=e # Fixes: 8c7274691f0d ("perf machine: Replace MAX_NR_CPUS with perf_env::nr_cpus_online") Fixes: 7df4e36a4785 ("perf session: Replace MAX_NR_CPUS with perf_env::nr_cpus_online") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20210107174159.24897-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15perf stat: Take cgroups into account for shadow statsNamhyung Kim
As of now it doesn't consider cgroups when collecting shadow stats and metrics so counter values from different cgroups will be saved in a same slot. This resulted in incorrect numbers when those cgroups have different workloads. For example, let's look at the scenario below: cgroups A and C runs same workload which burns a cpu while cgroup B runs a light workload. $ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C sleep 1 Performance counter stats for 'system wide': 3,958,116,522 cycles A 6,722,650,929 instructions A # 2.53 insn per cycle 1,132,741 cycles B 571,743 instructions B # 0.00 insn per cycle 4,007,799,935 cycles C 6,793,181,523 instructions C # 2.56 insn per cycle 1.001050869 seconds time elapsed When I run 'perf stat' with single workload, it usually shows IPC around 1.7. We can verify it (6,722,650,929.0 / 3,958,116,522 = 1.698) for cgroup A. But in this case, since cgroups are ignored, cycles are averaged so it used the lower value for IPC calculation and resulted in around 2.5. avg cycle: (3958116522 + 1132741 + 4007799935) / 3 = 2655683066 IPC (A) : 6722650929 / 2655683066 = 2.531 IPC (B) : 571743 / 2655683066 = 0.0002 IPC (C) : 6793181523 / 2655683066 = 2.557 We can simply compare cgroup pointers in the evsel and it'll be NULL when cgroups are not specified. With this patch, I can see correct numbers like below: $ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C sleep 1 Performance counter stats for 'system wide': 4,171,051,687 cycles A 7,219,793,922 instructions A # 1.73 insn per cycle 1,051,189 cycles B 583,102 instructions B # 0.55 insn per cycle 4,171,124,710 cycles C 7,192,944,580 instructions C # 1.72 insn per cycle 1.007909814 seconds time elapsed Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210115071139.257042-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15perf stat: Introduce struct runtime_stat_dataNamhyung Kim
To pass more info to the saved_value in the runtime_stat, add a new struct runtime_stat_data. Currently it only has 'ctx' field but later patch will add more. Note that we intentionally pass 0 as ctx to clock-related events for compatibility. It was already there in a few places. So move the code into the saved_value_lookup() explicitly and add a comment. Suggested-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210115071139.257042-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15perf test: Fix shadow stat test for non-bash shellsNamhyung Kim
It was using some bash-specific features and failed to parse when running with a different shell like below: root@kbl-ppc:~/kbl-ws/perf-dev/lck-9077/acme.tmp/tools/perf# ./perf test 83 -vv 83: perf stat metrics (shadow stat) test : --- start --- test child forked, pid 3922 ./tests/shell/stat+shadow_stat.sh: 19: ./tests/shell/stat+shadow_stat.sh: [[: not found ./tests/shell/stat+shadow_stat.sh: 24: ./tests/shell/stat+shadow_stat.sh: [[: not found ./tests/shell/stat+shadow_stat.sh: 30: ./tests/shell/stat+shadow_stat.sh: [[: not found (standard_in) 2: syntax error ./tests/shell/stat+shadow_stat.sh: 36: ./tests/shell/stat+shadow_stat.sh: [[: not found ./tests/shell/stat+shadow_stat.sh: 19: ./tests/shell/stat+shadow_stat.sh: [[: not found ./tests/shell/stat+shadow_stat.sh: 24: ./tests/shell/stat+shadow_stat.sh: [[: not found ./tests/shell/stat+shadow_stat.sh: 30: ./tests/shell/stat+shadow_stat.sh: [[: not found (standard_in) 2: syntax error ./tests/shell/stat+shadow_stat.sh: 36: ./tests/shell/stat+shadow_stat.sh: [[: not found ./tests/shell/stat+shadow_stat.sh: 45: ./tests/shell/stat+shadow_stat.sh: declare: not found test child finished with -1 ---- end ---- perf stat metrics (shadow stat) test: FAILED! Reported-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Laight <david.laight@aculab.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210114050609.1258820-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15perf bpf examples: Fix bpf.h header include directive in 5sec.c exampleArnaldo Carvalho de Melo
It was looking at bpf/bpf.h, which caused this problem: # perf trace -e tools/perf/examples/bpf/5sec.c /home/acme/git/perf/tools/perf/examples/bpf/5sec.c:42:10: fatal error: 'bpf/bpf.h' file not found #include <bpf/bpf.h> ^~~~~~~~~~~ 1 error generated. ERROR: unable to compile tools/perf/examples/bpf/5sec.c Hint: Check error message shown above. Hint: You can also pre-compile it into .o using: clang -target bpf -O2 -c tools/perf/examples/bpf/5sec.c with proper -I and -D options. event syntax error: 'tools/perf/examples/bpf/5sec.c' \___ Failed to load tools/perf/examples/bpf/5sec.c from source: Error when compiling BPF scriptlet # Change that to plain bpf.h, to make it work again: # perf trace -e tools/perf/examples/bpf/5sec.c sleep 5s 0.000 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1776891872, rqtp: 5000000000) # perf trace -e tools/perf/examples/bpf/5sec.c/max-stack=16/ sleep 5s 0.000 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1776891872, rqtp: 5000000000) hrtimer_nanosleep ([kernel.kallsyms]) common_nsleep ([kernel.kallsyms]) __x64_sys_clock_nanosleep ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) __clock_nanosleep_2 (/usr/lib64/libc-2.32.so) # perf trace -e tools/perf/examples/bpf/5sec.c sleep 4s # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24perf probe: Fix memory leak when synthesizing SDT probesArnaldo Carvalho de Melo
The argv_split() function must be paired with argv_free(), else we must keep a reference to the argv array received or do the freeing ourselves, in synthesize_sdt_probe_command() we were simply leaking that argv[] array. Fixes: 3b1f8311f6963cd1 ("perf probe: Add sdt probes arguments into the uprobe cmd string") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Truong <alexandre.truong@arm.com> Cc: Alexis Berlemont <alexis.berlemont@gmail.com> Cc: He Zhe <zhe.he@windriver.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201224135139.GF477817@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24perf stat aggregation: Add separate thread memberJames Clark
A separate field isn't strictly required. The core field could be re-used for thread IDs as a single field was used previously. But separating them will avoid confusion and catch potential errors where core IDs are read as thread IDs and vice versa. Also remove the placeholder id field which is now no longer used. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-13-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24perf stat aggregation: Add separate core memberJames Clark
Add core as a separate member so that it doesn't have to be packed into the int value. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-12-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24perf stat aggregation: Add separate die memberJames Clark
Add die as a separate member so that it doesn't have to be packed into the int value. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-11-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24perf stat aggregation: Add separate socket memberJames Clark
Add socket as a separate member so that it doesn't have to be packed into the int value. When the socket ID was larger than 8 bits the output appeared corrupted or incomplete. For example, here on ThunderX2 'perf stat' reports a socket of -1 and an invalid die number: ./perf stat -a --per-die The socket id number is too big. Performance counter stats for 'system wide': S-1-D255 128 687.99 msec cpu-clock # 57.240 CPUs utilized ... S36-D0 128 842.34 msec cpu-clock # 70.081 CPUs utilized ... And with --per-core there is an entry with an invalid core ID: ./perf stat record -a --per-core The socket id number is too big. Performance counter stats for 'system wide': S-1-D255-C65535 128 671.04 msec cpu-clock # 54.112 CPUs utilized ... S36-D0-C0 4 28.27 msec cpu-clock # 2.279 CPUs utilized ... This fixes the "Session topology" self test on ThunderX2. After this fix the output contains the correct socket and die IDs and no longer prints a warning about the size of the socket ID: ./perf stat --per-die -a Performance counter stats for 'system wide': S36-D0 128 169,869.39 msec cpu-clock # 127.501 CPUs utilized ... S3612-D0 128 169,733.05 msec cpu-clock # 127.398 CPUs utilized Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-10-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24perf stat aggregation: Add separate node memberJames Clark
Add node as a separate member so that it doesn't have to be packed into the int value. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-9-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24perf stat aggregation: Start using cpu_aggr_id in mapJames Clark
Use the new cpu_aggr_id struct in the cpu map instead of int so that it can store more data. No functional changes. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-8-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24perf cpumap: Drop in cpu_aggr_map structJames Clark
Replace usages of perf_cpu_map with cpu_aggr map in places that are involved with 'perf stat' aggregation. This will then later be changed to be a map of cpu_aggr_id rather than an int so that more data can be stored. No functional changes. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-7-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24perf cpumap: Add new map type for aggregationJames Clark
Currently this is a duplicate of perf_cpu_map so that it can be used as a drop in replacement. In a later commit it will be changed from a map of ints to use the new cpu_aggr_id struct. No functional changes. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-6-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24perf stat: Replace aggregation ID with a structJames Clark
Replace all occurences of the usage of int with the new struct cpu_aggr_id. No functional changes. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-5-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24perf cpumap: Add new struct for cpu aggregationJames Clark
This struct currently has only a single int member so that it can be used as a drop in replacement for the existing behaviour. Comparison and constructor functions have also been added that will replace usages of '==' and '= -1'. No functional changes. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-4-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24perf cpumap: Use existing allocator to avoid using mallocJames Clark
Use the existing allocator for perf_cpu_map to avoid use of raw malloc. This could cause an issue in later commits where the size of perf_cpu_map is changed. No functional changes. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24perf tests: Improve topology test to check all aggregation typesJames Clark
Improve the topology test to check all aggregation types. This is to lock down the behaviour before 'id' is changed into a struct in later commits. Committer testing: $ perf test topology 41: Session topology: Ok $ $ perf test -v topology 41: Session topology: --- start --- test child forked, pid 965552 templ file: /tmp/perf-test-mO7NtI Problems creating module maps, continuing anyway... CPU 0, core 0, socket 0 CPU 1, core 1, socket 0 CPU 2, core 2, socket 0 CPU 3, core 4, socket 0 CPU 4, core 5, socket 0 CPU 5, core 6, socket 0 CPU 6, core 8, socket 0 CPU 7, core 9, socket 0 CPU 8, core 10, socket 0 CPU 9, core 12, socket 0 CPU 10, core 13, socket 0 CPU 11, core 14, socket 0 CPU 12, core 0, socket 0 CPU 13, core 1, socket 0 CPU 14, core 2, socket 0 CPU 15, core 4, socket 0 CPU 16, core 5, socket 0 CPU 17, core 6, socket 0 CPU 18, core 8, socket 0 CPU 19, core 9, socket 0 CPU 20, core 10, socket 0 CPU 21, core 12, socket 0 CPU 22, core 13, socket 0 CPU 23, core 14, socket 0 test child finished with 0 ---- end ---- Session topology: Ok $ Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24perf tools: Update s390's syscall.tbl copy from the kernel sourcesTiezhu Yang
This silences the following tools/perf/ build warning: Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl' Just make them same: cp arch/s390/kernel/syscalls/syscall.tbl tools/perf/arch/s390/entry/syscalls/syscall.tbl Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xuefeng Li <lixuefeng@loongson.cn> Link: http://lore.kernel.org/lkml/1608278364-6733-5-git-send-email-yangtiezhu@loongson.cn [ There were updates after Tiezhu's post, so I just updated the copy ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24perf tools: Update powerpc's syscall.tbl copy from the kernel sourcesTiezhu Yang
This silences the following tools/perf/ build warning: Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl' Just make them same: cp arch/powerpc/kernel/syscalls/syscall.tbl tools/perf/arch/powerpc/entry/syscalls/syscall.tbl Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xuefeng Li <lixuefeng@loongson.cn> Link: http://lore.kernel.org/lkml/1608278364-6733-4-git-send-email-yangtiezhu@loongson.cn [ There were updates after Tiezhu's post, so I just updated the copy ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24perf s390: Move syscall.tbl check into check-headers.shTiezhu Yang
It is better to check syscall.tbl for s390 in check-headers.sh, it is similar with commit c9b51a017065 ("perf tools: Move syscall_64.tbl check into check-headers.sh"). Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xuefeng Li <lixuefeng@loongson.cn> Link: http://lore.kernel.org/lkml/1608278364-6733-3-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24perf powerpc: Move syscall.tbl check to check-headers.shTiezhu Yang
It is better to check syscall.tbl for powerpc in check-headers.sh, it is similar with commit c9b51a017065 ("perf tools: Move syscall_64.tbl check into check-headers.sh"). Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xuefeng Li <lixuefeng@loongson.cn> Link: http://lore.kernel.org/lkml/1608278364-6733-2-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24tools arch x86: Sync the msr-index.h copy with the kernel sourcesArnaldo Carvalho de Melo
To pick up the changes in: Fixes: 69372cf01290b958 ("x86/cpu: Add VM page flush MSR availablility as a CPUID feature") That cause these changes in tooling: $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after --- before 2020-12-21 09:09:05.593005003 -0300 +++ after 2020-12-21 09:12:48.436994802 -0300 @@ -21,7 +21,7 @@ [0x0000004f] = "PPIN", [0x00000060] = "LBR_CORE_TO", [0x00000079] = "IA32_UCODE_WRITE", - [0x0000008b] = "IA32_UCODE_REV", + [0x0000008b] = "AMD64_PATCH_LEVEL", [0x0000008C] = "IA32_SGXLEPUBKEYHASH0", [0x0000008D] = "IA32_SGXLEPUBKEYHASH1", [0x0000008E] = "IA32_SGXLEPUBKEYHASH2", @@ -286,6 +286,7 @@ [0xc0010114 - x86_AMD_V_KVM_MSRs_offset] = "VM_CR", [0xc0010115 - x86_AMD_V_KVM_MSRs_offset] = "VM_IGNNE", [0xc0010117 - x86_AMD_V_KVM_MSRs_offset] = "VM_HSAVE_PA", + [0xc001011e - x86_AMD_V_KVM_MSRs_offset] = "AMD64_VM_PAGE_FLUSH", [0xc001011f - x86_AMD_V_KVM_MSRs_offset] = "AMD64_VIRT_SPEC_CTRL", [0xc0010130 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_SEV_ES_GHCB", [0xc0010131 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_SEV", $ The new MSR has a pattern that wasn't matched to avoid a clash with IA32_UCODE_REV, change the regex to prefer the more relevant AMD_ prefixed ones to catch this new AMD64_VM_PAGE_FLUSH MSR. Which causes these parts of tools/perf/ to be rebuilt: CC /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o LD /tmp/build/perf/trace/beauty/tracepoints/perf-in.o LD /tmp/build/perf/trace/beauty/perf-in.o LD /tmp/build/perf/perf-in.o LINK /tmp/build/perf/perf This addresses this perf tools build warning: diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24tools headers UAPI: Update epoll_pwait2 affected filesArnaldo Carvalho de Melo
To pick the changes from: b0a0c2615f6f199a ("epoll: wire up syscall epoll_pwait2") That addresses these perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h' diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl' diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-22Merge tag 'kbuild-v5.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Use /usr/bin/env for shebang lines in scripts - Remove useless -Wnested-externs warning flag - Update documents - Refactor log handling in modpost - Stop building modules without MODULE_LICENSE() tag - Make the insane combination of 'static' and EXPORT_SYMBOL an error - Improve genksyms to handle _Static_assert() * tag 'kbuild-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: Documentation/kbuild: Document platform dependency practises Documentation/kbuild: Document COMPILE_TEST dependencies genksyms: Ignore module scoped _Static_assert() modpost: turn static exports into error modpost: turn section mismatches to error from fatal() modpost: change license incompatibility to error() from fatal() modpost: turn missing MODULE_LICENSE() into error modpost: refactor error handling and clarify error/fatal difference modpost: rename merror() to error() kbuild: don't hardcode depmod path kbuild: doc: document subdir-y syntax kbuild: doc: clarify the difference between extra-y and always-y kbuild: doc: split if_changed explanation to a separate section kbuild: doc: merge 'Special Rules' and 'Custom kbuild commands' sections kbuild: doc: fix 'List directories to visit when descending' section kbuild: doc: replace arch/$(ARCH)/ with arch/$(SRCARCH)/ kbuild: doc: update the description about kbuild Makefiles Makefile.extrawarn: remove -Wnested-externs warning tweewide: Fix most Shebang lines
2020-12-19perf mem: Factor out a function to generate sort orderKan Liang
Now, "--phys-data" is the only option which impacts the sort order. A simple "if else" is enough to handle the option. But there will be more options added, e.g. "--data-page-size", which also impact the sort order. The code will become too complex to be maintained. Divide the sort order string into several small pieces. The first piece is always the default sort string for LOAD/STORE. Appends the specific sort string if related option is applied. No functional change. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Link: http://lore.kernel.org/lkml/20201216185805.9981-4-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-19perf sort: Add sort option for data page sizeKan Liang
Add a new sort option "data_page_size" for --mem-mode sort. With this option applied, perf can sort and report by sample's data page size. Here is an example: perf report --stdio --mem-mode --sort=comm,symbol,phys_daddr,data_page_size # To display the perf.data header info, please use # --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 9K of event 'mem-loads:uP' # Total weight : 9028 # Sort order : comm,symbol,phys_daddr,data_page_size # # Overhead Command Symbol Data Physical # Address # Data Page Size # ........ ....... ............................ # ...................... ...................... # 11.19% dtlb [.] touch_buffer [.] 0x00000003fec82ea8 4K 8.61% dtlb [.] GetTickCount [.] 0x00000003c4f2c8a8 4K 4.52% dtlb [.] GetTickCount [.] 0x00000003fec82f58 4K 4.33% dtlb [.] __gettimeofday [.] 0x00000003fec82f48 4K 4.32% dtlb [.] GetTickCount [.] 0x00000003fec82f78 4K 4.28% dtlb [.] GetTickCount [.] 0x00000003fec82f50 4K 4.23% dtlb [.] GetTickCount [.] 0x00000003fec82f70 4K 4.11% dtlb [.] GetTickCount [.] 0x00000003fec82f68 4K 4.00% dtlb [.] Calibrate [.] 0x00000003fec82f98 4K 3.91% dtlb [.] Calibrate [.] 0x00000003fec82f90 4K 3.43% dtlb [.] touch_buffer [.] 0x00000003fec82e98 4K 3.42% dtlb [.] touch_buffer [.] 0x00000003fec82e90 4K 0.09% dtlb [.] DoDependentLoads [.] 0x000000036ea084c0 2M 0.08% dtlb [.] DoDependentLoads [.] 0x000000032b010b80 2M Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Link: http://lore.kernel.org/lkml/20201216185805.9981-3-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-19perf script: Support data page sizeKan Liang
Display the data page size if it is available and asked by the user: Can be configured by the user, for example: perf script --fields comm,event,phys_addr,data_page_size dtlb mem-loads:uP: 3fec82ea8 4K dtlb mem-loads:uP: 3fec82e90 4K dtlb mem-loads:uP: 3e23700a4 4K dtlb mem-loads:uP: 3fec82f20 4K dtlb mem-loads:uP: 3e23700a4 4K dtlb mem-loads:uP: 3b4211bec 4K dtlb mem-loads:uP: 382205dc0 2M dtlb mem-loads:uP: 36fa082c0 2M dtlb mem-loads:uP: 377607340 2M dtlb mem-loads:uP: 330010180 2M dtlb mem-loads:uP: 33200fd80 2M dtlb mem-loads:uP: 31b012b80 2M Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Link: http://lore.kernel.org/lkml/20201216185805.9981-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>