summaryrefslogtreecommitdiff
path: root/Documentation/kmemcheck.txt
blob: 843a63c4180fa5416d52af9bda115db1afb6708e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
Contents
========

  1. How to use
  2. Technical description
  3. Changes to the slab allocators
  4. Problems
  5. Parameters
  6. Future enhancements


How to use (IMPORTANT)
======================

Always remember this: kmemcheck _will_ give false positives. So don't enable
it and spam the mailing list with its reports; you are not going to be heard,
and it will make people's skins thicker for when the real errors are found.

Instead, I encourage maintainers and developers to find errors in _their_
_own_ code. And if you find false positives, you can try to work around them,
try to figure out if it's a real bug or not, or simply ignore them. Most
developers know their own code and will quickly and efficiently determine the
root cause of a kmemcheck report. This is therefore also the most efficient
way to work with kmemcheck.

If you still want to run kmemcheck to inspect others' code, the rule of thumb
should be: If it's not obvious (to you), don't tell us about it either. Most
likely the code is correct and you'll only waste our time. If you can work
out the error, please do send the maintainer a heads up and/or a patch, but
don't expect him/her to fix something that wasn't wrong in the first place.


Technical description
=====================

kmemcheck works by marking memory pages non-present. This means that whenever
somebody attempts to access the page, a page fault is generated. The page
fault handler notices that the page was in fact only hidden, and so it calls
on the kmemcheck code to make further investigations.

When the investigations are completed, kmemcheck "shows" the page by marking
it present (as it would be under normal circumstances). This way, the
interrupted code can continue as usual.

But after the instruction has been executed, we should hide the page again, so
that we can catch the next access too! Now kmemcheck makes use of a debugging
feature of the processor, namely single-stepping. When the processor has
finished the one instruction that generated the memory access, a debug
exception is raised. From here, we simply hide the page again and continue
execution, this time with the single-stepping feature turned off.


Changes to the slab allocators
==============================

kmemcheck requires some assistance from the memory allocator in order to work.
The memory allocator needs to

1. Tell kmemcheck about newly allocated pages and pages that are about to
   be freed. This allows kmemcheck to set up and tear down the shadow memory
   for the pages in question. The shadow memory stores the status of each byte
   in the allocation proper, e.g. whether it is initialized or uninitialized.
2. Tell kmemcheck which parts of memory should be marked uninitialized. There
   are actually a few more states, such as "not yet allocated" and "recently
   freed".

If a slab cache is set up using the SLAB_NOTRACK flag, it will never return
memory that can take page faults because of kmemcheck.

If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still
request memory with the __GFP_NOTRACK flag. This does not prevent the page
faults from occurring, however, but marks the object in question as being
initialized so that no warnings will ever be produced for this object.

Currently, the SLAB and SLUB allocators are supported by kmemcheck.


Problems
========

The most prominent problem seems to be that of bit-fields. kmemcheck can only
track memory with byte granularity. Therefore, when gcc generates code to
access only one bit in a bit-field, there is really no way for kmemcheck to
know which of the other bits will be used or thrown away. Consequently, there
may be bogus warnings for bit-field accesses. There is some experimental
support to detect this automatically, though it is probably better to work
around this by explicitly initializing whole bit-fields at once.

Some allocations are used for DMA. As DMA doesn't go through the paging
mechanism, we have absolutely no way to detect DMA writes. This means that
spurious warnings may be seen on access to DMA memory. DMA allocations should
be annotated with the __GFP_NOTRACK flag or allocated from caches marked
SLAB_NOTRACK to work around this problem.


Parameters
==========

In addition to enabling CONFIG_KMEMCHECK before the kernel is compiled, the
parameter kmemcheck=1 must be passed to the kernel when it is started in order
to actually do the tracking. So by default, there is only a very small
(probably negligible) overhead for enabling the config option.

Similarly, kmemcheck may be turned on or off at run-time using, respectively:

echo 1 > /proc/sys/kernel/kmemcheck
	and
echo 0 > /proc/sys/kernel/kmemcheck

Note that this is a lazy setting; once turned off, the old allocations will
still have to take a single page fault exception before tracking is turned off
for that particular page. Enabling kmemcheck on will only enable tracking for
allocations made from that point onwards.

The default mode is the one-shot mode, where only the first error is reported
before kmemcheck is disabled. This mode can be enabled by passing kmemcheck=2
to the kernel at boot, or running

echo 2 > /proc/sys/kernel/kmemcheck

when the kernel is already running.


Future enhancements
===================

There is already some preliminary support for catching use-after-free errors.
What still needs to be done is delaying kfree() so that memory is not
reallocated immediately after freeing it. [Suggested by Pekka Enberg.]

It should be possible to allow SMP systems by duplicating the page tables for
each processor in the system. This is probably extremely difficult, however.
[Suggested by Ingo Molnar.]

Support for instruction set extensions like XMM, SSE2, etc.