summaryrefslogtreecommitdiff
path: root/fs/notify/inotify
AgeCommit message (Collapse)Author
2009-11-25Merge remote branch 'sysctl/master'Stephen Rothwell
Conflicts: net/ipv6/addrconf.c
2009-11-18sysctl: Drop & in front of every proc_handler.Eric W. Biederman
For consistency drop & in front of every proc_handler. Explicity taking the address is unnecessary and it prevents optimizations like stubbing the proc_handlers to NULL. Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Joe Perches <joe@perches.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2009-11-12fsnotify: take inode->i_lock inside fsnotify_find_mark_entry()Andreas Gruenbacher
All callers to fsnotify_find_mark_entry() except one take and release inode->i_lock around the call. Take the lock inside fsnotify_find_mark_entry() instead. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2009-11-12inotify: rename mark_entry to just markEric Paris
rename anything in inotify that deals with mark_entry to just be mark. It makes a lot more sense. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-11-12fsnotify: rename fsnotify_find_mark_entry to fsnotify_find_markEric Paris
the _entry portion of fsnotify functions is useless. Drop it. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-11-12fsnotify: rename fsnotify_mark_entry to just fsnotify_markEric Paris
The name is long and it serves no real purpose. So rename fsnotify_mark_entry to just fsnotify_mark. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-11-12sysctl fs: Remove dead binary sysctl supportEric W. Biederman
Now that sys_sysctl is a generic wrapper around /proc/sys .ctl_name and .strategy members of sysctl tables are dead code. Remove them. Cc: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2009-11-06fsnotify: put inode specific fields in an fsnotify_mark in a unionEric Paris
The addition of marks on vfs mounts will be simplified if the inode specific parts of a mark and the vfsmnt specific parts of a mark are actually in a union so naming can be easy. This patch just implements the inode struct and the union. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-11-06fsnotify: include vfsmount in should_send_event when appropriateEric Paris
To ensure that a group will not duplicate events when it receives it based on the vfsmount and the inode should_send_event test we should distinguish those two cases. We pass a vfsmount to this function so groups can make their own determinations. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-11-06fsnotify: drop mask argument from fsnotify_alloc_groupEric Paris
Nothing uses the mask argument to fsnotify_alloc_group. This patch drops that argument. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-11-06fsnotify: fsnotify_obtain_group should be fsnotify_alloc_groupEric Paris
fsnotify_obtain_group was intended to be able to find an already existing group. Nothing uses that functionality. This just renames it to fsnotify_alloc_group so it is clear what it is doing. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-11-06fsnotify: remove group_num altogetherEric Paris
The original fsnotify interface has a group-num which was intended to be able to find a group after it was added. I no longer think this is a necessary thing to do and so we remove the group_num. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-11-06fsnotify: per group notification queue merge typesEric Paris
inotify only wishes to merge a new event with the last event on the notification fifo. fanotify is willing to merge any events including by means of bitwise OR masks of multiple events together. This patch moves the inotify event merging logic out of the generic fsnotify notification.c and into the inotify code. This allows each use of fsnotify to provide their own merge functionality. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-11-06fsnotify: include data in should_send callsEric Paris
fanotify is going to need to look at file->private_data to know if an event should be sent or not. This passes the data (which might be a file, dentry, inode, or none) to the should_send function calls so fanotify can get that information when available Signed-off-by: Eric Paris <eparis@redhat.com>
2009-11-06fsnotify: provide the data type to should_send_eventEric Paris
fanotify is only interested in event types which contain enough information to open the original file in the context of the fanotify listener. Since fanotify may not want to send events if that data isn't present we pass the data type to the should_send_event function call so fanotify can express its lack of interest. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-11-06inotify: use container_of instead of castingEric Paris
inotify_free_mark casts directly from an fsnotify_mark_entry to an inotify_inode_mark_entry. This works, but should use container_of instead for future proofing. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-11-06fsnotify: allow addition of duplicate fsnotify marksEric Paris
This patch allows a task to add a second fsnotify mark to an inode for the same group. This mark will be added to the end of the inode's list and this will never be found by the stand fsnotify_find_mark() function. This is useful if a user wants to add a new mark before removing the old one. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-11-06inotify: simplify the inotify idr handlingEric Paris
This patch moves all of the idr editing operations into their own idr functions. It makes it easier to prove locking correctness and to to understand the code flow. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-28inotify: update the group mask on mark additionEric Paris
Seperating the addition and update of marks in inotify resulted in a regression in that inotify never gets events. The inotify group mask is always 0. This mask should be updated any time a new mark is added. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-28inotify: fix length reporting and size checkingEric Paris
0db501bd0610ee0c0 introduced a regresion in that it now sends a nul terminator but the length accounting when checking for space or reporting to userspace did not take this into account. This corrects all of the rounding logic. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-28inotify: do not send a block of zeros when no pathname is availableBrian Rogers
When an event has no pathname, there's no need to pad it with a null byte and therefore generate an inotify_event sized block of zeros. This fixes a regression introduced by commit 0db501bd0610ee0c0aca84d927f90bcccd09e2bd where my system wouldn't finish booting because some process was being confused by this. Signed-off-by: Brian Rogers <brian@xyzw.org> Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-27inotify: Ensure we alwasy write the terminating NULL.Eric W. Biederman
Before the rewrite copy_event_to_user always wrote a terqminating '\0' byte to user space after the filename. Since the rewrite that terminating byte was skipped if your filename is exactly a multiple of event_size. Ouch! So add one byte to name_size before we round up and use clear_user to set userspace to zero like /dev/zero does instead of copying the strange nul_inotify_event. I can't quite convince myself len_to_zero will never exceed 16 and even if it doesn't clear_user should be more efficient and a more accurate reflection of what the code is trying to do. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-27inotify: fix locking around inotify watching in the idrEric Paris
The are races around the idr storage of inotify watches. It's possible that a watch could be found from sys_inotify_rm_watch() in the idr, but it could be removed from the idr before that code does it's removal. Move the locking and the refcnt'ing so that these have to happen atomically. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-27inotify: do not BUG on idr entries at inotify destructionEric Paris
If an inotify watch is left in the idr when an fsnotify group is destroyed this will lead to a BUG. This is not a dangerous situation and really indicates a programming bug and leak of memory. This patch changes it to use a WARN and a printk rather than killing people's boxes. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-27inotify: seperate new watch creation updating existing watchesEric Paris
There is nothing known wrong with the inotify watch addition/modification but this patch seperates the two code paths to make them each easy to verify as correct. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-17inotify: start watch descriptor count at 1Eric Paris
The inotify_add_watch man page specifies that inotify_add_watch() will return a non-negative integer. However, historically the inotify watches started at 1, not at 0. Turns out that the inotifywait program provided by the inotify-tools package doesn't properly handle a 0 watch descriptor. In 7e790dd5 we changed from starting at 1 to starting at 0. This patch starts at 1, just like in previous kernels, but also just like in previous kernels it's possible for it to wrap back to 0. This preserves the kernel functionality exactly like it was before the patch (neither method broke the spec) Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-17notify: unused event private raceEric Paris
inotify decides if private data it passed to get added to an event was used by checking list_empty(). But it's possible that the event may have been dequeued and the private event removed so it would look empty. The fix is to use the return code from fsnotify_add_notify_event rather than looking at the list. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-21inotify: use GFP_NOFS under potential memory pressureEric Paris
inotify can have a watchs removed under filesystem reclaim. ================================= [ INFO: inconsistent lock state ] 2.6.31-rc2 #16 --------------------------------- inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. khubd/217 [HC0[0]:SC0[0]:HE1:SE1] takes: (iprune_mutex){+.+.?.}, at: [<c10ba899>] invalidate_inodes+0x20/0xe3 {IN-RECLAIM_FS-W} state was registered at: [<c10536ab>] __lock_acquire+0x2c9/0xac4 [<c1053f45>] lock_acquire+0x9f/0xc2 [<c1308872>] __mutex_lock_common+0x2d/0x323 [<c1308c00>] mutex_lock_nested+0x2e/0x36 [<c10ba6ff>] shrink_icache_memory+0x38/0x1b2 [<c108bfb6>] shrink_slab+0xe2/0x13c [<c108c3e1>] kswapd+0x3d1/0x55d [<c10449b5>] kthread+0x66/0x6b [<c1003fdf>] kernel_thread_helper+0x7/0x10 [<ffffffff>] 0xffffffff Two things are needed to fix this. First we need a method to tell fsnotify_create_event() to use GFP_NOFS and second we need to stop using one global IN_IGNORED event and allocate them one at a time. This solves current issues with multiple IN_IGNORED on a queue having tail drop problems and simplifies the allocations since we don't have to worry about two tasks opperating on the IGNORED event concurrently. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-07-21fsnotify: use def_bool in kconfig instead of letting the user chooseEric Paris
fsnotify doens't give the user anything. If someone chooses inotify or dnotify it should build fsnotify, if they don't select one it shouldn't be built. This patch changes fsnotify to be a def_bool=n and makes everything else select it. Also fixes the issue people complained about on lwn where gdm hung because they didn't have inotify and they didn't get the inotify build option..... Signed-off-by: Eric Paris <eparis@redhat.com>
2009-07-21inotify: fix error paths in inotify_update_watchEric Paris
inotify_update_watch could leave things in a horrid state on a number of error paths. We could try to remove idr entries that didn't exist, we could send an IN_IGNORED to userspace for watches that don't exist, and a bit of other stupidity. Clean these up by doing the idr addition before we put the mark on the inode since we can clean that up on error and getting off the inode's mark list is hard. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-07-21inotify: do not leak inode marks in inotify_add_watchEric Paris
inotify_add_watch had a couple of problems. The biggest being that if inotify_add_watch was called on the same inode twice (to update or change the event mask) a refence was taken on the original inode mark by fsnotify_find_mark_entry but was not being dropped at the end of the inotify_add_watch call. Thus if inotify_rm_watch was called although the mark was removed from the inode, the refcnt wouldn't hit zero and we would leak memory. Reported-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2009-07-21inotify: drop user watch count when a watch is removedEric Paris
The inotify rewrite forgot to drop the inotify watch use cound when a watch was removed. This means that a single inotify fd can only ever register a maximum of /proc/sys/fs/max_user_watches even if some of those had been freed. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-07-02fs/notify/inotify: decrement user inotify count on closeKeith Packard
The per-user inotify_devs value is incremented each time a new file is allocated, but never decremented. This led to inotify_init failing after a limited number of calls. Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2009-06-19inotify: inotify_destroy_mark_entry could get called twiceEric Paris
inotify_destroy_mark_entry could get called twice for the same mark since it is called directly in inotify_rm_watch and when the mark is being destroyed for another reason. As an example assume that the file being watched was just deleted so inotify_destroy_mark_entry would get called from the path fsnotify_inoderemove() -> fsnotify_destroy_marks_by_inode() -> fsnotify_destroy_mark_entry() -> inotify_destroy_mark_entry(). If this happened at the same time as userspace tried to remove a watch via inotify_rm_watch we could attempt to remove the mark from the idr twice and could thus double dec the ref cnt and potentially could be in a use after free/double free situation. The fix is to have inotify_rm_watch use the generic recursive safe fsnotify_destroy_mark_by_entry() so we are sure the inotify_destroy_mark_entry() function can only be called one. This patch also renames the function to inotify_ingored_remove_idr() so it is clear what is actually going on in the function. Hopefully this fixes: [ 20.342058] idr_remove called for id=20 which is not allocated. [ 20.348000] Pid: 1860, comm: udevd Not tainted 2.6.30-tip #1077 [ 20.353933] Call Trace: [ 20.356410] [<ffffffff811a82b7>] idr_remove+0x115/0x18f [ 20.361737] [<ffffffff8134259d>] ? _spin_lock+0x6d/0x75 [ 20.367061] [<ffffffff8111640a>] ? inotify_destroy_mark_entry+0xa3/0xcf [ 20.373771] [<ffffffff8111641e>] inotify_destroy_mark_entry+0xb7/0xcf [ 20.380306] [<ffffffff81115913>] inotify_freeing_mark+0xe/0x10 [ 20.386238] [<ffffffff8111410d>] fsnotify_destroy_mark_by_entry+0x143/0x170 [ 20.393293] [<ffffffff811163a3>] inotify_destroy_mark_entry+0x3c/0xcf [ 20.399829] [<ffffffff811164d1>] sys_inotify_rm_watch+0x9b/0xc6 [ 20.405850] [<ffffffff8100bcdb>] system_call_fastpath+0x16/0x1b Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Eric Paris <eparis@redhat.com> Tested-by: Peter Ziljlstra <peterz@infradead.org>
2009-06-11inotify/dnotify: should_send_event shouldn't match on FS_EVENT_ON_CHILDEric Paris
inotify and dnotify will both indicate that they want any event which came from a child inode. The fix is to mask off FS_EVENT_ON_CHILD when deciding if inotify or dnotify is interested in a given event. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-06-11inotify: reimplement inotify using fsnotifyEric Paris
Reimplement inotify_user using fsnotify. This should be feature for feature exactly the same as the original inotify_user. This does not make any changes to the in kernel inotify feature used by audit. Those patches (and the eventual removal of in kernel inotify) will come after the new inotify_user proves to be working correctly. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
2009-06-11fsnotify: unified filesystem notification backendEric Paris
fsnotify is a backend for filesystem notification. fsnotify does not provide any userspace interface but does provide the basis needed for other notification schemes such as dnotify. fsnotify can be extended to be the backend for inotify or the upcoming fanotify. fsnotify provides a mechanism for "groups" to register for some set of filesystem events and to then deliver those events to those groups for processing. fsnotify has a number of benefits, the first being actually shrinking the size of an inode. Before fsnotify to support both dnotify and inotify an inode had unsigned long i_dnotify_mask; /* Directory notify events */ struct dnotify_struct *i_dnotify; /* for directory notifications */ struct list_head inotify_watches; /* watches on this inode */ struct mutex inotify_mutex; /* protects the watches list But with fsnotify this same functionallity (and more) is done with just __u32 i_fsnotify_mask; /* all events for this inode */ struct hlist_head i_fsnotify_mark_entries; /* marks on this inode */ That's right, inotify, dnotify, and fanotify all in 64 bits. We used that much space just in inotify_watches alone, before this patch set. fsnotify object lifetime and locking is MUCH better than what we have today. inotify locking is incredibly complex. See 8f7b0ba1c8539 as an example of what's been busted since inception. inotify needs to know internal semantics of superblock destruction and unmounting to function. The inode pinning and vfs contortions are horrible. no fsnotify implementers do allocation under locks. This means things like f04b30de3 which (due to an overabundance of caution) changes GFP_KERNEL to GFP_NOFS can be reverted. There are no longer any allocation rules when using or implementing your own fsnotify listener. fsnotify paves the way for fanotify. In brief fanotify is a notification mechanism that delivers the lisener both an 'event' and an open file descriptor to the object in question. This means that fanotify is pathname agnostic. Some on lkml may not care for the original companies or users that pushed for TALPA, but fanotify was designed with flexibility and input for other users in mind. The readahead group expressed interest in fanotify as it could be used to profile disk access on boot without breaking the audit system. The desktop search groups have also expressed interest in fanotify as it solves a number of the race conditions and problems present with managing inotify when more than a limited number of specific files are of interest. fanotify can provide for a userspace access control system which makes it a clean interface for AV vendors to hook without trying to do binary patching on the syscall table, LSM, and everywhere else they do their things today. With this patch series fanotify can be implemented in less than 1200 lines of easy to review code. Almost all of which is the socket based user interface. This patch series builds fsnotify to the point that it can implement dnotify and inotify_user. Patches exist and will be sent soon after acceptance to finish the in kernel inotify conversion (audit) and implement fanotify. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
2009-05-06inotify: use GFP_NOFS in kernel_event() to work around a lockdep false-positiveWu Fengguang
There is what we believe to be a false positive reported by lockdep. inotify_inode_queue_event() => take inotify_mutex => kernel_event() => kmalloc() => SLOB => alloc_pages_node() => page reclaim => slab reclaim => dcache reclaim => inotify_inode_is_dead => take inotify_mutex => deadlock The plan is to fix this via lockdep annotation, but that is proving to be quite involved. The patch flips the allocation over to GFP_NFS to shut the warning up, for the 2.6.30 release. Hopefully we will fix this for real in 2.6.31. I'll queue a patch in -mm to switch it back to GFP_KERNEL so we don't forget. ================================= [ INFO: inconsistent lock state ] 2.6.30-rc2-next-20090417 #203 --------------------------------- inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. kswapd0/380 [HC0[0]:SC0[0]:HE1:SE1] takes: (&inode->inotify_mutex){+.+.?.}, at: [<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0 {RECLAIM_FS-ON-W} state was registered at: [<ffffffff81079188>] mark_held_locks+0x68/0x90 [<ffffffff810792a5>] lockdep_trace_alloc+0xf5/0x100 [<ffffffff810f5261>] __kmalloc_node+0x31/0x1e0 [<ffffffff81130652>] kernel_event+0xe2/0x190 [<ffffffff81130826>] inotify_dev_queue_event+0x126/0x230 [<ffffffff8112f096>] inotify_inode_queue_event+0xc6/0x110 [<ffffffff8110444d>] vfs_create+0xcd/0x140 [<ffffffff8110825d>] do_filp_open+0x88d/0xa20 [<ffffffff810f6b68>] do_sys_open+0x98/0x140 [<ffffffff810f6c50>] sys_open+0x20/0x30 [<ffffffff8100c272>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff irq event stamp: 690455 hardirqs last enabled at (690455): [<ffffffff81564fe4>] _spin_unlock_irqrestore+0x44/0x80 hardirqs last disabled at (690454): [<ffffffff81565372>] _spin_lock_irqsave+0x32/0xa0 softirqs last enabled at (690178): [<ffffffff81052282>] __do_softirq+0x202/0x220 softirqs last disabled at (690157): [<ffffffff8100d50c>] call_softirq+0x1c/0x50 other info that might help us debug this: 2 locks held by kswapd0/380: #0: (shrinker_rwsem){++++..}, at: [<ffffffff810d0bd7>] shrink_slab+0x37/0x180 #1: (&type->s_umount_key#17){++++..}, at: [<ffffffff8110cfbf>] shrink_dcache_memory+0x11f/0x1e0 stack backtrace: Pid: 380, comm: kswapd0 Not tainted 2.6.30-rc2-next-20090417 #203 Call Trace: [<ffffffff810789ef>] print_usage_bug+0x19f/0x200 [<ffffffff81018bff>] ? save_stack_trace+0x2f/0x50 [<ffffffff81078f0b>] mark_lock+0x4bb/0x6d0 [<ffffffff810799e0>] ? check_usage_forwards+0x0/0xc0 [<ffffffff8107b142>] __lock_acquire+0xc62/0x1ae0 [<ffffffff810f478c>] ? slob_free+0x10c/0x370 [<ffffffff8107c0a1>] lock_acquire+0xe1/0x120 [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0 [<ffffffff81562d43>] mutex_lock_nested+0x63/0x420 [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0 [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0 [<ffffffff81012fe9>] ? sched_clock+0x9/0x10 [<ffffffff81077165>] ? lock_release_holdtime+0x35/0x1c0 [<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0 [<ffffffff8110c9dc>] dentry_iput+0xbc/0xe0 [<ffffffff8110cb23>] d_kill+0x33/0x60 [<ffffffff8110ce23>] __shrink_dcache_sb+0x2d3/0x350 [<ffffffff8110cffa>] shrink_dcache_memory+0x15a/0x1e0 [<ffffffff810d0cc5>] shrink_slab+0x125/0x180 [<ffffffff810d1540>] kswapd+0x560/0x7a0 [<ffffffff810ce160>] ? isolate_pages_global+0x0/0x2c0 [<ffffffff81065a30>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8107953d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff810d0fe0>] ? kswapd+0x0/0x7a0 [<ffffffff8106555b>] kthread+0x5b/0xa0 [<ffffffff8100d40a>] child_rip+0xa/0x20 [<ffffffff8100cdd0>] ? restore_args+0x0/0x30 [<ffffffff81065500>] ? kthread+0x0/0xa0 [<ffffffff8100d400>] ? child_rip+0x0/0x20 [eparis@redhat.com: fix audit too] Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matt Mackall <mpm@selenic.com> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Eric Paris <eparis@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-27fs: avoid I_NEW inodesNick Piggin
To be on the safe side, it should be less fragile to exclude I_NEW inodes from inode list scans by default (unless there is an important reason to have them). Normally they will get excluded (eg. by zero refcount or writecount etc), however it is a bit fragile for list walkers to know exactly what parts of the inode state is set up and valid to test when in I_NEW. So along these lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc checks with them too -- this shouldn't be a problem should it?) Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-02-18inotify: fix GFP_KERNEL related deadlockIngo Molnar
Enhanced lockdep coverage of __GFP_NOFS turned up this new lockdep assert: [ 1093.677775] [ 1093.677781] ================================= [ 1093.680031] [ INFO: inconsistent lock state ] [ 1093.680031] 2.6.29-rc5-tip-01504-gb49eca1-dirty #1 [ 1093.680031] --------------------------------- [ 1093.680031] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. [ 1093.680031] kswapd0/308 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 1093.680031] (&inode->inotify_mutex){+.+.?.}, at: [<c0205942>] inotify_inode_is_dead+0x20/0x80 [ 1093.680031] {RECLAIM_FS-ON-W} state was registered at: [ 1093.680031] [<c01696b9>] mark_held_locks+0x43/0x5b [ 1093.680031] [<c016baa4>] lockdep_trace_alloc+0x6c/0x6e [ 1093.680031] [<c01cf8b0>] kmem_cache_alloc+0x20/0x150 [ 1093.680031] [<c040d0ec>] idr_pre_get+0x27/0x6c [ 1093.680031] [<c02056e3>] inotify_handle_get_wd+0x25/0xad [ 1093.680031] [<c0205f43>] inotify_add_watch+0x7a/0x129 [ 1093.680031] [<c020679e>] sys_inotify_add_watch+0x20f/0x250 [ 1093.680031] [<c010389e>] sysenter_do_call+0x12/0x35 [ 1093.680031] [<ffffffff>] 0xffffffff [ 1093.680031] irq event stamp: 60417 [ 1093.680031] hardirqs last enabled at (60417): [<c018d5f5>] call_rcu+0x53/0x59 [ 1093.680031] hardirqs last disabled at (60416): [<c018d5b9>] call_rcu+0x17/0x59 [ 1093.680031] softirqs last enabled at (59656): [<c0146229>] __do_softirq+0x157/0x16b [ 1093.680031] softirqs last disabled at (59651): [<c0106293>] do_softirq+0x74/0x15d [ 1093.680031] [ 1093.680031] other info that might help us debug this: [ 1093.680031] 2 locks held by kswapd0/308: [ 1093.680031] #0: (shrinker_rwsem){++++..}, at: [<c01b0502>] shrink_slab+0x36/0x189 [ 1093.680031] #1: (&type->s_umount_key#4){+++++.}, at: [<c01e6d77>] shrink_dcache_memory+0x110/0x1fb [ 1093.680031] [ 1093.680031] stack backtrace: [ 1093.680031] Pid: 308, comm: kswapd0 Not tainted 2.6.29-rc5-tip-01504-gb49eca1-dirty #1 [ 1093.680031] Call Trace: [ 1093.680031] [<c016947a>] valid_state+0x12a/0x13d [ 1093.680031] [<c016954e>] mark_lock+0xc1/0x1e9 [ 1093.680031] [<c016a5b4>] ? check_usage_forwards+0x0/0x3f [ 1093.680031] [<c016ab74>] __lock_acquire+0x2c6/0xac8 [ 1093.680031] [<c01688d9>] ? register_lock_class+0x17/0x228 [ 1093.680031] [<c016b3d3>] lock_acquire+0x5d/0x7a [ 1093.680031] [<c0205942>] ? inotify_inode_is_dead+0x20/0x80 [ 1093.680031] [<c08824c4>] __mutex_lock_common+0x3a/0x4cb [ 1093.680031] [<c0205942>] ? inotify_inode_is_dead+0x20/0x80 [ 1093.680031] [<c08829ed>] mutex_lock_nested+0x2e/0x36 [ 1093.680031] [<c0205942>] ? inotify_inode_is_dead+0x20/0x80 [ 1093.680031] [<c0205942>] inotify_inode_is_dead+0x20/0x80 [ 1093.680031] [<c01e6672>] dentry_iput+0x90/0xc2 [ 1093.680031] [<c01e67a3>] d_kill+0x21/0x45 [ 1093.680031] [<c01e6a46>] __shrink_dcache_sb+0x27f/0x355 [ 1093.680031] [<c01e6dc5>] shrink_dcache_memory+0x15e/0x1fb [ 1093.680031] [<c01b05ed>] shrink_slab+0x121/0x189 [ 1093.680031] [<c01b0d12>] kswapd+0x39f/0x561 [ 1093.680031] [<c01ae499>] ? isolate_pages_global+0x0/0x233 [ 1093.680031] [<c0157eae>] ? autoremove_wake_function+0x0/0x43 [ 1093.680031] [<c01b0973>] ? kswapd+0x0/0x561 [ 1093.680031] [<c0157daf>] kthread+0x41/0x82 [ 1093.680031] [<c0157d6e>] ? kthread+0x0/0x82 [ 1093.680031] [<c01043ab>] kernel_thread_helper+0x7/0x10 inotify_handle_get_wd() does idr_pre_get() which does a kmem_cache_alloc() without __GFP_FS - and is hence deadlockable under extreme MM pressure. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: MinChan Kim <minchan.kim@gmail.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-26inotify: clean up inotify_read and fix locking problemsVegard Nossum
If userspace supplies an invalid pointer to a read() of an inotify instance, the inotify device's event list mutex is unlocked twice. This causes an unbalance which effectively leaves the data structure unprotected, and we can trigger oopses by accessing the inotify instance from different tasks concurrently. The best fix (contributed largely by Linus) is a total rewrite of the function in question: On Thu, Jan 22, 2009 at 7:05 AM, Linus Torvalds wrote: > The thing to notice is that: > > - locking is done in just one place, and there is no question about it > not having an unlock. > > - that whole double-while(1)-loop thing is gone. > > - use multiple functions to make nesting and error handling sane > > - do error testing after doing the things you always need to do, ie do > this: > > mutex_lock(..) > ret = function_call(); > mutex_unlock(..) > > .. test ret here .. > > instead of doing conditional exits with unlocking or freeing. > > So if the code is written in this way, it may still be buggy, but at least > it's not buggy because of subtle "forgot to unlock" or "forgot to free" > issues. > > This _always_ unlocks if it locked, and it always frees if it got a > non-error kevent. Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Robert Love <rlove@google.com> Cc: <stable@kernel.org> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-14[CVE-2009-0029] System call wrappers part 29Heiko Carstens
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14[CVE-2009-0029] System call wrappers part 28Heiko Carstens
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-05inotify: fix type errors in interfacesMichael Kerrisk
The problems lie in the types used for some inotify interfaces, both at the kernel level and at the glibc level. This mail addresses the kernel problem. I will follow up with some suggestions for glibc changes. For the sys_inotify_rm_watch() interface, the type of the 'wd' argument is currently 'u32', it should be '__s32' . That is Robert's suggestion, and is consistent with the other declarations of watch descriptors in the kernel source, in particular, the inotify_event structure in include/linux/inotify.h: struct inotify_event { __s32 wd; /* watch descriptor */ __u32 mask; /* watch mask */ __u32 cookie; /* cookie to synchronize two events */ __u32 len; /* length (including nulls) of name */ char name[0]; /* stub for possible name */ }; The patch makes the changes needed for inotify_rm_watch(). Signed-off-by: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Robert Love <rlove@google.com> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31filesystem notification: create fs/notify to contain all fs notificationEric Paris
Creating a generic filesystem notification interface, fsnotify, which will be used by inotify, dnotify, and eventually fanotify is really starting to clutter the fs directory. This patch simply moves inotify and dnotify into fs/notify/inotify and fs/notify/dnotify respectively to make both current fs/ and future notification tidier. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>