summaryrefslogtreecommitdiff
path: root/fs/nfsd/netns.h
AgeCommit message (Collapse)Author
2022-01-08NFSD: Rename boot verifier functionsChuck Lever
Clean up: These functions handle what the specs call a write verifier, which in the Linux NFS server implementation is now divorced from the server's boot instance Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08NFSD: Clean up the nfsd_net::nfssvc_boot fieldChuck Lever
There are two boot-time fields in struct nfsd_net: one called boot_time and one called nfssvc_boot. The latter is used only to form write verifiers, but its documenting comment declares: /* Time of server startup */ Since commit 27c438f53e79 ("nfsd: Support the server resetting the boot verifier"), this field can be reset at any time; it's no longer tied to server restart. So that comment is stale. Also, according to pahole, struct timespec64 is 16 bytes long on x86_64. The nfssvc_boot field is used only to form a write verifier, which is 8 bytes long. Let's clarify this situation by manufacturing an 8-byte verifier in nfs_reset_boot_verifier() and storing only that in struct nfsd_net. We're grabbing 128 bits of time, so compress all of those into a 64-bit verifier instead of throwing out the high-order bits. In the future, the siphash_key can be re-used for other hashed objects per-nfsd_net. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13NFSD: simplify per-net file cache managementNeilBrown
We currently have a 'laundrette' for closing cached files - a different work-item for each network-namespace. These 'laundrettes' (aka struct nfsd_fcache_disposal) are currently on a list, and are freed using rcu. The list is not necessary as we have a per-namespace structure (struct nfsd_net) which can hold a link to the nfsd_fcache_disposal. The use of kfree_rcu is also unnecessary as the cache is cleaned of all files associated with a given namespace, and no new files can be added, before the nfsd_fcache_disposal is freed. So add a '->fcache_disposal' link to nfsd_net, and discard the list management and rcu usage. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13NFSD: simplify locking for network notifier.NeilBrown
nfsd currently maintains an open-coded read/write semaphore (refcount and wait queue) for each network namespace to ensure the nfs service isn't shut down while the notifier is running. This is excessive. As there is unlikely to be contention between notifiers and they run without sleeping, a single spinlock is sufficient to avoid problems. Signed-off-by: NeilBrown <neilb@suse.de> [ cel: ensure nfsd_notifier_lock is static ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13NFSD: Make it possible to use svc_set_num_threads_syncNeilBrown
nfsd cannot currently use svc_set_num_threads_sync. It instead uses svc_set_num_threads which does *not* wait for threads to all exit, and has a separate mechanism (nfsd_shutdown_complete) to wait for completion. The reason that nfsd is unlike other services is that nfsd threads can exit separately from svc_set_num_threads being called - they die on receipt of SIGKILL. Also, when the last thread exits, the service must be shut down (sockets closed). For this, the nfsd_mutex needs to be taken, and as that mutex needs to be held while svc_set_num_threads is called, the one cannot wait for the other. This patch changes the nfsd thread so that it can drop the ref on the service without blocking on nfsd_mutex, so that svc_set_num_threads_sync can be used: - if it can drop a non-last reference, it does that. This does not trigger shutdown and does not require a mutex. This will likely happen for all but the last thread signalled, and for all threads being shut down by nfsd_shutdown_threads() - if it can get the mutex without blocking (trylock), it does that and then drops the reference. This will likely happen for the last thread killed by SIGKILL - Otherwise there might be an unrelated task holding the mutex, possibly in another network namespace, or nfsd_shutdown_threads() might be just about to get a reference on the service, after which we can drop ours safely. We cannot conveniently get wakeup notifications on these events, and we are unlikely to need to, so we sleep briefly and check again. With this we can discard nfsd_shutdown_complete and nfsd_complete_shutdown(), and switch to svc_set_num_threads_sync. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13SUNRPC: stop using ->sv_nrthreads as a refcountNeilBrown
The use of sv_nrthreads as a general refcount results in clumsy code, as is seen by various comments needed to explain the situation. This patch introduces a 'struct kref' and uses that for reference counting, leaving sv_nrthreads to be a pure count of threads. The kref is managed particularly in svc_get() and svc_put(), and also nfsd_put(); svc_destroy() now takes a pointer to the embedded kref, rather than to the serv. nfsd allows the svc_serv to exist with ->sv_nrhtreads being zero. This happens when a transport is created before the first thread is started. To support this, a 'keep_active' flag is introduced which holds a ref on the svc_serv. This is set when any listening socket is successfully added (unless there are running threads), and cleared when the number of threads is set. So when the last thread exits, the nfs_serv will be destroyed. The use of 'keep_active' replaces previous code which checked if there were any permanent sockets. We no longer clear ->rq_server when nfsd() exits. This was done to prevent svc_exit_thread() from calling svc_destroy(). Instead we take an extra reference to the svc_serv to prevent svc_destroy() from being called. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-05-25NFSD: delay unmount source's export after inter-server copy completed.Dai Ngo
Currently the source's export is mounted and unmounted on every inter-server copy operation. This patch is an enhancement to delay the unmount of the source export for a certain period of time to eliminate the mount and unmount overhead on subsequent copy operations. After a copy operation completes, a work entry is added to the delayed unmount list with an expiration time. This list is serviced by the laundromat thread to unmount the export of the expired entries. Each time the export is being used again, its expiration time is extended and the entry is re-inserted to the tail of the list. The unmount task and the mount operation of the copy request are synced to make sure the export is not unmounted while it's being used. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-03-22nfsd: Ensure knfsd shuts down when the "nfsd" pseudofs is unmountedTrond Myklebust
In order to ensure that knfsd threads don't linger once the nfsd pseudofs is unmounted (e.g. when the container is killed) we let nfsd_umount() shut down those threads and wait for them to exit. This also should ensure that we don't need to do a kernel mount of the pseudofs, since the thread lifetime is now limited by the lifetime of the filesystem. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25nfsd: protect concurrent access to nfsd stats countersAmir Goldstein
nfsd stats counters can be updated by concurrent nfsd threads without any protection. Convert some nfsd_stats and nfsd_net struct members to use percpu counters. The longest_chain* members of struct nfsd_net remain unprotected. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-24nfsd: netns.h: delete a duplicated wordRandy Dunlap
Drop the repeated word "the" in a comment. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-06-01nfsd4: make drc_slab global, not per-netJ. Bruce Fields
I made every global per-network-namespace instead. But perhaps doing that to this slab was a step too far. The kmem_cache_create call in our net init method also seems to be responsible for this lockdep warning: [ 45.163710] Unable to find swap-space signature [ 45.375718] trinity-c1 (855): attempted to duplicate a private mapping with mremap. This is not supported. [ 46.055744] futex_wake_op: trinity-c1 tries to shift op by -209; fix this program [ 51.011723] [ 51.013378] ====================================================== [ 51.013875] WARNING: possible circular locking dependency detected [ 51.014378] 5.2.0-rc2 #1 Not tainted [ 51.014672] ------------------------------------------------------ [ 51.015182] trinity-c2/886 is trying to acquire lock: [ 51.015593] 000000005405f099 (slab_mutex){+.+.}, at: slab_attr_store+0xa2/0x130 [ 51.016190] [ 51.016190] but task is already holding lock: [ 51.016652] 00000000ac662005 (kn->count#43){++++}, at: kernfs_fop_write+0x286/0x500 [ 51.017266] [ 51.017266] which lock already depends on the new lock. [ 51.017266] [ 51.017909] [ 51.017909] the existing dependency chain (in reverse order) is: [ 51.018497] [ 51.018497] -> #1 (kn->count#43){++++}: [ 51.018956] __lock_acquire+0x7cf/0x1a20 [ 51.019317] lock_acquire+0x17d/0x390 [ 51.019658] __kernfs_remove+0x892/0xae0 [ 51.020020] kernfs_remove_by_name_ns+0x78/0x110 [ 51.020435] sysfs_remove_link+0x55/0xb0 [ 51.020832] sysfs_slab_add+0xc1/0x3e0 [ 51.021332] __kmem_cache_create+0x155/0x200 [ 51.021720] create_cache+0xf5/0x320 [ 51.022054] kmem_cache_create_usercopy+0x179/0x320 [ 51.022486] kmem_cache_create+0x1a/0x30 [ 51.022867] nfsd_reply_cache_init+0x278/0x560 [ 51.023266] nfsd_init_net+0x20f/0x5e0 [ 51.023623] ops_init+0xcb/0x4b0 [ 51.023928] setup_net+0x2fe/0x670 [ 51.024315] copy_net_ns+0x30a/0x3f0 [ 51.024653] create_new_namespaces+0x3c5/0x820 [ 51.025257] unshare_nsproxy_namespaces+0xd1/0x240 [ 51.025881] ksys_unshare+0x506/0x9c0 [ 51.026381] __x64_sys_unshare+0x3a/0x50 [ 51.026937] do_syscall_64+0x110/0x10b0 [ 51.027509] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 51.028175] [ 51.028175] -> #0 (slab_mutex){+.+.}: [ 51.028817] validate_chain+0x1c51/0x2cc0 [ 51.029422] __lock_acquire+0x7cf/0x1a20 [ 51.029947] lock_acquire+0x17d/0x390 [ 51.030438] __mutex_lock+0x100/0xfa0 [ 51.030995] mutex_lock_nested+0x27/0x30 [ 51.031516] slab_attr_store+0xa2/0x130 [ 51.032020] sysfs_kf_write+0x11d/0x180 [ 51.032529] kernfs_fop_write+0x32a/0x500 [ 51.033056] do_loop_readv_writev+0x21d/0x310 [ 51.033627] do_iter_write+0x2e5/0x380 [ 51.034148] vfs_writev+0x170/0x310 [ 51.034616] do_pwritev+0x13e/0x160 [ 51.035100] __x64_sys_pwritev+0xa3/0x110 [ 51.035633] do_syscall_64+0x110/0x10b0 [ 51.036200] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 51.036924] [ 51.036924] other info that might help us debug this: [ 51.036924] [ 51.037876] Possible unsafe locking scenario: [ 51.037876] [ 51.038556] CPU0 CPU1 [ 51.039130] ---- ---- [ 51.039676] lock(kn->count#43); [ 51.040084] lock(slab_mutex); [ 51.040597] lock(kn->count#43); [ 51.041062] lock(slab_mutex); [ 51.041320] [ 51.041320] *** DEADLOCK *** [ 51.041320] [ 51.041793] 3 locks held by trinity-c2/886: [ 51.042128] #0: 000000001f55e152 (sb_writers#5){.+.+}, at: vfs_writev+0x2b9/0x310 [ 51.042739] #1: 00000000c7d6c034 (&of->mutex){+.+.}, at: kernfs_fop_write+0x25b/0x500 [ 51.043400] #2: 00000000ac662005 (kn->count#43){++++}, at: kernfs_fop_write+0x286/0x500 Reported-by: kernel test robot <lkp@intel.com> Fixes: 3ba75830ce17 "drc containerization" Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-03-16nfsd: set the server_scope during service startupScott Mayhew
Currently, nfsd4_encode_exchange_id() encodes the utsname nodename string in the server_scope field. In a multi-host container environemnt, if an nfsd container is restarted on a different host than it was originally running on, clients will see a server_scope mismatch and will not attempt to reclaim opens. Instead, set the server_scope while we're in a process context during service startup, so we get the utsname nodename of the current process and store that in nfsd_net. Signed-off-by: Scott Mayhew <smayhew@redhat.com> [bfields: fix up major_id too] Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2019-12-19nfsd: use boottime for lease expiry calculationArnd Bergmann
A couple of time_t variables are only used to track the state of the lease time and its expiration. The code correctly uses the 'time_after()' macro to make this work on 32-bit architectures even beyond year 2038, but the get_seconds() function and the time_t type itself are deprecated as they behave inconsistently between 32-bit and 64-bit architectures and often lead to code that is not y2038 safe. As a minor issue, using get_seconds() leads to problems with concurrent settimeofday() or clock_settime() calls, in the worst case timeout never triggering after the time has been set backwards. Change nfsd to use time64_t and ktime_get_boottime_seconds() here. This is clearly excessive, as boottime by itself means we never go beyond 32 bits, but it does mean we handle this correctly and consistently without having to worry about corner cases and should be no more expensive than the previous implementation on 64-bit architectures. The max_cb_time() function gets changed in order to avoid an expensive 64-bit division operation, but as the lease time is at most one hour, there is no change in behavior. Also do the same for server-to-server copy expiration time. Signed-off-by: Arnd Bergmann <arnd@arndb.de> [bfields@redhat.com: fix up copy expiration] Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-19nfsd: make 'boot_time' 64-bit wideArnd Bergmann
The local boot time variable gets truncated to time_t at the moment, which can lead to slightly odd behavior on 32-bit architectures. Use ktime_get_real_seconds() instead of get_seconds() to always get a 64-bit result, and keep it that way wherever possible. It still gets truncated in a few places: - When assigning to cl_clientid.cl_boot, this is already documented and is only used as a unique identifier. - In clients_still_reclaiming(), the truncation is to 'unsigned long' in order to use the 'time_before() helper. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-09-10nfsd: Support the server resetting the boot verifierTrond Myklebust
Add support to allow the server to reset the boot verifier in order to force clients to resend I/O after a timeout failure. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: make client/ directory names small intsJ. Bruce Fields
We want clientid's on the wire to be randomized for reasons explained in ebd7c72c63ac "nfsd: randomize SETCLIENTID reply to help distinguish servers". But I'd rather have mostly small integers for the clients/ directory. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: add nfsd/clients directoryJ. Bruce Fields
I plan to expose some information about nfsv4 clients here. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: persist nfsd filesystem across mountsJ. Bruce Fields
Keep around one internal mount of the nfsd filesystem so that we can add stuff to it when clients come and go, regardless of whether anyone has it mounted. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: note inadequate stats lockingJ. Bruce Fields
After 89a26b3d295d "nfsd: split DRC global spinlock into per-bucket locks", there is no longer a single global spinlock to protect these stats. So, really we need to fix that. For now, at least fix the comment. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd4: drc containerizationJ. Bruce Fields
The nfsd duplicate reply cache should not be shared between network namespaces. The most straightforward way to fix this is just to move every global in the code to per-net-namespace memory, so that's what we do. Still todo: sort out which members of nfsd_stats should be global and which per-net-namespace. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-05-21treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 1Thomas Gleixner
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 51 franklin street fifth floor boston ma 02110 1301 usa this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option [no]_[pad]_[ctrl] any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 51 franklin street fifth floor boston ma 02110 1301 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 176 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154040.652910950@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-24nfsd: Allow containers to set supported nfs versionsTrond Myklebust
Support use of the --nfs-version/--no-nfs-version arguments to rpc.nfsd in containers. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24nfsd: keep a tally of RECLAIM_COMPLETE operations when using nfsdcldScott Mayhew
When using nfsdcld for NFSv4 client tracking, track the number of RECLAIM_COMPLETE operations we receive from "known" clients to help in deciding if we can lift the grace period early (or whether we need to start a v4 grace period at all). Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-09-25NFSD introduce async copy featureOlga Kornievskaia
Upon receiving a request for async copy, create a new kthread. If we get asynchronous request, make sure to copy the needed arguments/state from the stack before starting the copy. Then start the thread and reply back to the client indicating copy is asynchronous. nfsd_copy_file_range() will copy in a loop over the total number of bytes is needed to copy. In case a failure happens in the middle, we ignore the error and return how much we copied so far. Once done creating a workitem for the callback workqueue and send CB_OFFLOAD with the results. The lifetime of the copy stateid is bound to the vfs copy. This way we don't need to keep the nfsd_net structure for the callback. We could keep it around longer so that an OFFLOAD_STATUS that came late would still get results, but clients should be able to deal without that. We handle OFFLOAD_CANCEL by sending a signal to the copy thread and calling kthread_stop. A client should cancel any ongoing copies before calling DESTROY_CLIENT; if not, we return a CLIENT_BUSY error. If the client is destroyed for some other reason (lease expiration, or server shutdown), we must clean up any ongoing copies ourselves. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> [colin.king@canonical.com: fix leak in error case] [bfields@fieldses.org: remove signalling, merge patches] Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-06-17nfsd4: extend reclaim period for reclaiming clientsJ. Bruce Fields
If the client is only renewing state a little sooner than once a lease period, then it might not discover the server has restarted till close to the end of the grace period, and might run out of time to do the actual reclaim. Extend the grace period by a second each time we notice there are clients still trying to reclaim, up to a limit of another whole lease period. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27race of nfsd inetaddr notifiers vs nn->nfsd_serv changeVasily Averin
nfsd_inet[6]addr_event uses nn->nfsd_serv without taking nfsd_mutex, which can be changed during execution of notifiers and crash the host. Moreover if notifiers were enabled in one net namespace they are enabled in all other net namespaces, from creation until destruction. This patch allows notifiers to access nn->nfsd_serv only after the pointer is correctly initialized and delays cleanup until notifiers are no longer in use. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Tested-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07nfds: avoid gettimeofday for nfssvc_boot timeArnd Bergmann
do_gettimeofday() is deprecated and we should generally use time64_t based functions instead. In case of nfsd, all three users of nfssvc_boot only use the initial time as a unique token, and are not affected by it overflowing, so they are not affected by the y2038 overflow. This converts the structure to timespec64 anyway and adds comments to all uses, to document that we have thought about it and avoid having to look at it again. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-18netns: make struct pernet_operations::id unsigned intAlexey Dobriyan
Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long *p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void *net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-24nfsd: move blocked lock handling under a dedicated spinlockJeff Layton
Bruce was hitting some lockdep warnings in testing, showing that we could hit a deadlock with the new CB_NOTIFY_LOCK handling, involving a rather complex situation involving four different spinlocks. The crux of the matter is that we end up taking the nn->client_lock in the lm_notify handler. The simplest fix is to just declare a new per-nfsd_net spinlock to protect the new CB_NOTIFY_LOCK structures. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-26nfsd: add a LRU list for blocked locksJeff Layton
It's possible for a client to call in on a lock that is blocked for a long time, but discontinue polling for it. A malicious client could even set a lock on a file, and then spam the server with failing lock requests from different lockowners that pile up in a DoS attack. Add the blocked lock structures to a per-net namespace LRU when hashing them, and timestamp them. If the lock request is not revisited after a lease period, we'll drop it under the assumption that the client is no longer interested. This also gives us a mechanism to clean up these objects at server shutdown time as well. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-23nfsd: recover: constify nfsd4_client_tracking_ops structuresJulia Lawall
The nfsd4_client_tracking_ops structures are never modified, so declare them as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: New counter for generating client confirm verifierKinglong Mee
If using clientid_counter, it seems possible that gen_confirm could generate the same verifier for the same client in some situations. Add a new counter for client confirm verifier to make sure gen_confirm generates a different verifier on each call for the same clientid. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05nfsd: add some comments to the nfsd4 object definitionsJeff Layton
Add some comments that describe what each of these objects is, and how they related to one another. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05nfsd: protect clid and verifier generation with client_lockJeff Layton
The clid counter is a global counter currently. Move it to be a per-net property so that it can be properly protected by the nn->client_lock instead of relying on the client_mutex. The verifier generator is also potentially racy if there are two simultaneous callers. Generate the verifier when we generate the clid value, so it's also created under the client_lock. With this, there's no need to keep two counters as they'd always be in sync anyway, so just use the clientid_counter for both. As Trond points out, what would be best is to eventually move this code to use IDR instead of the hash tables. That would also help ensure uniqueness, but that's probably best done as a separate project. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31nfsd: Move the open owner hash table into struct nfs4_clientTrond Myklebust
Preparation for removing the client_mutex. Convert the open owner hash table into a per-client table and protect it using the nfs4_client->cl_lock spin lock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09nfsd: NFSv4 lock-owners are not associated to a specific fileTrond Myklebust
Just like open-owners, lock-owners are associated with a name, a clientid and, in the case of minor version 0, a sequence id. There is no association to a file. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08nfsd: add a new /proc/fs/nfsd/max_connections fileJeff Layton
Currently, the maximum number of connections that nfsd will allow is based on the number of threads spawned. While this is fine for a default, there really isn't a clear relationship between the two. The number of threads corresponds to the number of concurrent requests that we want to allow the server to process at any given time. The connection limit corresponds to the maximum number of clients that we want to allow the server to handle. These are two entirely different quantities. Break the dependency on increasing threads in order to allow for more connections, by adding a new per-net parameter that can be set to a non-zero value. The default is still to base it on the number of threads, so there should be no behavior change for anyone who doesn't use it. Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-03NFSD: Don't start lockd when only NFSv4 is runningKinglong Mee
When starting without nfsv2 and nfsv3, nfsd does not need to start lockd (and certainly doesn't need to fail because lockd failed to register with the portmapper). Reported-by: Gareth Williams <gareth@garethwilliams.me.uk> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-04nfsd4: make del_recall_lru per-network-namespaceJ. Bruce Fields
If nothing else this simplifies the nfs4_state_shutdown_net logic a tad. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10nfsd: make NFSd service structure allocated per netStanislav Kinsbursky
This patch makes main step in NFSd containerisation. There could be different approaches to how to make NFSd able to handle incoming RPC request from different network namespaces. The two main options are: 1) Share NFSd kthreads betwween all network namespaces. 2) Create separated pool of threads for each namespace. While first approach looks more flexible, second one is simpler and non-racy. This patch implements the second option. To make it possible to allocate separate pools of threads, we have to make it possible to allocate separate NFSd service structures per net. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10nfsd: make NFSd service boot time per-netStanislav Kinsbursky
This is simple: an NFSd service can be started at different times in different network environments. So, its "boot time" has to be assigned per net. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10nfsd: per-net NFSd up flag introducedStanislav Kinsbursky
This patch introduces introduces per-net "nfsd_net_up" boolean flag, which has the same purpose as general "nfsd_up" flag - skip init or shutdown of per-net resources in case of they are inited on shutted down respectively. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10nfsd: make NFSv4 recovery client tracking options per netStanislav Kinsbursky
Pointer to client tracking operations - client_tracking_ops - have to be containerized, because different environment can support different trackers (for example, legacy tracker currently is not suported in container). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03NFSD: Clean up forgetting clientsBryan Schumaker
I added in a generic for-each loop that takes a pass over the client_lru list for the current net namespace and calls some function. The next few patches will update other operations to use this function as well. A value of 0 still means "forget everything that is found". Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28nfsd: make NFSv4 grace time per netStanislav Kinsbursky
Grace time is a part of NFSv4 state engine, which is constructed per network namespace. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28nfsd: make NFSv4 lease time per netStanislav Kinsbursky
Lease time is a part of NFSv4 state engine, which is constructed per network namespace. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28nfsd: recovery - make in_grace per netStanislav Kinsbursky
Flag in_grace is a part of client tracking state, which is network namesapce aware. So let'a replace global static variable with per-net one. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28nfsd: recovery - make rec_file per netStanislav Kinsbursky
Opening and closing of this file is done in client tracking init and exit operations. Client tracking is done in network namespace context already. So let's make this file opened and closed per network context - this will simlify it's management. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28nfsd: make client_lock per netStanislav Kinsbursky
This lock protects the client lru list and session hash table, which are allocated per network namespace already. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15nfsd: make laundromat network namespace awareStanislav Kinsbursky
This patch moves laundromat_work to nfsd per-net context, thus allowing to run multiple laundries. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>