summaryrefslogtreecommitdiff
path: root/drivers/android/binderfs.c
AgeCommit message (Collapse)Author
2020-03-23Merge 5.6-rc7 into char-misc-nextGreg Kroah-Hartman
We need the char/misc driver fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-19binderfs: port to new mount apiChristian Brauner
When I first wrote binderfs the new mount api had not yet landed. Now that it has been around for a little while and a bunch of filesystems have already been ported we should do so too. When Al sent his mount-api-conversion pr he requested that binderfs (and a few others) be ported separately. It's time we port binderfs. We can make use of the new option parser, get nicer infrastructure and it will be easier if we ever add any new mount options. This survives testing with the binderfs selftests: for i in `seq 1 1000`; do ./binderfs_test; done including the new stress tests I sent out for review today: TAP version 13 1..1 # selftests: filesystems/binderfs: binderfs_test # [==========] Running 3 tests from 1 test cases. # [ RUN ] global.binderfs_stress # [ XFAIL! ] Tests are not run as root. Skipping privileged tests # [==========] Running 3 tests from 1 test cases. # [ RUN ] global.binderfs_stress # [ OK ] global.binderfs_stress # [ RUN ] global.binderfs_test_privileged # [ OK ] global.binderfs_test_privileged # [ RUN ] global.binderfs_test_unprivileged # # Allocated new binder device with major 243, minor 4, and name my-binder # # Detected binder version: 8 # [==========] Running 3 tests from 1 test cases. # [ RUN ] global.binderfs_stress # [ OK ] global.binderfs_stress # [ RUN ] global.binderfs_test_privileged # [ OK ] global.binderfs_test_privileged # [ RUN ] global.binderfs_test_unprivileged # [ OK ] global.binderfs_test_unprivileged # [==========] 3 / 3 tests passed. # [ PASSED ] ok 1 selftests: filesystems/binderfs: binderfs_test Cc: Todd Kjos <tkjos@google.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200313153427.141789-1-christian.brauner@ubuntu.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-11binderfs: use refcount for binder control devices tooChristian Brauner
Binderfs binder-control devices are cleaned up via binderfs_evict_inode too() which will use refcount_dec_and_test(). However, we missed to set the refcount for binderfs binder-control devices and so we underflowed when the binderfs instance got unmounted. Pretty obvious oversight and should have been part of the more general UAF fix. The good news is that having test cases (suprisingly) helps. Technically, we could detect that we're about to cleanup the binder-control dentry in binderfs_evict_inode() and then simply clean it up. But that makes the assumption that the binder driver itself will never make use of a binderfs binder-control device after the binderfs instance it belongs to has been unmounted and the superblock for it been destroyed. While it is unlikely to ever come to this let's be on the safe side. Performance-wise this also really doesn't matter since the binder-control device is only every really when creating the binderfs filesystem or creating additional binder devices. Both operations are pretty rare. Fixes: f0fe2c0f050d ("binder: prevent UAF for binderfs devices II") Link: https://lore.kernel.org/r/CA+G9fYusdfg7PMfC9Xce-xLT7NiyKSbgojpK35GOm=Pf9jXXrA@mail.gmail.com Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Todd Kjos <tkjos@google.com> Link: https://lore.kernel.org/r/20200311105309.1742827-1-christian.brauner@ubuntu.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-03binder: prevent UAF for binderfs devices IIChristian Brauner
This is a necessary follow up to the first fix I proposed and we merged in 2669b8b0c79 ("binder: prevent UAF for binderfs devices"). I have been overly optimistic that the simple fix I proposed would work. But alas, ihold() + iput() won't work since the inodes won't survive the destruction of the superblock. So all we get with my prior fix is a different race with a tinier race-window but it doesn't solve the issue. Fwiw, the problem lies with generic_shutdown_super(). It even has this cozy Al-style comment: if (!list_empty(&sb->s_inodes)) { printk("VFS: Busy inodes after unmount of %s. " "Self-destruct in 5 seconds. Have a nice day...\n", sb->s_id); } On binder_release(), binder_defer_work(proc, BINDER_DEFERRED_RELEASE) is called which punts the actual cleanup operation to a workqueue. At some point, binder_deferred_func() will be called which will end up calling binder_deferred_release() which will retrieve and cleanup the binder_context attach to this struct binder_proc. If we trace back where this binder_context is attached to binder_proc we see that it is set in binder_open() and is taken from the struct binder_device it is associated with. This obviously assumes that the struct binder_device that context is attached to is _never_ freed. While that might be true for devtmpfs binder devices it is most certainly wrong for binderfs binder devices. So, assume binder_open() is called on a binderfs binder devices. We now stash away the struct binder_context associated with that struct binder_devices: proc->context = &binder_dev->context; /* binderfs stashes devices in i_private */ if (is_binderfs_device(nodp)) { binder_dev = nodp->i_private; info = nodp->i_sb->s_fs_info; binder_binderfs_dir_entry_proc = info->proc_log_dir; } else { . . . proc->context = &binder_dev->context; Now let's assume that the binderfs instance for that binder devices is shutdown via umount() and/or the mount namespace associated with it goes away. As long as there is still an fd open for that binderfs binder device things are fine. But let's assume we now close the last fd for that binderfs binder device. Now binder_release() is called and punts to the workqueue. Assume that the workqueue has quite a bit of stuff to do and doesn't get to cleaning up the struct binder_proc and the associated struct binder_context with it for that binderfs binder device right away. In the meantime, the VFS is killing the super block and is ultimately calling sb->evict_inode() which means it will call binderfs_evict_inode() which does: static void binderfs_evict_inode(struct inode *inode) { struct binder_device *device = inode->i_private; struct binderfs_info *info = BINDERFS_I(inode); clear_inode(inode); if (!S_ISCHR(inode->i_mode) || !device) return; mutex_lock(&binderfs_minors_mutex); --info->device_count; ida_free(&binderfs_minors, device->miscdev.minor); mutex_unlock(&binderfs_minors_mutex); kfree(device->context.name); kfree(device); } thereby freeing the struct binder_device including struct binder_context. Now the workqueue finally has time to get around to cleaning up struct binder_proc and is now trying to access the associate struct binder_context. Since it's already freed it will OOPs. Fix this by introducing a refounct on binder devices. This is an alternative fix to 51d8a7eca677 ("binder: prevent UAF read in print_binder_transaction_log_entry()"). Fixes: 3ad20fe393b3 ("binder: implement binderfs") Fixes: 2669b8b0c798 ("binder: prevent UAF for binderfs devices") Fixes: 03e2e07e3814 ("binder: Make transaction_log available in binderfs") Related : 51d8a7eca677 ("binder: prevent UAF read in print_binder_transaction_log_entry()") Cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Todd Kjos <tkjos@google.com> Link: https://lore.kernel.org/r/20200303164340.670054-1-christian.brauner@ubuntu.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-04binder: Add binder_proc logging to binderfsHridya Valsaraju
Currently /sys/kernel/debug/binder/proc contains the debug data for every binder_proc instance. This patch makes this information also available in a binderfs instance mounted with a mount option "stats=global" in addition to debugfs. The patch does not affect the presence of the file in debugfs. If a binderfs instance is mounted at path /dev/binderfs, this file would be present at /dev/binderfs/binder_logs/proc. This change provides an alternate way to access this file when debugfs is not mounted. Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Hridya Valsaraju <hridya@google.com> Link: https://lore.kernel.org/r/20190903161655.107408-5-hridya@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-04binder: Make transaction_log available in binderfsHridya Valsaraju
Currently, the binder transaction log files 'transaction_log' and 'failed_transaction_log' live in debugfs at the following locations: /sys/kernel/debug/binder/failed_transaction_log /sys/kernel/debug/binder/transaction_log This patch makes these files also available in a binderfs instance mounted with the mount option "stats=global". It does not affect the presence of these files in debugfs. If a binderfs instance is mounted at path /dev/binderfs, the location of these files will be as follows: /dev/binderfs/binder_logs/failed_transaction_log /dev/binderfs/binder_logs/transaction_log This change provides an alternate option to access these files when debugfs is not mounted. Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Hridya Valsaraju <hridya@google.com> Link: https://lore.kernel.org/r/20190903161655.107408-4-hridya@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-04binder: Add stats, state and transactions filesHridya Valsaraju
The following binder stat files currently live in debugfs. /sys/kernel/debug/binder/state /sys/kernel/debug/binder/stats /sys/kernel/debug/binder/transactions This patch makes these files available in a binderfs instance mounted with the mount option 'stats=global'. For example, if a binderfs instance is mounted at path /dev/binderfs, the above files will be available at the following locations: /dev/binderfs/binder_logs/state /dev/binderfs/binder_logs/stats /dev/binderfs/binder_logs/transactions This provides a way to access them even when debugfs is not mounted. Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Hridya Valsaraju <hridya@google.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20190903161655.107408-3-hridya@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-04binder: add a mount option to show global statsHridya Valsaraju
Currently, all binder state and statistics live in debugfs. We need this information even when debugfs is not mounted. This patch adds the mount option 'stats' to enable a binderfs instance to have binder debug information present in the same. 'stats=global' will enable the global binder statistics. In the future, 'stats=local' will enable binder statistics local to the binderfs instance. The two modes 'global' and 'local' will be mutually exclusive. 'stats=global' option is only available for a binderfs instance mounted in the initial user namespace. An attempt to use the option to mount a binderfs instance in another user namespace will return an EPERM error. Signed-off-by: Hridya Valsaraju <hridya@google.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20190903161655.107408-2-hridya@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-04binder: Add default binder devices through binderfs when configuredHridya Valsaraju
Currently, since each binderfs instance needs its own private binder devices, every time a binderfs instance is mounted, all the default binder devices need to be created via the BINDER_CTL_ADD IOCTL. This patch aims to add a solution to automatically create the default binder devices for each binderfs instance that gets mounted. To achieve this goal, when CONFIG_ANDROID_BINDERFS is set, the default binder devices specified by CONFIG_ANDROID_BINDER_DEVICES are created in each binderfs instance instead of global devices being created by the binder driver. Co-developed-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Hridya Valsaraju <hridya@google.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Link: https://lore.kernel.org/r/20190808222727.132744-2-hridya@google.com Link: https://lore.kernel.org/r/20190904110704.8606-2-christian.brauner@ubuntu.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-04binder: Validate the default binderfs device names.Hridya Valsaraju
Length of a binderfs device name cannot exceed BINDERFS_MAX_NAME. This patch adds a check in binderfs_init() to ensure the same for the default binder devices that will be created in every binderfs instance. Co-developed-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Hridya Valsaraju <hridya@google.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Link: https://lore.kernel.org/r/20190808222727.132744-3-hridya@google.com Link: https://lore.kernel.org/r/20190904110704.8606-3-christian.brauner@ubuntu.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-01binderfs: remove separate device_initcall()Christian Brauner
binderfs should not have a separate device_initcall(). When a kernel is compiled with CONFIG_ANDROID_BINDERFS register the filesystem alongside CONFIG_ANDROID_IPC. This use-case is especially sensible when users specify CONFIG_ANDROID_IPC=y, CONFIG_ANDROID_BINDERFS=y and ANDROID_BINDER_DEVICES="". When CONFIG_ANDROID_BINDERFS=n then this always succeeds so there's no regression potential for legacy workloads. Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-30binderfs: respect limit on binder control creationChristian Brauner
We currently adhere to the reserved devices limit when creating new binderfs devices in binderfs instances not located in the inital ipc namespace. But it is still possible to rob the host instances of their 4 reserved devices by creating the maximum allowed number of devices in a single binderfs instance located in a non-initial ipc namespace and then mounting 4 separate binderfs instances in non-initial ipc namespaces. That happens because the limit is currently not respected for the creation of the initial binder-control device node. Block this nonsense by performing the same check in binderfs_binder_ctl_create() that we perform in binderfs_binder_device_create(). Fixes: 36bdf3cae09d ("binderfs: reserve devices for initial mount") Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22binderfs: switch from d_add() to d_instantiate()Christian Brauner
In a previous commit we switched from a d_alloc_name() + d_lookup() combination to setup a new dentry and find potential duplicates to the more idiomatic lookup_one_len(). As far as I understand, this also means we need to switch from d_add() to d_instantiate() since lookup_one_len() will create a new dentry when it doesn't find an existing one and add the new dentry to the hash queues. So we only need to call d_instantiate() to connect the dentry to the inode and turn it into a positive dentry. If we were to use d_add() we sure see stack traces like the following indicating that adding the same dentry twice over the same inode: [ 744.441889] CPU: 4 PID: 2849 Comm: landscape-sysin Not tainted 5.0.0-rc1-brauner-binderfs #243 [ 744.441889] Hardware name: Dell DCS XS24-SC2 /XS24-SC2 , BIOS S59_3C20 04/07/2011 [ 744.441889] RIP: 0010:__d_lookup_rcu+0x76/0x190 [ 744.441889] Code: 89 75 c0 49 c1 e9 20 49 89 fd 45 89 ce 41 83 e6 07 42 8d 04 f5 00 00 00 00 89 45 c8 eb 0c 48 8b 1b 48 85 db 0f 84 81 00 00 00 <44> 8b 63 fc 4c 3b 6b 10 75 ea 48 83 7b 08 00 74 e3 41 83 e4 fe 41 [ 744.441889] RSP: 0018:ffffb8c984e27ad0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13 [ 744.441889] RAX: 0000000000000038 RBX: ffff9407ef770c08 RCX: ffffb8c980011000 [ 744.441889] RDX: ffffb8c984e27b54 RSI: ffffb8c984e27ce0 RDI: ffff9407e6689600 [ 744.441889] RBP: ffffb8c984e27b28 R08: ffffb8c984e27ba4 R09: 0000000000000007 [ 744.441889] R10: ffff9407e5c4f05c R11: 973f3eb9d84a94e5 R12: 0000000000000002 [ 744.441889] R13: ffff9407e6689600 R14: 0000000000000007 R15: 00000007bfef7a13 [ 744.441889] FS: 00007f0db13bb740(0000) GS:ffff9407f3b00000(0000) knlGS:0000000000000000 [ 744.441889] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 744.441889] CR2: 00007f0dacc51024 CR3: 000000032961a000 CR4: 00000000000006e0 [ 744.441889] Call Trace: [ 744.441889] lookup_fast+0x53/0x300 [ 744.441889] walk_component+0x49/0x350 [ 744.441889] ? inode_permission+0x63/0x1a0 [ 744.441889] link_path_walk.part.33+0x1bc/0x5a0 [ 744.441889] ? path_init+0x190/0x310 [ 744.441889] path_lookupat+0x95/0x210 [ 744.441889] filename_lookup+0xb6/0x190 [ 744.441889] ? __check_object_size+0xb8/0x1b0 [ 744.441889] ? strncpy_from_user+0x50/0x1a0 [ 744.441889] user_path_at_empty+0x36/0x40 [ 744.441889] ? user_path_at_empty+0x36/0x40 [ 744.441889] vfs_statx+0x76/0xe0 [ 744.441889] __do_sys_newstat+0x3d/0x70 [ 744.441889] __x64_sys_newstat+0x16/0x20 [ 744.441889] do_syscall_64+0x5a/0x120 [ 744.441889] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 744.441889] RIP: 0033:0x7f0db0ec2775 [ 744.441889] Code: 00 00 00 75 05 48 83 c4 18 c3 e8 26 55 02 00 66 0f 1f 44 00 00 83 ff 01 48 89 f0 77 30 48 89 c7 48 89 d6 b8 04 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 03 f3 c3 90 48 8b 15 e1 b6 2d 00 f7 d8 64 89 [ 744.441889] RSP: 002b:00007ffc36bc9388 EFLAGS: 00000246 ORIG_RAX: 0000000000000004 [ 744.441889] RAX: ffffffffffffffda RBX: 00007ffc36bc9300 RCX: 00007f0db0ec2775 [ 744.441889] RDX: 00007ffc36bc9400 RSI: 00007ffc36bc9400 RDI: 00007f0dad26f050 [ 744.441889] RBP: 0000000000c0bc60 R08: 0000000000000000 R09: 0000000000000001 [ 744.441889] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc36bc9400 [ 744.441889] R13: 0000000000000001 R14: 00000000ffffff9c R15: 0000000000c0bc60 Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22binderfs: drop lock in binderfs_binder_ctl_createChristian Brauner
The binderfs_binder_ctl_create() call is a no-op on subsequent calls and the first call is done before we unlock the suberblock. Hence, there is no need to take inode_lock() in there. Let's remove it. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22binderfs: kill_litter_super() before cleanupChristian Brauner
Al pointed out that first calling kill_litter_super() before cleaning up info is more correct since destroying info doesn't depend on the state of the dentries and inodes. That the opposite remains true is not guaranteed. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22binderfs: rework binderfs_binder_device_create()Christian Brauner
- switch from d_alloc_name() + d_lookup() to lookup_one_len(): Instead of using d_alloc_name() and then doing a d_lookup() with the allocated dentry to find whether a device with the name we're trying to create already exists switch to using lookup_one_len(). The latter will either return the existing dentry or a new one. - switch from kmalloc() + strscpy() to kmemdup(): Use a more idiomatic way to copy the name for the new dentry that userspace gave us. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22binderfs: rework binderfs_fill_super()Christian Brauner
Al pointed out that on binderfs_fill_super() error deactivate_locked_super() will call binderfs_kill_super() so all of the freeing and putting we currently do in binderfs_fill_super() is unnecessary and buggy. Let's simply return errors and let binderfs_fill_super() take care of cleaning up on error. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22binderfs: prevent renaming the control dentryChristian Brauner
- make binderfs control dentry immutable: We don't allow to unlink it since it is crucial for binderfs to be useable but if we allow to rename it we make the unlink trivial to bypass. So prevent renaming too and simply treat the control dentry as immutable. - add is_binderfs_control_device() helper: Take the opportunity and turn the check for the control dentry into a separate helper is_binderfs_control_device() since it's now used in two places. - simplify binderfs_rename(): Instead of hand-rolling our custom version of simple_rename() just dumb the whole function down to first check whether we're trying to rename the control dentry. If we do EPERM the caller and if not call simple_rename(). Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22binderfs: remove outdated commentChristian Brauner
The comment stems from an early version of that patchset and is just confusing now. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-18binderfs: fix error return code in binderfs_fill_super()Wei Yongjun
Fix to return a negative error code -ENOMEM from the new_inode() and d_make_root() error handling cases instead of 0, as done elsewhere in this function. Fixes: 849d540ddfcd ("binderfs: implement "max" mount option") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Christian Brauner <christian@brauner.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-12binderfs: handle !CONFIG_IPC_NS buildsChristian Brauner
kbuild reported a build faile in [1]. This is triggered when CONFIG_IPC_NS is not set. So let's make the use of init_ipc_ns conditional on CONFIG_IPC_NS being set. [1]: https://lists.01.org/pipermail/kbuild-all/2019-January/056903.html Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-11binderfs: reserve devices for initial mountChristian Brauner
The binderfs instance in the initial ipc namespace will always have a reserve of 4 binder devices unless explicitly capped by specifying a lower value via the "max" mount option. This ensures when binder devices are removed (on accident or on purpose) they can always be recreated without risking that all minor numbers have already been used up. Cc: Todd Kjos <tkjos@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-11binderfs: rename header to binderfs.hChristian Brauner
It doesn't make sense to call the header binder_ctl.h when its sole existence is tied to binderfs. So give it a sensible name. Users will far more easily remember binderfs.h than binder_ctl.h. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-11binderfs: implement "max" mount optionChristian Brauner
Since binderfs can be mounted by userns root in non-initial user namespaces some precautions are in order. First, a way to set a maximum on the number of binder devices that can be allocated per binderfs instance and second, a way to reserve a reasonable chunk of binderfs devices for the initial ipc namespace. A first approach as seen in [1] used sysctls similiar to devpts but was shown to be flawed (cf. [2] and [3]) since some aspects were unneeded. This is an alternative approach which avoids sysctls completely and instead switches to a single mount option. Starting with this commit binderfs instances can be mounted with a limit on the number of binder devices that can be allocated. The max=<count> mount option serves as a per-instance limit. If max=<count> is set then only <count> number of binder devices can be allocated in this binderfs instance. This allows to safely bind-mount binderfs instances into unprivileged user namespaces since userns root in a non-initial user namespace cannot change the mount option as long as it does not own the mount namespace the binderfs mount was created in and hence cannot drain the host of minor device numbers [1]: https://lore.kernel.org/lkml/20181221133909.18794-1-christian@brauner.io/ [2]; https://lore.kernel.org/lkml/20181221163316.GA8517@kroah.com/ [3]: https://lore.kernel.org/lkml/CAHRSSEx+gDVW4fKKK8oZNAir9G5icJLyodO8hykv3O0O1jt2FQ@mail.gmail.com/ [4]: https://lore.kernel.org/lkml/20181221192044.5yvfnuri7gdop4rs@brauner.io/ Cc: Todd Kjos <tkjos@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-08binderfs: make each binderfs mount a new instanceChristian Brauner
When currently mounting binderfs in the same ipc namespace twice: mount -t binder binder /A mount -t binder binder /B then the binderfs instances mounted on /A and /B will be the same, i.e. they will have the same superblock. This was the first approach that seemed reasonable. However, this leads to some problems and inconsistencies: /* private binderfs instance in same ipc namespace */ There is no way for a user to request a private binderfs instance in the same ipc namespace. This request has been made in a private mail to me by two independent people. /* bind-mounts */ If users want the same binderfs instance to appear in multiple places they can use bind mounts. So there is no value in having a request for a new binderfs mount giving them the same instance. /* unexpected behavior */ It's surprising that request to mount binderfs is not giving the user a new instance like tmpfs, devpts, ramfs, and others do. /* past mistakes */ Other pseudo-filesystems once made the same mistakes of giving back the same superblock when actually requesting a new mount (cf. devpts's deprecated "newinstance" option). We should not make the same mistake. Once we've committed to always giving back the same superblock in the same IPC namespace with the next kernel release we will not be able to make that change so better to do it now. /* kdbusfs */ It was pointed out to me that kdbusfs - which is conceptually closely related to binderfs - also allowed users to get a private kdbusfs instance in the same IPC namespace by making each mount of kdbusfs a separate instance. I think that makes a lot of sense. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-08binderfs: remove wrong kern_mount() callChristian Brauner
The binderfs filesystem never needs to be mounted by the kernel itself. This is conceptually wrong and should never have been done in the first place. Fixes: 3ad20fe393b ("binder: implement binderfs") Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-19binder: implement binderfsChristian Brauner
As discussed at Linux Plumbers Conference 2018 in Vancouver [1] this is the implementation of binderfs. /* Abstract */ binderfs is a backwards-compatible filesystem for Android's binder ipc mechanism. Each ipc namespace will mount a new binderfs instance. Mounting binderfs multiple times at different locations in the same ipc namespace will not cause a new super block to be allocated and hence it will be the same filesystem instance. Each new binderfs mount will have its own set of binder devices only visible in the ipc namespace it has been mounted in. All devices in a new binderfs mount will follow the scheme binder%d and numbering will always start at 0. /* Backwards compatibility */ Devices requested in the Kconfig via CONFIG_ANDROID_BINDER_DEVICES for the initial ipc namespace will work as before. They will be registered via misc_register() and appear in the devtmpfs mount. Specifically, the standard devices binder, hwbinder, and vndbinder will all appear in their standard locations in /dev. Mounting or unmounting the binderfs mount in the initial ipc namespace will have no effect on these devices, i.e. they will neither show up in the binderfs mount nor will they disappear when the binderfs mount is gone. /* binder-control */ Each new binderfs instance comes with a binder-control device. No other devices will be present at first. The binder-control device can be used to dynamically allocate binder devices. All requests operate on the binderfs mount the binder-control device resides in. Assuming a new instance of binderfs has been mounted at /dev/binderfs via mount -t binderfs binderfs /dev/binderfs. Then a request to create a new binder device can be made as illustrated in [2]. Binderfs devices can simply be removed via unlink(). /* Implementation details */ - dynamic major number allocation: When binderfs is registered as a new filesystem it will dynamically allocate a new major number. The allocated major number will be returned in struct binderfs_device when a new binder device is allocated. - global minor number tracking: Minor are tracked in a global idr struct that is capped at BINDERFS_MAX_MINOR. The minor number tracker is protected by a global mutex. This is the only point of contention between binderfs mounts. - struct binderfs_info: Each binderfs super block has its own struct binderfs_info that tracks specific details about a binderfs instance: - ipc namespace - dentry of the binder-control device - root uid and root gid of the user namespace the binderfs instance was mounted in - mountable by user namespace root: binderfs can be mounted by user namespace root in a non-initial user namespace. The devices will be owned by user namespace root. - binderfs binder devices without misc infrastructure: New binder devices associated with a binderfs mount do not use the full misc_register() infrastructure. The misc_register() infrastructure can only create new devices in the host's devtmpfs mount. binderfs does however only make devices appear under its own mountpoint and thus allocates new character device nodes from the inode of the root dentry of the super block. This will have the side-effect that binderfs specific device nodes do not appear in sysfs. This behavior is similar to devpts allocated pts devices and has no effect on the functionality of the ipc mechanism itself. [1]: https://goo.gl/JL2tfX [2]: program to allocate a new binderfs binder device: #define _GNU_SOURCE #include <errno.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/ioctl.h> #include <sys/stat.h> #include <sys/types.h> #include <unistd.h> #include <linux/android/binder_ctl.h> int main(int argc, char *argv[]) { int fd, ret, saved_errno; size_t len; struct binderfs_device device = { 0 }; if (argc < 2) exit(EXIT_FAILURE); len = strlen(argv[1]); if (len > BINDERFS_MAX_NAME) exit(EXIT_FAILURE); memcpy(device.name, argv[1], len); fd = open("/dev/binderfs/binder-control", O_RDONLY | O_CLOEXEC); if (fd < 0) { printf("%s - Failed to open binder-control device\n", strerror(errno)); exit(EXIT_FAILURE); } ret = ioctl(fd, BINDER_CTL_ADD, &device); saved_errno = errno; close(fd); errno = saved_errno; if (ret < 0) { printf("%s - Failed to allocate new binder device\n", strerror(errno)); exit(EXIT_FAILURE); } printf("Allocated new binder device with major %d, minor %d, and " "name %s\n", device.major, device.minor, device.name); exit(EXIT_SUCCESS); } Cc: Martijn Coenen <maco@android.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Todd Kjos <tkjos@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>