From ef9fe980c6fcc1821ab955b74b242d2d6585fa75 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Fri, 9 Nov 2012 09:12:30 -0800 Subject: cgroup_freezer: implement proper hierarchy support Up until now, cgroup_freezer didn't implement hierarchy properly. cgroups could be arranged in hierarchy but it didn't make any difference in how each cgroup_freezer behaved. They all operated separately. This patch implements proper hierarchy support. If a cgroup is frozen, all its descendants are frozen. A cgroup is thawed iff it and all its ancestors are THAWED. freezer.self_freezing shows the current freezing state for the cgroup itself. freezer.parent_freezing shows whether the cgroup is freezing because any of its ancestors is freezing. freezer_post_create() locks the parent and new cgroup and inherits the parent's state and freezer_change_state() applies new state top-down using cgroup_for_each_descendant_pre() which guarantees that no child can escape its parent's state. update_if_frozen() uses cgroup_for_each_descendant_post() to propagate frozen states bottom-up. Synchronization could be coarser and easier by using a single mutex to protect all hierarchy operations. Finer grained approach was used because it wasn't too difficult for cgroup_freezer and I think it's beneficial to have an example implementation and cgroup_freezer is rather simple and can serve a good one. As this makes cgroup_freezer properly hierarchical, freezer_subsys.broken_hierarchy marking is removed. Note that this patch changes userland visible behavior - freezing a cgroup now freezes all its descendants too. This behavior change is intended and has been warned via .broken_hierarchy. v2: Michal spotted a bug in freezer_change_state() - descendants were inheriting from the wrong ancestor. Fixed. v3: Documentation/cgroups/freezer-subsystem.txt updated. Signed-off-by: Tejun Heo Reviewed-by: Michal Hocko --- Documentation/cgroups/freezer-subsystem.txt | 63 +++++++++++++++++++---------- 1 file changed, 42 insertions(+), 21 deletions(-) (limited to 'Documentation/cgroups') diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt index 7e62de1e59ff..c96a72cbb30a 100644 --- a/Documentation/cgroups/freezer-subsystem.txt +++ b/Documentation/cgroups/freezer-subsystem.txt @@ -49,13 +49,49 @@ prevent the freeze/unfreeze cycle from becoming visible to the tasks being frozen. This allows the bash example above and gdb to run as expected. -The freezer subsystem in the container filesystem defines a file named -freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the -cgroup. Subsequently writing "THAWED" will unfreeze the tasks in the cgroup. -Reading will return the current state. +The cgroup freezer is hierarchical. Freezing a cgroup freezes all +tasks beloning to the cgroup and all its descendant cgroups. Each +cgroup has its own state (self-state) and the state inherited from the +parent (parent-state). Iff both states are THAWED, the cgroup is +THAWED. -Note freezer.state doesn't exist in root cgroup, which means root cgroup -is non-freezable. +The following cgroupfs files are created by cgroup freezer. + +* freezer.state: Read-write. + + When read, returns the effective state of the cgroup - "THAWED", + "FREEZING" or "FROZEN". This is the combined self and parent-states. + If any is freezing, the cgroup is freezing (FREEZING or FROZEN). + + FREEZING cgroup transitions into FROZEN state when all tasks + belonging to the cgroup and its descendants become frozen. Note that + a cgroup reverts to FREEZING from FROZEN after a new task is added + to the cgroup or one of its descendant cgroups until the new task is + frozen. + + When written, sets the self-state of the cgroup. Two values are + allowed - "FROZEN" and "THAWED". If FROZEN is written, the cgroup, + if not already freezing, enters FREEZING state along with all its + descendant cgroups. + + If THAWED is written, the self-state of the cgroup is changed to + THAWED. Note that the effective state may not change to THAWED if + the parent-state is still freezing. If a cgroup's effective state + becomes THAWED, all its descendants which are freezing because of + the cgroup also leave the freezing state. + +* freezer.self_freezing: Read only. + + Shows the self-state. 0 if the self-state is THAWED; otherwise, 1. + This value is 1 iff the last write to freezer.state was "FROZEN". + +* freezer.parent_freezing: Read only. + + Shows the parent-state. 0 if none of the cgroup's ancestors is + frozen; otherwise, 1. + +The root cgroup is non-freezable and the above interface files don't +exist. * Examples of usage : @@ -85,18 +121,3 @@ to unfreeze all tasks in the container : This is the basic mechanism which should do the right thing for user space task in a simple scenario. - -It's important to note that freezing can be incomplete. In that case we return -EBUSY. This means that some tasks in the cgroup are busy doing something that -prevents us from completely freezing the cgroup at this time. After EBUSY, -the cgroup will remain partially frozen -- reflected by freezer.state reporting -"FREEZING" when read. The state will remain "FREEZING" until one of these -things happens: - - 1) Userspace cancels the freezing operation by writing "THAWED" to - the freezer.state file - 2) Userspace retries the freezing operation by writing "FROZEN" to - the freezer.state file (writing "FREEZING" is not legal - and returns EINVAL) - 3) The tasks that blocked the cgroup from entering the "FROZEN" - state disappear from the cgroup's set of tasks. -- cgit v1.2.3 From 92fb97487a7e41b222c1417cabd1d1ab7cc3a48c Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 19 Nov 2012 08:13:38 -0800 Subject: cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free() Rename cgroup_subsys css lifetime related callbacks to better describe what their roles are. Also, update documentation. Signed-off-by: Tejun Heo Acked-by: Li Zefan --- Documentation/cgroups/cgroups.txt | 49 ++++++++++++++++++++++--------------- block/blk-cgroup.c | 14 +++++------ include/linux/cgroup.h | 35 ++++++++++++++------------- kernel/cgroup.c | 51 ++++++++++++++++++++------------------- kernel/cgroup_freezer.c | 20 +++++++-------- kernel/cpuset.c | 10 ++++---- kernel/events/core.c | 8 +++--- kernel/sched/core.c | 16 ++++++------ mm/hugetlb_cgroup.c | 14 +++++------ mm/memcontrol.c | 12 ++++----- net/core/netprio_cgroup.c | 8 +++--- net/sched/cls_cgroup.c | 8 +++--- security/device_cgroup.c | 8 +++--- 13 files changed, 132 insertions(+), 121 deletions(-) (limited to 'Documentation/cgroups') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 9e04196c4d78..b06eea217403 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -553,16 +553,16 @@ call to cgroup_unload_subsys(). It should also set its_subsys.module = THIS_MODULE in its .c file. Each subsystem may export the following methods. The only mandatory -methods are create/destroy. Any others that are null are presumed to +methods are css_alloc/free. Any others that are null are presumed to be successful no-ops. -struct cgroup_subsys_state *create(struct cgroup *cgrp) +struct cgroup_subsys_state *css_alloc(struct cgroup *cgrp) (cgroup_mutex held by caller) -Called to create a subsystem state object for a cgroup. The +Called to allocate a subsystem state object for a cgroup. The subsystem should allocate its subsystem state object for the passed cgroup, returning a pointer to the new object on success or a -negative error code. On success, the subsystem pointer should point to +ERR_PTR() value. On success, the subsystem pointer should point to a structure of type cgroup_subsys_state (typically embedded in a larger subsystem-specific object), which will be initialized by the cgroup system. Note that this will be called at initialization to @@ -571,24 +571,33 @@ identified by the passed cgroup object having a NULL parent (since it's the root of the hierarchy) and may be an appropriate place for initialization code. -void destroy(struct cgroup *cgrp) +int css_online(struct cgroup *cgrp) (cgroup_mutex held by caller) -The cgroup system is about to destroy the passed cgroup; the subsystem -should do any necessary cleanup and free its subsystem state -object. By the time this method is called, the cgroup has already been -unlinked from the file system and from the child list of its parent; -cgroup->parent is still valid. (Note - can also be called for a -newly-created cgroup if an error occurs after this subsystem's -create() method has been called for the new cgroup). - -int pre_destroy(struct cgroup *cgrp); - -Called before checking the reference count on each subsystem. This may -be useful for subsystems which have some extra references even if -there are not tasks in the cgroup. If pre_destroy() returns error code, -rmdir() will fail with it. From this behavior, pre_destroy() can be -called multiple times against a cgroup. +Called after @cgrp successfully completed all allocations and made +visible to cgroup_for_each_child/descendant_*() iterators. The +subsystem may choose to fail creation by returning -errno. This +callback can be used to implement reliable state sharing and +propagation along the hierarchy. See the comment on +cgroup_for_each_descendant_pre() for details. + +void css_offline(struct cgroup *cgrp); + +This is the counterpart of css_online() and called iff css_online() +has succeeded on @cgrp. This signifies the beginning of the end of +@cgrp. @cgrp is being removed and the subsystem should start dropping +all references it's holding on @cgrp. When all references are dropped, +cgroup removal will proceed to the next step - css_free(). After this +callback, @cgrp should be considered dead to the subsystem. + +void css_free(struct cgroup *cgrp) +(cgroup_mutex held by caller) + +The cgroup system is about to free @cgrp; the subsystem should free +its subsystem state object. By the time this method is called, @cgrp +is completely unused; @cgrp->parent is still valid. (Note - can also +be called for a newly-created cgroup if an error occurs after this +subsystem's create() method has been called for the new cgroup). int can_attach(struct cgroup *cgrp, struct cgroup_taskset *tset) (cgroup_mutex held by caller) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 3dc60fc441cb..3f6d39d23bb6 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -600,7 +600,7 @@ struct cftype blkcg_files[] = { }; /** - * blkcg_pre_destroy - cgroup pre_destroy callback + * blkcg_css_offline - cgroup css_offline callback * @cgroup: cgroup of interest * * This function is called when @cgroup is about to go away and responsible @@ -610,7 +610,7 @@ struct cftype blkcg_files[] = { * * This is the blkcg counterpart of ioc_release_fn(). */ -static void blkcg_pre_destroy(struct cgroup *cgroup) +static void blkcg_css_offline(struct cgroup *cgroup) { struct blkcg *blkcg = cgroup_to_blkcg(cgroup); @@ -634,7 +634,7 @@ static void blkcg_pre_destroy(struct cgroup *cgroup) spin_unlock_irq(&blkcg->lock); } -static void blkcg_destroy(struct cgroup *cgroup) +static void blkcg_css_free(struct cgroup *cgroup) { struct blkcg *blkcg = cgroup_to_blkcg(cgroup); @@ -642,7 +642,7 @@ static void blkcg_destroy(struct cgroup *cgroup) kfree(blkcg); } -static struct cgroup_subsys_state *blkcg_create(struct cgroup *cgroup) +static struct cgroup_subsys_state *blkcg_css_alloc(struct cgroup *cgroup) { static atomic64_t id_seq = ATOMIC64_INIT(0); struct blkcg *blkcg; @@ -739,10 +739,10 @@ static int blkcg_can_attach(struct cgroup *cgrp, struct cgroup_taskset *tset) struct cgroup_subsys blkio_subsys = { .name = "blkio", - .create = blkcg_create, + .css_alloc = blkcg_css_alloc, + .css_offline = blkcg_css_offline, + .css_free = blkcg_css_free, .can_attach = blkcg_can_attach, - .pre_destroy = blkcg_pre_destroy, - .destroy = blkcg_destroy, .subsys_id = blkio_subsys_id, .base_cftypes = blkcg_files, .module = THIS_MODULE, diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 03d8a92786da..7a2189ca8327 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -82,7 +82,7 @@ struct cgroup_subsys_state { /* bits in struct cgroup_subsys_state flags field */ enum { CSS_ROOT = (1 << 0), /* this CSS is the root of the subsystem */ - CSS_ONLINE = (1 << 1), /* between ->post_create() and ->pre_destroy() */ + CSS_ONLINE = (1 << 1), /* between ->css_online() and ->css_offline() */ }; /* Caller must verify that the css is not for root cgroup */ @@ -439,10 +439,11 @@ int cgroup_taskset_size(struct cgroup_taskset *tset); */ struct cgroup_subsys { - struct cgroup_subsys_state *(*create)(struct cgroup *cgrp); - int (*post_create)(struct cgroup *cgrp); - void (*pre_destroy)(struct cgroup *cgrp); - void (*destroy)(struct cgroup *cgrp); + struct cgroup_subsys_state *(*css_alloc)(struct cgroup *cgrp); + int (*css_online)(struct cgroup *cgrp); + void (*css_offline)(struct cgroup *cgrp); + void (*css_free)(struct cgroup *cgrp); + int (*can_attach)(struct cgroup *cgrp, struct cgroup_taskset *tset); void (*cancel_attach)(struct cgroup *cgrp, struct cgroup_taskset *tset); void (*attach)(struct cgroup *cgrp, struct cgroup_taskset *tset); @@ -541,13 +542,13 @@ static inline struct cgroup* task_cgroup(struct task_struct *task, * @cgroup: cgroup whose children to walk * * Walk @cgroup's children. Must be called under rcu_read_lock(). A child - * cgroup which hasn't finished ->post_create() or already has finished - * ->pre_destroy() may show up during traversal and it's each subsystem's + * cgroup which hasn't finished ->css_online() or already has finished + * ->css_offline() may show up during traversal and it's each subsystem's * responsibility to verify that each @pos is alive. * - * If a subsystem synchronizes against the parent in its ->post_create() - * and before starting iterating, a cgroup which finished ->post_create() - * is guaranteed to be visible in the future iterations. + * If a subsystem synchronizes against the parent in its ->css_online() and + * before starting iterating, a cgroup which finished ->css_online() is + * guaranteed to be visible in the future iterations. */ #define cgroup_for_each_child(pos, cgroup) \ list_for_each_entry_rcu(pos, &(cgroup)->children, sibling) @@ -561,19 +562,19 @@ struct cgroup *cgroup_next_descendant_pre(struct cgroup *pos, * @cgroup: cgroup whose descendants to walk * * Walk @cgroup's descendants. Must be called under rcu_read_lock(). A - * descendant cgroup which hasn't finished ->post_create() or already has - * finished ->pre_destroy() may show up during traversal and it's each + * descendant cgroup which hasn't finished ->css_online() or already has + * finished ->css_offline() may show up during traversal and it's each * subsystem's responsibility to verify that each @pos is alive. * - * If a subsystem synchronizes against the parent in its ->post_create() - * and before starting iterating, and synchronizes against @pos on each - * iteration, any descendant cgroup which finished ->post_create() is + * If a subsystem synchronizes against the parent in its ->css_online() and + * before starting iterating, and synchronizes against @pos on each + * iteration, any descendant cgroup which finished ->css_offline() is * guaranteed to be visible in the future iterations. * * In other words, the following guarantees that a descendant can't escape * state updates of its ancestors. * - * my_post_create(@cgrp) + * my_online(@cgrp) * { * Lock @cgrp->parent and @cgrp; * Inherit state from @cgrp->parent; @@ -606,7 +607,7 @@ struct cgroup *cgroup_next_descendant_pre(struct cgroup *pos, * iteration should lock and unlock both @pos->parent and @pos. * * Alternatively, a subsystem may choose to use a single global lock to - * synchronize ->post_create() and ->pre_destroy() against tree-walking + * synchronize ->css_online() and ->css_offline() against tree-walking * operations. */ #define cgroup_for_each_descendant_pre(pos, cgroup) \ diff --git a/kernel/cgroup.c b/kernel/cgroup.c index c389f4258681..d35463bab487 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -876,7 +876,7 @@ static void cgroup_diput(struct dentry *dentry, struct inode *inode) * Release the subsystem state objects. */ for_each_subsys(cgrp->root, ss) - ss->destroy(cgrp); + ss->css_free(cgrp); cgrp->root->number_of_cgroups--; mutex_unlock(&cgroup_mutex); @@ -4048,8 +4048,8 @@ static int online_css(struct cgroup_subsys *ss, struct cgroup *cgrp) lockdep_assert_held(&cgroup_mutex); - if (ss->post_create) - ret = ss->post_create(cgrp); + if (ss->css_online) + ret = ss->css_online(cgrp); if (!ret) cgrp->subsys[ss->subsys_id]->flags |= CSS_ONLINE; return ret; @@ -4067,14 +4067,14 @@ static void offline_css(struct cgroup_subsys *ss, struct cgroup *cgrp) return; /* - * pre_destroy() should be called with cgroup_mutex unlocked. See + * css_offline() should be called with cgroup_mutex unlocked. See * 3fa59dfbc3 ("cgroup: fix potential deadlock in pre_destroy") for * details. This temporary unlocking should go away once * cgroup_mutex is unexported from controllers. */ - if (ss->pre_destroy) { + if (ss->css_offline) { mutex_unlock(&cgroup_mutex); - ss->pre_destroy(cgrp); + ss->css_offline(cgrp); mutex_lock(&cgroup_mutex); } @@ -4136,7 +4136,7 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry, for_each_subsys(root, ss) { struct cgroup_subsys_state *css; - css = ss->create(cgrp); + css = ss->css_alloc(cgrp); if (IS_ERR(css)) { err = PTR_ERR(css); goto err_free_all; @@ -4147,7 +4147,7 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry, if (err) goto err_free_all; } - /* At error, ->destroy() callback has to free assigned ID. */ + /* At error, ->css_free() callback has to free assigned ID. */ if (clone_children(parent) && ss->post_clone) ss->post_clone(cgrp); @@ -4201,7 +4201,7 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry, err_free_all: for_each_subsys(root, ss) { if (cgrp->subsys[ss->subsys_id]) - ss->destroy(cgrp); + ss->css_free(cgrp); } mutex_unlock(&cgroup_mutex); /* Release the reference count that we took on the superblock */ @@ -4381,7 +4381,7 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss) /* Create the top cgroup state for this subsystem */ list_add(&ss->sibling, &rootnode.subsys_list); ss->root = &rootnode; - css = ss->create(dummytop); + css = ss->css_alloc(dummytop); /* We don't handle early failures gracefully */ BUG_ON(IS_ERR(css)); init_cgroup_css(css, ss, dummytop); @@ -4425,7 +4425,7 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss) /* check name and function validity */ if (ss->name == NULL || strlen(ss->name) > MAX_CGROUP_TYPE_NAMELEN || - ss->create == NULL || ss->destroy == NULL) + ss->css_alloc == NULL || ss->css_free == NULL) return -EINVAL; /* @@ -4454,10 +4454,11 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss) subsys[ss->subsys_id] = ss; /* - * no ss->create seems to need anything important in the ss struct, so - * this can happen first (i.e. before the rootnode attachment). + * no ss->css_alloc seems to need anything important in the ss + * struct, so this can happen first (i.e. before the rootnode + * attachment). */ - css = ss->create(dummytop); + css = ss->css_alloc(dummytop); if (IS_ERR(css)) { /* failure case - need to deassign the subsys[] slot. */ subsys[ss->subsys_id] = NULL; @@ -4577,12 +4578,12 @@ void cgroup_unload_subsys(struct cgroup_subsys *ss) write_unlock(&css_set_lock); /* - * remove subsystem's css from the dummytop and free it - need to free - * before marking as null because ss->destroy needs the cgrp->subsys - * pointer to find their state. note that this also takes care of - * freeing the css_id. + * remove subsystem's css from the dummytop and free it - need to + * free before marking as null because ss->css_free needs the + * cgrp->subsys pointer to find their state. note that this also + * takes care of freeing the css_id. */ - ss->destroy(dummytop); + ss->css_free(dummytop); dummytop->subsys[ss->subsys_id] = NULL; mutex_unlock(&cgroup_mutex); @@ -4626,8 +4627,8 @@ int __init cgroup_init_early(void) BUG_ON(!ss->name); BUG_ON(strlen(ss->name) > MAX_CGROUP_TYPE_NAMELEN); - BUG_ON(!ss->create); - BUG_ON(!ss->destroy); + BUG_ON(!ss->css_alloc); + BUG_ON(!ss->css_free); if (ss->subsys_id != i) { printk(KERN_ERR "cgroup: Subsys %s id == %d\n", ss->name, ss->subsys_id); @@ -5439,7 +5440,7 @@ struct cgroup_subsys_state *cgroup_css_from_dir(struct file *f, int id) } #ifdef CONFIG_CGROUP_DEBUG -static struct cgroup_subsys_state *debug_create(struct cgroup *cont) +static struct cgroup_subsys_state *debug_css_alloc(struct cgroup *cont) { struct cgroup_subsys_state *css = kzalloc(sizeof(*css), GFP_KERNEL); @@ -5449,7 +5450,7 @@ static struct cgroup_subsys_state *debug_create(struct cgroup *cont) return css; } -static void debug_destroy(struct cgroup *cont) +static void debug_css_free(struct cgroup *cont) { kfree(cont->subsys[debug_subsys_id]); } @@ -5578,8 +5579,8 @@ static struct cftype debug_files[] = { struct cgroup_subsys debug_subsys = { .name = "debug", - .create = debug_create, - .destroy = debug_destroy, + .css_alloc = debug_css_alloc, + .css_free = debug_css_free, .subsys_id = debug_subsys_id, .base_cftypes = debug_files, }; diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c index ee8bb671688c..75dda1ea5026 100644 --- a/kernel/cgroup_freezer.c +++ b/kernel/cgroup_freezer.c @@ -92,7 +92,7 @@ static const char *freezer_state_strs(unsigned int state) struct cgroup_subsys freezer_subsys; -static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup) +static struct cgroup_subsys_state *freezer_css_alloc(struct cgroup *cgroup) { struct freezer *freezer; @@ -105,14 +105,14 @@ static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup) } /** - * freezer_post_create - commit creation of a freezer cgroup + * freezer_css_online - commit creation of a freezer cgroup * @cgroup: cgroup being created * * We're committing to creation of @cgroup. Mark it online and inherit * parent's freezing state while holding both parent's and our * freezer->lock. */ -static int freezer_post_create(struct cgroup *cgroup) +static int freezer_css_online(struct cgroup *cgroup) { struct freezer *freezer = cgroup_freezer(cgroup); struct freezer *parent = parent_freezer(freezer); @@ -141,13 +141,13 @@ static int freezer_post_create(struct cgroup *cgroup) } /** - * freezer_pre_destroy - initiate destruction of @cgroup + * freezer_css_offline - initiate destruction of @cgroup * @cgroup: cgroup being destroyed * * @cgroup is going away. Mark it dead and decrement system_freezing_count * if it was holding one. */ -static void freezer_pre_destroy(struct cgroup *cgroup) +static void freezer_css_offline(struct cgroup *cgroup) { struct freezer *freezer = cgroup_freezer(cgroup); @@ -161,7 +161,7 @@ static void freezer_pre_destroy(struct cgroup *cgroup) spin_unlock_irq(&freezer->lock); } -static void freezer_destroy(struct cgroup *cgroup) +static void freezer_css_free(struct cgroup *cgroup) { kfree(cgroup_freezer(cgroup)); } @@ -477,10 +477,10 @@ static struct cftype files[] = { struct cgroup_subsys freezer_subsys = { .name = "freezer", - .create = freezer_create, - .post_create = freezer_post_create, - .pre_destroy = freezer_pre_destroy, - .destroy = freezer_destroy, + .css_alloc = freezer_css_alloc, + .css_online = freezer_css_online, + .css_offline = freezer_css_offline, + .css_free = freezer_css_free, .subsys_id = freezer_subsys_id, .attach = freezer_attach, .fork = freezer_fork, diff --git a/kernel/cpuset.c b/kernel/cpuset.c index f33c7153b6d7..06931337c4e5 100644 --- a/kernel/cpuset.c +++ b/kernel/cpuset.c @@ -1821,11 +1821,11 @@ static void cpuset_post_clone(struct cgroup *cgroup) } /* - * cpuset_create - create a cpuset + * cpuset_css_alloc - allocate a cpuset css * cont: control group that the new cpuset will be part of */ -static struct cgroup_subsys_state *cpuset_create(struct cgroup *cont) +static struct cgroup_subsys_state *cpuset_css_alloc(struct cgroup *cont) { struct cpuset *cs; struct cpuset *parent; @@ -1864,7 +1864,7 @@ static struct cgroup_subsys_state *cpuset_create(struct cgroup *cont) * will call async_rebuild_sched_domains(). */ -static void cpuset_destroy(struct cgroup *cont) +static void cpuset_css_free(struct cgroup *cont) { struct cpuset *cs = cgroup_cs(cont); @@ -1878,8 +1878,8 @@ static void cpuset_destroy(struct cgroup *cont) struct cgroup_subsys cpuset_subsys = { .name = "cpuset", - .create = cpuset_create, - .destroy = cpuset_destroy, + .css_alloc = cpuset_css_alloc, + .css_free = cpuset_css_free, .can_attach = cpuset_can_attach, .attach = cpuset_attach, .post_clone = cpuset_post_clone, diff --git a/kernel/events/core.c b/kernel/events/core.c index dbccf83c134d..f9ff5493171d 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7434,7 +7434,7 @@ unlock: device_initcall(perf_event_sysfs_init); #ifdef CONFIG_CGROUP_PERF -static struct cgroup_subsys_state *perf_cgroup_create(struct cgroup *cont) +static struct cgroup_subsys_state *perf_cgroup_css_alloc(struct cgroup *cont) { struct perf_cgroup *jc; @@ -7451,7 +7451,7 @@ static struct cgroup_subsys_state *perf_cgroup_create(struct cgroup *cont) return &jc->css; } -static void perf_cgroup_destroy(struct cgroup *cont) +static void perf_cgroup_css_free(struct cgroup *cont) { struct perf_cgroup *jc; jc = container_of(cgroup_subsys_state(cont, perf_subsys_id), @@ -7492,8 +7492,8 @@ static void perf_cgroup_exit(struct cgroup *cgrp, struct cgroup *old_cgrp, struct cgroup_subsys perf_subsys = { .name = "perf_event", .subsys_id = perf_subsys_id, - .create = perf_cgroup_create, - .destroy = perf_cgroup_destroy, + .css_alloc = perf_cgroup_css_alloc, + .css_free = perf_cgroup_css_free, .exit = perf_cgroup_exit, .attach = perf_cgroup_attach, diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2d8927fda712..6f20c8fb2326 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7468,7 +7468,7 @@ static inline struct task_group *cgroup_tg(struct cgroup *cgrp) struct task_group, css); } -static struct cgroup_subsys_state *cpu_cgroup_create(struct cgroup *cgrp) +static struct cgroup_subsys_state *cpu_cgroup_css_alloc(struct cgroup *cgrp) { struct task_group *tg, *parent; @@ -7485,7 +7485,7 @@ static struct cgroup_subsys_state *cpu_cgroup_create(struct cgroup *cgrp) return &tg->css; } -static void cpu_cgroup_destroy(struct cgroup *cgrp) +static void cpu_cgroup_css_free(struct cgroup *cgrp) { struct task_group *tg = cgroup_tg(cgrp); @@ -7845,8 +7845,8 @@ static struct cftype cpu_files[] = { struct cgroup_subsys cpu_cgroup_subsys = { .name = "cpu", - .create = cpu_cgroup_create, - .destroy = cpu_cgroup_destroy, + .css_alloc = cpu_cgroup_css_alloc, + .css_free = cpu_cgroup_css_free, .can_attach = cpu_cgroup_can_attach, .attach = cpu_cgroup_attach, .exit = cpu_cgroup_exit, @@ -7869,7 +7869,7 @@ struct cgroup_subsys cpu_cgroup_subsys = { struct cpuacct root_cpuacct; /* create a new cpu accounting group */ -static struct cgroup_subsys_state *cpuacct_create(struct cgroup *cgrp) +static struct cgroup_subsys_state *cpuacct_css_alloc(struct cgroup *cgrp) { struct cpuacct *ca; @@ -7899,7 +7899,7 @@ out: } /* destroy an existing cpu accounting group */ -static void cpuacct_destroy(struct cgroup *cgrp) +static void cpuacct_css_free(struct cgroup *cgrp) { struct cpuacct *ca = cgroup_ca(cgrp); @@ -8070,8 +8070,8 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime) struct cgroup_subsys cpuacct_subsys = { .name = "cpuacct", - .create = cpuacct_create, - .destroy = cpuacct_destroy, + .css_alloc = cpuacct_css_alloc, + .css_free = cpuacct_css_free, .subsys_id = cpuacct_subsys_id, .base_cftypes = files, }; diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 0d3a1a317731..b5bde7a5c017 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -77,7 +77,7 @@ static inline bool hugetlb_cgroup_have_usage(struct cgroup *cg) return false; } -static struct cgroup_subsys_state *hugetlb_cgroup_create(struct cgroup *cgroup) +static struct cgroup_subsys_state *hugetlb_cgroup_css_alloc(struct cgroup *cgroup) { int idx; struct cgroup *parent_cgroup; @@ -101,7 +101,7 @@ static struct cgroup_subsys_state *hugetlb_cgroup_create(struct cgroup *cgroup) return &h_cgroup->css; } -static void hugetlb_cgroup_destroy(struct cgroup *cgroup) +static void hugetlb_cgroup_css_free(struct cgroup *cgroup) { struct hugetlb_cgroup *h_cgroup; @@ -155,7 +155,7 @@ out: * Force the hugetlb cgroup to empty the hugetlb resources by moving them to * the parent cgroup. */ -static void hugetlb_cgroup_pre_destroy(struct cgroup *cgroup) +static void hugetlb_cgroup_css_offline(struct cgroup *cgroup) { struct hstate *h; struct page *page; @@ -404,8 +404,8 @@ void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage) struct cgroup_subsys hugetlb_subsys = { .name = "hugetlb", - .create = hugetlb_cgroup_create, - .pre_destroy = hugetlb_cgroup_pre_destroy, - .destroy = hugetlb_cgroup_destroy, - .subsys_id = hugetlb_subsys_id, + .css_alloc = hugetlb_cgroup_css_alloc, + .css_offline = hugetlb_cgroup_css_offline, + .css_free = hugetlb_cgroup_css_free, + .subsys_id = hugetlb_subsys_id, }; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 08adaaae6fcc..8b0b2b028e23 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4922,7 +4922,7 @@ err_cleanup: } static struct cgroup_subsys_state * __ref -mem_cgroup_create(struct cgroup *cont) +mem_cgroup_css_alloc(struct cgroup *cont) { struct mem_cgroup *memcg, *parent; long error = -ENOMEM; @@ -5003,14 +5003,14 @@ free_out: return ERR_PTR(error); } -static void mem_cgroup_pre_destroy(struct cgroup *cont) +static void mem_cgroup_css_offline(struct cgroup *cont) { struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); mem_cgroup_reparent_charges(memcg); } -static void mem_cgroup_destroy(struct cgroup *cont) +static void mem_cgroup_css_free(struct cgroup *cont) { struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); @@ -5600,9 +5600,9 @@ static void mem_cgroup_move_task(struct cgroup *cont, struct cgroup_subsys mem_cgroup_subsys = { .name = "memory", .subsys_id = mem_cgroup_subsys_id, - .create = mem_cgroup_create, - .pre_destroy = mem_cgroup_pre_destroy, - .destroy = mem_cgroup_destroy, + .css_alloc = mem_cgroup_css_alloc, + .css_offline = mem_cgroup_css_offline, + .css_free = mem_cgroup_css_free, .can_attach = mem_cgroup_can_attach, .cancel_attach = mem_cgroup_cancel_attach, .attach = mem_cgroup_move_task, diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c index 79285a36035f..f0b6b0d572c1 100644 --- a/net/core/netprio_cgroup.c +++ b/net/core/netprio_cgroup.c @@ -108,7 +108,7 @@ static int write_update_netdev_table(struct net_device *dev) return ret; } -static struct cgroup_subsys_state *cgrp_create(struct cgroup *cgrp) +static struct cgroup_subsys_state *cgrp_css_alloc(struct cgroup *cgrp) { struct cgroup_netprio_state *cs; int ret = -EINVAL; @@ -132,7 +132,7 @@ out: return ERR_PTR(ret); } -static void cgrp_destroy(struct cgroup *cgrp) +static void cgrp_css_free(struct cgroup *cgrp) { struct cgroup_netprio_state *cs; struct net_device *dev; @@ -276,8 +276,8 @@ static struct cftype ss_files[] = { struct cgroup_subsys net_prio_subsys = { .name = "net_prio", - .create = cgrp_create, - .destroy = cgrp_destroy, + .css_alloc = cgrp_css_alloc, + .css_free = cgrp_css_free, .attach = net_prio_attach, .subsys_id = net_prio_subsys_id, .base_cftypes = ss_files, diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c index 2ecde225ae60..8cdc18e075fb 100644 --- a/net/sched/cls_cgroup.c +++ b/net/sched/cls_cgroup.c @@ -34,7 +34,7 @@ static inline struct cgroup_cls_state *task_cls_state(struct task_struct *p) struct cgroup_cls_state, css); } -static struct cgroup_subsys_state *cgrp_create(struct cgroup *cgrp) +static struct cgroup_subsys_state *cgrp_css_alloc(struct cgroup *cgrp) { struct cgroup_cls_state *cs; @@ -48,7 +48,7 @@ static struct cgroup_subsys_state *cgrp_create(struct cgroup *cgrp) return &cs->css; } -static void cgrp_destroy(struct cgroup *cgrp) +static void cgrp_css_free(struct cgroup *cgrp) { kfree(cgrp_cls_state(cgrp)); } @@ -75,8 +75,8 @@ static struct cftype ss_files[] = { struct cgroup_subsys net_cls_subsys = { .name = "net_cls", - .create = cgrp_create, - .destroy = cgrp_destroy, + .css_alloc = cgrp_css_alloc, + .css_free = cgrp_css_free, .subsys_id = net_cls_subsys_id, .base_cftypes = ss_files, .module = THIS_MODULE, diff --git a/security/device_cgroup.c b/security/device_cgroup.c index 78a16f5b7275..19ecc8de9e6b 100644 --- a/security/device_cgroup.c +++ b/security/device_cgroup.c @@ -180,7 +180,7 @@ static void dev_exception_clean(struct dev_cgroup *dev_cgroup) /* * called from kernel/cgroup.c with cgroup_lock() held. */ -static struct cgroup_subsys_state *devcgroup_create(struct cgroup *cgroup) +static struct cgroup_subsys_state *devcgroup_css_alloc(struct cgroup *cgroup) { struct dev_cgroup *dev_cgroup, *parent_dev_cgroup; struct cgroup *parent_cgroup; @@ -210,7 +210,7 @@ static struct cgroup_subsys_state *devcgroup_create(struct cgroup *cgroup) return &dev_cgroup->css; } -static void devcgroup_destroy(struct cgroup *cgroup) +static void devcgroup_css_free(struct cgroup *cgroup) { struct dev_cgroup *dev_cgroup; @@ -564,8 +564,8 @@ static struct cftype dev_cgroup_files[] = { struct cgroup_subsys devices_subsys = { .name = "devices", .can_attach = devcgroup_can_attach, - .create = devcgroup_create, - .destroy = devcgroup_destroy, + .css_alloc = devcgroup_css_alloc, + .css_free = devcgroup_css_free, .subsys_id = devices_subsys_id, .base_cftypes = dev_cgroup_files, -- cgit v1.2.3 From 2260e7fc1f18ad815324605c1ce7d5c6fd9b19a2 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 19 Nov 2012 08:13:38 -0800 Subject: cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/ clone_children is only meaningful for cpuset and will stay that way. Rename the flag to reflect that and update documentation. Also, drop clone_children() wrapper in cgroup.c. The thin wrapper is used only a few times and one of them will go away soon. Signed-off-by: Tejun Heo Acked-by: Serge E. Hallyn Acked-by: Li Zefan Cc: Glauber Costa --- Documentation/cgroups/cgroups.txt | 8 +++----- include/linux/cgroup.h | 6 ++++-- kernel/cgroup.c | 28 ++++++++++++---------------- 3 files changed, 19 insertions(+), 23 deletions(-) (limited to 'Documentation/cgroups') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index b06eea217403..24cdf76bd20c 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -299,11 +299,9 @@ a cgroup hierarchy's release_agent path is empty. 1.5 What does clone_children do ? --------------------------------- -If the clone_children flag is enabled (1) in a cgroup, then all -cgroups created beneath will call the post_clone callbacks for each -subsystem of the newly created cgroup. Usually when this callback is -implemented for a subsystem, it copies the values of the parent -subsystem, this is the case for the cpuset. +This flag only affects the cpuset controller. If the clone_children +flag is enabled (1) in a cgroup, a new cpuset cgroup will copy its +configuration from the parent during initialization. 1.6 How do I use cgroups ? -------------------------- diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 7a2189ca8327..d2f82979f6c1 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -143,9 +143,11 @@ enum { /* Control Group requires release notifications to userspace */ CGRP_NOTIFY_ON_RELEASE, /* - * Clone cgroup values when creating a new child cgroup + * Clone the parent's configuration when creating a new child + * cpuset cgroup. For historical reasons, this option can be + * specified at mount time and thus is implemented here. */ - CGRP_CLONE_CHILDREN, + CGRP_CPUSET_CLONE_CHILDREN, }; struct cgroup { diff --git a/kernel/cgroup.c b/kernel/cgroup.c index d35463bab487..2895880e6800 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -296,11 +296,6 @@ static int notify_on_release(const struct cgroup *cgrp) return test_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags); } -static int clone_children(const struct cgroup *cgrp) -{ - return test_bit(CGRP_CLONE_CHILDREN, &cgrp->flags); -} - /* * for_each_subsys() allows you to iterate on each subsystem attached to * an active hierarchy @@ -1101,7 +1096,7 @@ static int cgroup_show_options(struct seq_file *seq, struct dentry *dentry) seq_puts(seq, ",xattr"); if (strlen(root->release_agent_path)) seq_printf(seq, ",release_agent=%s", root->release_agent_path); - if (clone_children(&root->top_cgroup)) + if (test_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->top_cgroup.flags)) seq_puts(seq, ",clone_children"); if (strlen(root->name)) seq_printf(seq, ",name=%s", root->name); @@ -1113,7 +1108,7 @@ struct cgroup_sb_opts { unsigned long subsys_mask; unsigned long flags; char *release_agent; - bool clone_children; + bool cpuset_clone_children; char *name; /* User explicitly requested empty subsystem */ bool none; @@ -1164,7 +1159,7 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts) continue; } if (!strcmp(token, "clone_children")) { - opts->clone_children = true; + opts->cpuset_clone_children = true; continue; } if (!strcmp(token, "xattr")) { @@ -1474,8 +1469,8 @@ static struct cgroupfs_root *cgroup_root_from_opts(struct cgroup_sb_opts *opts) strcpy(root->release_agent_path, opts->release_agent); if (opts->name) strcpy(root->name, opts->name); - if (opts->clone_children) - set_bit(CGRP_CLONE_CHILDREN, &root->top_cgroup.flags); + if (opts->cpuset_clone_children) + set_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->top_cgroup.flags); return root; } @@ -3905,7 +3900,7 @@ fail: static u64 cgroup_clone_children_read(struct cgroup *cgrp, struct cftype *cft) { - return clone_children(cgrp); + return test_bit(CGRP_CPUSET_CLONE_CHILDREN, &cgrp->flags); } static int cgroup_clone_children_write(struct cgroup *cgrp, @@ -3913,9 +3908,9 @@ static int cgroup_clone_children_write(struct cgroup *cgrp, u64 val) { if (val) - set_bit(CGRP_CLONE_CHILDREN, &cgrp->flags); + set_bit(CGRP_CPUSET_CLONE_CHILDREN, &cgrp->flags); else - clear_bit(CGRP_CLONE_CHILDREN, &cgrp->flags); + clear_bit(CGRP_CPUSET_CLONE_CHILDREN, &cgrp->flags); return 0; } @@ -4130,8 +4125,8 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry, if (notify_on_release(parent)) set_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags); - if (clone_children(parent)) - set_bit(CGRP_CLONE_CHILDREN, &cgrp->flags); + if (test_bit(CGRP_CPUSET_CLONE_CHILDREN, &parent->flags)) + set_bit(CGRP_CPUSET_CLONE_CHILDREN, &cgrp->flags); for_each_subsys(root, ss) { struct cgroup_subsys_state *css; @@ -4148,7 +4143,8 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry, goto err_free_all; } /* At error, ->css_free() callback has to free assigned ID. */ - if (clone_children(parent) && ss->post_clone) + if (test_bit(CGRP_CPUSET_CLONE_CHILDREN, &parent->flags) && + ss->post_clone) ss->post_clone(cgrp); if (ss->broken_hierarchy && !ss->warned_broken_hierarchy && -- cgit v1.2.3 From 033fa1c5f5f73833598a0beb022c0048fb769dad Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 19 Nov 2012 08:13:39 -0800 Subject: cgroup, cpuset: remove cgroup_subsys->post_clone() Currently CGRP_CPUSET_CLONE_CHILDREN triggers ->post_clone(). Now that clone_children is cpuset specific, there's no reason to have this rather odd option activation mechanism in cgroup core. cpuset can check the flag from its ->css_allocate() and take the necessary action. Move cpuset_post_clone() logic to the end of cpuset_css_alloc() and remove cgroup_subsys->post_clone(). Loosely based on Glauber's "generalize post_clone into post_create" patch. Signed-off-by: Tejun Heo Original-patch-by: Glauber Costa Original-patch: <1351686554-22592-2-git-send-email-glommer@parallels.com> Acked-by: Serge E. Hallyn Acked-by: Li Zefan Cc: Glauber Costa --- Documentation/cgroups/cgroups.txt | 8 ---- include/linux/cgroup.h | 1 - kernel/cgroup.c | 4 -- kernel/cpuset.c | 80 ++++++++++++++++++--------------------- 4 files changed, 36 insertions(+), 57 deletions(-) (limited to 'Documentation/cgroups') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 24cdf76bd20c..bcf1a00b06a1 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -642,14 +642,6 @@ void exit(struct task_struct *task) Called during task exit. -void post_clone(struct cgroup *cgrp) -(cgroup_mutex held by caller) - -Called during cgroup_create() to do any parameter -initialization which might be required before a task could attach. For -example, in cpusets, no task may attach before 'cpus' and 'mems' are set -up. - void bind(struct cgroup *root) (cgroup_mutex held by caller) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index d2f82979f6c1..c798997e5011 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -452,7 +452,6 @@ struct cgroup_subsys { void (*fork)(struct task_struct *task); void (*exit)(struct cgroup *cgrp, struct cgroup *old_cgrp, struct task_struct *task); - void (*post_clone)(struct cgroup *cgrp); void (*bind)(struct cgroup *root); int subsys_id; diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 2895880e6800..96719f73e95d 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -4142,10 +4142,6 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry, if (err) goto err_free_all; } - /* At error, ->css_free() callback has to free assigned ID. */ - if (test_bit(CGRP_CPUSET_CLONE_CHILDREN, &parent->flags) && - ss->post_clone) - ss->post_clone(cgrp); if (ss->broken_hierarchy && !ss->warned_broken_hierarchy && parent->parent) { diff --git a/kernel/cpuset.c b/kernel/cpuset.c index 06931337c4e5..b017887d632f 100644 --- a/kernel/cpuset.c +++ b/kernel/cpuset.c @@ -1783,43 +1783,6 @@ static struct cftype files[] = { { } /* terminate */ }; -/* - * post_clone() is called during cgroup_create() when the - * clone_children mount argument was specified. The cgroup - * can not yet have any tasks. - * - * Currently we refuse to set up the cgroup - thereby - * refusing the task to be entered, and as a result refusing - * the sys_unshare() or clone() which initiated it - if any - * sibling cpusets have exclusive cpus or mem. - * - * If this becomes a problem for some users who wish to - * allow that scenario, then cpuset_post_clone() could be - * changed to grant parent->cpus_allowed-sibling_cpus_exclusive - * (and likewise for mems) to the new cgroup. Called with cgroup_mutex - * held. - */ -static void cpuset_post_clone(struct cgroup *cgroup) -{ - struct cgroup *parent, *child; - struct cpuset *cs, *parent_cs; - - parent = cgroup->parent; - list_for_each_entry(child, &parent->children, sibling) { - cs = cgroup_cs(child); - if (is_mem_exclusive(cs) || is_cpu_exclusive(cs)) - return; - } - cs = cgroup_cs(cgroup); - parent_cs = cgroup_cs(parent); - - mutex_lock(&callback_mutex); - cs->mems_allowed = parent_cs->mems_allowed; - cpumask_copy(cs->cpus_allowed, parent_cs->cpus_allowed); - mutex_unlock(&callback_mutex); - return; -} - /* * cpuset_css_alloc - allocate a cpuset css * cont: control group that the new cpuset will be part of @@ -1827,13 +1790,14 @@ static void cpuset_post_clone(struct cgroup *cgroup) static struct cgroup_subsys_state *cpuset_css_alloc(struct cgroup *cont) { - struct cpuset *cs; - struct cpuset *parent; + struct cgroup *parent_cg = cont->parent; + struct cgroup *tmp_cg; + struct cpuset *parent, *cs; - if (!cont->parent) { + if (!parent_cg) return &top_cpuset.css; - } - parent = cgroup_cs(cont->parent); + parent = cgroup_cs(parent_cg); + cs = kmalloc(sizeof(*cs), GFP_KERNEL); if (!cs) return ERR_PTR(-ENOMEM); @@ -1855,7 +1819,36 @@ static struct cgroup_subsys_state *cpuset_css_alloc(struct cgroup *cont) cs->parent = parent; number_of_cpusets++; - return &cs->css ; + + if (!test_bit(CGRP_CPUSET_CLONE_CHILDREN, &cont->flags)) + goto skip_clone; + + /* + * Clone @parent's configuration if CGRP_CPUSET_CLONE_CHILDREN is + * set. This flag handling is implemented in cgroup core for + * histrical reasons - the flag may be specified during mount. + * + * Currently, if any sibling cpusets have exclusive cpus or mem, we + * refuse to clone the configuration - thereby refusing the task to + * be entered, and as a result refusing the sys_unshare() or + * clone() which initiated it. If this becomes a problem for some + * users who wish to allow that scenario, then this could be + * changed to grant parent->cpus_allowed-sibling_cpus_exclusive + * (and likewise for mems) to the new cgroup. + */ + list_for_each_entry(tmp_cg, &parent_cg->children, sibling) { + struct cpuset *tmp_cs = cgroup_cs(tmp_cg); + + if (is_mem_exclusive(tmp_cs) || is_cpu_exclusive(tmp_cs)) + goto skip_clone; + } + + mutex_lock(&callback_mutex); + cs->mems_allowed = parent->mems_allowed; + cpumask_copy(cs->cpus_allowed, parent->cpus_allowed); + mutex_unlock(&callback_mutex); +skip_clone: + return &cs->css; } /* @@ -1882,7 +1875,6 @@ struct cgroup_subsys cpuset_subsys = { .css_free = cpuset_css_free, .can_attach = cpuset_can_attach, .attach = cpuset_attach, - .post_clone = cpuset_post_clone, .subsys_id = cpuset_subsys_id, .base_cftypes = files, .early_init = 1, -- cgit v1.2.3 From 811d8d6ff59cbc7d618dfa2cd339ba6c3691a7eb Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 22 Nov 2012 07:32:47 -0800 Subject: netprio_cgroup: allow nesting and inherit config on cgroup creation Inherit netprio configuration from ->css_online(), allow nesting and remove .broken_hierarchy marking. This makes netprio_cgroup's behavior match netcls_cgroup's. Note that this patch changes userland-visible behavior. Nesting is allowed and the first level cgroups below the root cgroup behave differently - they inherit priorities from the root cgroup on creation instead of starting with 0. This is unfortunate but not doing so is much crazier. Signed-off-by: Tejun Heo Tested-and-Acked-by: Daniel Wagner Acked-by: David S. Miller --- Documentation/cgroups/net_prio.txt | 2 ++ net/core/netprio_cgroup.c | 42 ++++++++++++++++++++++---------------- 2 files changed, 26 insertions(+), 18 deletions(-) (limited to 'Documentation/cgroups') diff --git a/Documentation/cgroups/net_prio.txt b/Documentation/cgroups/net_prio.txt index 01b322635591..a82cbd28ea8a 100644 --- a/Documentation/cgroups/net_prio.txt +++ b/Documentation/cgroups/net_prio.txt @@ -51,3 +51,5 @@ One usage for the net_prio cgroup is with mqprio qdisc allowing application traffic to be steered to hardware/driver based traffic classes. These mappings can then be managed by administrators or other networking protocols such as DCBX. + +A new net_prio cgroup inherits the parent's configuration. diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c index b2af0d099663..bde53da9cd86 100644 --- a/net/core/netprio_cgroup.c +++ b/net/core/netprio_cgroup.c @@ -136,9 +136,6 @@ static struct cgroup_subsys_state *cgrp_css_alloc(struct cgroup *cgrp) { struct cgroup_netprio_state *cs; - if (cgrp->parent && cgrp->parent->id) - return ERR_PTR(-EINVAL); - cs = kzalloc(sizeof(*cs), GFP_KERNEL); if (!cs) return ERR_PTR(-ENOMEM); @@ -146,16 +143,34 @@ static struct cgroup_subsys_state *cgrp_css_alloc(struct cgroup *cgrp) return &cs->css; } -static void cgrp_css_free(struct cgroup *cgrp) +static int cgrp_css_online(struct cgroup *cgrp) { - struct cgroup_netprio_state *cs = cgrp_netprio_state(cgrp); + struct cgroup *parent = cgrp->parent; struct net_device *dev; + int ret = 0; + + if (!parent) + return 0; rtnl_lock(); - for_each_netdev(&init_net, dev) - WARN_ON_ONCE(netprio_set_prio(cgrp, dev, 0)); + /* + * Inherit prios from the parent. As all prios are set during + * onlining, there is no need to clear them on offline. + */ + for_each_netdev(&init_net, dev) { + u32 prio = netprio_prio(parent, dev); + + ret = netprio_set_prio(cgrp, dev, prio); + if (ret) + break; + } rtnl_unlock(); - kfree(cs); + return ret; +} + +static void cgrp_css_free(struct cgroup *cgrp) +{ + kfree(cgrp_netprio_state(cgrp)); } static u64 read_prioidx(struct cgroup *cgrp, struct cftype *cft) @@ -237,21 +252,12 @@ static struct cftype ss_files[] = { struct cgroup_subsys net_prio_subsys = { .name = "net_prio", .css_alloc = cgrp_css_alloc, + .css_online = cgrp_css_online, .css_free = cgrp_css_free, .attach = net_prio_attach, .subsys_id = net_prio_subsys_id, .base_cftypes = ss_files, .module = THIS_MODULE, - - /* - * net_prio has artificial limit on the number of cgroups and - * disallows nesting making it impossible to co-mount it with other - * hierarchical subsystems. Remove the artificially low PRIOIDX_SZ - * limit and properly nest configuration such that children follow - * their parents' configurations by default and are allowed to - * override and remove the following. - */ - .broken_hierarchy = true, }; static int netprio_device_event(struct notifier_block *unused, -- cgit v1.2.3 From 15ef4ffaa797034d5ff82844daf8f595d7c6d53c Mon Sep 17 00:00:00 2001 From: Namjae Jeon Date: Sat, 8 Dec 2012 15:02:36 +0900 Subject: cgroup: update Documentation/cgroups/00-INDEX There are new files added to cgroup documentation. Let's update the index file to include the new files. Signed-off-by: Namjae Jeon Signed-off-by: Amit Sahrawat Signed-off-by: Tejun Heo --- Documentation/cgroups/00-INDEX | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'Documentation/cgroups') diff --git a/Documentation/cgroups/00-INDEX b/Documentation/cgroups/00-INDEX index 3f58fa3d6d00..f78b90a35ad0 100644 --- a/Documentation/cgroups/00-INDEX +++ b/Documentation/cgroups/00-INDEX @@ -1,7 +1,11 @@ 00-INDEX - this file +blkio-controller.txt + - Description for Block IO Controller, implementation and usage details. cgroups.txt - Control Groups definition, implementation details, examples and API. +cgroup_event_listener.c + - A user program for cgroup listener. cpuacct.txt - CPU Accounting Controller; account CPU usage for groups of tasks. cpusets.txt @@ -10,9 +14,13 @@ devices.txt - Device Whitelist Controller; description, interface and security. freezer-subsystem.txt - checkpointing; rationale to not use signals, interface. +hugetlb.txt + - HugeTLB Controller implementation and usage details. memcg_test.txt - Memory Resource Controller; implementation details. memory.txt - Memory Resource Controller; design, accounting, interface, testing. +net_prio.txt + - Network priority cgroups details and usages. resource_counter.txt - Resource Counter API. -- cgit v1.2.3