From 144439376bfcd1d178e37bc08e27a58f82719bdb Mon Sep 17 00:00:00 2001 From: Jeff Mahoney Date: Wed, 19 Jul 2017 23:25:51 -0400 Subject: btrfs: fix lockup in find_free_extent with read-only block groups If we have a block group that is all of the following: 1) uncached in memory 2) is read-only 3) has a disk cache state that indicates we need to recreate the cache AND the file system has enough free space fragmentation such that the request for an extent of a given size can't be honored; AND have a single CPU core; AND it's the block group with the highest starting offset such that there are no opportunities (like reading from disk) for the loop to yield the CPU; We can end up with a lockup. The root cause is simple. Once we're in the position that we've read in all of the other block groups directly and none of those block groups can honor the request, there are no more opportunities to sleep. We end up trying to start a caching thread which never gets run if we only have one core. This *should* present as a hung task waiting on the caching thread to make some progress, but it doesn't. Instead, it degrades into a busy loop because of the placement of the read-only check. During the first pass through the loop, block_group->cached will be set to BTRFS_CACHE_STARTED and have_caching_bg will be set. Then we hit the read-only check and short circuit the loop. We're not yet in LOOP_CACHING_WAIT, so we skip that loop back before going through the loop again for other raid groups. Then we move to LOOP_CACHING_WAIT state. During the this pass through the loop, ->cached will still be BTRFS_CACHE_STARTED, which means it's not cached, so we'll enter cache_block_group, do a lot of nothing, and return, and also set have_caching_bg again. Then we hit the read-only check and short circuit the loop. The same thing happens as before except now we DO trigger the LOOP_CACHING_WAIT && have_caching_bg check and loop back up to the top. We do this forever. There are two fixes in this patch since they address the same underlying bug. The first is to add a cond_resched to the end of the loop to ensure that the caching thread always has an opportunity to run. This will fix the soft lockup issue, but find_free_extent will still loop doing nothing until the thread has completed. The second is to move the read-only check to the top of the loop. We're never going to return an allocation within a read-only block group so we may as well skip it early. The check for ->cached == BTRFS_CACHE_ERROR would cause the same problem except that BTRFS_CACHE_ERROR is considered a "done" state and we won't re-set have_caching_bg again. Many thanks to Stephan Kulow for his excellent help in the testing process. Signed-off-by: Jeff Mahoney Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/extent-tree.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 375f8c728d91..a6635f07b8f1 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7589,6 +7589,10 @@ search: u64 offset; int cached; + /* If the block group is read-only, we can skip it entirely. */ + if (unlikely(block_group->ro)) + continue; + btrfs_grab_block_group(block_group, delalloc); search_start = block_group->key.objectid; @@ -7624,8 +7628,6 @@ have_block_group: if (unlikely(block_group->cached == BTRFS_CACHE_ERROR)) goto loop; - if (unlikely(block_group->ro)) - goto loop; /* * Ok we want to try and use the cluster allocator, so @@ -7839,6 +7841,7 @@ loop: failed_alloc = false; BUG_ON(index != get_block_group_index(block_group)); btrfs_release_block_group(block_group, delalloc); + cond_resched(); } up_read(&space_info->groups_sem); -- cgit v1.2.3 From 17024ad0a0fdfcfe53043afb969b813d3e020c21 Mon Sep 17 00:00:00 2001 From: Omar Sandoval Date: Thu, 20 Jul 2017 15:10:35 -0700 Subject: Btrfs: fix early ENOSPC due to delalloc If a lot of metadata is reserved for outstanding delayed allocations, we rely on shrink_delalloc() to reclaim metadata space in order to fulfill reservation tickets. However, shrink_delalloc() has a shortcut where if it determines that space can be overcommitted, it will stop early. This made sense before the ticketed enospc system, but now it means that shrink_delalloc() will often not reclaim enough space to fulfill any tickets, leading to an early ENOSPC. (Reservation tickets don't care about being able to overcommit, they need every byte accounted for.) Fix it by getting rid of the shortcut so that shrink_delalloc() reclaims all of the metadata it is supposed to. This fixes early ENOSPCs we were seeing when doing a btrfs receive to populate a new filesystem, as well as early ENOSPCs Christoph saw when doing a big cp -r onto Btrfs. Fixes: 957780eb2788 ("Btrfs: introduce ticketed enospc infrastructure") Tested-by: Christoph Anton Mitterer Cc: stable@vger.kernel.org Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval Signed-off-by: David Sterba --- fs/btrfs/extent-tree.c | 4 ---- 1 file changed, 4 deletions(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a6635f07b8f1..e3b0b4196d3d 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4825,10 +4825,6 @@ skip_async: else flush = BTRFS_RESERVE_NO_FLUSH; spin_lock(&space_info->lock); - if (can_overcommit(fs_info, space_info, orig, flush, false)) { - spin_unlock(&space_info->lock); - break; - } if (list_empty(&space_info->tickets) && list_empty(&space_info->priority_tickets)) { spin_unlock(&space_info->lock); -- cgit v1.2.3 From 23d1f73788785a770fe6eb348fee4b26281d2064 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 28 Jun 2017 11:05:22 +0300 Subject: btrfs: remove unused sectorsize member The sectorsize member of btrfs_block_group_cache is unused. So remove it, this reduces the number of holes in the struct. With patch: /* size: 856, cachelines: 14, members: 40 */ /* sum members: 837, holes: 4, sum holes: 19 */ /* bit holes: 1, sum bit holes: 29 bits */ /* last cacheline: 24 bytes */ Without patch: /* size: 864, cachelines: 14, members: 41 */ /* sum members: 841, holes: 5, sum holes: 23 */ /* bit holes: 1, sum bit holes: 29 bits */ /* last cacheline: 32 bytes */ Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/ctree.h | 1 - fs/btrfs/extent-tree.c | 1 - fs/btrfs/tests/btrfs-tests.c | 1 - fs/btrfs/tests/free-space-tree-tests.c | 2 +- 4 files changed, 1 insertion(+), 4 deletions(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 3f3eb7b17cac..589491040950 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -558,7 +558,6 @@ struct btrfs_block_group_cache { u64 bytes_super; u64 flags; u64 cache_generation; - u32 sectorsize; /* * If the free space extent count exceeds this number, convert the block diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index e3b0b4196d3d..6d04563585e6 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9952,7 +9952,6 @@ btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info, cache->key.offset = size; cache->key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; - cache->sectorsize = fs_info->sectorsize; cache->fs_info = fs_info; cache->full_stripe_len = btrfs_full_stripe_len(fs_info, &fs_info->mapping_tree, diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c index b18ab8f327a5..d3f25376a0f8 100644 --- a/fs/btrfs/tests/btrfs-tests.c +++ b/fs/btrfs/tests/btrfs-tests.c @@ -211,7 +211,6 @@ btrfs_alloc_dummy_block_group(struct btrfs_fs_info *fs_info, cache->key.objectid = 0; cache->key.offset = length; cache->key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; - cache->sectorsize = fs_info->sectorsize; cache->full_stripe_len = fs_info->sectorsize; cache->fs_info = fs_info; diff --git a/fs/btrfs/tests/free-space-tree-tests.c b/fs/btrfs/tests/free-space-tree-tests.c index b29954c01673..1458bb0ea124 100644 --- a/fs/btrfs/tests/free-space-tree-tests.c +++ b/fs/btrfs/tests/free-space-tree-tests.c @@ -81,7 +81,7 @@ static int __check_free_space_extents(struct btrfs_trans_handle *trans, i++; } prev_bit = bit; - offset += cache->sectorsize; + offset += fs_info->sectorsize; } } if (prev_bit == 1) { -- cgit v1.2.3 From 7bdd6277e0dc2beb4f5db5ea4ff7670ecf0b5879 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Tue, 11 Jul 2017 13:25:13 +0300 Subject: btrfs: Remove redundant argument of flush_space All callers of flush_space pass the same number for orig/num_bytes arguments. Let's remove one of the numbers and also modify the trace point to show only a single number - bytes requested. Seems that last point where the two parameters were treated differently is before the ticketed enospc rework. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/extent-tree.c | 16 +++++++--------- include/trace/events/btrfs.h | 13 +++++-------- 2 files changed, 12 insertions(+), 17 deletions(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 6d04563585e6..288b38ae8791 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4906,7 +4906,7 @@ struct reserve_ticket { static int flush_space(struct btrfs_fs_info *fs_info, struct btrfs_space_info *space_info, u64 num_bytes, - u64 orig_bytes, int state) + int state) { struct btrfs_root *root = fs_info->extent_root; struct btrfs_trans_handle *trans; @@ -4931,7 +4931,7 @@ static int flush_space(struct btrfs_fs_info *fs_info, break; case FLUSH_DELALLOC: case FLUSH_DELALLOC_WAIT: - shrink_delalloc(fs_info, num_bytes * 2, orig_bytes, + shrink_delalloc(fs_info, num_bytes * 2, num_bytes, state == FLUSH_DELALLOC_WAIT); break; case ALLOC_CHUNK: @@ -4949,15 +4949,15 @@ static int flush_space(struct btrfs_fs_info *fs_info, break; case COMMIT_TRANS: ret = may_commit_transaction(fs_info, space_info, - orig_bytes, 0); + num_bytes, 0); break; default: ret = -ENOSPC; break; } - trace_btrfs_flush_space(fs_info, space_info->flags, num_bytes, - orig_bytes, state, ret); + trace_btrfs_flush_space(fs_info, space_info->flags, num_bytes, state, + ret); return ret; } @@ -5063,8 +5063,7 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work) struct reserve_ticket *ticket; int ret; - ret = flush_space(fs_info, space_info, to_reclaim, to_reclaim, - flush_state); + ret = flush_space(fs_info, space_info, to_reclaim, flush_state); spin_lock(&space_info->lock); if (list_empty(&space_info->tickets)) { space_info->flush = 0; @@ -5120,8 +5119,7 @@ static void priority_reclaim_metadata_space(struct btrfs_fs_info *fs_info, spin_unlock(&space_info->lock); do { - flush_space(fs_info, space_info, to_reclaim, to_reclaim, - flush_state); + flush_space(fs_info, space_info, to_reclaim, flush_state); flush_state++; spin_lock(&space_info->lock); if (ticket->bytes == 0) { diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index 90d25085762f..1e4908dcd065 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -1011,15 +1011,14 @@ TRACE_EVENT(btrfs_trigger_flush, TRACE_EVENT(btrfs_flush_space, TP_PROTO(const struct btrfs_fs_info *fs_info, u64 flags, u64 num_bytes, - u64 orig_bytes, int state, int ret), + int state, int ret), - TP_ARGS(fs_info, flags, num_bytes, orig_bytes, state, ret), + TP_ARGS(fs_info, flags, num_bytes, state, ret), TP_STRUCT__entry( __array( u8, fsid, BTRFS_UUID_SIZE ) __field( u64, flags ) __field( u64, num_bytes ) - __field( u64, orig_bytes ) __field( int, state ) __field( int, ret ) ), @@ -1028,19 +1027,17 @@ TRACE_EVENT(btrfs_flush_space, memcpy(__entry->fsid, fs_info->fsid, BTRFS_UUID_SIZE); __entry->flags = flags; __entry->num_bytes = num_bytes; - __entry->orig_bytes = orig_bytes; __entry->state = state; __entry->ret = ret; ), - TP_printk("%pU: state=%d(%s) flags=%llu(%s) num_bytes=%llu " - "orig_bytes=%llu ret=%d", __entry->fsid, __entry->state, + TP_printk("%pU: state=%d(%s) flags=%llu(%s) num_bytes=%llu ret=%d", + __entry->fsid, __entry->state, show_flush_state(__entry->state), (unsigned long long)__entry->flags, __print_flags((unsigned long)__entry->flags, "|", BTRFS_GROUP_FLAGS), - (unsigned long long)__entry->num_bytes, - (unsigned long long)__entry->orig_bytes, __entry->ret) + (unsigned long long)__entry->num_bytes, __entry->ret) ); DECLARE_EVENT_CLASS(btrfs__reserved_extent, -- cgit v1.2.3 From 1174cade8182b4136c8a162342bf7e8eba7200de Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Tue, 11 Jul 2017 13:47:50 +0300 Subject: btrfs: Remove redundant checks from btrfs_alloc_data_chunk_ondemand Many commits ago the data space_info in alloc_data_chunk_ondemand used to be acquired from the inode. At that point commit 33b4d47f5e24 ("Btrfs: deal with NULL space info") got introduced to deal with spurios cases where the space info could be null, following a rebalance. Nowadays, however, the space info is referenced directly from the btrfs_fs_info struct which is initialised at filesystem mount time. This makes the null checks redundant, so remove them. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/extent-tree.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 288b38ae8791..4fe93c436302 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4199,9 +4199,9 @@ static u64 btrfs_space_info_used(struct btrfs_space_info *s_info, int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes) { - struct btrfs_space_info *data_sinfo; struct btrfs_root *root = inode->root; struct btrfs_fs_info *fs_info = root->fs_info; + struct btrfs_space_info *data_sinfo = fs_info->data_sinfo; u64 used; int ret = 0; int need_commit = 2; @@ -4215,10 +4215,6 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes) ASSERT(current->journal_info); } - data_sinfo = fs_info->data_sinfo; - if (!data_sinfo) - goto alloc; - again: /* make sure we have enough space to handle the data first */ spin_lock(&data_sinfo->lock); @@ -4236,7 +4232,7 @@ again: data_sinfo->force_alloc = CHUNK_ALLOC_FORCE; spin_unlock(&data_sinfo->lock); -alloc: + alloc_target = btrfs_data_alloc_profile(fs_info); /* * It is ugly that we don't call nolock join @@ -4264,9 +4260,6 @@ alloc: } } - if (!data_sinfo) - data_sinfo = fs_info->data_sinfo; - goto again; } -- cgit v1.2.3 From 913e153572218c911125414d4ca1f8531f20c120 Mon Sep 17 00:00:00 2001 From: David Sterba Date: Thu, 13 Jul 2017 15:32:18 +0200 Subject: btrfs: drop newlines from strings when using btrfs_* helpers The helpers append "\n" so we can keep the actual strings shorter. The extra newline will print an empty line. Some messages have been slightly modified to be more consistent with the rest (lowercase first letter). Reviewed-by: Nikolay Borisov Signed-off-by: David Sterba --- fs/btrfs/extent-tree.c | 2 +- fs/btrfs/free-space-cache.c | 2 +- fs/btrfs/inode.c | 2 +- fs/btrfs/qgroup.c | 2 +- fs/btrfs/scrub.c | 3 +-- 5 files changed, 5 insertions(+), 6 deletions(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 4fe93c436302..fd2fee398c83 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6832,7 +6832,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans, if (ret) { const char *errstr = btrfs_decode_error(ret); btrfs_warn(fs_info, - "Discard failed while removing blockgroup: errno=%d %s\n", + "discard failed while removing blockgroup: errno=%d %s", ret, errstr); } } diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index c5e6180cdb8c..cdc9f4015ec3 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -709,7 +709,7 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode, if (!BTRFS_I(inode)->generation) { btrfs_info(fs_info, - "The free space cache file (%llu) is invalid. skip it\n", + "the free space cache file (%llu) is invalid, skip it", offset); return 0; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 97970602c3d5..3bf7bae36e56 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8017,7 +8017,7 @@ static int dio_read_error(struct inode *inode, struct bio *failed_bio, bio_set_op_attrs(bio, REQ_OP_READ, read_mode); btrfs_debug(BTRFS_I(inode)->root->fs_info, - "Repair DIO Read Error: submitting new dio read[%#x] to this_mirror=%d, in_validation=%d\n", + "repair DIO read error: submitting new dio read[%#x] to this_mirror=%d, in_validation=%d", read_mode, failrec->this_mirror, failrec->in_validation); ret = submit_dio_repair_bio(inode, bio, failrec->this_mirror); diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index acb48983be26..ddc37c537058 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2646,7 +2646,7 @@ out: if (IS_ERR(trans)) { err = PTR_ERR(trans); btrfs_err(fs_info, - "fail to start transaction for status update: %d\n", + "fail to start transaction for status update: %d", err); goto done; } diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 6f1e4c984b94..de53c521a50f 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3869,8 +3869,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, ro_set = 0; } else { btrfs_warn(fs_info, - "failed setting block group ro, ret=%d\n", - ret); + "failed setting block group ro: %d", ret); btrfs_put_block_group(cache); break; } -- cgit v1.2.3 From e4ff5fb5dc3742b126a2f4d1f18706509812d084 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 19 Jul 2017 10:48:42 +0300 Subject: btrfs: Remove unused parameters from volume.c functions This also adjusts the respective callers in other files. Those were found with -Wunused-parameter. btrfs_full_stripe_len's mapping_tree - introduced by 53b381b3abeb ("Btrfs: RAID5 and RAID6") but it was never really used even in that commit btrfs_is_parity_mirror's mirror_num - same as above chunk_drange_filter's chunk_offset - introduced by 94e60d5a5c4b ("Btrfs: devid subset filter") and never used. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/extent-tree.c | 4 +--- fs/btrfs/extent_io.c | 2 +- fs/btrfs/volumes.c | 7 ++----- fs/btrfs/volumes.h | 3 +-- 4 files changed, 5 insertions(+), 11 deletions(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index fd2fee398c83..08e620c43842 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9944,9 +9944,7 @@ btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info, cache->key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; cache->fs_info = fs_info; - cache->full_stripe_len = btrfs_full_stripe_len(fs_info, - &fs_info->mapping_tree, - start); + cache->full_stripe_len = btrfs_full_stripe_len(fs_info, start); set_free_space_tree_thresholds(cache); atomic_set(&cache->count, 1); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 7dd1b2dc7c68..a7bebba4f9fc 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1997,7 +1997,7 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start, * read repair operation. */ btrfs_bio_counter_inc_blocked(fs_info); - if (btrfs_is_parity_mirror(fs_info, logical, length, mirror_num)) { + if (btrfs_is_parity_mirror(fs_info, logical, length)) { /* * Note that we don't use BTRFS_MAP_WRITE because it's supposed * to update all raid stripes, but here we just want to correct diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 325ea062dc6b..877224b66a12 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3305,7 +3305,6 @@ static int chunk_devid_filter(struct extent_buffer *leaf, /* [pstart, pend) */ static int chunk_drange_filter(struct extent_buffer *leaf, struct btrfs_chunk *chunk, - u64 chunk_offset, struct btrfs_balance_args *bargs) { struct btrfs_stripe *stripe; @@ -3432,7 +3431,7 @@ static int should_balance_chunk(struct btrfs_fs_info *fs_info, /* drange filter, makes sense only with devid filter */ if ((bargs->flags & BTRFS_BALANCE_ARGS_DRANGE) && - chunk_drange_filter(leaf, chunk, chunk_offset, bargs)) { + chunk_drange_filter(leaf, chunk, bargs)) { return 0; } @@ -5128,7 +5127,6 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len) } unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info, - struct btrfs_mapping_tree *map_tree, u64 logical) { struct extent_map *em; @@ -5146,8 +5144,7 @@ unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info, return len; } -int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, - u64 logical, u64 len, int mirror_num) +int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len) { struct extent_map *em; struct map_lookup *map; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index e906377ed329..181b365cab0c 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -481,9 +481,8 @@ void btrfs_init_dev_replace_tgtdev_for_resume(struct btrfs_fs_info *fs_info, struct btrfs_device *tgtdev); void btrfs_scratch_superblocks(struct block_device *bdev, const char *device_path); int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, - u64 logical, u64 len, int mirror_num); + u64 logical, u64 len); unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info, - struct btrfs_mapping_tree *map_tree, u64 logical); int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, -- cgit v1.2.3 From a4f78750ef1882e59bb4f947e216cf61ef2d67d2 Mon Sep 17 00:00:00 2001 From: David Sterba Date: Thu, 29 Jun 2017 18:37:49 +0200 Subject: btrfs: get fs_info from eb in btrfs_print_leaf, remove argument Signed-off-by: David Sterba --- fs/btrfs/ctree.c | 14 +++++++------- fs/btrfs/disk-io.c | 2 +- fs/btrfs/extent-tree.c | 6 +++--- fs/btrfs/print-tree.c | 6 ++++-- fs/btrfs/print-tree.h | 2 +- fs/btrfs/root-tree.c | 2 +- 6 files changed, 17 insertions(+), 15 deletions(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 3f4daa9d6e2c..6d49db7d86be 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -4650,7 +4650,7 @@ void btrfs_truncate_item(struct btrfs_fs_info *fs_info, btrfs_mark_buffer_dirty(leaf); if (btrfs_leaf_free_space(fs_info, leaf) < 0) { - btrfs_print_leaf(fs_info, leaf); + btrfs_print_leaf(leaf); BUG(); } } @@ -4679,7 +4679,7 @@ void btrfs_extend_item(struct btrfs_fs_info *fs_info, struct btrfs_path *path, data_end = leaf_data_end(fs_info, leaf); if (btrfs_leaf_free_space(fs_info, leaf) < data_size) { - btrfs_print_leaf(fs_info, leaf); + btrfs_print_leaf(leaf); BUG(); } slot = path->slots[0]; @@ -4687,7 +4687,7 @@ void btrfs_extend_item(struct btrfs_fs_info *fs_info, struct btrfs_path *path, BUG_ON(slot < 0); if (slot >= nritems) { - btrfs_print_leaf(fs_info, leaf); + btrfs_print_leaf(leaf); btrfs_crit(fs_info, "slot %d too large, nritems %d", slot, nritems); BUG_ON(1); @@ -4718,7 +4718,7 @@ void btrfs_extend_item(struct btrfs_fs_info *fs_info, struct btrfs_path *path, btrfs_mark_buffer_dirty(leaf); if (btrfs_leaf_free_space(fs_info, leaf) < 0) { - btrfs_print_leaf(fs_info, leaf); + btrfs_print_leaf(leaf); BUG(); } } @@ -4757,7 +4757,7 @@ void setup_items_for_insert(struct btrfs_root *root, struct btrfs_path *path, data_end = leaf_data_end(fs_info, leaf); if (btrfs_leaf_free_space(fs_info, leaf) < total_size) { - btrfs_print_leaf(fs_info, leaf); + btrfs_print_leaf(leaf); btrfs_crit(fs_info, "not enough freespace need %u have %d", total_size, btrfs_leaf_free_space(fs_info, leaf)); BUG(); @@ -4767,7 +4767,7 @@ void setup_items_for_insert(struct btrfs_root *root, struct btrfs_path *path, unsigned int old_data = btrfs_item_end_nr(leaf, slot); if (old_data < data_end) { - btrfs_print_leaf(fs_info, leaf); + btrfs_print_leaf(leaf); btrfs_crit(fs_info, "slot %d old_data %d data_end %d", slot, old_data, data_end); BUG_ON(1); @@ -4811,7 +4811,7 @@ void setup_items_for_insert(struct btrfs_root *root, struct btrfs_path *path, btrfs_mark_buffer_dirty(leaf); if (btrfs_leaf_free_space(fs_info, leaf) < 0) { - btrfs_print_leaf(fs_info, leaf); + btrfs_print_leaf(leaf); BUG(); } } diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 2f366044d891..a42ae7676759 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3984,7 +3984,7 @@ void btrfs_mark_buffer_dirty(struct extent_buffer *buf) fs_info->dirty_metadata_batch); #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY if (btrfs_header_level(buf) == 0 && check_leaf(root, buf)) { - btrfs_print_leaf(fs_info, buf); + btrfs_print_leaf(buf); ASSERT(0); } #endif diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 08e620c43842..9f0563dbbd5f 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6960,7 +6960,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, "umm, got %d back from search, was looking for %llu", ret, bytenr); if (ret > 0) - btrfs_print_leaf(info, path->nodes[0]); + btrfs_print_leaf(path->nodes[0]); } if (ret < 0) { btrfs_abort_transaction(trans, ret); @@ -6969,7 +6969,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, extent_slot = path->slots[0]; } } else if (WARN_ON(ret == -ENOENT)) { - btrfs_print_leaf(info, path->nodes[0]); + btrfs_print_leaf(path->nodes[0]); btrfs_err(info, "unable to find ref byte nr %llu parent %llu root %llu owner %llu offset %llu", bytenr, parent, root_objectid, owner_objectid, @@ -7006,7 +7006,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, btrfs_err(info, "umm, got %d back from search, was looking for %llu", ret, bytenr); - btrfs_print_leaf(info, path->nodes[0]); + btrfs_print_leaf(path->nodes[0]); } if (ret < 0) { btrfs_abort_transaction(trans, ret); diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c index fcae61e175f3..9c3911f4a100 100644 --- a/fs/btrfs/print-tree.c +++ b/fs/btrfs/print-tree.c @@ -161,8 +161,9 @@ static void print_uuid_item(struct extent_buffer *l, unsigned long offset, } } -void btrfs_print_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *l) +void btrfs_print_leaf(struct extent_buffer *l) { + struct btrfs_fs_info *fs_info; int i; u32 type, nr; struct btrfs_item *item; @@ -180,6 +181,7 @@ void btrfs_print_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *l) if (!l) return; + fs_info = l->fs_info; nr = btrfs_header_nritems(l); btrfs_info(fs_info, "leaf %llu total ptrs %d free space %d", @@ -329,7 +331,7 @@ void btrfs_print_tree(struct btrfs_fs_info *fs_info, struct extent_buffer *c) nr = btrfs_header_nritems(c); level = btrfs_header_level(c); if (level == 0) { - btrfs_print_leaf(fs_info, c); + btrfs_print_leaf(c); return; } btrfs_info(fs_info, diff --git a/fs/btrfs/print-tree.h b/fs/btrfs/print-tree.h index 4f2e0ea0e95a..689a52ee0cd1 100644 --- a/fs/btrfs/print-tree.h +++ b/fs/btrfs/print-tree.h @@ -18,6 +18,6 @@ #ifndef __PRINT_TREE_ #define __PRINT_TREE_ -void btrfs_print_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *l); +void btrfs_print_leaf(struct extent_buffer *l); void btrfs_print_tree(struct btrfs_fs_info *fs_info, struct extent_buffer *c); #endif diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index 460db0cb2d07..5b488af6f25e 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -151,7 +151,7 @@ int btrfs_update_root(struct btrfs_trans_handle *trans, struct btrfs_root } if (ret != 0) { - btrfs_print_leaf(fs_info, path->nodes[0]); + btrfs_print_leaf(path->nodes[0]); btrfs_crit(fs_info, "unable to update root key %llu %u %llu", key->objectid, key->type, key->offset); BUG_ON(1); -- cgit v1.2.3 From e38ae7a0868b02e7b183f18a4b75c8b2b68ce258 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Tue, 25 Jul 2017 17:48:28 +0300 Subject: btrfs: Make flush_space return void The return value of flush_space was used to have significance in the early days when the code was first introduced and before the ticketed enospc rework. Since the latter got introduced the return value lost any significance whatsoever to its callers. So let's remove it. While at it also remove the unused ticket variable in btrfs_async_reclaim_metadata_space. It was used in the initial version of the ticketed ENOSPC work, however Wang Xiaoguang detected a problem with this and fixed it in ce129655c9d9 ("btrfs: introduce tickets_id to determine whether asynchronous metadata reclaim work makes progress"). Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba [ add comment ] Signed-off-by: David Sterba --- fs/btrfs/extent-tree.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 9f0563dbbd5f..42251c2eb7d2 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4897,7 +4897,12 @@ struct reserve_ticket { wait_queue_head_t wait; }; -static int flush_space(struct btrfs_fs_info *fs_info, +/* + * Try to flush some data based on policy set by @state. This is only advisory + * and may fail for various reasons. The caller is supposed to examine the + * state of @space_info to detect the outcome. + */ +static void flush_space(struct btrfs_fs_info *fs_info, struct btrfs_space_info *space_info, u64 num_bytes, int state) { @@ -4951,7 +4956,7 @@ static int flush_space(struct btrfs_fs_info *fs_info, trace_btrfs_flush_space(fs_info, space_info->flags, num_bytes, state, ret); - return ret; + return; } static inline u64 @@ -5053,10 +5058,7 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work) flush_state = FLUSH_DELAYED_ITEMS_NR; do { - struct reserve_ticket *ticket; - int ret; - - ret = flush_space(fs_info, space_info, to_reclaim, flush_state); + flush_space(fs_info, space_info, to_reclaim, flush_state); spin_lock(&space_info->lock); if (list_empty(&space_info->tickets)) { space_info->flush = 0; @@ -5066,8 +5068,6 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work) to_reclaim = btrfs_calc_reclaim_metadata_size(fs_info, space_info, false); - ticket = list_first_entry(&space_info->tickets, - struct reserve_ticket, list); if (last_tickets_id == space_info->tickets_id) { flush_state++; } else { -- cgit v1.2.3 From ea14b57fd1954fa3193e025224bbbeab7415c490 Mon Sep 17 00:00:00 2001 From: David Sterba Date: Thu, 22 Jun 2017 02:19:11 +0200 Subject: btrfs: fix spelling of snapshotting Signed-off-by: David Sterba --- fs/btrfs/ctree.h | 6 +++--- fs/btrfs/disk-io.c | 2 +- fs/btrfs/extent-tree.c | 22 +++++++++++----------- fs/btrfs/file.c | 10 +++++----- fs/btrfs/inode.c | 22 +++++++++++----------- fs/btrfs/ioctl.c | 10 +++++----- 6 files changed, 36 insertions(+), 36 deletions(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 6510f246f71e..874d814da371 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1258,7 +1258,7 @@ struct btrfs_root { */ int send_in_progress; struct btrfs_subvolume_writers *subv_writers; - atomic_t will_be_snapshoted; + atomic_t will_be_snapshotted; /* For qgroup metadata space reserve */ atomic64_t qgroup_meta_rsv; @@ -2773,8 +2773,8 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info); int btrfs_delayed_refs_qgroup_accounting(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int __get_raid_index(u64 flags); -int btrfs_start_write_no_snapshoting(struct btrfs_root *root); -void btrfs_end_write_no_snapshoting(struct btrfs_root *root); +int btrfs_start_write_no_snapshotting(struct btrfs_root *root); +void btrfs_end_write_no_snapshotting(struct btrfs_root *root); void btrfs_wait_for_snapshot_creation(struct btrfs_root *root); void check_system_chunk(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, const u64 type); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9b1f4ef54438..57a857142cda 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1343,7 +1343,7 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info, atomic_set(&root->log_batch, 0); atomic_set(&root->orphan_inodes, 0); refcount_set(&root->refs, 1); - atomic_set(&root->will_be_snapshoted, 0); + atomic_set(&root->will_be_snapshotted, 0); atomic64_set(&root->qgroup_meta_rsv, 0); root->log_transid = 0; root->log_transid_committed = -1; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 42251c2eb7d2..7d7abc0d47b9 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -10989,14 +10989,14 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range) } /* - * btrfs_{start,end}_write_no_snapshoting() are similar to + * btrfs_{start,end}_write_no_snapshotting() are similar to * mnt_{want,drop}_write(), they are used to prevent some tasks from writing * data into the page cache through nocow before the subvolume is snapshoted, * but flush the data into disk after the snapshot creation, or to prevent - * operations while snapshoting is ongoing and that cause the snapshot to be + * operations while snapshotting is ongoing and that cause the snapshot to be * inconsistent (writes followed by expanding truncates for example). */ -void btrfs_end_write_no_snapshoting(struct btrfs_root *root) +void btrfs_end_write_no_snapshotting(struct btrfs_root *root) { percpu_counter_dec(&root->subv_writers->counter); /* @@ -11007,9 +11007,9 @@ void btrfs_end_write_no_snapshoting(struct btrfs_root *root) wake_up(&root->subv_writers->wait); } -int btrfs_start_write_no_snapshoting(struct btrfs_root *root) +int btrfs_start_write_no_snapshotting(struct btrfs_root *root) { - if (atomic_read(&root->will_be_snapshoted)) + if (atomic_read(&root->will_be_snapshotted)) return 0; percpu_counter_inc(&root->subv_writers->counter); @@ -11017,14 +11017,14 @@ int btrfs_start_write_no_snapshoting(struct btrfs_root *root) * Make sure counter is updated before we check for snapshot creation. */ smp_mb(); - if (atomic_read(&root->will_be_snapshoted)) { - btrfs_end_write_no_snapshoting(root); + if (atomic_read(&root->will_be_snapshotted)) { + btrfs_end_write_no_snapshotting(root); return 0; } return 1; } -static int wait_snapshoting_atomic_t(atomic_t *a) +static int wait_snapshotting_atomic_t(atomic_t *a) { schedule(); return 0; @@ -11035,11 +11035,11 @@ void btrfs_wait_for_snapshot_creation(struct btrfs_root *root) while (true) { int ret; - ret = btrfs_start_write_no_snapshoting(root); + ret = btrfs_start_write_no_snapshotting(root); if (ret) break; - wait_on_atomic_t(&root->will_be_snapshoted, - wait_snapshoting_atomic_t, + wait_on_atomic_t(&root->will_be_snapshotted, + wait_snapshotting_atomic_t, TASK_UNINTERRUPTIBLE); } } diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 9e75d8a39aac..58818cf7f82d 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1536,7 +1536,7 @@ static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos, u64 num_bytes; int ret; - ret = btrfs_start_write_no_snapshoting(root); + ret = btrfs_start_write_no_snapshotting(root); if (!ret) return -ENOSPC; @@ -1561,7 +1561,7 @@ static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos, NULL, NULL, NULL); if (ret <= 0) { ret = 0; - btrfs_end_write_no_snapshoting(root); + btrfs_end_write_no_snapshotting(root); } else { *write_bytes = min_t(size_t, *write_bytes , num_bytes - pos + lockstart); @@ -1664,7 +1664,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, data_reserved, pos, write_bytes); else - btrfs_end_write_no_snapshoting(root); + btrfs_end_write_no_snapshotting(root); break; } @@ -1767,7 +1767,7 @@ again: release_bytes = 0; if (only_release_metadata) - btrfs_end_write_no_snapshoting(root); + btrfs_end_write_no_snapshotting(root); if (only_release_metadata && copied > 0) { lockstart = round_down(pos, @@ -1797,7 +1797,7 @@ again: if (release_bytes) { if (only_release_metadata) { - btrfs_end_write_no_snapshoting(root); + btrfs_end_write_no_snapshotting(root); btrfs_delalloc_release_metadata(BTRFS_I(inode), release_bytes); } else { diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 9349a13b3d72..9ad9dda871ca 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1381,7 +1381,7 @@ next_slot: * we fall into common COW way. */ if (!nolock) { - err = btrfs_start_write_no_snapshoting(root); + err = btrfs_start_write_no_snapshotting(root); if (!err) goto out_check; } @@ -1393,12 +1393,12 @@ next_slot: if (csum_exist_in_range(fs_info, disk_bytenr, num_bytes)) { if (!nolock) - btrfs_end_write_no_snapshoting(root); + btrfs_end_write_no_snapshotting(root); goto out_check; } if (!btrfs_inc_nocow_writers(fs_info, disk_bytenr)) { if (!nolock) - btrfs_end_write_no_snapshoting(root); + btrfs_end_write_no_snapshotting(root); goto out_check; } nocow = 1; @@ -1415,7 +1415,7 @@ out_check: if (extent_end <= start) { path->slots[0]++; if (!nolock && nocow) - btrfs_end_write_no_snapshoting(root); + btrfs_end_write_no_snapshotting(root); if (nocow) btrfs_dec_nocow_writers(fs_info, disk_bytenr); goto next_slot; @@ -1438,7 +1438,7 @@ out_check: NULL); if (ret) { if (!nolock && nocow) - btrfs_end_write_no_snapshoting(root); + btrfs_end_write_no_snapshotting(root); if (nocow) btrfs_dec_nocow_writers(fs_info, disk_bytenr); @@ -1459,7 +1459,7 @@ out_check: BTRFS_ORDERED_PREALLOC); if (IS_ERR(em)) { if (!nolock && nocow) - btrfs_end_write_no_snapshoting(root); + btrfs_end_write_no_snapshotting(root); if (nocow) btrfs_dec_nocow_writers(fs_info, disk_bytenr); @@ -1499,7 +1499,7 @@ out_check: PAGE_UNLOCK | PAGE_SET_PRIVATE2); if (!nolock && nocow) - btrfs_end_write_no_snapshoting(root); + btrfs_end_write_no_snapshotting(root); cur_offset = extent_end; /* @@ -5053,7 +5053,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) if (newsize > oldsize) { /* - * Don't do an expanding truncate while snapshoting is ongoing. + * Don't do an expanding truncate while snapshotting is ongoing. * This is to ensure the snapshot captures a fully consistent * state of this file - if the snapshot captures this expanding * truncation, it must capture all writes that happened before @@ -5062,13 +5062,13 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) btrfs_wait_for_snapshot_creation(root); ret = btrfs_cont_expand(inode, oldsize, newsize); if (ret) { - btrfs_end_write_no_snapshoting(root); + btrfs_end_write_no_snapshotting(root); return ret; } trans = btrfs_start_transaction(root, 1); if (IS_ERR(trans)) { - btrfs_end_write_no_snapshoting(root); + btrfs_end_write_no_snapshotting(root); return PTR_ERR(trans); } @@ -5076,7 +5076,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) btrfs_ordered_update_i_size(inode, i_size_read(inode), NULL); pagecache_isize_extended(inode, oldsize, newsize); ret = btrfs_update_inode(trans, root, inode); - btrfs_end_write_no_snapshoting(root); + btrfs_end_write_no_snapshotting(root); btrfs_end_transaction(trans); } else { diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 4cfc3d4c0a37..7d144a676d95 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -607,7 +607,7 @@ fail_free: return ret; } -static void btrfs_wait_for_no_snapshoting_writes(struct btrfs_root *root) +static void btrfs_wait_for_no_snapshotting_writes(struct btrfs_root *root) { s64 writers; DEFINE_WAIT(wait); @@ -650,9 +650,9 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir, goto free_pending; } - atomic_inc(&root->will_be_snapshoted); + atomic_inc(&root->will_be_snapshotted); smp_mb__after_atomic(); - btrfs_wait_for_no_snapshoting_writes(root); + btrfs_wait_for_no_snapshotting_writes(root); ret = btrfs_start_delalloc_inodes(root, 0); if (ret) @@ -723,8 +723,8 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir, fail: btrfs_subvolume_release_metadata(fs_info, &pending_snapshot->block_rsv); dec_and_free: - if (atomic_dec_and_test(&root->will_be_snapshoted)) - wake_up_atomic_t(&root->will_be_snapshoted); + if (atomic_dec_and_test(&root->will_be_snapshotted)) + wake_up_atomic_t(&root->will_be_snapshotted); free_pending: kfree(pending_snapshot->root_item); btrfs_free_path(pending_snapshot->path); -- cgit v1.2.3 From f44d2287d2879f50b921e909b2377d6dcba3e251 Mon Sep 17 00:00:00 2001 From: Jeff Mahoney Date: Thu, 22 Jun 2017 09:51:47 -0400 Subject: btrfs: account for pinned bytes in should_alloc_chunk In a heavy write scenario, we can end up with a large number of pinned bytes. This can translate into (very) premature ENOSPC because pinned bytes must be accounted for when allowing a reservation but aren't accounted for when deciding whether to create a new chunk. This patch adds the accounting to should_alloc_chunk so that we can create the chunk. Signed-off-by: Jeff Mahoney Signed-off-by: David Sterba --- fs/btrfs/extent-tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 7d7abc0d47b9..b929fe201981 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4419,7 +4419,7 @@ static int should_alloc_chunk(struct btrfs_fs_info *fs_info, { struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv; u64 num_bytes = sinfo->total_bytes - sinfo->bytes_readonly; - u64 num_allocated = sinfo->bytes_used + sinfo->bytes_reserved; + u64 num_allocated = sinfo->bytes_used + sinfo->bytes_reserved + sinfo->bytes_pinned; u64 thresh; if (force == CHUNK_ALLOC_FORCE) -- cgit v1.2.3 From 8d8aafeea23e2d641460d7e6231361f0322ac058 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Thu, 22 Jun 2017 09:51:48 -0400 Subject: btrfs: Simplify math in should_alloc chunk Currently should_alloc_chunk uses ->total_bytes - ->bytes_readonly to signify the total amount of bytes in this space info. However, given Jeff's patch which adds bytes_pinned and bytes_may_use to the calculation of num_allocated it becomes a lot more clear to just eliminate num_bytes altogether and add the bytes_readonly to the amount of used space. That way we don't change the results of the following statements. In the process also start using btrfs_space_info_used. Signed-off-by: Nikolay Borisov Signed-off-by: Jeff Mahoney Signed-off-by: David Sterba --- fs/btrfs/extent-tree.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index b929fe201981..c74d24c1bbc9 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4418,8 +4418,7 @@ static int should_alloc_chunk(struct btrfs_fs_info *fs_info, struct btrfs_space_info *sinfo, int force) { struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv; - u64 num_bytes = sinfo->total_bytes - sinfo->bytes_readonly; - u64 num_allocated = sinfo->bytes_used + sinfo->bytes_reserved + sinfo->bytes_pinned; + u64 bytes_used = btrfs_space_info_used(sinfo, false); u64 thresh; if (force == CHUNK_ALLOC_FORCE) @@ -4431,7 +4430,7 @@ static int should_alloc_chunk(struct btrfs_fs_info *fs_info, * global_rsv, it doesn't change except when the transaction commits. */ if (sinfo->flags & BTRFS_BLOCK_GROUP_METADATA) - num_allocated += calc_global_rsv_need_space(global_rsv); + bytes_used += calc_global_rsv_need_space(global_rsv); /* * in limited mode, we want to have some free space up to @@ -4441,11 +4440,11 @@ static int should_alloc_chunk(struct btrfs_fs_info *fs_info, thresh = btrfs_super_total_bytes(fs_info->super_copy); thresh = max_t(u64, SZ_64M, div_factor_fine(thresh, 1)); - if (num_bytes - num_allocated < thresh) + if (sinfo->total_bytes - bytes_used < thresh) return 1; } - if (num_allocated + SZ_2M < div_factor(num_bytes, 8)) + if (bytes_used + SZ_2M < div_factor(sinfo->total_bytes, 8)) return 0; return 1; } -- cgit v1.2.3 From 92ac58ec99db0a9ad7337ce85f0ad98a90b88805 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Thu, 17 Aug 2017 10:52:28 +0300 Subject: btrfs: Remove never-reached WARN_ON We have a WARN_ON(!var) inside an if branch which is executed (among others) only when var is true. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/extent-tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index c74d24c1bbc9..eff674bfd162 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6745,7 +6745,7 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info, if (!readonly && return_free_space && global_rsv->space_info == space_info) { u64 to_add = len; - WARN_ON(!return_free_space); + spin_lock(&global_rsv->lock); if (!global_rsv->full) { to_add = min(len, global_rsv->size - -- cgit v1.2.3 From 583b723151794e2ff1691f1510b4e43710293875 Mon Sep 17 00:00:00 2001 From: Hans van Kranenburg Date: Fri, 28 Jul 2017 08:31:28 +0200 Subject: btrfs: Do not use data_alloc_cluster in ssd mode This patch provides a band aid to improve the 'out of the box' behaviour of btrfs for disks that are detected as being an ssd. In a general purpose mixed workload scenario, the current ssd mode causes overallocation of available raw disk space for data, while leaving behind increasing amounts of unused fragmented free space. This situation leads to early ENOSPC problems which are harming user experience and adoption of btrfs as a general purpose filesystem. This patch modifies the data extent allocation behaviour of the ssd mode to make it behave identical to nossd mode. The metadata behaviour and additional ssd_spread option stay untouched so far. Recommendations for future development are to reconsider the current oversimplified nossd / ssd distinction and the broken detection mechanism based on the rotational attribute in sysfs and provide experienced users with a more flexible way to choose allocator behaviour for data and metadata, optimized for certain use cases, while keeping sane 'out of the box' default settings. The internals of the current btrfs code have more potential than what currently gets exposed to the user to choose from. The SSD story... In the first year of btrfs development, around early 2008, btrfs gained a mount option which enables specific functionality for filesystems on solid state devices. The first occurance of this functionality is in commit e18e4809, labeled "Add mount -o ssd, which includes optimizations for seek free storage". The effect on allocating free space for doing (data) writes is to 'cluster' writes together, writing them out in contiguous space, as opposed to a 'tetris' way of putting all separate writes into any free space fragment that fits (which is what the -o nossd behaviour does). A somewhat simplified explanation of what happens is that, when for example, the 'cluster' size is set to 2MiB, when we do some writes, the data allocator will search for a free space block that is 2MiB big, and put the writes in there. The ssd mode itself might allow a 2MiB cluster to be composed of multiple free space extents with some existing data in between, while the additional ssd_spread mount option kills off this option and requires fully free space. The idea behind this is (commit 536ac8ae): "The [...] clusters make it more likely a given IO will completely overwrite the ssd block, so it doesn't have to do an internal rwm cycle."; ssd block meaning nand erase block. So, effectively this means applying a "locality based algorithm" and trying to outsmart the actual ssd. Since then, various changes have been made to the involved code, but the basic idea is still present, and gets activated whenever the ssd mount option is active. This also happens by default, when the rotational flag as seen at /sys/block//queue/rotational is set to 0. However, there's a number of problems with this approach. First, what the optimization is trying to do is outsmart the ssd by assuming there is a relation between the physical address space of the block device as seen by btrfs and the actual physical storage of the ssd, and then adjusting data placement. However, since the introduction of the Flash Translation Layer (FTL) which is a part of the internal controller of an ssd, these attempts are futile. The use of good quality FTL in consumer ssd products might have been limited in 2008, but this situation has changed drastically soon after that time. Today, even the flash memory in your automatic cat feeding machine or your grandma's wheelchair has a full featured one. Second, the behaviour as described above results in the filesystem being filled up with badly fragmented free space extents because of relatively small pieces of space that are freed up by deletes, but not selected again as part of a 'cluster'. Since the algorithm prefers allocating a new chunk over going back to tetris mode, the end result is a filesystem in which all raw space is allocated, but which is composed of underutilized chunks with a 'shotgun blast' pattern of fragmented free space. Usually, the next problematic thing that happens is the filesystem wanting to allocate new space for metadata, which causes the filesystem to fail in spectacular ways. Third, the default mount options you get for an ssd ('ssd' mode enabled, 'discard' not enabled), in combination with spreading out writes over the full address space and ignoring freed up space leads to worst case behaviour in providing information to the ssd itself, since it will never learn that all the free space left behind is actually free. There are two ways to let an ssd know previously written data does not have to be preserved, which are sending explicit signals using discard or fstrim, or by simply overwriting the space with new data. The worst case behaviour is the btrfs ssd_spread mount option in combination with not having discard enabled. It has a side effect of minimizing the reuse of free space previously written in. Fourth, the rotational flag in /sys/ does not reliably indicate if the device is a locally attached ssd. For example, iSCSI or NBD displays as non-rotational, while a loop device on an ssd shows up as rotational. The combination of the second and third problem effectively means that despite all the good intentions, the btrfs ssd mode reliably causes the ssd hardware and the filesystem structures and performance to be choked to death. The clickbait version of the title of this story would have been "Btrfs ssd optimizations considered harmful for ssds". The current nossd 'tetris' mode (even still without discard) allows a pattern of overwriting much more previously used space, causing many more implicit discards to happen because of the overwrite information the ssd gets. The actual location in the physical address space, as seen from the point of view of btrfs is irrelevant, because the actual writes to the low level flash are reordered anyway thanks to the FTL. Changes made in the code 1. Make ssd mode data allocation identical to tetris mode, like nossd. 2. Adjust and clean up filesystem mount messages so that we can easily identify if a kernel has this patch applied or not, when providing support to end users. Also, make better use of the *_and_info helpers to only trigger messages on actual state changes. Backporting notes Notes for whoever wants to backport this patch to their 4.9 LTS kernel: * First apply commit 951e7966 "btrfs: drop the nossd flag when remounting with -o ssd", or fixup the differences manually. * The rest of the conflicts are because of the fs_info refactoring. So, for example, instead of using fs_info, it's root->fs_info in extent-tree.c Signed-off-by: Hans van Kranenburg Signed-off-by: David Sterba --- fs/btrfs/ctree.h | 6 +++--- fs/btrfs/disk-io.c | 6 ++---- fs/btrfs/extent-tree.c | 11 ++++++----- fs/btrfs/super.c | 16 +++++++++------- 4 files changed, 20 insertions(+), 19 deletions(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a3db2b1738aa..02edcddbcc9c 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -470,8 +470,8 @@ struct btrfs_block_rsv { /* * free clusters are used to claim free space in relatively large chunks, - * allowing us to do less seeky writes. They are used for all metadata - * allocations and data allocations in ssd mode. + * allowing us to do less seeky writes. They are used for all metadata + * allocations. In ssd_spread mode they are also used for data allocations. */ struct btrfs_free_cluster { spinlock_t lock; @@ -967,7 +967,7 @@ struct btrfs_fs_info { struct reloc_control *reloc_ctl; - /* data_alloc_cluster is only used in ssd mode */ + /* data_alloc_cluster is only used in ssd_spread mode */ struct btrfs_free_cluster data_alloc_cluster; /* all metadata allocations go through this cluster */ diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 195634098f21..90b967ae46d0 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3053,11 +3053,9 @@ retry_root_backup: if (IS_ERR(fs_info->transaction_kthread)) goto fail_cleaner; - if (!btrfs_test_opt(fs_info, SSD) && - !btrfs_test_opt(fs_info, NOSSD) && + if (!btrfs_test_opt(fs_info, NOSSD) && !fs_info->fs_devices->rotating) { - btrfs_info(fs_info, "detected SSD devices, enabling SSD mode"); - btrfs_set_opt(fs_info->mount_opt, SSD); + btrfs_set_and_info(fs_info, SSD, "enabling ssd optimizations"); } /* diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index eff674bfd162..6a8da7c19182 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6654,19 +6654,20 @@ fetch_cluster_info(struct btrfs_fs_info *fs_info, struct btrfs_space_info *space_info, u64 *empty_cluster) { struct btrfs_free_cluster *ret = NULL; - bool ssd = btrfs_test_opt(fs_info, SSD); *empty_cluster = 0; if (btrfs_mixed_space_info(space_info)) return ret; - if (ssd) - *empty_cluster = SZ_2M; if (space_info->flags & BTRFS_BLOCK_GROUP_METADATA) { ret = &fs_info->meta_alloc_cluster; - if (!ssd) + if (btrfs_test_opt(fs_info, SSD)) + *empty_cluster = SZ_2M; + else *empty_cluster = SZ_64K; - } else if ((space_info->flags & BTRFS_BLOCK_GROUP_DATA) && ssd) { + } else if ((space_info->flags & BTRFS_BLOCK_GROUP_DATA) && + btrfs_test_opt(fs_info, SSD_SPREAD)) { + *empty_cluster = SZ_2M; ret = &fs_info->data_alloc_cluster; } diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 8a9bcad3b06a..0b7a1d8cd08b 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -549,20 +549,22 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options, break; case Opt_ssd: btrfs_set_and_info(info, SSD, - "use ssd allocation scheme"); + "enabling ssd optimizations"); btrfs_clear_opt(info->mount_opt, NOSSD); break; case Opt_ssd_spread: + btrfs_set_and_info(info, SSD, + "enabling ssd optimizations"); btrfs_set_and_info(info, SSD_SPREAD, - "use spread ssd allocation scheme"); - btrfs_set_opt(info->mount_opt, SSD); + "using spread ssd allocation scheme"); btrfs_clear_opt(info->mount_opt, NOSSD); break; case Opt_nossd: - btrfs_set_and_info(info, NOSSD, - "not using ssd allocation scheme"); - btrfs_clear_opt(info->mount_opt, SSD); - btrfs_clear_opt(info->mount_opt, SSD_SPREAD); + btrfs_set_opt(info->mount_opt, NOSSD); + btrfs_clear_and_info(info, SSD, + "not using ssd optimizations"); + btrfs_clear_and_info(info, SSD_SPREAD, + "not using spread ssd allocation scheme"); break; case Opt_barrier: btrfs_clear_and_info(info, NOBARRIER, -- cgit v1.2.3 From 0174484d619460a65e88f594c36983cd2b7f4128 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Thu, 27 Jul 2017 14:22:11 +0300 Subject: btrfs: Remove chunk_objectid argument from btrfs_make_block_group btrfs_make_block_group is always called with chunk_objectid set to BTRFS_FIRST_CHUNK_TREE_OBJECTID. There's no reason why this behavior will change anytime soon, so let's remove the argument and decrease the cognitive load when reading the code path. No functional change Signed-off-by: Nikolay Borisov Signed-off-by: David Sterba --- fs/btrfs/ctree.h | 3 +-- fs/btrfs/extent-tree.c | 6 +++--- fs/btrfs/volumes.c | 4 +--- 3 files changed, 5 insertions(+), 8 deletions(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 02edcddbcc9c..ca087ad5ac48 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2676,8 +2676,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info); int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr); int btrfs_make_block_group(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 bytes_used, - u64 type, u64 chunk_objectid, u64 chunk_offset, - u64 size); + u64 type, u64 chunk_offset, u64 size); struct btrfs_trans_handle *btrfs_start_trans_remove_block_group( struct btrfs_fs_info *fs_info, const u64 chunk_offset); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 6a8da7c19182..1a80f6e58296 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -10180,8 +10180,7 @@ next: int btrfs_make_block_group(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 bytes_used, - u64 type, u64 chunk_objectid, u64 chunk_offset, - u64 size) + u64 type, u64 chunk_offset, u64 size) { struct btrfs_block_group_cache *cache; int ret; @@ -10193,7 +10192,8 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, return -ENOMEM; btrfs_set_block_group_used(&cache->item, bytes_used); - btrfs_set_block_group_chunk_objectid(&cache->item, chunk_objectid); + btrfs_set_block_group_chunk_objectid(&cache->item, + BTRFS_FIRST_CHUNK_TREE_OBJECTID); btrfs_set_block_group_flags(&cache->item, type); cache->flags = type; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 3561397a0c29..1df1044e9008 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4826,9 +4826,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, goto error; } - ret = btrfs_make_block_group(trans, info, 0, type, - BTRFS_FIRST_CHUNK_TREE_OBJECTID, - start, num_bytes); + ret = btrfs_make_block_group(trans, info, 0, type, start, num_bytes); if (ret) goto error_del_extent; -- cgit v1.2.3 From 167ce953ca55bdee20fe56c3c0fa51002435f745 Mon Sep 17 00:00:00 2001 From: Liu Bo Date: Fri, 18 Aug 2017 15:15:18 -0600 Subject: Btrfs: add a helper to retrive extent inline ref type An invalid value of extent inline ref type may be read from a malicious image which may force btrfs to crash. This adds a helper which does sanity check for the ref type, so we can know if it's sane, return he type, otherwise return an error. Signed-off-by: Liu Bo Reviewed-by: David Sterba [ minimal tweak const types, causing warnings due to other cleanup patches ] Signed-off-by: David Sterba --- fs/btrfs/ctree.h | 11 +++++++++++ fs/btrfs/extent-tree.c | 37 +++++++++++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index ca087ad5ac48..542db9d0dbcd 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2567,6 +2567,17 @@ static inline gfp_t btrfs_alloc_write_mask(struct address_space *mapping) /* extent-tree.c */ +enum btrfs_inline_ref_type { + BTRFS_REF_TYPE_INVALID = 0, + BTRFS_REF_TYPE_BLOCK = 1, + BTRFS_REF_TYPE_DATA = 2, + BTRFS_REF_TYPE_ANY = 3, +}; + +int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb, + struct btrfs_extent_inline_ref *iref, + enum btrfs_inline_ref_type is_data); + u64 btrfs_csum_bytes_to_leaves(struct btrfs_fs_info *fs_info, u64 csum_bytes); static inline u64 btrfs_calc_trans_metadata_size(struct btrfs_fs_info *fs_info, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 1a80f6e58296..794b06dd824a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1148,6 +1148,43 @@ static int convert_extent_item_v0(struct btrfs_trans_handle *trans, } #endif +/* + * is_data == BTRFS_REF_TYPE_BLOCK, tree block type is required, + * is_data == BTRFS_REF_TYPE_DATA, data type is requried, + * is_data == BTRFS_REF_TYPE_ANY, either type is OK. + */ +int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb, + struct btrfs_extent_inline_ref *iref, + enum btrfs_inline_ref_type is_data) +{ + int type = btrfs_extent_inline_ref_type(eb, iref); + + if (type == BTRFS_TREE_BLOCK_REF_KEY || + type == BTRFS_SHARED_BLOCK_REF_KEY || + type == BTRFS_SHARED_DATA_REF_KEY || + type == BTRFS_EXTENT_DATA_REF_KEY) { + if (is_data == BTRFS_REF_TYPE_BLOCK) { + if (type == BTRFS_TREE_BLOCK_REF_KEY || + type == BTRFS_SHARED_BLOCK_REF_KEY) + return type; + } else if (is_data == BTRFS_REF_TYPE_DATA) { + if (type == BTRFS_EXTENT_DATA_REF_KEY || + type == BTRFS_SHARED_DATA_REF_KEY) + return type; + } else { + ASSERT(is_data == BTRFS_REF_TYPE_ANY); + return type; + } + } + + btrfs_print_leaf((struct extent_buffer *)eb); + btrfs_err(eb->fs_info, "eb %llu invalid extent inline ref type %d", + eb->start, type); + WARN_ON(1); + + return BTRFS_REF_TYPE_INVALID; +} + static u64 hash_extent_data_ref(u64 root_objectid, u64 owner, u64 offset) { u32 high_crc = ~(u32)0; -- cgit v1.2.3 From 3de28d579edbd35294bf44aee8402c804331bc37 Mon Sep 17 00:00:00 2001 From: Liu Bo Date: Fri, 18 Aug 2017 15:15:19 -0600 Subject: Btrfs: convert to use btrfs_get_extent_inline_ref_type Since we have a helper which can do sanity check, this converts all btrfs_extent_inline_ref_type to it. Signed-off-by: Liu Bo Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/backref.c | 11 +++++++++-- fs/btrfs/extent-tree.c | 36 ++++++++++++++++++++++++++++++------ fs/btrfs/relocation.c | 13 +++++++++++-- 3 files changed, 50 insertions(+), 10 deletions(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 6bae986bfcfb..b517ef1477ea 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -929,7 +929,11 @@ static int add_inline_refs(const struct btrfs_fs_info *fs_info, int type; iref = (struct btrfs_extent_inline_ref *)ptr; - type = btrfs_extent_inline_ref_type(leaf, iref); + type = btrfs_get_extent_inline_ref_type(leaf, iref, + BTRFS_REF_TYPE_ANY); + if (type == BTRFS_REF_TYPE_INVALID) + return -EINVAL; + offset = btrfs_extent_inline_ref_offset(leaf, iref); switch (type) { @@ -1776,7 +1780,10 @@ static int get_extent_inline_ref(unsigned long *ptr, end = (unsigned long)ei + item_size; *out_eiref = (struct btrfs_extent_inline_ref *)(*ptr); - *out_type = btrfs_extent_inline_ref_type(eb, *out_eiref); + *out_type = btrfs_get_extent_inline_ref_type(eb, *out_eiref, + BTRFS_REF_TYPE_ANY); + if (*out_type == BTRFS_REF_TYPE_INVALID) + return -EINVAL; *ptr += btrfs_extent_inline_ref_size(*out_type); WARN_ON(*ptr > end); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 794b06dd824a..51a691532fd8 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1454,12 +1454,18 @@ static noinline u32 extent_data_ref_count(struct btrfs_path *path, struct btrfs_extent_data_ref *ref1; struct btrfs_shared_data_ref *ref2; u32 num_refs = 0; + int type; leaf = path->nodes[0]; btrfs_item_key_to_cpu(leaf, &key, path->slots[0]); if (iref) { - if (btrfs_extent_inline_ref_type(leaf, iref) == - BTRFS_EXTENT_DATA_REF_KEY) { + /* + * If type is invalid, we should have bailed out earlier than + * this call. + */ + type = btrfs_get_extent_inline_ref_type(leaf, iref, BTRFS_REF_TYPE_DATA); + ASSERT(type != BTRFS_REF_TYPE_INVALID); + if (type == BTRFS_EXTENT_DATA_REF_KEY) { ref1 = (struct btrfs_extent_data_ref *)(&iref->offset); num_refs = btrfs_extent_data_ref_count(leaf, ref1); } else { @@ -1620,6 +1626,7 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, int ret; int err = 0; bool skinny_metadata = btrfs_fs_incompat(fs_info, SKINNY_METADATA); + int needed; key.objectid = bytenr; key.type = BTRFS_EXTENT_ITEM_KEY; @@ -1711,6 +1718,11 @@ again: BUG_ON(ptr > end); } + if (owner >= BTRFS_FIRST_FREE_OBJECTID) + needed = BTRFS_REF_TYPE_DATA; + else + needed = BTRFS_REF_TYPE_BLOCK; + err = -ENOENT; while (1) { if (ptr >= end) { @@ -1718,7 +1730,12 @@ again: break; } iref = (struct btrfs_extent_inline_ref *)ptr; - type = btrfs_extent_inline_ref_type(leaf, iref); + type = btrfs_get_extent_inline_ref_type(leaf, iref, needed); + if (type == BTRFS_REF_TYPE_INVALID) { + err = -EINVAL; + goto out; + } + if (want < type) break; if (want > type) { @@ -1910,7 +1927,12 @@ void update_inline_extent_backref(struct btrfs_fs_info *fs_info, if (extent_op) __run_delayed_extent_op(extent_op, leaf, ei); - type = btrfs_extent_inline_ref_type(leaf, iref); + /* + * If type is invalid, we should have bailed out after + * lookup_inline_extent_backref(). + */ + type = btrfs_get_extent_inline_ref_type(leaf, iref, BTRFS_REF_TYPE_ANY); + ASSERT(type != BTRFS_REF_TYPE_INVALID); if (type == BTRFS_EXTENT_DATA_REF_KEY) { dref = (struct btrfs_extent_data_ref *)(&iref->offset); @@ -3195,6 +3217,7 @@ static noinline int check_committed_ref(struct btrfs_root *root, struct btrfs_extent_item *ei; struct btrfs_key key; u32 item_size; + int type; int ret; key.objectid = bytenr; @@ -3236,8 +3259,9 @@ static noinline int check_committed_ref(struct btrfs_root *root, goto out; iref = (struct btrfs_extent_inline_ref *)(ei + 1); - if (btrfs_extent_inline_ref_type(leaf, iref) != - BTRFS_EXTENT_DATA_REF_KEY) + + type = btrfs_get_extent_inline_ref_type(leaf, iref, BTRFS_REF_TYPE_DATA); + if (type != BTRFS_EXTENT_DATA_REF_KEY) goto out; ref = (struct btrfs_extent_data_ref *)(&iref->offset); diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 1a532bb72eab..96f816aa9ed3 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -799,9 +799,17 @@ again: if (ptr < end) { /* update key for inline back ref */ struct btrfs_extent_inline_ref *iref; + int type; iref = (struct btrfs_extent_inline_ref *)ptr; - key.type = btrfs_extent_inline_ref_type(eb, iref); + type = btrfs_get_extent_inline_ref_type(eb, iref, + BTRFS_REF_TYPE_BLOCK); + if (type == BTRFS_REF_TYPE_INVALID) { + err = -EINVAL; + goto out; + } + key.type = type; key.offset = btrfs_extent_inline_ref_offset(eb, iref); + WARN_ON(key.type != BTRFS_TREE_BLOCK_REF_KEY && key.type != BTRFS_SHARED_BLOCK_REF_KEY); } @@ -3753,7 +3761,8 @@ int add_data_references(struct reloc_control *rc, while (ptr < end) { iref = (struct btrfs_extent_inline_ref *)ptr; - key.type = btrfs_extent_inline_ref_type(eb, iref); + key.type = btrfs_get_extent_inline_ref_type(eb, iref, + BTRFS_REF_TYPE_DATA); if (key.type == BTRFS_SHARED_DATA_REF_KEY) { key.offset = btrfs_extent_inline_ref_offset(eb, iref); ret = __add_tree_block(rc, key.offset, blocksize, -- cgit v1.2.3 From 64ecdb647ddb83dcff9c8e2a5c40119f171ea004 Mon Sep 17 00:00:00 2001 From: Liu Bo Date: Fri, 18 Aug 2017 15:15:24 -0600 Subject: Btrfs: add one more sanity check for shared ref type Every shared ref has a parent tree block, which can be get from btrfs_extent_inline_ref_offset(). And the tree block must be aligned to the nodesize, so we'd know this inline ref is not valid if this block's bytenr is not aligned to the nodesize, in which case, most likely the ref type has been misused. This adds the above mentioned check and also updates print_extent_item() called by btrfs_print_leaf() to point out the invalid ref while printing the tree structure. Signed-off-by: Liu Bo Signed-off-by: David Sterba --- fs/btrfs/extent-tree.c | 29 +++++++++++++++++++++++++---- fs/btrfs/print-tree.c | 27 +++++++++++++++++++++------ 2 files changed, 46 insertions(+), 10 deletions(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 51a691532fd8..96e49fd5b888 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1158,19 +1158,40 @@ int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb, enum btrfs_inline_ref_type is_data) { int type = btrfs_extent_inline_ref_type(eb, iref); + u64 offset = btrfs_extent_inline_ref_offset(eb, iref); if (type == BTRFS_TREE_BLOCK_REF_KEY || type == BTRFS_SHARED_BLOCK_REF_KEY || type == BTRFS_SHARED_DATA_REF_KEY || type == BTRFS_EXTENT_DATA_REF_KEY) { if (is_data == BTRFS_REF_TYPE_BLOCK) { - if (type == BTRFS_TREE_BLOCK_REF_KEY || - type == BTRFS_SHARED_BLOCK_REF_KEY) + if (type == BTRFS_TREE_BLOCK_REF_KEY) return type; + if (type == BTRFS_SHARED_BLOCK_REF_KEY) { + ASSERT(eb->fs_info); + /* + * Every shared one has parent tree + * block, which must be aligned to + * nodesize. + */ + if (offset && + IS_ALIGNED(offset, eb->fs_info->nodesize)) + return type; + } } else if (is_data == BTRFS_REF_TYPE_DATA) { - if (type == BTRFS_EXTENT_DATA_REF_KEY || - type == BTRFS_SHARED_DATA_REF_KEY) + if (type == BTRFS_EXTENT_DATA_REF_KEY) return type; + if (type == BTRFS_SHARED_DATA_REF_KEY) { + ASSERT(eb->fs_info); + /* + * Every shared one has parent tree + * block, which must be aligned to + * nodesize. + */ + if (offset && + IS_ALIGNED(offset, eb->fs_info->nodesize)) + return type; + } } else { ASSERT(is_data == BTRFS_REF_TYPE_ANY); return type; diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c index c1acbdcb476c..569205e651c7 100644 --- a/fs/btrfs/print-tree.c +++ b/fs/btrfs/print-tree.c @@ -44,7 +44,7 @@ static void print_dev_item(struct extent_buffer *eb, static void print_extent_data_ref(struct extent_buffer *eb, struct btrfs_extent_data_ref *ref) { - pr_info("\t\textent data backref root %llu objectid %llu offset %llu count %u\n", + pr_cont("extent data backref root %llu objectid %llu offset %llu count %u\n", btrfs_extent_data_ref_root(eb, ref), btrfs_extent_data_ref_objectid(eb, ref), btrfs_extent_data_ref_offset(eb, ref), @@ -63,6 +63,7 @@ static void print_extent_item(struct extent_buffer *eb, int slot, int type) u32 item_size = btrfs_item_size_nr(eb, slot); u64 flags; u64 offset; + int ref_index = 0; if (item_size < sizeof(*ei)) { #ifdef BTRFS_COMPAT_EXTENT_TREE_V0 @@ -104,12 +105,20 @@ static void print_extent_item(struct extent_buffer *eb, int slot, int type) iref = (struct btrfs_extent_inline_ref *)ptr; type = btrfs_extent_inline_ref_type(eb, iref); offset = btrfs_extent_inline_ref_offset(eb, iref); + pr_info("\t\tref#%d: ", ref_index++); switch (type) { case BTRFS_TREE_BLOCK_REF_KEY: - pr_info("\t\ttree block backref root %llu\n", offset); + pr_cont("tree block backref root %llu\n", offset); break; case BTRFS_SHARED_BLOCK_REF_KEY: - pr_info("\t\tshared block backref parent %llu\n", offset); + pr_cont("shared block backref parent %llu\n", offset); + /* + * offset is supposed to be a tree block which + * must be aligned to nodesize. + */ + if (!IS_ALIGNED(offset, eb->fs_info->nodesize)) + pr_info("\t\t\t(parent %llu is NOT ALIGNED to nodesize %llu)\n", + offset, (unsigned long long)eb->fs_info->nodesize); break; case BTRFS_EXTENT_DATA_REF_KEY: dref = (struct btrfs_extent_data_ref *)(&iref->offset); @@ -117,12 +126,18 @@ static void print_extent_item(struct extent_buffer *eb, int slot, int type) break; case BTRFS_SHARED_DATA_REF_KEY: sref = (struct btrfs_shared_data_ref *)(iref + 1); - pr_info("\t\tshared data backref parent %llu count %u\n", + pr_cont("shared data backref parent %llu count %u\n", offset, btrfs_shared_data_ref_count(eb, sref)); + /* + * offset is supposed to be a tree block which + * must be aligned to nodesize. + */ + if (!IS_ALIGNED(offset, eb->fs_info->nodesize)) + pr_info("\t\t\t(parent %llu is NOT ALIGNED to nodesize %llu)\n", + offset, (unsigned long long)eb->fs_info->nodesize); break; default: - btrfs_err(eb->fs_info, - "extent %llu has invalid ref type %d", + pr_cont("(extent %llu has INVALID ref type %d)\n", eb->start, type); return; } -- cgit v1.2.3 From 1cd5447eb677822c5c22bb52161c2105507dcce0 Mon Sep 17 00:00:00 2001 From: Jeff Mahoney Date: Thu, 17 Aug 2017 10:25:11 -0400 Subject: btrfs: pass fs_info to btrfs_del_root instead of tree_root btrfs_del_roots always uses the tree_root. Let's pass fs_info instead. Signed-off-by: Jeff Mahoney Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/ctree.h | 4 ++-- fs/btrfs/extent-tree.c | 2 +- fs/btrfs/free-space-tree.c | 2 +- fs/btrfs/qgroup.c | 3 +-- fs/btrfs/root-tree.c | 7 ++++--- 5 files changed, 9 insertions(+), 9 deletions(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index b7cfc74c1757..2add002662f4 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2988,8 +2988,8 @@ int btrfs_del_root_ref(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 root_id, u64 ref_id, u64 dirid, u64 *sequence, const char *name, int name_len); -int btrfs_del_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, - const struct btrfs_key *key); +int btrfs_del_root(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info, const struct btrfs_key *key); int btrfs_insert_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, const struct btrfs_key *key, struct btrfs_root_item *item); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 96e49fd5b888..e2d7e86b51d1 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9266,7 +9266,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root, if (err) goto out_end_trans; - ret = btrfs_del_root(trans, tree_root, &root->root_key); + ret = btrfs_del_root(trans, fs_info, &root->root_key); if (ret) { btrfs_abort_transaction(trans, ret); goto out_end_trans; diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c index a5e34de06c2f..684f12247db7 100644 --- a/fs/btrfs/free-space-tree.c +++ b/fs/btrfs/free-space-tree.c @@ -1257,7 +1257,7 @@ int btrfs_clear_free_space_tree(struct btrfs_fs_info *fs_info) if (ret) goto abort; - ret = btrfs_del_root(trans, tree_root, &free_space_root->root_key); + ret = btrfs_del_root(trans, fs_info, &free_space_root->root_key); if (ret) goto abort; diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index ddc37c537058..5c8b61c86e61 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -946,7 +946,6 @@ out: int btrfs_quota_disable(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { - struct btrfs_root *tree_root = fs_info->tree_root; struct btrfs_root *quota_root; int ret = 0; @@ -968,7 +967,7 @@ int btrfs_quota_disable(struct btrfs_trans_handle *trans, if (ret) goto out; - ret = btrfs_del_root(trans, tree_root, "a_root->root_key); + ret = btrfs_del_root(trans, fs_info, "a_root->root_key); if (ret) goto out; diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index 5b488af6f25e..9fb9896610e0 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -335,10 +335,11 @@ int btrfs_find_orphan_roots(struct btrfs_fs_info *fs_info) return err; } -/* drop the root item for 'key' from 'root' */ -int btrfs_del_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, - const struct btrfs_key *key) +/* drop the root item for 'key' from the tree root */ +int btrfs_del_root(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info, const struct btrfs_key *key) { + struct btrfs_root *root = fs_info->tree_root; struct btrfs_path *path; int ret; -- cgit v1.2.3