summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-01-28xfs: Fix xfs_swap_extents() after removal of xfs_flushinval_pages()for-linus-v3.8-rc6Torsten Kaiser
Commit fb59581404ab7ec5075299065c22cb211a9262a9 removed xfs_flushinval_pages() and changed its callers to use filemap_write_and_wait() and truncate_pagecache_range() directly. But in xfs_swap_extents() this change accidental switched the argument for 'tip' to 'ip'. This patch switches it back to 'tip' Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28xfs: Fix possible use-after-free with AIOJan Kara
Running AIO is pinning inode in memory using file reference. Once AIO is completed using aio_complete(), file reference is put and inode can be freed from memory. So we have to be sure that calling aio_complete() is the last thing we do with the inode. CC: xfs@oss.sgi.com CC: Ben Myers <bpm@sgi.com> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28xfs: fix shutdown hang on invalid inode during createDave Chinner
When the new inode verify in xfs_iread() fails, the create transaction is aborted and a shutdown occurs. The subsequent unmount then hangs in xfs_wait_buftarg() on a buffer that has an elevated hold count. Debug showed that it was an AGI buffer getting stuck: [ 22.576147] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck [ 22.976213] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck [ 23.376206] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck [ 23.776325] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck The trace of this buffer leading up to the shutdown (trimmed for brevity) looks like: xfs_buf_init: bno 0x2 nblks 0x1 hold 1 caller xfs_buf_get_map xfs_buf_get: bno 0x2 len 0x200 hold 1 caller xfs_buf_read_map xfs_buf_read: bno 0x2 len 0x200 hold 1 caller xfs_trans_read_buf_map xfs_buf_iorequest: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read xfs_buf_hold: bno 0x2 nblks 0x1 hold 1 caller xfs_buf_iorequest xfs_buf_rele: bno 0x2 nblks 0x1 hold 2 caller xfs_buf_iorequest xfs_buf_iowait: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read xfs_buf_ioerror: bno 0x2 len 0x200 hold 1 caller xfs_buf_bio_end_io xfs_buf_iodone: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_ioend xfs_buf_iowait_done: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read xfs_buf_hold: bno 0x2 nblks 0x1 hold 1 caller xfs_buf_item_init xfs_trans_read_buf: bno 0x2 len 0x200 hold 2 recur 0 refcount 1 xfs_trans_brelse: bno 0x2 len 0x200 hold 2 recur 0 refcount 1 xfs_buf_item_relse: bno 0x2 nblks 0x1 hold 2 caller xfs_trans_brelse xfs_buf_rele: bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_relse xfs_buf_unlock: bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse xfs_buf_rele: bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse xfs_buf_trylock: bno 0x2 nblks 0x1 hold 2 caller _xfs_buf_find xfs_buf_find: bno 0x2 len 0x200 hold 2 caller xfs_buf_get_map xfs_buf_get: bno 0x2 len 0x200 hold 2 caller xfs_buf_read_map xfs_buf_read: bno 0x2 len 0x200 hold 2 caller xfs_trans_read_buf_map xfs_buf_hold: bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_init xfs_trans_read_buf: bno 0x2 len 0x200 hold 3 recur 0 refcount 1 xfs_trans_log_buf: bno 0x2 len 0x200 hold 3 recur 0 refcount 1 xfs_buf_item_unlock: bno 0x2 len 0x200 hold 3 flags DIRTY liflags ABORTED xfs_buf_unlock: bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock xfs_buf_rele: bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock And that is the AGI buffer from cold cache read into memory to transaction abort. You can see at transaction abort the bli is dirty and only has a single reference. The item is not pinned, and it's not in the AIL. Hence the only reference to it is this transaction. The problem is that the xfs_buf_item_unlock() call is dropping the last reference to the xfs_buf_log_item attached to the buffer (which holds a reference to the buffer), but it is not freeing the xfs_buf_log_item. Hence nothing will ever release the buffer, and the unmount hangs waiting for this reference to go away. The fix is simple - xfs_buf_item_unlock needs to detect the last reference going away in this case and free the xfs_buf_log_item to release the reference it holds on the buffer. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28xfs: limit speculative prealloc near ENOSPC thresholdsDave Chinner
There is a window on small filesytsems where specualtive preallocation can be larger than that ENOSPC throttling thresholds, resulting in specualtive preallocation trying to reserve more space than there is space available. This causes immediate ENOSPC to be triggered, prealloc to be turned off and flushing to occur. One the next write (i.e. next 4k page), we do exactly the same thing, and so effective drive into synchronous 4k writes by triggering ENOSPC flushing on every page while in the window between the prealloc size and the ENOSPC prealloc throttle threshold. Fix this by checking to see if the prealloc size would consume all free space, and throttle it appropriately to avoid premature ENOSPC... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28xfs: fix _xfs_buf_find oops on blocks beyond the filesystem endDave Chinner
When _xfs_buf_find is passed an out of range address, it will fail to find a relevant struct xfs_perag and oops with a null dereference. This can happen when trying to walk a filesystem with a metadata inode that has a partially corrupted extent map (i.e. the block number returned is corrupt, but is otherwise intact) and we try to read from the corrupted block address. In this case, just fail the lookup. If it is readahead being issued, it will simply not be done, but if it is real read that fails we will get an error being reported. Ideally this case should result in an EFSCORRUPTED error being reported, but we cannot return an error through xfs_buf_read() or xfs_buf_get() so this lookup failure may result in ENOMEM or EIO errors being reported instead. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28xfs: pull up stack_switch check into xfs_bmapi_writeBrian Foster
The stack_switch check currently occurs in __xfs_bmapi_allocate, which means the stack switch only occurs when xfs_bmapi_allocate() is called in a loop. Pull the check up before the loop in xfs_bmapi_write() such that the first iteration of the loop has consistent behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28xfs: Do not return EFSCORRUPTED when filesystem probe finds no XFS magicEric Sandeen
9802182 changed the return value from EWRONGFS (aka EINVAL) to EFSCORRUPTED which doesn't seem to be handled properly by the root filesystem probe. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Tested-by: Sergei Trofimovich <slyfox@gentoo.org> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-16xfs: recalculate leaf entry pointer after compacting a dir2 blockfor-linus-v3.8-rc4Eric Sandeen
Dave Jones hit this assert when doing a compile on recent git, with CONFIG_XFS_DEBUG enabled: XFS: Assertion failed: (char *)dup - (char *)hdr == be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup)), file: fs/xfs/xfs_dir2_data.c, line: 828 Upon further digging, the tag found by xfs_dir2_data_unused_tag_p(dup) contained "2" and not the proper offset, and I found that this value was changed after the memmoves under "Use a stale leaf for our new entry." in xfs_dir2_block_addname(), i.e. memmove(&blp[mid + 1], &blp[mid], (highstale - mid) * sizeof(*blp)); overwrote it. What has happened is that the previous call to xfs_dir2_block_compact() has rearranged things; it changes btp->count as well as the blp array. So after we make that call, we must recalculate the proper pointer to the leaf entries by making another call to xfs_dir2_block_leaf_p(). Dave provided a metadump image which led to a simple reproducer (create a particular filename in the affected directory) and this resolves the testcase as well as the bug on his live system. Thanks also to dchinner for looking at this one with me. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Tested-by: Dave Jones <davej@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-16xfs: remove int casts from debug dquot soft limit timer assertsBrian Foster
The int casts here make it easy to trigger an assert with a large soft limit. For example, set a >4TB soft limit on an empty volume to reproduce a (0 > -x) comparison due to an overflow of d_blk_softlimit. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-16xfs: fix the multi-segment log buffer formatMark Tinguely
Per Dave Chinner suggestion, this patch: 1) Corrects the detection of whether a multi-segment buffer is still tracking data. 2) Clears all the buffer log formats for a multi-segment buffer. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-16xfs: fix segment in xfs_buf_item_format_segmentMark Tinguely
Not every segment in a multi-segment buffer is dirty in a transaction and they will not be outputted. The assert in xfs_buf_item_format_segment() that checks for the at least one chunk of data in the segment to be used is not necessary true for multi-segmented buffers. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-16xfs: rename bli_format to avoid confusion with bli_formatsMark Tinguely
Rename the bli_format structure to __bli_format to avoid accidently confusing them with the bli_formats pointer. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-16xfs: use b_maps[] for discontiguous buffersMark Tinguely
Commits starting at 77c1a08 introduced a multiple segment support to xfs_buf. xfs_trans_buf_item_match() could not find a multi-segment buffer in the transaction because it was looking at the single segment block number rather than the multi-segment b_maps[0].bm.bn. This results on a recursive buffer lock that can never be satisfied. This patch: 1) Changed the remaining b_map accesses to be b_maps[0] accesses. 2) Renames the single segment b_map structure to __b_map to avoid future confusion. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-12-21Linux 3.8-rc1v3.8-rc1Linus Torvalds
2012-12-21Merge git://www.linux-watchdog.org/linux-watchdogLinus Torvalds
Pull watchdog updates from Wim Van Sebroeck: "This includes some fixes and code improvements (like clk_prepare_enable and clk_disable_unprepare), conversion from the omap_wdt and twl4030_wdt drivers to the watchdog framework, addition of the SB8x0 chipset support and the DA9055 Watchdog driver and some OF support for the davinci_wdt driver." * git://www.linux-watchdog.org/linux-watchdog: (22 commits) watchdog: mei: avoid oops in watchdog unregister code path watchdog: Orion: Fix possible null-deference in orion_wdt_probe watchdog: sp5100_tco: Add SB8x0 chipset support watchdog: davinci_wdt: add OF support watchdog: da9052: Fix invalid free of devm_ allocated data watchdog: twl4030_wdt: Change TWL4030_MODULE_PM_RECEIVER to TWL_MODULE_PM_RECEIVER watchdog: remove depends on CONFIG_EXPERIMENTAL watchdog: Convert dev_printk(KERN_<LEVEL> to dev_<level>( watchdog: DA9055 Watchdog driver watchdog: omap_wdt: eliminate goto watchdog: omap_wdt: delete redundant platform_set_drvdata() calls watchdog: omap_wdt: convert to devm_ functions watchdog: omap_wdt: convert to new watchdog core watchdog: WatchDog Timer Driver Core: fix comment watchdog: s3c2410_wdt: use clk_prepare_enable and clk_disable_unprepare watchdog: imx2_wdt: Select the driver via ARCH_MXC watchdog: cpu5wdt.c: add missing del_timer call watchdog: hpwdt.c: Increase version string watchdog: Convert twl4030_wdt to watchdog core davinci_wdt: preparation for switch to common clock framework ...
2012-12-21Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French: "Misc small cifs fixes" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: cifs: eliminate cifsERROR variable cifs: don't compare uniqueids in cifs_prime_dcache unless server inode numbers are in use cifs: fix double-free of "string" in cifs_parse_mount_options
2012-12-21Merge tag 'dm-3.8-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm Pull dm update from Alasdair G Kergon: "Miscellaneous device-mapper fixes, cleanups and performance improvements. Of particular note: - Disable broken WRITE SAME support in all targets except linear and striped. Use it when kcopyd is zeroing blocks. - Remove several mempools from targets by moving the data into the bio's new front_pad area(which dm calls 'per_bio_data'). - Fix a race in thin provisioning if discards are misused. - Prevent userspace from interfering with the ioctl parameters and use kmalloc for the data buffer if it's small instead of vmalloc. - Throttle some annoying error messages when I/O fails." * tag 'dm-3.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: (36 commits) dm stripe: add WRITE SAME support dm: remove map_info dm snapshot: do not use map_context dm thin: dont use map_context dm raid1: dont use map_context dm flakey: dont use map_context dm raid1: rename read_record to bio_record dm: move target request nr to dm_target_io dm snapshot: use per_bio_data dm verity: use per_bio_data dm raid1: use per_bio_data dm: introduce per_bio_data dm kcopyd: add WRITE SAME support to dm_kcopyd_zero dm linear: add WRITE SAME support dm: add WRITE SAME support dm: prepare to support WRITE SAME dm ioctl: use kmalloc if possible dm ioctl: remove PF_MEMALLOC dm persistent data: improve improve space map block alloc failure message dm thin: use DMERR_LIMIT for errors ...
2012-12-21Revert "nfsd: warn on odd reply state in nfsd_vfs_read"J. Bruce Fields
This reverts commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e. This is obviously wrong, and I have no idea how I missed seeing the warning in testing: I must just not have looked at the right logs. The caller bumps rq_resused/rq_next_page, so it will always be hit on a large enough read. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-21Merge tag 'rdma-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull more infiniband changes from Roland Dreier: "Second batch of InfiniBand/RDMA changes for 3.8: - cxgb4 changes to fix lookup engine hash collisions - mlx4 changes to make flow steering usable - fix to IPoIB to avoid pinning dst reference for too long" * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA/cxgb4: Fix bug for active and passive LE hash collision path RDMA/cxgb4: Fix LE hash collision bug for passive open connection RDMA/cxgb4: Fix LE hash collision bug for active open connection mlx4_core: Allow choosing flow steering mode mlx4_core: Adjustments to Flow Steering activation logic for SR-IOV mlx4_core: Fix error flow in the flow steering wrapper mlx4_core: Add QPN enforcement for flow steering rules set by VFs cxgb4: Add LE hash collision bug fix path in LLD driver cxgb4: Add T4 filter support IPoIB: Call skb_dst_drop() once skb is enqueued for sending
2012-12-21Merge tag 'asm-generic' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic cleanup from Arnd Bergmann: "These are a few cleanups for asm-generic: - a set of patches from Lars-Peter Clausen to generalize asm/mmu.h and use it in the architectures that don't need any special handling. - A patch from Will Deacon to remove the {read,write}s{b,w,l} as discussed during the arm64 review - A patch from James Hogan that helps with the meta architecture series." * tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: xtensa: Use generic asm/mmu.h for nommu h8300: Use generic asm/mmu.h c6x: Use generic asm/mmu.h asm-generic/mmu.h: Add support for FDPIC asm-generic/mmu.h: Remove unused vmlist field from mm_context_t asm-generic: io: remove {read,write} string functions asm-generic/io.h: remove asm/cacheflush.h include
2012-12-21ARM: dts: fix duplicated build target and alphabetical sort out for exynosKukjin Kim
Commit db5b0ae00712 ("Merge tag 'dt' of git://git.kernel.org/.../arm-soc") causes a duplicated build target. This patch fixes it and sorts out the build target alphabetically so that we can recognize something wrong easily. Cc: Olof Johansson <olof@lixom.net> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-21dm stripe: add WRITE SAME supportMike Snitzer
Rename stripe_map_discard to stripe_map_range and reuse it for WRITE SAME bio processing. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm: remove map_infoMikulas Patocka
This patch removes map_info from bio-based device mapper targets. map_info is still used for request-based targets. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm snapshot: do not use map_contextMikulas Patocka
Eliminate struct map_info from dm-snap. map_info->ptr was used in dm-snap to indicate if the bio was tracked. If map_info->ptr was non-NULL, the bio was linked in tracked_chunk_hash. This patch removes the use of map_info->ptr. We determine if the bio was tracked based on hlist_unhashed(&c->node). If hlist_unhashed is true, the bio is not tracked, if it is false, the bio is tracked. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm thin: dont use map_contextMikulas Patocka
This patch removes endio_hook_pool from dm-thin and uses per-bio data instead. This patch removes any use of map_info in preparation for the next patch that removes map_info from bio-based device mapper. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm raid1: dont use map_contextMikulas Patocka
Don't use map_info any more in dm-raid1. map_info was used for writes to hold the region number. For this purpose we add a new field dm_bio_details to dm_raid1_bio_record. map_info was used for reads to hold a pointer to dm_raid1_bio_record (if the pointer was non-NULL, bio details were saved; if the pointer was NULL, bio details were not saved). We use dm_raid1_bio_record.details->bi_bdev for this purpose. If bi_bdev is NULL, details were not saved, if bi_bdev is non-NULL, details were saved. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm flakey: dont use map_contextMikulas Patocka
Replace map_info with a per-bio structure "struct per_bio_data" in dm-flakey. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm raid1: rename read_record to bio_recordMikulas Patocka
Rename struct read_record to bio_record in dm-raid1. In the following patch, the structure will be used for both read and write bios, so rename it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm: move target request nr to dm_target_ioMikulas Patocka
This patch moves target_request_nr from map_info to dm_target_io and makes it accessible with dm_bio_get_target_request_nr. This patch is a preparation for the next patch that removes map_info. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm snapshot: use per_bio_dataMikulas Patocka
Replace tracked_chunk_pool with per_bio_data in dm-snap. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm verity: use per_bio_dataMikulas Patocka
Replace io_mempool with per_bio_data in dm-verity. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm raid1: use per_bio_dataMikulas Patocka
Replace read_record_pool with per_bio_data in dm-raid1. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm: introduce per_bio_dataMikulas Patocka
Introduce a field per_bio_data_size in struct dm_target. Targets can set this field in the constructor. If a target sets this field to a non-zero value, "per_bio_data_size" bytes of auxiliary data are allocated for each bio submitted to the target. These data can be used for any purpose by the target and help us improve performance by removing some per-target mempools. Per-bio data is accessed with dm_per_bio_data. The argument data_size must be the same as the value per_bio_data_size in dm_target. If the target has a pointer to per_bio_data, it can get a pointer to the bio with dm_bio_from_per_bio_data() function (data_size must be the same as the value passed to dm_per_bio_data). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm kcopyd: add WRITE SAME support to dm_kcopyd_zeroMike Snitzer
Add WRITE SAME support to dm-io and make it accessible to dm_kcopyd_zero(). dm_kcopyd_zero() provides an asynchronous interface whereas the blkdev_issue_write_same() interface is synchronous. WRITE SAME is a SCSI command that can be leveraged for more efficient zeroing of a specified logical extent of a device which supports it. Only a single zeroed logical block is transfered to the target for each WRITE SAME and the target then writes that same block across the specified extent. The dm thin target uses this. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm linear: add WRITE SAME supportMike Snitzer
The linear target can already support WRITE SAME requests so signal this by setting num_write_same_requests to 1. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm: add WRITE SAME supportMike Snitzer
WRITE SAME bios have a payload that contain a single page. When cloning WRITE SAME bios DM has no need to modify the bi_io_vec attributes (and doing so would be detrimental). DM need only alter the start and end of the WRITE SAME bio accordingly. Rather than duplicate __clone_and_map_discard, factor out a common function that is also used by __clone_and_map_write_same. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm: prepare to support WRITE SAMEMike Snitzer
Allow targets to opt in to WRITE SAME support by setting 'num_write_same_requests' in the dm_target structure. A dm device will only advertise WRITE SAME support if all its targets and all its underlying devices support it. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm ioctl: use kmalloc if possibleMikulas Patocka
If the parameter buffer is small enough, try to allocate it with kmalloc() rather than vmalloc(). vmalloc is noticeably slower than kmalloc because it has to manipulate page tables. In my tests, on PA-RISC this patch speeds up activation 13 times. On Opteron this patch speeds up activation by 5%. This patch introduces a new function free_params() to free the parameters and this uses new flags that record whether or not vmalloc() was used and whether or not the input buffer must be wiped after use. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm ioctl: remove PF_MEMALLOCMikulas Patocka
When allocating memory for the userspace ioctl data, set some appropriate GPF flags directly instead of using PF_MEMALLOC. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm persistent data: improve improve space map block alloc failure messageJoe Thornber
Improve space map error message when unable to allocate a new metadata block. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm thin: use DMERR_LIMIT for errorsMike Snitzer
Throttle all errors logged from the IO path by dm thin. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm persistent data: use DMERR_LIMIT for errorsMike Snitzer
Nearly all of persistent-data is in the IO path so throttle error messages with DMERR_LIMIT to limit the amount logged when something has gone wrong. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm block manager: reinstate message when validator failsMike Snitzer
Reinstate a useful error message when the block manager buffer validator fails. This was mistakenly eliminated when the block manager was converted to use dm-bufio. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm raid: round region_size to power of twoJonathan Brassow
If the user does not supply a bitmap region_size to the dm raid target, a reasonable size is computed automatically. If this is not a power of 2, the md code will report an error later. This patch catches the problem early and rounds the region_size to the next power of two. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm thin: cleanup dead codeJoe Thornber
Remove unused @data_block parameter from cell_defer. Change thin_bio_map to use many returns rather than setting a variable. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm thin: rename cell_defer_except to cell_defer_no_holderJoe Thornber
Rename cell_defer_except() to cell_defer_no_holder() which describes its function more clearly. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm snapshot: optimize track_chunkMikulas Patocka
track_chunk is always called with interrupts enabled. Consequently, we do not need to save and restore interrupt state in "flags" variable. This patch changes spin_lock_irqsave to spin_lock_irq and spin_unlock_irqrestore to spin_unlock_irq. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm raid: use DM_ENDIO_INCOMPLETEMikulas Patocka
Use a defined macro DM_ENDIO_INCOMPLETE instead of a numeric constant. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm raid1: remove impossible mempool_alloc error testMikulas Patocka
mempool_alloc can't fail if __GFP_WAIT is specified, so the condition that tests if read_record is non-NULL is always true. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm thin: emit ignore_discard in status when discards disabledMike Snitzer
If "ignore_discard" is specified when creating the thin pool device then discard support is disabled for that device. The pool device's status should reflect this fact rather than stating "no_discard_passdown" (which implies discards are enabled but passdown is disabled). Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>