summaryrefslogtreecommitdiff
path: root/drivers/scsi
AgeCommit message (Collapse)Author
2011-03-02scsi_dh_alua: Add IBM Power Virtual SCSI ALUA device to dev listBrian King
commit 22963a37b3437a25812cc856afa5a84ad4a3f541 upstream. Adds IBM Power Virtual SCSI ALUA devices to the ALUA device handler. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02scsi_dh_alua: add netapp to dev listMike Christie
commit cd4a8814d44672bd2c8f04a472121bfbe193809c upstream. Newer Netapp target software supports ALUA, so this patch adds them to the scsi_dev_alua dev list. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17mpt2sas: Kernel Panic during Large Topology discoveryKashyap, Desai
commit 4224489f45b503f0a1f1cf310f76dc108f45689a upstream. There was a configuration page timing out during the initial port enable at driver load time. The port enable would fail, and this would result in the driver unloading itself, meanwhile the driver was accessing freed memory in another context resulting in the panic. The fix is to prevent access to freed memory once the driver had issued the diag reset which woke up the sleeping port enable process. The routine _base_reset_handler was reorganized so the last sleeping process woken up was the port_enable. Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17mpt2sas: Correct resizing calculation for max_queue_depthKashyap, Desai
commit 11e1b961ab067ee3acaf723531da4d3f23e1d6f7 upstream. The ioc->hba_queue_depth is not properly resized when the controller firmware reports that it supports more outstanding IO than what can be fit inside the reply descriptor pool depth. This is reproduced by setting the controller global credits larger than 30,000. The bug results in an incorrect sizing of the queues. The fix is to resize the queue_size by dividing queue_diff by two. Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17mpt2sas: Fix device removal handshake for zoned devicesKashyap, Desai
commit 4dc2757a2e9a9d1f2faee4fc6119276fc0061c16 upstream. When zoning end devices, the driver is not sending device removal handshake alogrithm to firmware. This results in controller firmware not sending sas topology add events the next time the device is added. The fix is the driver should be doing the device removal handshake even though the PHYSTATUS_VACANT bit is set in the PhyStatus of the event data. The current design is avoiding the handshake when the VACANT bit is set in the phy status. Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17libsas: fix runaway error handler problemJames Bottomley
commit 9ee91f7fb550a4c82f82d9818e42493484c754af upstream. libsas makes use of scsi_schedule_eh() but forgets to clear the host_eh_scheduled flag in its error handling routine. Because of this, the error handler thread never gets to sleep; it's constantly awake and trying to run the error routine leading to console spew and inability to run anything else (at least on a UP system). The fix is to clear the flag as we splice the work queue. Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17fix medium error problems with some arrays which can cause data corruptionJames Bottomley
commit a8733c7baf457b071528e385a0b7d4aaec79287c upstream. Our current handling of medium error assumes that data is returned up to the bad sector. This assumption holds good for all disk devices, all DIF arrays and most ordinary arrays. However, an LSI array engine was recently discovered which reports a medium error without returning any data. This means that when we report good data up to the medium error, we've reported junk originally in the buffer as good. Worse, if the read consists of requested data plus a readahead, and the error occurs in readahead, we'll just strip off the readahead and report junk up to userspace as good data with no error. The fix for this is to have the error position computation take into account the amount of data returned by the driver using the scsi residual data. Unfortunately, not every driver fills in this data, but for those who don't, it's set to zero, which means we'll think a full set of data was transferred and the behaviour will be identical to the prior behaviour of the code (believe the buffer up to the error sector). All modern drivers seem to set the residual, so that should fix up the LSI failure/corruption case. Reported-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-01-07block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits insteadMartin K. Petersen
commit e692cb668fdd5a712c6ed2a2d6f2a36ee83997b4 upstream. When stacking devices, a request_queue is not always available. This forced us to have a no_cluster flag in the queue_limits that could be used as a carrier until the request_queue had been set up for a metadevice. There were several problems with that approach. First of all it was up to the stacking device to remember to set queue flag after stacking had completed. Also, the queue flag and the queue limits had to be kept in sync at all times. We got that wrong, which could lead to us issuing commands that went beyond the max scatterlist limit set by the driver. The proper fix is to avoid having two flags for tracking the same thing. We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the block layer merging functions. The queue_limit 'no_cluster' is turned into 'cluster' to avoid double negatives and to ease stacking. Clustering defaults to being enabled as before. The queue flag logic is removed from the stacking function, and explicitly setting the cluster flag is no longer necessary in DM and MD. Reported-by: Ed Lin <ed.lin@promise.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-01-07bfa: fix system crash when reading sysfs fc_host statisticsKrishna Gudipati
commit 7873ca4e4401f0ecd8868bf1543113467e6bae61 upstream. The port data structure related to fc_host statistics collection is not initialized. This causes system crash when reading the fc_host statistics. The fix is to initialize port structure during driver attach. Signed-off-by: Krishna Gudipati <kgudipat@brocade.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22Fix regressions in scsi_internal_device_blockMike Christie
commit 986fe6c7f50974e871b8ab5a800f5310ea25b361 upstream. Deleting a SCSI device on a blocked fc_remote_port (before fast_io_fail_tmo fires) results in a hanging thread: STACK: 0 schedule+1108 [0x5cac48] 1 schedule_timeout+528 [0x5cb7fc] 2 wait_for_common+266 [0x5ca6be] 3 blk_execute_rq+160 [0x354054] 4 scsi_execute+324 [0x3b7ef4] 5 scsi_execute_req+162 [0x3b80ca] 6 sd_sync_cache+138 [0x3cf662] 7 sd_shutdown+138 [0x3cf91a] 8 sd_remove+112 [0x3cfe4c] 9 __device_release_driver+124 [0x3a08b8] 10 device_release_driver+60 [0x3a0a5c] 11 bus_remove_device+266 [0x39fa76] 12 device_del+340 [0x39d818] 13 __scsi_remove_device+204 [0x3bcc48] 14 scsi_remove_device+66 [0x3bcc8e] 15 sysfs_schedule_callback_work+50 [0x260d66] 16 worker_thread+622 [0x162326] 17 kthread+160 [0x1680b0] 18 kernel_thread_starter+6 [0x10aaea] During the delete, the SCSI device is in moved to SDEV_CANCEL. When the FC transport class later calls scsi_target_unblock, this has no effect, since scsi_internal_device_unblock ignores SCSI devics in this state. It looks like all these are regressions caused by: 5c10e63c943b4c67561ddc6bf61e01d4141f881f [SCSI] limit state transitions in scsi_internal_device_unblock Fix by rejecting offline and cancel in the state transition. Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> [jejb: Original patch by Christof Schmitt, modified by Mike Christie] Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22Fix race when removing SCSI devicesChristof Schmitt
commit 546ae796bfac6399e30da4b5af2cf7a6d0f8a4ec upstream. Removing SCSI devices through echo 1 > /sys/bus/scsi/devices/ ... /delete while the FC transport class removes the SCSI target can lead to an oops: Unable to handle kernel pointer dereference at virtual kernel address 00000000b6815000 Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: sunrpc qeth_l3 binfmt_misc dm_multipath scsi_dh dm_mod ipv6 qeth ccwgroup [last unloaded: scsi_wait_scan] CPU: 1 Not tainted 2.6.35.5-45.x.20100924-s390xdefault #1 Process fc_wq_0 (pid: 861, task: 00000000b7331240, ksp: 00000000b735bac0) Krnl PSW : 0704200180000000 00000000003ff6e4 (__scsi_remove_device+0x24/0xd0) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3 Krnl GPRS: 0000000000000001 0000000000000000 00000000b6815000 00000000bc24a8c0 00000000003ff7c8 000000000056dbb8 0000000000000002 0000000000835d80 ffffffff00000000 0000000000001000 00000000b6815000 00000000bc24a7f0 00000000b68151a0 00000000b6815000 00000000b735bc20 00000000b735bbf8 Krnl Code: 00000000003ff6d6: a7840001 brc 8,3ff6d8 00000000003ff6da: a7fbffd8 aghi %r15,-40 00000000003ff6de: e3e0f0980024 stg %r14,152(%r15) >00000000003ff6e4: e31021200004 lg %r1,288(%r2) 00000000003ff6ea: a71f0000 cghi %r1,0 00000000003ff6ee: a7a40011 brc 10,3ff710 00000000003ff6f2: a7390003 lghi %r3,3 00000000003ff6f6: c0e5ffffc8b1 brasl %r14,3f8858 Call Trace: ([<0000000000001000>] 0x1000) [<00000000003ff7d2>] scsi_remove_device+0x42/0x54 [<00000000003ff8ba>] __scsi_remove_target+0xca/0xfc [<00000000003ff99a>] __remove_child+0x3a/0x48 [<00000000003e3246>] device_for_each_child+0x72/0xbc [<00000000003ff93a>] scsi_remove_target+0x4e/0x74 [<0000000000406586>] fc_rport_final_delete+0xb2/0x23c [<000000000015d080>] worker_thread+0x200/0x344 [<000000000016330c>] kthread+0xa0/0xa8 [<0000000000106c1a>] kernel_thread_starter+0x6/0xc [<0000000000106c14>] kernel_thread_starter+0x0/0xc INFO: lockdep is turned off. Last Breaking-Event-Address: [<00000000003ff7cc>] scsi_remove_device+0x3c/0x54 The function __scsi_remove_target iterates through the SCSI devices on the host, but it drops the host_lock before calling scsi_remove_device. When the SCSI device is deleted from another thread, the pointer to the SCSI device in scsi_remove_device can become invalid. Fix this by getting a reference to the SCSI device before dropping the host_lock to keep the SCSI device alive for the call to scsi_remove_device. Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22gdth: integer overflow in ioctlDan Carpenter
commit f63ae56e4e97fb12053590e41a4fa59e7daa74a4 upstream. gdth_ioctl_alloc() takes the size variable as an int. copy_from_user() takes the size variable as an unsigned long. gen.data_len and gen.sense_len are unsigned longs. On x86_64 longs are 64 bit and ints are 32 bit. We could pass in a very large number and the allocation would truncate the size to 32 bits and allocate a small buffer. Then when we do the copy_from_user(), it would result in a memory corruption. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22libsas: fix NCQ mixing with non-NCQDavid Milburn
commit f0ad30d3d2dc924decc0e10b1ff6dc32525a5d99 upstream. Some cards (like mvsas) have issue troubles if non-NCQ commands are mixed with NCQ ones. Fix this by using the libata default NCQ check routine which waits until all NCQ commands are complete before issuing a non-NCQ one. The impact to cards (like aic94xx) which don't need this logic should be minimal Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22sd name space exhaustion causes system hangMichael Reed
commit 1a03ae0f556a931aa3747b70e44b78308f5b0590 upstream. Following a site power outage which re-enabled all the ports on my FC switches, my system subsequently booted with far too many luns! I had let it run hoping it would make multi-user. It didn't. :( It hung solid after exhausting the last sd device, sdzzz, and attempting to create sdaaaa and beyond. I was unable to get a dump. Discovered using a 2.6.32.13 based system. correct this by detecting when the last index is utilized and failing the sd probe of the device. Patch applies to scsi-misc-2.6. Signed-off-by: Michael Reed <mdr@sgi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13ibmvfc: Reduce error recovery timeoutBrian King
commit daa142d1773dd3a986f02a8a4da929608d24daaa upstream. If a command times out resulting in EH getting invoked, we wait for the aborted commands to come back after sending the abort. Shorten the amount of time we wait for these responses, to ensure we don't get stuck in EH for several minutes. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13ibmvfc: Fix command completion handlingBrian King
commit f5832fa2f8dc39adcf3ae348d2d6383163235e79 upstream. Commands which are completed by the VIOS are placed on a CRQ in kernel memory for the ibmvfc driver to process. Each CRQ entry is 16 bytes. The ibmvfc driver reads the first 8 bytes to check if the entry is valid, then reads the next 8 bytes to get the handle, which is a pointer the completed command. This fixes an issue seen on Power 7 where the processor reordered the loads from memory, resulting in processing command completion with a stale handle. This could result in command timeouts, and also early completion of commands. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13aic79xx: check for non-NULL scb in ahd_handle_nonpkt_busfreeHannes Reinecke
commit 534ef056db8a8fb6b9d50188d88ed5d1fbc66673 upstream. When removing several devices aic79xx will occasionally Oops in ahd_handle_nonpkt_busfree during rescan. Looking at the code I found that we're indeed not checking if the scb in question is NULL. So check for it before accessing it. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02SCSI: aacraid: Eliminate use after freeJulia Lawall
commit 8a52da632ceb9d8b776494563df579e87b7b586b upstream. The debugging code using the freed structure is moved before the kfree. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @free@ expression E; position p; @@ kfree@p(E) @@ expression free.E, subE<=free.E, E1; position free.p; @@ kfree@p(E) ... ( subE = E1 | * E ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-07-05qla2xxx: Disable MSI on qla24xx chips other than QLA2432.Ben Hutchings
commit 6377a7ae1ab82859edccdbc8eaea63782efb134d upstream. On specific platforms, MSI is unreliable on some of the QLA24xx chips, resulting in fatal I/O errors under load, as reported in <http://bugs.debian.org/572322> and by some RHEL customers. Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-26megaraid_sas: fix for 32bit appsTomas Henzl
commit b3dc1a212e5167984616445990c76056034f8eeb upstream. It looks like this patch - commit 7b2519afa1abd1b9f63aa1e90879307842422dae Author: Yang, Bo <Bo.Yang@lsi.com> Date: Tue Oct 6 14:52:20 2009 -0600 [SCSI] megaraid_sas: fix 64 bit sense pointer truncation has caused a problem for 32bit programs with 64bit os - http://bugzilla.kernel.org/show_bug.cgi?id=15001 fix by converting the user space 32bit pointer to a 64 bit one when needed. [jejb: fix up some 64 bit warnings] Signed-off-by: Tomas Henzl <thenzl@redhat.com> Cc: Bo Yang <Bo.Yang@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12SCSI: Retry commands with UNIT_ATTENTION sense codes to fix ext3/ext4 I/O errorJames Bottomley
commit 77a4229719e511a0d38d9c355317ae1469adeb54 upstream. There's nastyness in the way we currently handle barriers (and discards): They're effectively filesystem commands, but they get processed as BLOCK_PC commands. Unfortunately BLOCK_PC commands are taken by SCSI to be SG_IO commands and the issuer expects to see and handle any returned errors, however trivial. This leads to a huge problem, because the block layer doesn't expect this to happen and any trivially retryable error on a barrier causes an immediate I/O error to the filesystem. The only real way to hack around this is to take the usual class of offending errors (unit attentions) and make them all retryable in the case of a REQ_HARDBARRIER. A correct fix would involve a rework of the entire block and SCSI submit system, and so is out of scope for a quick fix. Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12Enable retries for SYNCRONIZE_CACHE commands to fix I/O errorHannes Reinecke
commit c213e1407be6b04b144794399a91472e0ef92aec upstream. Some arrays are giving I/O errors with ext3 filesystems when SYNCHRONIZE_CACHE gets a UNIT_ATTENTION. What is happening is that these commands have no retries, so the UNIT_ATTENTION causes the barrier to fail. We should be enable retries here to clear any transient error and allow the barrier to succeed. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12scsi_debug: virtual_gb ignores sector_sizeDouglas Gilbert
commit 5447ed6c968e7270b656afa273c2b79d15d82edd upstream. In the scsi_debug driver, the virtual_gb option ignores the sector_size, implicitly assuming that is 512 bytes. So if 'virtual_gb=1 sector_size=4096' the result is an 8 GB (virtual) disk. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12SCSI: libiscsi: regression: fix header digest errorsMike Christie
commit 96b1f96dcab87756c0a1e7ba76bc5dc2add82b88 upstream. This fixes a regression introduced with this commit: commit d3305f3407fa3e9452079ec6cc8379067456e4aa Author: Mike Christie <michaelc@cs.wisc.edu> Date: Thu Aug 20 15:10:58 2009 -0500 [SCSI] libiscsi: don't increment cmdsn if cmd is not sent in 2.6.32. When I moved the hdr->cmdsn after init_task, I added a bug when header digests are used. The problem is that the LLD may calculate the header digest in init_task, so if we then set the cmdsn after the init_task call we change what the digest will be calculated by the target. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12SCSI: fix locking around blk_abort_request()Tejun Heo
commit 70b25f890ce9f0520c64075ce9225a5b020a513e upstream. blk_abort_request() expects queue lock to be held by the caller. Grab it before calling the function. Lack of this synchronization led to infinite loop on corrupt q->timeout_list. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12qla2xxx: Properly handle UNDERRUN completion statuses.Lalit Chandivade
commit 0f00a206ccb1dc644b6770ef25f185610fee6962 upstream. Correct issues where the lower scsi-status would be improperly cleared, instead, allow the midlayer to process the status after the proper residual-count checks are performed. Finally, validate firmware status flags prior to assigning values from the FCP_RSP frame. Signed-off-by: Lalit Chandivade <lalit.chandivade@qlogic.com> Signed-off-by: Michael Hernandez <michael.hernandez@qlogic.com> Signed-off-by: Ravi Anand <ravi.anand@qlogic.com> Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12skip sense logging for some ATA PASS-THROUGH cdbsDouglas Gilbert
commit e7efe5932b1d3916c79326a4221693ea90a900e2 upstream. Further to the lsml thread titled: "does scsi_io_completion need to dump sense data for ata pass through (ck_cond = 1) ?" This is a patch to skip logging when the sense data is associated with a SENSE_KEY of "RECOVERED_ERROR" and the additional sense code is "ATA PASS-THROUGH INFORMATION AVAILABLE". This only occurs with the SAT ATA PASS-THROUGH commands when CK_COND=1 (in the cdb). It indicates that the sense data contains ATA registers. Smartmontools uses such commands on ATA disks connected via SAT. Periodic checks such as those done by smartd cause nuisance entries into logs that are: - neither errors nor warnings - pointless unless the cdb that caused them are also logged Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26SCSI: add scsi target reset support to scsi ioctlMike Christie
commit 3f9daedfcb197d784c6e7ecd731e3aa9859bc951 upstream. The scsi ioctl code path was missing scsi target reset support. This patch just adds it. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26fc class: fail fast bsg requestsMike Christie
commit 2bc1c59dbdefdb6f9767e06efb86bbdb2923a8be upstream. If the port state is blocked and the fast io fail tmo has fired then this patch will fail bsg requests immediately. This is needed if userspace is sending IOs to test the transport like with fcping, so it will not have to wait for the dev loss tmo. With this patch he bsg req fast io fail code behaves like the normal and sg io/passthrough fast io fail. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Acked-By: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26libiscsi: Fix recovery slowdown regressionMike Christie
commit 4ae0a6c15efcc37e94e3f30e3533bdec03c53126 upstream. We could be failing/stopping a connection due to libiscsi starting recovery/cleanup, but the xmit path or scsi eh thread path could be dropping the connection at the same time. As a result the session->state gets set to failed instead of in recovery. We end up not blocking the session and so the replacement timeout never gets started and we only end up failing the IO when scsi_softirq_done sees that the cmd has been running for (cmd->allowed + 1) * rq->timeout secs. We used to fail the IO right away so users are seeing a long delay when using dm-multipath. This problem was added in 2.6.28. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01SCSI: scsi_transport_fc: Fix synchronization issue while deleting vportGal Rosen
commit 0d9dc7c8b9b7fa0f53647423b41056ee1beed735 upstream. The issue occur while deleting 60 virtual ports through the sys interface /sys/class/fc_vports/vport-X/vport_delete. It happen while in a mistake each request sent twice for the same vport. This interface is asynchronous, entering the delete request into a work queue, allowing more than one request to enter to the delete work queue. The result is a NULL pointer. The first request already delete the vport, while the second request got a pointer to the vport before the device destroyed. Re-create vport later cause system freeze. Solution: Check vport flags before entering the request to the work queue. [jejb: fixed int<->long problem on spinlock flags variable] Signed-off-by: Gal Rosen <galr@storwize.com> Acked-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01mvsas: add support for Adaptec ASC-1045/1405 SAS/SATA HBASrinivas
commit 7ec4ad0125db0222e397508c190b01c8f2b5f7cd upstream. This is support for Adaptec ASC-1045/1405 SAS/SATA HBA on mvsas, which is based on Marvell 88SE6440 chipset. Signed-off-by: Srinivas <satyasrinivasp@hcl.in> Cc: Andy Yan <ayan@marvell.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Cc: Thomas Voegtle <tv@lio96.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01drivers/scsi/ses.c: eliminate double freeJulia Lawall
commit 9b3a6549b2602ca30f58715a0071e29f9898cae9 upstream. The few lines below the kfree of hdr_buf may go to the label err_free which will also free hdr_buf. The most straightforward solution seems to be to just move the kfree of hdr_buf after these gotos. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ identifier E; expression E1; iterator I; statement S; @@ *kfree(E); ... when != E = E1 when != I(E,...) S when != &E *kfree(E); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15SCSI: qla1280: Drop host_lock while requesting firmwareBen Hutchings
commit 2cec802980727f1daa46d8c31b411e083d49d7a2 upstream. request_firmware() may sleep and it appears to be safe to release the spinlock here. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15SCSI: qla2xxx: Obtain proper host structure during response-queue processing.Anirban Chakraborty
commit a67093d46e3caed1a42d694a7de452b61db30562 upstream. Original code incorrectly assumed only status-type-0 IOCBs would be queued to the response-queue, and thus all entries would safely reference a VHA from the IOCB 'handle.' Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15mpt2sas: Delete volume before HBA detach.Kashyap, Desai
commit d7384b28afb2bf2b7be835ddc8c852bdc5e0ce1c upstream. The driver hangs when doing `rmmod mpt2sas` if there are any IR volumes present.The hang is due the scsi midlayer trying to access the IR volumes after the driver releases controller resources. Perhaps when scsi_remove_host is called,the scsi mid layer is sending some request. This doesn't occur for bare drives becuase the driver is already reporting those drives deleted prior to calling mpt2sas_base_detach. To solve this issue, we need to delete the volumes as well. Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Reviewed-by: Eric Moore <eric.moore@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15ARM: 5944/1: scsi: fix timer setup in fas216.cGuennadi Liakhovetski
commit b857df1acc634b18db1db2a40864af985100266e upstream. mod_timer() takes an absolute time and not a delay as its argument. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09scsi_lib: Fix bug in completion of bidi commandsBoaz Harrosh
commit 63c43b0ec1765b74c734d465ba6345ef4f434df8 upstream. Because of the terrible structuring of scsi-bidi-commands it breaks some of the life time rules of a scsi-command. It is now not allowed to free up the block-request before cleanup and partial deallocation of the scsi-command. (Which is not so for none bidi commands) The right fix to this problem would be to make bidi command a first citizen by allocating a scsi_sdb pointer at scsi command just like cmd->prot_sdb. The bidi sdb should be allocated/deallocated as part of the get/put_command (Again like the prot_sdb) and the current decoupling of scsi_cmnd and blk-request should be kept. For now make sure scsi_release_buffers() is called before the call to blk_end_request_all() which might cause the suicide of the block requests. At best the leak of bidi buffers, at worse a crash, as there is a race between the existence of the bidi_request and the free of the associated bidi_sdb. The reason this was never hit before is because only OSD has the potential of doing asynchronous bidi commands. (So does bsg but it is never used) And OSD clients just happen to do all their bidi commands synchronously, up until recently. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28fcoe: Fix getting san mac for VLAN interfaceYi Zou
commit 5bab87e6d465d54a2b5899e0f583d42f00dbee2e upstream. Make sure we are get the SAN MAC address from the real netdev if the input netdev is a VLAN device. Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28fcoe: Fix checking san mac addressYi Zou
commit bf361707c81f8e8e43e332bfc8838bae76ae021a upstream. This was fixed before in 7a7f0c7 but it's introduced again recently. Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28fcoe, libfc: fix an libfc issue with queue ramp down in libfcVasu Dev
commit 14caf44c69184ed72d46a2f883311daf27a4192f upstream. The cmd_per_lun value is used by scsi-ml as fall back lowest queue_depth value but in case of libfc cmd_per_lun is set to same value as max queue_depth = 32. So this patch reduces cmd_per_lun value to 3 and configures each lun with default max queue_depth 32 in fc_slave_alloc. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Acked-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: remote port gets stuck in restart state without really restartingAbhijeet Joglekar
commit 5543c72e2bbb30e5ba5938b18ec26617b8b3fb04 upstream. We ran into a scenario where a remote port goes into RESTART state, but never gets added to scsi transport. The running vmcore showed the following: a) Port was in RESTART state b) rdata->event was STOP c) no work gets scheduled for the remote work to fc_rport_work After this point, shut/no-shut of the remote port did not cause the port to get re-discovered. The port would move betwen DELETE and RESTART states, but the event would always be STOP, no work would get scheduled to fc_rport_work and the port would not get added to scsi_transport. The problem is that rdata->event is not set to NONE after a port is restarted. After this point, no more work gets scheduled for the remote port since new work is scheduled only if rdata->event is non-NONE. So, the event and state keep changing, but fc_rport_work does not get scheduled to actually handle the event. Here's a transition of states that explains the above observation: ) Port is first in READY State, event is NONE 2) RSCN on shut, port goes to DELETED, event is stop 3) Before fc_rport_work runs, RSCN on no-shut, port goes to RESTART, event is still STOP 4) fc_rport_work gets scheduled, removes the port from transport, sees state as RESTART, begins the PLOGI state machine, event remains as STOP (event NOT changed to NONE, this is the bug) 5) Plogi state machine completes, port state goes to READY, event goes to READY, but no work is scheduled since event was STOP (non-NONE) before. Fc_rport_work is not scheduled, port remains in READY state, but is not added to transport. Things are broken at this point. Libfc rport is ready, but no transport rport created. 6) now a shut causes port state to change to DELETE, event to change to STOP, no work gets scheduled 7) no-shut causes port state to change to RESTART, event remains at STOP, no work gets scheduled (6) and (7) now get repeated everytime we do shut/no-shut. No way to get out of this state. Fcc reset does not help too. Only way to get out is to load/unload module. Fix is to set rdata->event to NONE while processing the STOP/LOGO/FAILED events, inside the discovery and rport locks. Signed-off-by: Abhijeet Joglekar <abjoglek@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: fix free of fc_rport_priv with timer pendingJoe Eykholt
commit b4a9c7ede96e90f7b1ec009ce7256059295e76df upstream. Timer crashes were caused by freeing a struct fc_rport_priv with a timer pending, causing the timer facility list to be corrupted. This was during FC uplink flap tests with a lot of targets. After discovery, we were doing an PLOGI on an rdata that was in DELETE state but not yet removed from the lookup list. This moved the rdata from DELETE state to PLOGI state. If the PLOGI exchange allocation failed and needed to be retried, the timer scheduling could race with the free being done by fc_rport_work(). When fc_rport_login() is called on a rport in DELETE state, move it to a new state RESTART. In fc_rport_work, when handling a LOGO, STOPPED or FAILED event, look for restart state. In the RESTART case, don't take the rdata off the list and after the transport remote port is deleted and exchanges are reset, re-login to the remote port. Note that the new RESTART state also corrects a problem we had when re-discovering a port that had moved to DELETE state. In that case, a new rdata was created, but the old rdata would do an exchange manager reset affecting the FC_ID for both the new rdata and old rdata. With the new state, the new port isn't logged into until after any old exchanges are reset. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: fix memory corruption caused by double frees and bad error handlingChris Leech
commit 8f550f937e9fdafa5c37e348e214aecec851ef3f upstream. I was running into several different panics under stress, which I traced down to a few different possible slab corruption issues in error handling paths. I have not yet looked into why these exchange sends fail, but with these fixes my test system is much more stable under stress than before. fc_elsct_send() could fail and either leave the passed in frame intact (failure in fc_ct/els_fill) or the frame could have been freed if the failure was is fc_exch_seq_send(). The caller had no way of knowing, and there was a potential double free in the error handling in fc_fcp_rec(). Make fc_elsct_send() always free the frame before returning, and remove the fc_frame_free() call in fc_fcp_rec(). While fc_exch_seq_send() did always consume the frame, there were double free bugs in the error handling of fc_fcp_cmd_send() and fc_fcp_srr() as well. Numerous calls to error handling routines (fc_disc_error(), fc_lport_error(), fc_rport_error_retry() ) were passing in a frame pointer that had already been freed in the case of an error. I have changed the call sites to pass in a NULL pointer, but there may be more appropriate error codes to use. Question: Why do these error routines take a frame pointer anyway? I understand passing in a pointer encoded error to the response handlers, but the error routines take no action on a valid pointer and should never be called that way. Signed-off-by: Chris Leech <christopher.leech@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: Fix frags in frame exceeding SKB_MAX_FRAGS in fc_fcp_send_dataYi Zou
commit d37322a43ebac79eef417149f5696390cf8872db upstream. In case of sequence offload, in fc_fcp_send_data(), the skb_fill_page_info() called may end up adding more frags to the skb_shinfo(fp_skb(fp))->frags[], exceeding SKB_MAX_FRAGS, this eventually corrupts the memory. I am adding the FR_FRAME_SG_LEN back, but as SKB_MAX_FRAGS -1, leaving 1 for our fcoe_eof_crc page. And send will be broken into multiple large sends if the frame already contains more frags than skb handle. Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28fcoe: initialize return value in fcoe_destroyMike Christie
commit 8eca355fa8af660557fbdd5506bde1392eee9bfe upstream. When doing echo ethX > /sys..../destroy I am getting errors when the tear down succeeds. It looks like the reason for this is because the rc var is not getting set when the destruction works. This just sets it to zero. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: don't WARN_ON in lport_timeout for RESET stateJoe Eykholt
commit 22655ac22289d7b7def8ef2d72eafe5024bd57fe upstream. It's possible and harmless to get FLOGI timeouts while in RESET state. Don't do a WARN_ON in that case. Also, split out the other WARN_ONs in fc_lport_timeout, so we can tell which one is hit by its line number. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: lport: fix minor documentation errorsJoe Eykholt
commit 1b69bc062c2a4c8f3e15ac69f487afec3aa8d774 upstream. Fix minor errors. A debug message said an RLIR was received instead of ECHO. "Expected" was misspelled in several places. Fix a type cast from u32 to __be32. Rob, Some of these may have been also taken care of in your other doc cleanup patch. Feel free to fold them in. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: Fix wrong scsi return status under FC_DATA_UNDRUNYi Zou
commit 4347fa66878e079766258bc0d077c350cb31a799 upstream. This bug is exposed when there is a link flap in LLD. Particularly, when it happens right after a SCSI write command is sent out, no FCP_DATA is sent, causing fsp->status_code to be set as FC_DATA_UNDRUN in fc_fcp_complete_locked even no SCSI status is received. Consequently, fc_io_compl treats this as DID_OK. This results in SCSI returning successful to the initial I/O request even there is no DATA actually sent. Particularly, if you run an I/O tool w/ data verification on, the read back for verification is gonna fail. This is fixed here by checking when FC_DATA_UNDRUN happens, SCSI status is received w/ FC_SRB_RCV_STATUS set in fsp->state. Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28fcoe: remove redundant checking of netdev->netdev_opsYi Zou
commit b04d023cf5b7f4113cc4a09405c2fe8003bfe37d upstream. Remove the redundant checking of netdev->netdev_ops as it will never be NULL. Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>