From e3c1b0712fdb039d9139ac0910be9a658c9cb65e Mon Sep 17 00:00:00 2001 From: shaoyunl Date: Tue, 16 Feb 2021 12:50:42 -0500 Subject: drm/amdgpu: Reset the devices in the XGMI hive duirng probe In passthrough configuration, hypervisior will trigger the SBR(Secondary bus reset) to the devices without sync to each other. This could cause device hang since for XGMI configuration, all the devices within the hive need to be reset at a limit time slot. This serial of patches try to solve this issue by co-operate with new SMU which will only do minimum house keeping to response the SBR request but don't do the real reset job and leave it to driver. Driver need to do the whole sw init and minimum HW init to bring up the SMU and trigger the reset(possibly BACO) on all the ASICs at the same time Signed-off-by: shaoyunl Acked-by: Andrey Grodzovsky andrey.grodzovsky@amd.com Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c') diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c index 7f527f8bbdb1..e2ec71321c7a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c @@ -441,7 +441,7 @@ bool amdgpu_ras_eeprom_check_err_threshold(struct amdgpu_device *adev) if (!__is_ras_eeprom_supported(adev)) return false; - if (con->eeprom_control.tbl_hdr.header == EEPROM_TABLE_HDR_BAD) { + if (con && (con->eeprom_control.tbl_hdr.header == EEPROM_TABLE_HDR_BAD)) { dev_warn(adev->dev, "This GPU is in BAD status."); dev_warn(adev->dev, "Please retire it or setting one bigger " "threshold value when reloading driver.\n"); -- cgit v1.2.3