[ofw] Opensm & WinMad: a race, cauing BSOD722

Discussion:

Hefty, Sean

2012-01-25 20:09:05 UTC

We got a BSOD in Opensm - 10D, {b, 76157f00, 0, 8811d008}.
Could you take a look ?

I looked into this, and I can't say that I see anything wrong in the code. :(

Seems like BSOD has been caused by a race between the main and MAD reading
threads of Opensm.
The main thread has already closed the port and is now found in
opensm_main
...
osm_mad_pool_destroy(&p_osm->mad_pool);
osm_vendor_delete(&p_osm->p_vendor); //
port release
osm_subn_destroy(&p_osm->subn);
// the thread is found here now
opensm!umad_receiver
libibumad!umad_recv
...
winmad!WmIoRead
winmad!WmProviderRead
WdfObjectAcquireLock(pProvider->ReadQueue); // BSOD
A try to ReadQueue with !wdfqueue fails.
Seems like pProvider is already released. But there is no any checks of its
validity in WmProviderRead().

The pProvider->Ref is set to 0, which strongly suggests that the provider has been released.

Maybe WmIoRead() should check, that the Provider is not being released and
take some reference, while WmProviderRemoveHandler() should wait to this
reference to be removed ?

The provider object is (supposed to be) bound to the lifetime of the open ControlDevice file. It is initialized in the EvtFileCreate callback and released in the EvtFileCleanup callback. According to the MS documentation, the EvtFileCleanup is called after the last handle to the file has been closed. My assumption was that this meant that the file is no longer accessible for any other access (ioctls, reads, or writes).

There is a vague note in the documentation that states: "(Because of outstanding I/O requests, this handle might not have been released.)" I have no idea what exactly this means. If it means that Windows may invoke calls on a file during or after calling the EvtFileCleanup, then Windows is seriously stupid.

As a simple test, we can *try* adding checks in wm_driver.c in WmIoDeviceControl(), WmIoRead(), and WmIoWrite() that do something like:

if (prov->Ref == 0) {
WdfRequestComplete(Request, STATUS_WINDOWS_IS_STUPID);
return;
}

(A better solution may be to call WmProviderGet() / WmProviderPut(), with WmProviderGet() returning whether or not we actually obtained the provider.) What we really need to determine is whether Windows will invoke calls on a file during or after calling the cleanup event callback, but I have no idea how we can know that. And if it does, is it a 'feature' or a bug. If windows does not do this, then the check above isn't a safe fix, since it depends on the prov memory being accessible.

- Sean

Hefty, Sean

2012-01-26 16:35:55 UTC

Permalink

Post by Hefty, Sean

Post by Hefty, Sean
If it means that Windows may invoke calls on a file during or after calling

the EvtFileCleanup, then Windows is seriously stupid.
A quote from ms-
help://MS.WDK.v10.7600.091201/Kernel_d/hh/Kernel_d/DrvComps_72d3e802-d17e-
DispatchCleanup Routines
A driver's DispatchCleanup routine handles IRPs for the IRP_MJ_CLEANUP I/O
function code.
...
While processing an IRP_MJ_CLEANUP request, a driver can receive additional
requests, such as IRP_MJ_READ or IRP_MJ_WRITE. Therefore, a driver that must
deallocate resources must also synchronize execution of its DispatchCleanup
routine with other dispatch routines, such as DispatchRead and DispatchWrite.

Bah - then that's just a stupid design. It means that the cleanup request can't actually do anything useful, like, say clean up resources, since there may be other operations in progress at the same time. Whatever ends up being used to synchronize between the cleanup processing and read/write requests can't actually be cleaned up until some future request. So, why even have the dumb IRP in the first place...

Hefty, Sean

2012-01-26 17:04:08 UTC

Permalink

1) Don't use IRP_MJ_READ and IRP_MJ_WRITE. In other words, replace them with
device io control which as far as I understand will not be called after the
call to IRP_MJ_CLEANUP.

I used read/write because that's what was used by umad, and I was trying to match those semantics. I'd prefer to keep this usage model.

Hopefully the device IO control does not have this issue, or there would likely be a problem in this same area with winverbs.

(http://msdn.microsoft.com/en-
us/library/windows/hardware/ff550720(v=vs.85).aspx)
"When Sent
Receipt of this request indicates that the last handle of the file object that
is associated with the target device object has been closed and released. All
outstanding I/O requests have been completed or canceled."

Yes - it looks like cleanup needs to be split between the cleanup and close IRPs, with extra synchronization added to handle things correctly. The cleanup of the synchronization object must be delayed until close.

- Sean

Hefty, Sean

2012-01-27 01:24:08 UTC

Permalink

The attached patch will hopefully fix the problem. I have not been able to test this. I see errors trying to load the mlx4_bus at the head of the svn tree.

MLX4_SF_8_0_0: QUERY_FW command failed with error -19.
The adapter card is non-functional.
Most likely a FW problem.

I swapped the HCA with one from another system, but still see the error. The OFED 3.0 release works fine. I will try backing out my changes and testing with the head of the svn tree tomorrow or early next week, though I don't see why the changes would affect loading the driver.

- Sean

Hefty, Sean

2012-01-27 02:05:49 UTC

Permalink

Post by Hefty, Sean
I swapped the HCA with one from another system, but still see the error. The
OFED 3.0 release works fine. I will try backing out my changes and testing
with the head of the svn tree tomorrow or early next week, though I don't see
why the changes would affect loading the driver.

Ok, my build went faster than I expected. I was able to apply this patch to the 3.0 code release and verify that it at least doesn't crash the system.

- Sean

Hefty, Sean

2012-01-30 22:07:23 UTC

Permalink

WmProviderInit() is called without checking the return status. Is there a
reason ?
Seems like the similar patch is needed for WvIoDeviceControl().

I can't tell whether IOCTLs suffer from the same problem or not. But since Windows is stupid, I went ahead and added the same protection to winverbs, plus some additional validation in case we get a cleanup event for a file for which we failed to create.

- Sean

Hefty, Sean

2012-01-31 15:50:16 UTC

Permalink

I've replaced mutex by spinlock - see below.
I did it also for WinMad, albeit it has no asynchronous callbacks like
WinVerbs.
The main reason is to keep it similar to WinVerbs as it is today.
A minor, mostly theoretical one: there are other functions, which are using
today the provider mutex. It seems for me worthful to keep for them
possibility to call a low-level WvProviderGet function.
What's your opinion ?

As a theoretical preference, I prefer to use a guarded mutex wherever possible, specifically so we know that the objects being protected may not be accessible from DPC, which gives us a convenient stack dump. My guess is that it's faster than a spinlock as well. The other issue is that if we're trying to synchronize between a passive level thread and one at DPC, then the passive level thread can't acquire, then release a spinlock. That puts the thread back at passive level, which the DPC thread could then starve. Does any of this apply to this patch? Probably not...

I think an easier fix for the crash is to simply call InterlockedIncrement() directly in the offending callback, rather than WvProviderGet(). This would clarify in the code that we must already be holding a reference on the provider and that destruction during that time is not something which needs to be handled. I.e. a call to WvProviderGet() would always succeed. But if you've already committed your change, I'm fine leaving it.

Hefty, Sean

2012-02-02 16:28:03 UTC

Permalink

WmProviderDeregister(pRegistration->pProvider, pRegistration);
pRegistration->pDevice->IbInterface.destroy_qp(pRegistration->hQp,
NULL);
pRegistration->pDevice->IbInterface.dealloc_pd(pRegistration->hPd,
NULL);

pRegistration->pDevice->IbInterface.close_ca(pRegistration->hCa, NULL);

Could you suggest some idea ?

winmad does not explicitly allocate any address handles. Can you tell if there are any mads which were not returned to the free pool? You could try replacing the NULLs in the above code with ib_sync_destroy (unsure of exact name).

Hefty, Sean

2012-02-02 18:04:20 UTC

Permalink

I don't see anything that stands out as a bug in the cleanup code. But with a ref_cnt that high, it seems unlikely that a small window in the cleanup code would result with that many mads being missed. I need to spend more time reviewing the code.

Have you seen this as a consistent issue, or is this the first time that it's happened?

-----Original Message-----
Sent: Thursday, February 02, 2012 8:42 AM
To: Hefty, Sean; Tzachi Dar; Smith, Stan
Cc: Uri Habusha; ofw_list; Irena Gannon
Subject: RE: opensm stuck upon kill
I do not have the crashed machine more.
It was rebooted and the full dump creation failed.
I can't say about MADs, but I found only one place where an AV is created and
attached to PD - in the send_mad call.
And I saw that PD has ref_cnt = 227.
I think these are references of not released AVs i.e. MADs.
Could you tell me where I can see not released MADs ?
The stuck happened after WmProviderDeregister() and destroy_qp.
WmProviderDeregister is to release all the queued MADs.
Could there be some MADs that are already or yet not in the queue ?
-----Original Message-----
Sent: Thursday, February 02, 2012 6:28 PM
To: Leonid Keller; Tzachi Dar; Smith, Stan
Cc: Uri Habusha; ofw_list; Irena Gannon
Subject: RE: opensm stuck upon kill

pRegistration->pDevice->IbInterface.close_ca(pRegistration->hCa, NULL);

Could you suggest some idea ?

winmad does not explicitly allocate any address handles. Can you tell if
there are any mads which were not returned to the free pool? You could try
replacing the NULLs in the above code with ib_sync_destroy (unsure of exact
name).

Smith, Stan

2012-02-02 18:50:33 UTC

Permalink

-----Original Message-----
Sent: Thursday, February 02, 2012 8:42 AM
To: Hefty, Sean; Tzachi Dar; Smith, Stan
Cc: Uri Habusha; ofw_list; Irena Gannon
Subject: RE: opensm stuck upon kill
I do not have the crashed machine more.
It was rebooted and the full dump creation failed.
I can't say about MADs, but I found only one place where an AV is created and attached to PD - in the send_mad call.
And I saw that PD has ref_cnt = 227.
I think these are references of not released AVs i.e. MADs.
Could you tell me where I can see not released MADs ?
The stuck happened after WmProviderDeregister() and destroy_qp.
WmProviderDeregister is to release all the queued MADs.
Could there be some MADs that are already or yet not in the queue ?

Check opensm\user\libvendor\osm_vendor_ibumad.c

-----Original Message-----
Sent: Thursday, February 02, 2012 6:28 PM
To: Leonid Keller; Tzachi Dar; Smith, Stan
Cc: Uri Habusha; ofw_list; Irena Gannon
Subject: RE: opensm stuck upon kill

pRegistration->pDevice->IbInterface.close_ca(pRegistration->hCa, NULL);

Could you suggest some idea ?

winmad does not explicitly allocate any address handles. Can you tell if there are any mads which were not returned to the free pool? You
could try replacing the NULLs in the above code with ib_sync_destroy (unsure of exact name).

Hefty, Sean

2012-02-02 21:17:14 UTC

Permalink

Where in the send completion path does ibal free AVs that it allocates in the send path? I've looked through the code, but haven't found that yet.

pRegistration->pDevice->IbInterface.close_ca(pRegistration->hCa, NULL);

Could you suggest some idea ?

winmad does not explicitly allocate any address handles. Can you tell if
there are any mads which were not returned to the free pool? You could try
replacing the NULLs in the above code with ib_sync_destroy (unsure of exact
name).

Hefty, Sean

2012-02-03 23:01:08 UTC

Permalink

Post by Hefty, Sean
Where in the send completion path does ibal free AVs that it allocates in the
send path? I've looked through the code, but haven't found that yet.

Btw - I found where this is in the code path.

Smith, Stan

2012-02-02 18:55:11 UTC

Permalink

Leo,
What are you saying exactly by 'opensm stuck on kill'? More kill info please.

Was OpenSM running as a service and via service control you said stop?
OpenSM running as a console application '--console local' and you typed the 'exit' command?
OpenSM running and you just killed the process?

Killed how?

Thanks,

Stan.

-----Original Message-----
Sent: Thursday, February 02, 2012 6:42 AM
To: Leonid Keller; Hefty, Sean; Tzachi Dar; Smith, Stan
Cc: Uri Habusha; ofw_list; Irena Gannon
Subject: opensm stuck upon kill
Hi guys,
opensm got stuck upon kill
I'll try to keep the full dump and will send you if you are interested.
The stuck happens in IBAL upon releasing PD.
nt!DbgBreakPoint
ibbus!sync_destroy_obj+0xa61
ibbus!destroy_obj+0x8ad
ibbus!async_destroy_obj+0xa4
ibbus!ib_dealloc_pd+0x2b6
winmad!WmRegRemoveHandler+0xae
...
// from ibbus!sync_destroy_obj
1: kd> ?? p_obj
struct _al_obj * 0xa970fbbc
...
+0x080 ref_cnt : 1
...
+0x0a4 type : 3 //it's AV
+0x0a8 state : 3 ( CL_DESTROYING )
...
There are 227 children (AVs), which - as far as I understand, are created and attached to PD upon send_mad.
There were several applications, that were running at the time of stuck, opensm was one of them.
[cda39020 opensm.exe]
83c.0003a8 9af686f0 0000002 RUNNING nt!DbgBreakPoint
ibbus!sync_destroy_obj+0xa61
ibbus!destroy_obj+0x8ad
ibbus!async_destroy_obj+0xa4
ibbus!ib_dealloc_pd+0x2b6
winmad!WmRegRemoveHandler+0xae
winmad!WmRegFree+0xe
winmad!WmProviderCleanup+0x24
winmad!WmFileCleanup+0x3a
Wdf01000!FxFileObjectFileCleanup::Invoke+0x24
Wdf01000!FxPkgGeneral::OnCleanup+0x57
Wdf01000!FxPkgGeneral::Dispatch+0xcb
Wdf01000!FxDevice::Dispatch+0x7f
nt!IovCallDriver+0x23f
nt!IofCallDriver+0x1b
nt!IopCloseFile+0x387
nt!ObpDecrementHandleCount+0x146
nt!ObpCloseHandleTableEntry+0x234
nt!ExSweepHandleTable+0x5f
nt!ObKillProcess+0x54
nt!PspExitThread+0x5b6
nt!PsExitSpecialApc+0x22
nt!KiDeliverApc+0x1dc
nt!KiServiceExit+0x56
ntdll!KiFastSystemCallRet
ntdll!ZwWaitForWorkViaWorkerFactory+0xc
ntdll!TppWorkerThread+0x1f6
kernel32!BaseThreadInitThunk+0xe
ntdll!__RtlUserThreadStart+0x23
ntdll!_RtlUserThreadStart+
WmProviderDeregister(pRegistration->pProvider, pRegistration);
pRegistration->pDevice->IbInterface.destroy_qp(pRegistration->hQp, NULL);
pRegistration->pDevice->IbInterface.dealloc_pd(pRegistration->hPd, NULL);

pRegistration->pDevice->IbInterface.close_ca(pRegistration->hCa, NULL);

Could you suggest some idea ?
Thank you.
-----Original Message-----
From: Leonid Keller
Sent: Tuesday, January 31, 2012 1:15 PM
To: 'Hefty, Sean'; Tzachi Dar; Smith, Stan
Cc: Uri Habusha; ofw_list; Irena Gannon
Subject: RE: Opensm & WinMad: a race, cauing BSOD722
Thank you, Sean.
Some comments.
We do not think that this additional validation is necessary.
It's hard to believe - unless you saw that - that Windows can call close(handle) after open(&handle) has failed.
As to the patch to winverbs - it causes a crash, because WvProviderGet is called at DISPATCH level.
ATTEMPTED_SWITCH_FROM_DPC (b8)
A wait operation, attach process, or yield was attempted from a DPC routine.
This is an illegal operation and the stack track will lead to the offending
code and original DPC routine.
nt!KiSwapContext+0x7f
nt!KiSwapThread+0x2fa
nt!KeWaitForGate+0x22a
nt!KiAcquireGuardedMutex+0x35
nt!KeAcquireGuardedMutex+0x39
winverbs!WvProviderGet+0x1d
winverbs!WvEpCompleteDisconnect+0x113
winverbs!WvEpIbCmHandler+0x26a
ibbus!cm_cep_handler+0x99
ibbus!__process_cep+0x10f
ibbus!__drep_handler+0x6ea
ibbus!__cep_mad_recv_cb+0x246
ibbus!__mad_svc_recv_done+0xb58
ibbus!mad_disp_recv_done+0x1650
ibbus!process_mad_recv+0x3bf
ibbus!spl_qp_comp+0x3d2
ibbus!spl_qp_recv_dpc_cb+0x112
nt!KiRetireDpcList+0x117
nt!KyRetireDpcList+0x5
nt!KiDispatchInterruptContinue
I've replaced mutex by spinlock - see below.
I did it also for WinMad, albeit it has no asynchronous callbacks like WinVerbs.
The main reason is to keep it similar to WinVerbs as it is today.
A minor, mostly theoretical one: there are other functions, which are using today the provider mutex. It seems for me worthful to keep for
them possibility to call a low-level WvProviderGet function.
What's your opinion ?
Index: B:/users/leonid/svn/winib/trunk/core/winverbs/kernel/wv_provider.c
===================================================================
--- B:/users/leonid/svn/winib/trunk/core/winverbs/kernel/wv_provider.c (revision 9686)
+++ B:/users/leonid/svn/winib/trunk/core/winverbs/kernel/wv_provider.c (revision 9687)
@@ -44,14 +44,15 @@
LONG WvProviderGet(WV_PROVIDER *pProvider)
{
LONG val;
+ KIRQL irql;
- KeAcquireGuardedMutex(&pProvider->Lock);
+ KeAcquireSpinLock(&pProvider->SpinLock, &irql);
val = InterlockedIncrement(&pProvider->Ref);
if (val == 1) {
pProvider->Ref = 0;
val = 0;
}
- KeReleaseGuardedMutex(&pProvider->Lock);
+ KeReleaseSpinLock(&pProvider->SpinLock, irql);
return val;
}
@@ -119,6 +120,7 @@
KeInitializeEvent(&pProvider->SharedEvent, NotificationEvent, FALSE);
pProvider->Exclusive = 0;
KeInitializeEvent(&pProvider->ExclusiveEvent, SynchronizationEvent, FALSE);
+ KeInitializeSpinLock(&pProvider->SpinLock);
return STATUS_SUCCESS;
}
Index: B:/users/leonid/svn/winib/trunk/core/winverbs/kernel/wv_provider.h
===================================================================
--- B:/users/leonid/svn/winib/trunk/core/winverbs/kernel/wv_provider.h (revision 9686)
+++ B:/users/leonid/svn/winib/trunk/core/winverbs/kernel/wv_provider.h (revision 9687)
@@ -80,6 +80,7 @@
KEVENT ExclusiveEvent;
WORK_QUEUE WorkQueue;
+ KSPIN_LOCK SpinLock;
} WV_PROVIDER;
Index: B:/users/leonid/svn/winib/trunk/core/winmad/kernel/wm_provider.h
===================================================================
--- B:/users/leonid/svn/winib/trunk/core/winmad/kernel/wm_provider.h (revision 9687)
+++ B:/users/leonid/svn/winib/trunk/core/winmad/kernel/wm_provider.h (revision 9688)
@@ -57,6 +57,7 @@
KEVENT SharedEvent;
LONG Exclusive;
KEVENT ExclusiveEvent;
+ KSPIN_LOCK SpinLock;
} WM_PROVIDER;
Index: B:/users/leonid/svn/winib/trunk/core/winmad/kernel/wm_provider.c
===================================================================
--- B:/users/leonid/svn/winib/trunk/core/winmad/kernel/wm_provider.c (revision 9687)
+++ B:/users/leonid/svn/winib/trunk/core/winmad/kernel/wm_provider.c (revision 9688)
@@ -36,14 +36,15 @@
LONG WmProviderGet(WM_PROVIDER *pProvider)
{
LONG val;
+ KIRQL irql;
- KeAcquireGuardedMutex(&pProvider->Lock);
+ KeAcquireSpinLock(&pProvider->SpinLock, &irql);
val = InterlockedIncrement(&pProvider->Ref);
if (val == 1) {
pProvider->Ref = 0;
val = 0;
}
- KeReleaseGuardedMutex(&pProvider->Lock);
+ KeReleaseSpinLock(&pProvider->SpinLock, irql);
return val;
}
@@ -72,6 +73,7 @@
KeInitializeEvent(&pProvider->SharedEvent, NotificationEvent, FALSE);
pProvider->Exclusive = 0;
KeInitializeEvent(&pProvider->ExclusiveEvent, SynchronizationEvent, FALSE);
+ KeInitializeSpinLock(&pProvider->SpinLock);
ASSERT(ControlDevice != NULL);
-----Original Message-----
Sent: Tuesday, January 31, 2012 12:08 AM
To: Leonid Keller; Tzachi Dar; Smith, Stan
Cc: Uri Habusha; ofw_list; Irena Gannon
Subject: RE: Opensm & WinMad: a race, cauing BSOD722

WmProviderInit() is called without checking the return status. Is there a
reason ?
Seems like the similar patch is needed for WvIoDeviceControl().

I can't tell whether IOCTLs suffer from the same problem or not. But since Windows is stupid, I went ahead and added the same protection
to winverbs, plus some additional validation in case we get a cleanup event for a file for which we failed to create.
- Sean

Hefty, Sean

2012-02-02 19:19:08 UTC

Permalink