From: Leonid Keller [mailto:***@mellanox.com]
Sent: Wednesday, January 18, 2012 12:57 AM
To: Smith, Stan; Uri Habusha; Fab Tillier; Hefty, Sean
Cc: ofw_list; Irena Gannon
Subject: RE: Opensm & WinMad: a race, causing BSOD722
Thank you, Stan.
A quick question after a quick look on the patch:
I understand that with this patch the MAD receiver thread will exit in some time.
The original problem is umad_read()s were repeatedly issued during and after the HCA/port close; infinite for() loop, error handler just continued the loop.
Eventually opensm exit() clobbered the umad_receiver() thread during process tear-down. Your testing exposed a race condition.
The fix allows 0..1 umad_read()'s and then exits the for() loop which then exits the umad_receiver() function which terminates the CL thread.
I do not understand why the crash can't happen during this time.
umad_receiver_stop() is called which clears the umad_receiver() for() loop execution condition.
Possible umad_receiver() outcomes:
umad_receiver () is not in umad_read(), hence the for() loop check will exit.
umad_receiver() is blocked in umad_read() call; when umad_read() returns there will either be a valid MAD or an error (device closed) - either way the for() loop check will exit.
Point is umad_read() will not be called after the HCA/port is closed.
cl_thread_destroy() was unable to interrupt the umad_read() call or if it did another umad_read() was issued before the thread terminated.
I'd expect umad_receiver_stop() to wait till umad_receiver_run variable gets a value "OK, I'm done, dude". :)
Potential deadlock situation as the caller of umad_receiver_stop() eventually closes the HCA/port. In order for umad_receiver_stop() to receive confirmation that umad_receiver() had stopped without first closing the HCA, one MAD would have to be received in order to exit the umad_receiver() for() loop and notify/release umad_receiver_stop().
I'd vote for keep-it-simple (KIS).
BTW, is this problem only for reading MADs ?
I don't know, yet to be discovered.
Is it possible that one thread is sending a MAD while the main one is closing the port ?
One would think the MAD driver (winmad) would disallow closing while a Transmit is in progress.
Stan.
From: Smith, Stan [mailto:***@intel.com]<mailto:[mailto:***@intel.com]>
Sent: Tuesday, January 17, 2012 10:29 PM
To: Leonid Keller; Uri Habusha; Fab Tillier; Hefty, Sean
Cc: ofw_list; Irena Gannon
Subject: RE: Opensm & WinMad: a race, causing BSOD722
Leo,
Please try the enclosed patch which allows the umad_receiver() thread to exit once umad_receiver_stop() has been called [overall the same effect as cl_thread_destroy()].
This is not a fix for winmad() but a way of 'not' calling umad_read() after umad_receiver_stop() has been called.
Stan.
--- ulp/opensm/user/libvendor/osm_vendor_ibumad.c Thu Jan 12 15:27:32 2012
+++ ulp/opensm/user/libvendor/osm_vendor_ibumad.c Tue Jan 17 12:15:11 2012
@@ -67,6 +67,10 @@
#include <opensm/osm_helper.h>
#include <vendor/osm_vendor_api.h>
+#ifdef __WIN__
+static int umad_receiver_run;
+#endif
+
/****s* OpenSM: Vendor UMAD/osm_umad_bind_info_t
* NAME
* osm_umad_bind_info_t
@@ -253,7 +257,11 @@
OSM_LOG_ENTER(p_ur->p_log);
+#ifdef __WIN__
+ for (umad_receiver_run=1; umad_receiver_run;) {
+#else
for (;;) {
+#endif
if (!umad &&
!(umad = umad_alloc(1, umad_size() + MAD_BLOCK_SIZE))) {
OSM_LOG(p_ur->p_log, OSM_LOG_ERROR, "ERR 5403: "
@@ -469,9 +477,7 @@
pthread_join(p_ur->tid, NULL);
p_ur->tid = 0;
#else
- /* XXX hangs current thread - suspect umad_recv() ignoring wakeup.
- * cl_thread_destroy(&p_ur->tid);
- */
+ umad_receiver_run = 0;
#endif
p_ur->p_vend = NULL;
p_ur->p_log = NULL;
From: Leonid Keller [mailto:***@mellanox.com]<mailto:[mailto:***@mellanox.com]>
Sent: Tuesday, January 17, 2012 12:57 AM
To: Uri Habusha; Fab Tillier; Smith, Stan; Hefty, Sean
Cc: ofw_list; Irena Gannon
Subject: RE: Opensm & WinMad: a race, causing BSOD722
I believe, there are two questions here.
Why Opensm closes the port while the MAD receiving thread is still in work ?
Hello Leo. I suspect the above is the root of the problem. A long time ago, opensm 3.3.6, opensm would hang in shutdown due to the MAD receiving thread ignoring the destroy thread call if reader thread was in winmad read().
Consequently, the MAD receiver thread destroy was commented out and opensm shutdown cleanly.
Perhaps changes in IBAL have altered timing such that shutdown is delayed enough such that the MAD reader thread runs again/further?
A SWAG at this point - I'll experiment today and let you know.
Stan.
[LK] Hello Stan. It looks like a Smart WAG. :)
But I believe, we all agree with Fab, that the problem should be fixed at WinMad level. We have other WinMad applications except opensm.
I hope, Sean will have a chance to look at it.
Why Winmad doesn't expect that WmIoRead and WmProviderCleanup can be called simultaneously from different threads ?
Please, look at the code below.
WmIoRead gets provider without any lock and without checking whether it is valid.
It then calls WmProviderRead, which doesn't increment any reference, but handles MAD under WdfObjectAcquireLock(pProvider->ReadQueue), which can be already destroyed (like it was in our case).
WmFileCleanup removes provider from the list under a mutex, unknown to WmIoRead, and then calls WmProviderCleanup.
The latter waits for some reference, unknown for WmIoRead, and then calls WdfIoQueuePurgeSynchronously(pProvider->ReadQueue) without WdfObjectAcquireLock(pProvider->ReadQueue).
No doubt, I'm missing here something, I just did a quick review of the code.
Maybe WinMad presumes that the application first stops I/O, then sends WM_IOCTL_DEREGISTER and only then closes the file handle and the right way for us is just to fix a bad boy opensm ? :)
As Uri stated, we do not know how to reproduce it: opensm gets started and stopped 1000 times during a night regression run, so it's not 1 to 100, it's 1 to 100000. :(
We'd very appreciate if you could find the time to review the code.
(Of course, the same problem may exist for WmIoWrite)
TIA
static VOID WmIoRead(WDFQUEUE Queue, WDFREQUEST Request, size_t Length)
{
WDFFILEOBJECT file;
WM_PROVIDER *prov;
UNREFERENCED_PARAMETER(Queue);
file = WdfRequestGetFileObject(Request);
prov = WmProviderGetContext(file);
WmProviderRead(prov, Request);
}
void WmProviderRead(WM_PROVIDER *pProvider, WDFREQUEST Request)
{
WM_REGISTRATION *reg;
NTSTATUS status;
WM_IO_MAD *wmad;
size_t outlen, len = 0;
status = WdfRequestRetrieveOutputBuffer(Request, sizeof(WM_IO_MAD), &wmad, &outlen);
if (!NT_SUCCESS(status)) {
goto out;
}
WdfObjectAcquireLock(pProvider->ReadQueue);
if (pProvider->MadHead == NULL) {
status = WdfRequestForwardToIoQueue(Request, pProvider->ReadQueue);
WdfObjectReleaseLock(pProvider->ReadQueue);
if (NT_SUCCESS(status)) {
return;
}
goto out;
}
len = outlen;
status = WmCopyMad(wmad, pProvider->MadHead, &len);
if (status == STATUS_SUCCESS) {
reg = (WM_REGISTRATION *) pProvider->MadHead->context1;
reg->pDevice->IbInterface.put_mad(__FILE__, __LINE__, WmRemoveMad(pProvider));
}
WdfObjectReleaseLock(pProvider->ReadQueue);
out:
WdfRequestCompleteWithInformation(Request, status, len);
}
static VOID WmFileCleanup(WDFFILEOBJECT FileObject)
{
WM_PROVIDER *prov = WmProviderGetContext(FileObject);
KeAcquireGuardedMutex(&Lock);
RemoveEntryList(&prov->Entry);
KeReleaseGuardedMutex(&Lock);
WmProviderCleanup(prov);
}
void WmProviderCleanup(WM_PROVIDER *pProvider)
{
WM_REGISTRATION *reg;
while ((reg = IndexListRemoveHead(&pProvider->RegIndex)) != NULL) {
WmRegFree(reg);
}
if (InterlockedDecrement(&pProvider->Ref) > 0) {
KeWaitForSingleObject(&pProvider->Event, Executive, KernelMode, FALSE, NULL);
}
WdfIoQueuePurgeSynchronously(pProvider->ReadQueue);
WdfObjectDelete(pProvider->ReadQueue);
IndexListDestroy(&pProvider->RegIndex);
}
From: Uri Habusha
Sent: Tuesday, January 17, 2012 7:16 AM
To: Fab Tillier; Smith, Stan; Leonid Keller; Hefty, Sean
Cc: ofw_list; Irena Gannon
Subject: RE: Opensm & WinMad: a race, causing BSOD722
Again you miss the point.
Not always we have a clear repro steps. If we have it the life was simple.
This issue happened during night run of our regression. The same test runs every night and this is the first time it happens, what means it is race/timing issue.
In such a case we need to review the code start thinking what can cause the issue and add instrumental code so next time it happens we understand better the scenario.
Uri
From: Fab Tillier [mailto:***@microsoft.com]<mailto:[mailto:***@microsoft.com]>
Sent: Tuesday, January 17, 2012 2:47 AM
To: Smith, Stan; Leonid Keller; Hefty, Sean
Cc: Uri Habusha; ofw_list; Irena Gannon
Subject: RE: Opensm & WinMad: a race, causing BSOD722
10D is the bugcheck code, WDF_VIOLATION. The next 4 parameters are the bugcheck arguments.
Leonid, there's good detail in this report, but you're missing a critical 'repro steps' section that would allow this to be reproduced and debugged by the bug owner. Just the crashdump information is only usually enough for simple bugs, and your analysis is probably on the right track. Still, if you want someone else to look into it, it's best to tell them how they too can see it - debugging things live is more productive than debugging by code inspection. How are Sean or Stan supposed to test that they have indeed fixed a bug, assuming they find something suspect and make code changes, when they have no idea how you triggered it? Is it a consistent repro? 1 in a 100?
Thanks,
-Fab
From: ofw-***@lists.openfabrics.org<mailto:ofw-***@lists.openfabrics.org> [mailto:ofw-***@lists.openfabrics.org]<mailto:[mailto:ofw-***@lists.openfabrics.org]> On Behalf Of Smith, Stan
Sent: Monday, January 16, 2012 4:37 PM
To: Leonid Keller; Hefty, Sean
Cc: Uri Habusha; ofw_list; Irena Gannon
Subject: Re: [ofw] Opensm & WinMad: a race, causing BSOD722
What are the conditions required to generate the BSOD?