What are you saying exactly by 'opensm stuck on kill'? More kill info please.
OpenSM running as a console application '--console local' and you typed the 'exit' command?
Stan.
-----Original Message-----
Sent: Thursday, February 02, 2012 6:42 AM
To: Leonid Keller; Hefty, Sean; Tzachi Dar; Smith, Stan
Cc: Uri Habusha; ofw_list; Irena Gannon
Subject: opensm stuck upon kill
Hi guys,
opensm got stuck upon kill
I'll try to keep the full dump and will send you if you are interested.
The stuck happens in IBAL upon releasing PD.
nt!DbgBreakPoint
ibbus!sync_destroy_obj+0xa61
ibbus!destroy_obj+0x8ad
ibbus!async_destroy_obj+0xa4
ibbus!ib_dealloc_pd+0x2b6
winmad!WmRegRemoveHandler+0xae
...
// from ibbus!sync_destroy_obj
1: kd> ?? p_obj
struct _al_obj * 0xa970fbbc
...
+0x080 ref_cnt : 1
...
+0x0a4 type : 3 //it's AV
+0x0a8 state : 3 ( CL_DESTROYING )
...
There are 227 children (AVs), which - as far as I understand, are created and attached to PD upon send_mad.
There were several applications, that were running at the time of stuck, opensm was one of them.
[cda39020 opensm.exe]
83c.0003a8 9af686f0 0000002 RUNNING nt!DbgBreakPoint
ibbus!sync_destroy_obj+0xa61
ibbus!destroy_obj+0x8ad
ibbus!async_destroy_obj+0xa4
ibbus!ib_dealloc_pd+0x2b6
winmad!WmRegRemoveHandler+0xae
winmad!WmRegFree+0xe
winmad!WmProviderCleanup+0x24
winmad!WmFileCleanup+0x3a
Wdf01000!FxFileObjectFileCleanup::Invoke+0x24
Wdf01000!FxPkgGeneral::OnCleanup+0x57
Wdf01000!FxPkgGeneral::Dispatch+0xcb
Wdf01000!FxDevice::Dispatch+0x7f
nt!IovCallDriver+0x23f
nt!IofCallDriver+0x1b
nt!IopCloseFile+0x387
nt!ObpDecrementHandleCount+0x146
nt!ObpCloseHandleTableEntry+0x234
nt!ExSweepHandleTable+0x5f
nt!ObKillProcess+0x54
nt!PspExitThread+0x5b6
nt!PsExitSpecialApc+0x22
nt!KiDeliverApc+0x1dc
nt!KiServiceExit+0x56
ntdll!KiFastSystemCallRet
ntdll!ZwWaitForWorkViaWorkerFactory+0xc
ntdll!TppWorkerThread+0x1f6
kernel32!BaseThreadInitThunk+0xe
ntdll!__RtlUserThreadStart+0x23
ntdll!_RtlUserThreadStart+
WmProviderDeregister(pRegistration->pProvider, pRegistration);
pRegistration->pDevice->IbInterface.destroy_qp(pRegistration->hQp, NULL);
pRegistration->pDevice->IbInterface.dealloc_pd(pRegistration->hPd, NULL);
pRegistration->pDevice->IbInterface.close_ca(pRegistration->hCa, NULL);
Could you suggest some idea ?
Thank you.
-----Original Message-----
From: Leonid Keller
Sent: Tuesday, January 31, 2012 1:15 PM
To: 'Hefty, Sean'; Tzachi Dar; Smith, Stan
Cc: Uri Habusha; ofw_list; Irena Gannon
Subject: RE: Opensm & WinMad: a race, cauing BSOD722
Thank you, Sean.
Some comments.
We do not think that this additional validation is necessary.
It's hard to believe - unless you saw that - that Windows can call close(handle) after open(&handle) has failed.
As to the patch to winverbs - it causes a crash, because WvProviderGet is called at DISPATCH level.
ATTEMPTED_SWITCH_FROM_DPC (b8)
A wait operation, attach process, or yield was attempted from a DPC routine.
This is an illegal operation and the stack track will lead to the offending
code and original DPC routine.
nt!KiSwapContext+0x7f
nt!KiSwapThread+0x2fa
nt!KeWaitForGate+0x22a
nt!KiAcquireGuardedMutex+0x35
nt!KeAcquireGuardedMutex+0x39
winverbs!WvProviderGet+0x1d
winverbs!WvEpCompleteDisconnect+0x113
winverbs!WvEpIbCmHandler+0x26a
ibbus!cm_cep_handler+0x99
ibbus!__process_cep+0x10f
ibbus!__drep_handler+0x6ea
ibbus!__cep_mad_recv_cb+0x246
ibbus!__mad_svc_recv_done+0xb58
ibbus!mad_disp_recv_done+0x1650
ibbus!process_mad_recv+0x3bf
ibbus!spl_qp_comp+0x3d2
ibbus!spl_qp_recv_dpc_cb+0x112
nt!KiRetireDpcList+0x117
nt!KyRetireDpcList+0x5
nt!KiDispatchInterruptContinue
I've replaced mutex by spinlock - see below.
I did it also for WinMad, albeit it has no asynchronous callbacks like WinVerbs.
The main reason is to keep it similar to WinVerbs as it is today.
A minor, mostly theoretical one: there are other functions, which are using today the provider mutex. It seems for me worthful to keep for
them possibility to call a low-level WvProviderGet function.
What's your opinion ?
Index: B:/users/leonid/svn/winib/trunk/core/winverbs/kernel/wv_provider.c
===================================================================
--- B:/users/leonid/svn/winib/trunk/core/winverbs/kernel/wv_provider.c (revision 9686)
+++ B:/users/leonid/svn/winib/trunk/core/winverbs/kernel/wv_provider.c (revision 9687)
@@ -44,14 +44,15 @@
LONG WvProviderGet(WV_PROVIDER *pProvider)
{
LONG val;
+ KIRQL irql;
- KeAcquireGuardedMutex(&pProvider->Lock);
+ KeAcquireSpinLock(&pProvider->SpinLock, &irql);
val = InterlockedIncrement(&pProvider->Ref);
if (val == 1) {
pProvider->Ref = 0;
val = 0;
}
- KeReleaseGuardedMutex(&pProvider->Lock);
+ KeReleaseSpinLock(&pProvider->SpinLock, irql);
return val;
}
@@ -119,6 +120,7 @@
KeInitializeEvent(&pProvider->SharedEvent, NotificationEvent, FALSE);
pProvider->Exclusive = 0;
KeInitializeEvent(&pProvider->ExclusiveEvent, SynchronizationEvent, FALSE);
+ KeInitializeSpinLock(&pProvider->SpinLock);
return STATUS_SUCCESS;
}
Index: B:/users/leonid/svn/winib/trunk/core/winverbs/kernel/wv_provider.h
===================================================================
--- B:/users/leonid/svn/winib/trunk/core/winverbs/kernel/wv_provider.h (revision 9686)
+++ B:/users/leonid/svn/winib/trunk/core/winverbs/kernel/wv_provider.h (revision 9687)
@@ -80,6 +80,7 @@
KEVENT ExclusiveEvent;
WORK_QUEUE WorkQueue;
+ KSPIN_LOCK SpinLock;
} WV_PROVIDER;
Index: B:/users/leonid/svn/winib/trunk/core/winmad/kernel/wm_provider.h
===================================================================
--- B:/users/leonid/svn/winib/trunk/core/winmad/kernel/wm_provider.h (revision 9687)
+++ B:/users/leonid/svn/winib/trunk/core/winmad/kernel/wm_provider.h (revision 9688)
@@ -57,6 +57,7 @@
KEVENT SharedEvent;
LONG Exclusive;
KEVENT ExclusiveEvent;
+ KSPIN_LOCK SpinLock;
} WM_PROVIDER;
Index: B:/users/leonid/svn/winib/trunk/core/winmad/kernel/wm_provider.c
===================================================================
--- B:/users/leonid/svn/winib/trunk/core/winmad/kernel/wm_provider.c (revision 9687)
+++ B:/users/leonid/svn/winib/trunk/core/winmad/kernel/wm_provider.c (revision 9688)
@@ -36,14 +36,15 @@
LONG WmProviderGet(WM_PROVIDER *pProvider)
{
LONG val;
+ KIRQL irql;
- KeAcquireGuardedMutex(&pProvider->Lock);
+ KeAcquireSpinLock(&pProvider->SpinLock, &irql);
val = InterlockedIncrement(&pProvider->Ref);
if (val == 1) {
pProvider->Ref = 0;
val = 0;
}
- KeReleaseGuardedMutex(&pProvider->Lock);
+ KeReleaseSpinLock(&pProvider->SpinLock, irql);
return val;
}
@@ -72,6 +73,7 @@
KeInitializeEvent(&pProvider->SharedEvent, NotificationEvent, FALSE);
pProvider->Exclusive = 0;
KeInitializeEvent(&pProvider->ExclusiveEvent, SynchronizationEvent, FALSE);
+ KeInitializeSpinLock(&pProvider->SpinLock);
ASSERT(ControlDevice != NULL);
-----Original Message-----
Sent: Tuesday, January 31, 2012 12:08 AM
To: Leonid Keller; Tzachi Dar; Smith, Stan
Cc: Uri Habusha; ofw_list; Irena Gannon
Subject: RE: Opensm & WinMad: a race, cauing BSOD722
WmProviderInit() is called without checking the return status. Is there a
reason ?
Seems like the similar patch is needed for WvIoDeviceControl().
I can't tell whether IOCTLs suffer from the same problem or not. But since Windows is stupid, I went ahead and added the same protection
to winverbs, plus some additional validation in case we get a cleanup event for a file for which we failed to create.
- Sean