Discussion:
[ofw] opensm 3.3.11 crash
Leonid Keller
2011-10-17 15:25:45 UTC
Permalink
Hi Sean,

We experienced a crash of opensm in win8.
The core reason for the crash was that WinMad driver has not started for some reason.
So libibumad couldn't work.
But it had a bug in error flow which caused an access violation.

I bring here two quotes of umad_open_port() function with my comments:

// umad_open_port

...
hr = WmGetObject(IID_IWMProvider, (LPVOID*) &ports[portid].prov);
if (FAILED(hr)) {
CloseHandle(ports[portid].overlap.hEvent);
portid = GetLastError() & 0x80000000;
goto out;
}
// if there is no providers WmGetObject will return error, but portid wil be set to 0
// We saw it when GetLastError() returned 6. Maybe you mean it to be HRESULT_FROM_WIN32(GetLastError()) ?

// As a result umad_open_port() returns 0, i.e. SUCCESS which causes later an access violation.
Smith, Stan
2011-10-17 16:16:39 UTC
Permalink
Hi Leo,
Just so everyone understands, neither Sean nor myself have done any testing on Win8; we are still trying to get winOFED 3.0 out the door... :)
Please continue the win8 reports as it only helps, just understand no one here has walked the win8 path.

Stan.

From: ofw-***@lists.openfabrics.org [mailto:ofw-***@lists.openfabrics.org] On Behalf Of Leonid Keller
Sent: Monday, October 17, 2011 8:26 AM
To: Hefty, Sean
Cc: ofw_list
Subject: [ofw] opensm 3.3.11 crash

Hi Sean,

We experienced a crash of opensm in win8.
The core reason for the crash was that WinMad driver has not started for some reason.
So libibumad couldn't work.
But it had a bug in error flow which caused an access violation.

I bring here two quotes of umad_open_port() function with my comments:

// umad_open_port

...
hr = WmGetObject(IID_IWMProvider, (LPVOID*) &ports[portid].prov);
if (FAILED(hr)) {
CloseHandle(ports[portid].overlap.hEvent);
portid = GetLastError() & 0x80000000;
goto out;
}
// if there is no providers WmGetObject will return error, but portid wil be set to 0
// We saw it when GetLastError() returned 6. Maybe you mean it to be HRESULT_FROM_WIN32(GetLastError()) ?

// As a result umad_open_port() returns 0, i.e. SUCCESS which causes later an access violation.
Leonid Keller
2011-10-17 16:56:14 UTC
Permalink
Stan, I absolutely didn't mean to criticize or so. :(
I know you didn't work with win8, but even if you did, so what ?
It was just a bug report with mentioning where it happened.
Because I'm not sure you can reproduce it in win7, for example.

From: Smith, Stan [mailto:***@intel.com]
Sent: Monday, October 17, 2011 6:17 PM
To: Leonid Keller; Hefty, Sean
Cc: ofw_list
Subject: RE: opensm 3.3.11 crash

Hi Leo,
Just so everyone understands, neither Sean nor myself have done any testing on Win8; we are still trying to get winOFED 3.0 out the door... :)
Please continue the win8 reports as it only helps, just understand no one here has walked the win8 path.

Stan.

From: ofw-***@lists.openfabrics.org [mailto:ofw-***@lists.openfabrics.org] On Behalf Of Leonid Keller
Sent: Monday, October 17, 2011 8:26 AM
To: Hefty, Sean
Cc: ofw_list
Subject: [ofw] opensm 3.3.11 crash

Hi Sean,

We experienced a crash of opensm in win8.
The core reason for the crash was that WinMad driver has not started for some reason.
So libibumad couldn't work.
But it had a bug in error flow which caused an access violation.

I bring here two quotes of umad_open_port() function with my comments:

// umad_open_port

...
hr = WmGetObject(IID_IWMProvider, (LPVOID*) &ports[portid].prov);
if (FAILED(hr)) {
CloseHandle(ports[portid].overlap.hEvent);
portid = GetLastError() & 0x80000000;
goto out;
}
// if there is no providers WmGetObject will return error, but portid wil be set to 0
// We saw it when GetLastError() returned 6. Maybe you mean it to be HRESULT_FROM_WIN32(GetLastError()) ?

// As a result umad_open_port() returns 0, i.e. SUCCESS which causes later an access violation.
Smith, Stan
2011-10-17 18:19:54 UTC
Permalink
No problems Leo; please feel to criticize, I value your insights and experience - all comments are welcome; that's the way the code gets better!
I just wanted you to understand you are way ahead of us in win8 testing...

Stan.

From: Leonid Keller [mailto:***@mellanox.co.il]
Sent: Monday, October 17, 2011 9:56 AM
To: Smith, Stan; Hefty, Sean
Cc: ofw_list
Subject: RE: opensm 3.3.11 crash

Stan, I absolutely didn't mean to criticize or so. :(
I know you didn't work with win8, but even if you did, so what ?
It was just a bug report with mentioning where it happened.
Because I'm not sure you can reproduce it in win7, for example.

From: Smith, Stan [mailto:***@intel.com]
Sent: Monday, October 17, 2011 6:17 PM
To: Leonid Keller; Hefty, Sean
Cc: ofw_list
Subject: RE: opensm 3.3.11 crash

Hi Leo,
Just so everyone understands, neither Sean nor myself have done any testing on Win8; we are still trying to get winOFED 3.0 out the door... :)
Please continue the win8 reports as it only helps, just understand no one here has walked the win8 path.

Stan.

From: ofw-***@lists.openfabrics.org [mailto:ofw-***@lists.openfabrics.org] On Behalf Of Leonid Keller
Sent: Monday, October 17, 2011 8:26 AM
To: Hefty, Sean
Cc: ofw_list
Subject: [ofw] opensm 3.3.11 crash

Hi Sean,

We experienced a crash of opensm in win8.
The core reason for the crash was that WinMad driver has not started for some reason.
So libibumad couldn't work.
But it had a bug in error flow which caused an access violation.

I bring here two quotes of umad_open_port() function with my comments:

// umad_open_port

...
hr = WmGetObject(IID_IWMProvider, (LPVOID*) &ports[portid].prov);
if (FAILED(hr)) {
CloseHandle(ports[portid].overlap.hEvent);
portid = GetLastError() & 0x80000000;
goto out;
}
// if there is no providers WmGetObject will return error, but portid wil be set to 0
// We saw it when GetLastError() returned 6. Maybe you mean it to be HRESULT_FROM_WIN32(GetLastError()) ?

// As a result umad_open_port() returns 0, i.e. SUCCESS which causes later an access violation.
Smith, Stan
2011-10-27 15:49:16 UTC
Permalink
Hello,
PSB.

From: Leonid Keller [mailto:***@mellanox.com]
Sent: Thursday, October 27, 2011 5:52 AM
To: Smith, Stan; Hefty, Sean
Cc: ofw_list
Subject: RE: opensm 3.3.11 crash

What about Vista ?
(I'm not sure whether it's relevant for you)
We are compiling code for Server 2008 with 6001.18001 compiler.
We got two warnings.

[stan]
I use the compiler from the Windows Driver Kit version 7.1.0 release 'C:\WinDDK\7600.16385.1' and compile for win7, Svr2008-R2, Svr2008(wlh) and Vista on x86, x64 & ia64.
None of the compiles of osm_link_mgr.c show any compilation errors/warnings?

Googling around I see the 6001.1801 compiler is from a much older DDK, around 2009.
You might want to consider updating your compilation environment (WDK).

Stan.


In link_mgr_set_physp_pi() on zero-size memset:

memset(payload + sizeof(ib_port_info_t), 0,
IB_SMP_DATA_SIZE - sizeof(ib_port_info_t));

( Because after the last changes IB_SMP_DATA_SIZE - sizeof(ib_port_info_t) = 0; )

I've just removed the memset.

Index: B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_link_mgr.c
===================================================================
--- B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_link_mgr.c (revision 9050)
+++ B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_link_mgr.c (revision 9051)
@@ -157,8 +157,6 @@
}

memcpy(payload, p_old_pi, sizeof(ib_port_info_t));
- memset(payload + sizeof(ib_port_info_t), 0,
- IB_SMP_DATA_SIZE - sizeof(ib_port_info_t));

/*
Should never write back a value that is bigger then 3 in



Another warnings are in osm_torus.c:

The warnings I witnessed in compilation osm_torus.c required the cast (bool) in order to get a clean (warnings free compile).
The (bool) cast is not in the OFED for Linux code base.

13>b:\users\leonid\svn\winib\trunk\ulp\opensm\user\opensm\osm_torus.c(1191) : warning C4311: 'type cast' : pointer truncation from 'f_switch *' to 'unsigned int'
And some more like that.
I excluded them with #pragma - didn't want to change Linux code.

Index: B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_torus.c
===================================================================
--- B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_torus.c (revision 9078)
+++ B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_torus.c (revision 9079)
@@ -1188,7 +1188,9 @@
port_cnt = osm_node_get_num_physp(osm_node);
sw_guid = osm_node_get_node_guid(osm_node);

+#pragma warning( disable:4311 )
success = (bool)alloc_fswitch(fabric, sw_guid, port_cnt);
+#pragma warning( default:4311 )
if (!success)
goto out;
}
@@ -1457,6 +1459,7 @@
jp1 = canonicalize(j + 1, t->y_sz);
kp1 = canonicalize(k + 1, t->z_sz);

+#pragma warning( disable:4311 )
fp = set_fp_bit((bool)t->sw[i][j][k], 0, 0, 0);
fp |= set_fp_bit((bool)t->sw[ip1][j][k], x_sz_gt1, 0, 0);
fp |= set_fp_bit((bool)t->sw[i][jp1][k], 0, y_sz_gt1, 0);
@@ -1465,6 +1468,7 @@
fp |= set_fp_bit((bool)t->sw[ip1][j][kp1], x_sz_gt1, 0, z_sz_gt1);
fp |= set_fp_bit((bool)t->sw[i][jp1][kp1], 0, y_sz_gt1, z_sz_gt1);
fp |= set_fp_bit((bool)t->sw[ip1][jp1][kp1], x_sz_gt1, y_sz_gt1, z_sz_gt1);
+#pragma warning( default:4311 )

fp |= x_sz_gt1 << 8;
fp |= y_sz_gt1 << 9;
@@ -8789,6 +8793,7 @@
return IB_SUCCESS;
}

+#pragma warning( disable:4311 )
static
bool good_xy_ring(struct torus *t, const int x, const int y, const int z)
{
@@ -8804,6 +8809,7 @@

return good_ring;
}
+#pragma warning( default:4311 )

static
struct t_switch *find_plane_mid(struct torus *t, const int z)
@@ -8830,6 +8836,7 @@
return NULL;
}

+#pragma warning( disable:4311 )
static
struct t_switch *find_stree_root(struct torus *t)
{
@@ -8892,6 +8899,7 @@
out:
return root;
}
+#pragma warning( default:4311 )

static
bool sw_in_master_stree(struct t_switch *sw)




From: Smith, Stan [mailto:***@intel.com]
Sent: Monday, October 17, 2011 8:20 PM
To: Leonid Keller; Hefty, Sean
Cc: ofw_list
Subject: RE: opensm 3.3.11 crash

No problems Leo; please feel to criticize, I value your insights and experience - all comments are welcome; that's the way the code gets better!
I just wanted you to understand you are way ahead of us in win8 testing...

Stan.

From: Leonid Keller [mailto:***@mellanox.co.il]
Sent: Monday, October 17, 2011 9:56 AM
To: Smith, Stan; Hefty, Sean
Cc: ofw_list
Subject: RE: opensm 3.3.11 crash

Stan, I absolutely didn't mean to criticize or so. :(
I know you didn't work with win8, but even if you did, so what ?
It was just a bug report with mentioning where it happened.
Because I'm not sure you can reproduce it in win7, for example.

From: Smith, Stan [mailto:***@intel.com]
Sent: Monday, October 17, 2011 6:17 PM
To: Leonid Keller; Hefty, Sean
Cc: ofw_list
Subject: RE: opensm 3.3.11 crash

Hi Leo,
Just so everyone understands, neither Sean nor myself have done any testing on Win8; we are still trying to get winOFED 3.0 out the door... :)
Please continue the win8 reports as it only helps, just understand no one here has walked the win8 path.

Stan.

From: ofw-***@lists.openfabrics.org [mailto:ofw-***@lists.openfabrics.org] On Behalf Of Leonid Keller
Sent: Monday, October 17, 2011 8:26 AM
To: Hefty, Sean
Cc: ofw_list
Subject: [ofw] opensm 3.3.11 crash

Hi Sean,

We experienced a crash of opensm in win8.
The core reason for the crash was that WinMad driver has not started for some reason.
So libibumad couldn't work.
But it had a bug in error flow which caused an access violation.

I bring here two quotes of umad_open_port() function with my comments:

// umad_open_port

...
hr = WmGetObject(IID_IWMProvider, (LPVOID*) &ports[portid].prov);
if (FAILED(hr)) {
CloseHandle(ports[portid].overlap.hEvent);
portid = GetLastError() & 0x80000000;
goto out;
}
// if there is no providers WmGetObject will return error, but portid wil be set to 0
// We saw it when GetLastError() returned 6. Maybe you mean it to be HRESULT_FROM_WIN32(GetLastError()) ?

// As a result umad_open_port() returns 0, i.e. SUCCESS which causes later an access violation.
Hal Rosenstock
2011-10-27 16:01:04 UTC
Permalink
Hello,****
PSB.****
** **
*Sent:* Thursday, October 27, 2011 5:52 AM
*To:* Smith, Stan; Hefty, Sean
*Cc:* ofw_list
*Subject:* RE: opensm 3.3.11 crash****
** **
What about Vista ?****
(I’m not sure whether it’s relevant for you)****
We are compiling code for Server 2008 with 6001.18001 compiler.****
We got two warnings.****
** **
[stan]****
I use the compiler from the Windows Driver Kit version 7.1.0 release
‘C:\WinDDK\7600.16385.1’ and compile for win7, Svr2008-R2, Svr2008(wlh) and
Vista on x86, x64 & ia64.****
None of the compiles of osm_link_mgr.c show any compilation
errors/warnings?****
** **
Googling around I see the 6001.1801 compiler is from a much older DDK,
around 2009.****
You might want to consider updating your compilation environment (WDK).***
*
** **
Stan.****
** **
** **
In link_mgr_set_physp_pi() on zero-size memset:****
** **
memset(payload + sizeof(ib_port_info_t), 0,****
IB_SMP_DATA_SIZE - sizeof(ib_port_info_t));****
** **
( Because after the last changes IB_SMP_DATA_SIZE - sizeof(ib_port_info_t)
= 0; )****
** **
I’ve just removed the memset.
I know I removed other similar (size 0) memset but somehow missed this one
:-( I'll send an official patch for this shortly so it'll be fixed in the
next version.

-- Hal
** **
B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_link_mgr.c****
===================================================================****
---
B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_link_mgr.c
(revision 9050)****
+++
B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_link_mgr.c
(revision 9051)****
@@ -157,8 +157,6 @@****
}****
****
memcpy(payload, p_old_pi, sizeof(ib_port_info_t));****
- memset(payload + sizeof(ib_port_info_t), 0,****
- IB_SMP_DATA_SIZE - sizeof(ib_port_info_t));****
****
/*****
Should never write back a value that is bigger then 3 in
****
** **
** **
** **
Another warnings are in osm_torus.c:****
** **
The warnings I witnessed in compilation osm_torus.c required the cast
(bool) in order to get a clean (warnings free compile).****
The (bool) cast is not in the OFED for Linux code base.****
** **
13>b:\users\leonid\svn\winib\trunk\ulp\opensm\user\opensm\osm_torus.c(1191)
: warning C4311: 'type cast' : pointer truncation from 'f_switch *' to
'unsigned int'****
And some more like that.****
I excluded them with #pragma – didn’t want to change Linux code.****
** **
Index: B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_torus.c*
***
===================================================================****
--- B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_torus.c
(revision 9078)****
+++
B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_torus.c
(revision 9079)****
@@ -1188,7 +1188,9 @@****
port_cnt =
osm_node_get_num_physp(osm_node);****
sw_guid = osm_node_get_node_guid(osm_node);
****
****
+#pragma warning( disable:4311 )****
success = (bool)alloc_fswitch(fabric,
sw_guid, port_cnt);****
+#pragma warning( default:4311 )****
if (!success)****
goto out;****
}****
@@ -1457,6 +1459,7 @@****
jp1 = canonicalize(j + 1, t->y_sz);****
kp1 = canonicalize(k + 1, t->z_sz);****
****
+#pragma warning( disable:4311 )****
fp = set_fp_bit((bool)t->sw[i][j][k], 0, 0, 0);****
fp |= set_fp_bit((bool)t->sw[ip1][j][k], x_sz_gt1, 0, 0);*
***
fp |= set_fp_bit((bool)t->sw[i][jp1][k], 0, y_sz_gt1, 0);*
***
@@ -1465,6 +1468,7 @@****
fp |= set_fp_bit((bool)t->sw[ip1][j][kp1], x_sz_gt1, 0,
z_sz_gt1);****
fp |= set_fp_bit((bool)t->sw[i][jp1][kp1], 0, y_sz_gt1,
z_sz_gt1);****
fp |= set_fp_bit((bool)t->sw[ip1][jp1][kp1], x_sz_gt1,
y_sz_gt1, z_sz_gt1);****
+#pragma warning( default:4311 )****
****
fp |= x_sz_gt1 << 8;****
fp |= y_sz_gt1 << 9;****
@@ -8789,6 +8793,7 @@****
return IB_SUCCESS;****
}****
****
+#pragma warning( disable:4311 )****
static****
bool good_xy_ring(struct torus *t, const int x, const int y, const int z)
****
{****
@@ -8804,6 +8809,7 @@****
****
return good_ring;****
}****
+#pragma warning( default:4311 )****
****
static****
struct t_switch *find_plane_mid(struct torus *t, const int z)****
@@ -8830,6 +8836,7 @@****
return NULL;****
}****
****
+#pragma warning( disable:4311 )****
static****
struct t_switch *find_stree_root(struct torus *t)****
{****
@@ -8892,6 +8899,7 @@****
out:****
return root;****
}****
+#pragma warning( default:4311 )****
****
static****
bool sw_in_master_stree(struct t_switch *sw)****
** **
** **
** **
** **
*Sent:* Monday, October 17, 2011 8:20 PM
*To:* Leonid Keller; Hefty, Sean
*Cc:* ofw_list
*Subject:* RE: opensm 3.3.11 crash****
** **
No problems Leo; please feel to criticize, I value your insights and
experience – all comments are welcome; that’s the way the code gets better!
****
I just wanted you to understand you are way ahead of us in win8 testing
**
**
** **
Stan.****
** **
*Sent:* Monday, October 17, 2011 9:56 AM
*To:* Smith, Stan; Hefty, Sean
*Cc:* ofw_list
*Subject:* RE: opensm 3.3.11 crash****
** **
Stan, I absolutely didn’t mean to criticize or so. L****
I know you didn’t work with win8, but even if you did, so what ?****
It was just a bug report with mentioning where it happened.****
Because I’m not sure you can reproduce it in win7, for example.****
** **
*Sent:* Monday, October 17, 2011 6:17 PM
*To:* Leonid Keller; Hefty, Sean
*Cc:* ofw_list
*Subject:* RE: opensm 3.3.11 crash****
** **
Hi Leo,****
Just so everyone understands, neither Sean nor myself have done any
testing on Win8; we are still trying to get winOFED 3.0 out the door
 J**
**
Please continue the win8 reports as it only helps, just understand no one
here has walked the win8 path.****
** **
Stan.****
** **
*Sent:* Monday, October 17, 2011 8:26 AM
*To:* Hefty, Sean
*Cc:* ofw_list
*Subject:* [ofw] opensm 3.3.11 crash****
** **
Hi Sean,****
** **
We experienced a crash of opensm in win8.****
The core reason for the crash was that WinMad driver has not started for
some reason.****
So libibumad couldn’t work.****
But it had a bug in error flow which caused an access violation.****
** **
I bring here two quotes of umad_open_port() function with my comments:****
** **
// umad_open_port****
** **

****
hr = WmGetObject(IID_IWMProvider, (LPVOID*)
&ports[portid].prov);****
if (FAILED(hr)) {****
CloseHandle(ports[portid].overlap.hEvent);
****
portid = GetLastError() & 0x80000000;****
goto out;****
}****
// if there is no providers WmGetObject will return error,
but portid wil be set to 0****
// We saw it when GetLastError() returned 6. Maybe you mean
it to be HRESULT_FROM_WIN32(GetLastError()) ?****
** **
// As a result umad_open_port() returns 0, i.e. SUCCESS
which causes later an access violation.****
** **
** **
_______________________________________________
ofw mailing list
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
Hefty, Sean
2011-10-17 16:51:33 UTC
Permalink
Post by Leonid Keller
hr = WmGetObject(IID_IWMProvider, (LPVOID*)
&ports[portid].prov);
if (FAILED(hr)) {
CloseHandle(ports[portid].overlap.hEvent);
portid = GetLastError() & 0x80000000;
Uhm... this should be a bitwise or '|', not an and '&'. I checked in a fix. Thanks.
Loading...