Discussion:
[Openib-windows] Win IBhost stop receive broadcast packets
(too old to reply)
Anatoly Lisenko
2007-01-02 11:56:50 UTC
Permalink
Hi ,



I saw some problem with windows ibhost stack: reboot of infiniband
switch can cause ping loss ( even after ibsw get up ).

I start to research this anomaly and I saw:

1. ib stack doesn't receive broadcast arp packets.

2. All other packets unicast + multicast are received.

3. rx packets hca port counter increased each time broadcast packet
arrived

4. It seems that firmware drop this packet. ( I don't see any
completions )



I examined the logs and saw that somehow we fall into state when :

1. hca's port joined to bcast group

2. ipoib qp detached from bcast group



This is stack backtrace of mlnx_detach_mcast func. :

f7125d10 f68ae6ab mthca!mlnx_detach_mcast+0x13
[n:\win-ibhost\trunk\hw\mthca\kernel\hca_mcast.c @ 142]

f7125d38 f68c9c30 ibbus!__cleanup_mcast+0x24b
[n:\win-ibhost\trunk\core\al\al_mcast.c @ 304]

f7125d70 f6820212 ibbus!async_destroy_cb+0x420
[n:\win-ibhost\trunk\core\al\al_common.c @ 665]

f7125d8c f6825dc2 ibbus!__cl_async_proc_worker+0x92
[n:\win-ibhost\trunk\core\complib\cl_async_proc.c @ 153]

f7125da0 f6827c3a ibbus!__cl_thread_pool_routine+0x52
[n:\win-ibhost\trunk\core\complib\cl_threadpool.c @ 67]

f7125dac 80948bb2 ibbus!__thread_callback+0x2a
[n:\win-ibhost\trunk\core\complib\kernel\cl_thread.c @ 49]

f7125ddc 8088d4d2 nt!PspSystemThreadStartup+0x2e

00000000 00000000 nt!KiThreadStartup+0x16





Mthca wpp log:

00000662 kernel 1236 600 2 312
01\02\2007-13:28:02:781 mlnx_query_ca()===>

00000663 kernel 1236 600 2 321
01\02\2007-13:28:02:781 mlnx_query_ca() :port 0 gid0:

00000664 kernel 1236 600 2 322
01\02\2007-13:28:02:781 mlnx_query_ca() :
0xfe80000000-0x08f14398095

00000665 kernel 1236 600 2 323
01\02\2007-13:28:02:781 mlnx_query_ca() :port 1 gid0:

00000666 kernel 1236 600 2 324
01\02\2007-13:28:02:781 mlnx_query_ca() :
0xfe80000000-0x08f14398096

00000667 kernel 1236 600 2 325
01\02\2007-13:28:02:781 mlnx_query_ca() :Space required 1898
used 1898

00000668 kernel 1236 600 2 326
01\02\2007-13:28:02:781 mlnx_conv_hca_cap() :Port 1 port_guid
0x8f10403980095

00000669 kernel 1236 600 2 327
01\02\2007-13:28:02:781 mlnx_conv_hca_cap() :Port 2 port_guid
0x8f10403980096

00000670 kernel 1236 600 2 328
01\02\2007-13:28:02:781 mlnx_query_ca()<===

00000671 kernel 4 276 2 339
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000672 kernel 4 276 2 340
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 89930EA8,
qp_p 88A56E78, mlid c0, mgid ffff1b4012ff`ffffffff00000000

00000678 kernel 4 276 2 346
01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS

00000681 kernel 4 276 2 349
01\02\2007-13:28:02:859 mlnx_enable_cq_notify()===>

00000682 kernel 4 276 2 350
01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS

00000683 kernel 4 276 2 357
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000684 kernel 4 276 2 358
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 898F1D68,
qp_p 88A56E78, mlid 1c0, mgid ffff1b4012ff`100000000000000

00000685 kernel 4 276 2 359
01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS

00000686 kernel 4 276 2 362
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000687 kernel 4 276 2 363
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 884352D8,
qp_p 88A56E78, mlid 2c0, mgid ffff051412ff`30000c280010000

00000688 kernel 4 276 2 364
01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS

00000689 kernel 0 0 3 129
01\02\2007-13:28:02:750 mlnx_enable_cq_notify()===>

00000690 kernel 0 0 3 130
01\02\2007-13:28:02:750 completes with ERROR status
IB_SUCCESS

...

00000776 kernel 0 0 3 373
01\02\2007-13:28:03:109 mlnx_enable_cq_notify()===>

00000777 kernel 0 0 3 374
01\02\2007-13:28:03:109 completes with ERROR status
IB_SUCCESS

00000778 kernel 4 272 3 375
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 89918F40,
qp_p 88A56E78, mlid 2c0, mgid ffff051412ff`30000c280010000

00000779 kernel 4 272 3 376
01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS

00000780 kernel 4 272 3 377
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88DB3F00,
qp_p 88A56E78, mlid 1c0, mgid ffff1b4012ff`100000000000000

00000781 kernel 4 272 3 378
01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS

00000782 kernel 4 272 3 379
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88A48DA0,
qp_p 88A56E78, mlid 6c0, mgid ffff051412ff`ffffa8ff00ff0000

00000783 kernel 4 272 3 380
01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS

00000784 kernel 4 272 3 381
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88A94DD0,
qp_p 88A56E78, mlid c0, mgid ffff1b4012ff`ffffffff00000000

00000785 kernel 4 272 3 382
01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS

00000786 kernel 4 280 1 383
01\02\2007-13:28:22:781 mlnx_query_ca()===>

00000787 kernel 4 280 1 384
01\02\2007-13:28:22:781 mlnx_query_ca() :port 0 gid0:

00000788 kernel 4 280 1 385
01\02\2007-13:28:22:781 mlnx_query_ca() :
0xfe80000000-0x08f14398095

00000789 kernel 4 280 1 386
01\02\2007-13:28:22:781 mlnx_query_ca() :port 1 gid0:

00000790 kernel 4 280 1 387
01\02\2007-13:28:22:781 mlnx_query_ca() :
0xfe80000000-0x08f14398096

00000791 kernel 4 280 1 388
01\02\2007-13:28:22:781 mlnx_query_ca() :Space required 1898
used 1898

00000792 kernel 4 280 1 389
01\02\2007-13:28:22:781 mlnx_conv_hca_cap() :Port 1 port_guid
0x8f10403980095

00000793 kernel 4 280 1 390
01\02\2007-13:28:22:781 mlnx_conv_hca_cap() :Port 2 port_guid
0x8f10403980096

00000794 kernel 4 280 1 391
01\02\2007-13:28:22:781 mlnx_query_ca()<===





Ipoib wpp log:



00000130 kernel 0 0 0 130
01\02\2007-13:28:01:468 [IPoIB] :ipoib_check_for_hang():]

00000131 kernel 4 280 0 133
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_reset_all():[

00000132 kernel 4 280 0 134
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():[

00000133 kernel 4 280 0 135
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():]

...

00000140 kernel 4 280 0 150
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():[

00000141 kernel 4 280 0 151
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():]

00000142 kernel 4 280 0 152
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_reset_all():]

00000143 kernel 4 280 0 160
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_down():]

00000144 kernel 4 280 1 131
01\02\2007-13:28:02:781 [IPoIB] :__ipoib_pnp_cb() :Link DOWN!

00000145 kernel 4 280 1 132
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_down():[

00000146 kernel 4 312 1 153
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000147 kernel 4 312 1 154
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving
MCast group

00000148 kernel 4 312 1 164
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000149 kernel 4 312 1 165
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000150 kernel 4 312 1 166
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000151 kernel 4 280 1 170
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_up():[

00000152 kernel 4 280 1 171
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_up():]

00000153 kernel 4 308 2 145
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000154 kernel 4 308 2 147
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving
MCast group

00000155 kernel 4 308 2 161
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000156 kernel 4 308 2 162
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000157 kernel 4 308 2 163
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000158 kernel 4 276 2 191
01\02\2007-13:28:02:859 [IPoIB] :__bcast_cb():[

00000159 kernel 4 276 2 192
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_add_bcast():[

00000160 kernel 4 276 2 193
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[

00000161 kernel 4 276 2 194
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]

00000162 kernel 4 276 2 195
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[

00000163 kernel 4 276 2 196
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast()
:Create av for MAC: 00-00-00-00-00-00

00000164 kernel 4 276 2 197
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[

00000165 kernel 4 276 2 198
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]

00000166 kernel 4 276 2 199
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]

00000167 kernel 4 276 2 200
01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert_locked():[

00000168 kernel 4 276 2 201
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: FF-FF-FF-FF-FF-FF

00000169 kernel 4 276 2 202
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[

00000170 kernel 4 276 2 203
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]

00000171 kernel 4 276 2 204
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_add_bcast():]

00000172 kernel 4 276 2 205
01\02\2007-13:28:02:859 [IPoIB] :__ib_mgr_activate():[

00000173 kernel 4 276 2 206
01\02\2007-13:28:02:859 [IPoIB] :__ib_mgr_activate():]

00000174 kernel 4 276 2 207
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active():[

00000175 kernel 4 276 2 208
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():[

00000176 kernel 4 276 2 209
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref():[

00000177 kernel 4 276 2 210
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Look for
: MAC: 01-00-5E-00-00-01

00000178 kernel 4 276 2 211
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Failed
endpoint lookup.[IpoIB] :__endpt_mgr_ref():]

00000179 kernel 4 276 2 212
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[

00000180 kernel 4 276 2 213
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]

00000181 kernel 4 276 2 214
01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert_locked():[

00000182 kernel 4 276 2 215
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: 01-00-5E-00-00-01

00000183 kernel 4 276 2 216
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[

00000184 kernel 4 276 2 217
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]

00000185 kernel 4 276 2 218
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():]

00000186 kernel 4 276 2 219
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():[

00000187 kernel 4 276 2 220
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref():[

00000188 kernel 4 276 2 221
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Look for
: MAC: 01-80-C2-00-00-03

00000189 kernel 4 276 2 222
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Failed
endpoint lookup.[IpoIB] :__endpt_mgr_ref():]

00000190 kernel 4 276 2 223
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[

00000191 kernel 4 276 2 224
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]

00000192 kernel 4 276 2 225
01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert_locked():[

00000193 kernel 4 276 2 226
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: 01-80-C2-00-00-03

00000194 kernel 4 276 2 227
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[

00000195 kernel 4 276 2 228
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]

00000196 kernel 4 276 2 229
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():]

00000197 kernel 4 276 2 230
01\02\2007-13:28:02:859 [IPoIB] :ipoib_resume_oids():[

00000198 kernel 4 276 2 231
01\02\2007-13:28:02:859 [IPoIB] :ipoib_resume_oids():]

00000199 kernel 4 276 2 232
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active() :Link UP!

00000200 kernel 4 276 2 233
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active():]

00000201 kernel 4 276 2 234
01\02\2007-13:28:02:859 [IPoIB] :__bcast_cb():]

00000202 kernel 4 276 2 235
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():[

00000203 kernel 4 276 2 236
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[

00000204 kernel 4 276 2 237
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast()
:Create av for MAC: 01-00-5E-00-00-01

00000205 kernel 4 276 2 238
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[

00000206 kernel 4 276 2 239
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]

00000207 kernel 4 276 2 240
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]

00000208 kernel 4 276 2 241
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():]

00000209 kernel 4 276 2 242
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():[

00000210 kernel 4 276 2 243
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[

00000211 kernel 4 276 2 244
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast()
:Create av for MAC: 01-80-C2-00-00-03

00000212 kernel 4 276 2 245
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[

00000213 kernel 4 276 2 246
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]

00000214 kernel 4 276 2 247
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]

00000215 kernel 4 276 2 248
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():]

00000216 kernel 4 320 3 136
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000217 kernel 4 320 3 139
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000218 kernel 4 320 3 141
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000219 kernel 4 320 3 143
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000220 kernel 4 320 3 144
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000221 kernel 4 320 3 146
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving
MCast group

00000222 kernel 4 320 3 155
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000223 kernel 4 320 3 156
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000224 kernel 4 320 3 157
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000225 kernel 4 320 3 158
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000226 kernel 4 320 3 159
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving
MCast group

00000227 kernel 4 320 3 167
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000228 kernel 4 320 3 168
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000229 kernel 4 320 3 169
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000230 kernel 0 0 3 172
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb():[

00000231 kernel 0 0 3 173
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_add_local():[

00000232 kernel 0 0 3 174
01\02\2007-13:28:02:781 [IPoIB] :ipoib_endpt_create():[

00000233 kernel 0 0 3 175
01\02\2007-13:28:02:781 [IPoIB] :ipoib_endpt_create():]

00000234 kernel 0 0 3 176
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_insert():[

00000235 kernel 0 0 3 177
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_insert():]

00000236 kernel 0 0 3 178
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_add_local():]

00000237 kernel 0 0 3 179
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb() :Received
port info: link width = 2.

00000238 kernel 0 0 3 180
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate():[

00000239 kernel 0 0 3 181
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate() :Link speed
is 2.5Gs

00000240 kernel 0 0 3 182
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate() :Link width
is 4X

00000241 kernel 0 0 3 183
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate():]

00000242 kernel 0 0 3 184
01\02\2007-13:28:02:781 [IPoIB] :__port_get_bcast():[

00000243 kernel 0 0 3 185
01\02\2007-13:28:02:781 [IPoIB] :__port_get_bcast():]

00000244 kernel 0 0 3 186
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb():]

00000245 kernel 2624 2732 3 187
01\02\2007-13:28:02:781 [IPoIB] :__bcast_get_cb():[

00000246 kernel 2624 2732 3 188
01\02\2007-13:28:02:781 [IPoIB] :__port_join_bcast():[

00000247 kernel 2624 2732 3 189
01\02\2007-13:28:02:781 [IPoIB] :__port_join_bcast():]

00000248 kernel 2624 2732 3 190
01\02\2007-13:28:02:781 [IPoIB] :__bcast_get_cb():]

00000249 kernel 0 0 3 249
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():[

00000250 kernel 0 0 3 250
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():]

...

00000339 kernel 0 0 3 339
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():[

00000340 kernel 0 0 3 340
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():]

00000341 kernel 0 0 0 341
01\02\2007-13:28:03:468 [IPoIB] :ipoib_check_for_hang():[





Thanks,

Anatoly
Yossi Leybovich
2007-01-02 13:11:06 UTC
Permalink
Is it reproduce able ? how ?
Does the SM resides on the switch ?


I think that there was error in the join process against the SM.
In normal behavior of IPoIB __bcast_cb() should be called after IPoIB issued
query of __port_join_bcast()

In your log there is not call to the __bcast_cb() callback, this mean that
IBAL did not get the answer and failed to create time out for the query.

Can you collect IBAL traces so we can be sure that the query returned ?
Can you also get IB traces between the SM and the IPoIB ?


_____

From: openib-windows-***@openib.org
[mailto:openib-windows-***@openib.org] On Behalf Of Anatoly Lisenko
Sent: Tuesday, January 02, 2007 1:57 PM
To: ***@mellanox.co.il; openib-***@openib.org
Cc: Tzahi Oved
Subject: [Openib-windows] Win IBhost stop receive broadcast packets



Hi ,



I saw some problem with windows ibhost stack: reboot of infiniband switch
can cause ping loss ( even after ibsw get up ).

I start to research this anomaly and I saw:

1. ib stack doesn't receive broadcast arp packets.

2. All other packets unicast + multicast are received.

3. rx packets hca port counter increased each time broadcast packet arrived

4. It seems that firmware drop this packet. ( I don't see any completions )



I examined the logs and saw that somehow we fall into state when :

1. hca's port joined to bcast group

2. ipoib qp detached from bcast group



This is stack backtrace of mlnx_detach_mcast func. :

f7125d10 f68ae6ab mthca!mlnx_detach_mcast+0x13
[n:\win-ibhost\trunk\hw\mthca\kernel\hca_mcast.c @ 142]

f7125d38 f68c9c30 ibbus!__cleanup_mcast+0x24b
[n:\win-ibhost\trunk\core\al\al_mcast.c @ 304]

f7125d70 f6820212 ibbus!async_destroy_cb+0x420
[n:\win-ibhost\trunk\core\al\al_common.c @ 665]

f7125d8c f6825dc2 ibbus!__cl_async_proc_worker+0x92
[n:\win-ibhost\trunk\core\complib\cl_async_proc.c @ 153]

f7125da0 f6827c3a ibbus!__cl_thread_pool_routine+0x52
[n:\win-ibhost\trunk\core\complib\cl_threadpool.c @ 67]

f7125dac 80948bb2 ibbus!__thread_callback+0x2a
[n:\win-ibhost\trunk\core\complib\kernel\cl_thread.c @ 49]

f7125ddc 8088d4d2 nt!PspSystemThreadStartup+0x2e

00000000 00000000 nt!KiThreadStartup+0x16





Mthca wpp log:

00000662 kernel 1236 600 2 312
01\02\2007-13:28:02:781 mlnx_query_ca()===>

00000663 kernel 1236 600 2 321
01\02\2007-13:28:02:781 mlnx_query_ca() :port 0 gid0:

00000664 kernel 1236 600 2 322
01\02\2007-13:28:02:781 mlnx_query_ca() :
0xfe80000000-0x08f14398095

00000665 kernel 1236 600 2 323
01\02\2007-13:28:02:781 mlnx_query_ca() :port 1 gid0:

00000666 kernel 1236 600 2 324
01\02\2007-13:28:02:781 mlnx_query_ca() :
0xfe80000000-0x08f14398096

00000667 kernel 1236 600 2 325
01\02\2007-13:28:02:781 mlnx_query_ca() :Space required 1898 used
1898

00000668 kernel 1236 600 2 326
01\02\2007-13:28:02:781 mlnx_conv_hca_cap() :Port 1 port_guid
0x8f10403980095

00000669 kernel 1236 600 2 327
01\02\2007-13:28:02:781 mlnx_conv_hca_cap() :Port 2 port_guid
0x8f10403980096

00000670 kernel 1236 600 2 328
01\02\2007-13:28:02:781 mlnx_query_ca()<===

00000671 kernel 4 276 2 339
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000672 kernel 4 276 2 340
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 89930EA8,
qp_p 88A56E78, mlid c0, mgid ffff1b4012ff`ffffffff00000000

00000678 kernel 4 276 2 346
01\02\2007-13:28:02:859 completes with ERROR status IB_SUCCESS

00000681 kernel 4 276 2 349
01\02\2007-13:28:02:859 mlnx_enable_cq_notify()===>

00000682 kernel 4 276 2 350
01\02\2007-13:28:02:859 completes with ERROR status IB_SUCCESS

00000683 kernel 4 276 2 357
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000684 kernel 4 276 2 358
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 898F1D68,
qp_p 88A56E78, mlid 1c0, mgid ffff1b4012ff`100000000000000

00000685 kernel 4 276 2 359
01\02\2007-13:28:02:859 completes with ERROR status IB_SUCCESS

00000686 kernel 4 276 2 362
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000687 kernel 4 276 2 363
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 884352D8,
qp_p 88A56E78, mlid 2c0, mgid ffff051412ff`30000c280010000

00000688 kernel 4 276 2 364
01\02\2007-13:28:02:859 completes with ERROR status IB_SUCCESS

00000689 kernel 0 0 3 129
01\02\2007-13:28:02:750 mlnx_enable_cq_notify()===>

00000690 kernel 0 0 3 130
01\02\2007-13:28:02:750 completes with ERROR status IB_SUCCESS

...

00000776 kernel 0 0 3 373
01\02\2007-13:28:03:109 mlnx_enable_cq_notify()===>

00000777 kernel 0 0 3 374
01\02\2007-13:28:03:109 completes with ERROR status IB_SUCCESS

00000778 kernel 4 272 3 375
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 89918F40,
qp_p 88A56E78, mlid 2c0, mgid ffff051412ff`30000c280010000

00000779 kernel 4 272 3 376
01\02\2007-13:28:03:296 completes with ERROR status IB_SUCCESS

00000780 kernel 4 272 3 377
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88DB3F00,
qp_p 88A56E78, mlid 1c0, mgid ffff1b4012ff`100000000000000

00000781 kernel 4 272 3 378
01\02\2007-13:28:03:296 completes with ERROR status IB_SUCCESS

00000782 kernel 4 272 3 379
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88A48DA0,
qp_p 88A56E78, mlid 6c0, mgid ffff051412ff`ffffa8ff00ff0000

00000783 kernel 4 272 3 380
01\02\2007-13:28:03:296 completes with ERROR status IB_SUCCESS

00000784 kernel 4 272 3 381
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88A94DD0,
qp_p 88A56E78, mlid c0, mgid ffff1b4012ff`ffffffff00000000

00000785 kernel 4 272 3 382
01\02\2007-13:28:03:296 completes with ERROR status IB_SUCCESS

00000786 kernel 4 280 1 383
01\02\2007-13:28:22:781 mlnx_query_ca()===>

00000787 kernel 4 280 1 384
01\02\2007-13:28:22:781 mlnx_query_ca() :port 0 gid0:

00000788 kernel 4 280 1 385
01\02\2007-13:28:22:781 mlnx_query_ca() :
0xfe80000000-0x08f14398095

00000789 kernel 4 280 1 386
01\02\2007-13:28:22:781 mlnx_query_ca() :port 1 gid0:

00000790 kernel 4 280 1 387
01\02\2007-13:28:22:781 mlnx_query_ca() :
0xfe80000000-0x08f14398096

00000791 kernel 4 280 1 388
01\02\2007-13:28:22:781 mlnx_query_ca() :Space required 1898 used
1898

00000792 kernel 4 280 1 389
01\02\2007-13:28:22:781 mlnx_conv_hca_cap() :Port 1 port_guid
0x8f10403980095

00000793 kernel 4 280 1 390
01\02\2007-13:28:22:781 mlnx_conv_hca_cap() :Port 2 port_guid
0x8f10403980096

00000794 kernel 4 280 1 391
01\02\2007-13:28:22:781 mlnx_query_ca()<===





Ipoib wpp log:



00000130 kernel 0 0 0 130
01\02\2007-13:28:01:468 [IPoIB] :ipoib_check_for_hang():]

00000131 kernel 4 280 0 133
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_reset_all():[

00000132 kernel 4 280 0 134
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():[

00000133 kernel 4 280 0 135
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():]

...

00000140 kernel 4 280 0 150
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():[

00000141 kernel 4 280 0 151
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():]

00000142 kernel 4 280 0 152
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_reset_all():]

00000143 kernel 4 280 0 160
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_down():]

00000144 kernel 4 280 1 131
01\02\2007-13:28:02:781 [IPoIB] :__ipoib_pnp_cb() :Link DOWN!

00000145 kernel 4 280 1 132
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_down():[

00000146 kernel 4 312 1 153
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000147 kernel 4 312 1 154
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving MCast
group

00000148 kernel 4 312 1 164
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000149 kernel 4 312 1 165
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000150 kernel 4 312 1 166
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000151 kernel 4 280 1 170
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_up():[

00000152 kernel 4 280 1 171
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_up():]

00000153 kernel 4 308 2 145
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000154 kernel 4 308 2 147
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving MCast
group

00000155 kernel 4 308 2 161
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000156 kernel 4 308 2 162
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000157 kernel 4 308 2 163
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000158 kernel 4 276 2 191
01\02\2007-13:28:02:859 [IPoIB] :__bcast_cb():[

00000159 kernel 4 276 2 192
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_add_bcast():[

00000160 kernel 4 276 2 193
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[

00000161 kernel 4 276 2 194
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]

00000162 kernel 4 276 2 195
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[

00000163 kernel 4 276 2 196
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast() :Create
av for MAC: 00-00-00-00-00-00

00000164 kernel 4 276 2 197
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[

00000165 kernel 4 276 2 198
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]

00000166 kernel 4 276 2 199
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]

00000167 kernel 4 276 2 200
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked():[

00000168 kernel 4 276 2 201
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: FF-FF-FF-FF-FF-FF

00000169 kernel 4 276 2 202
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[

00000170 kernel 4 276 2 203
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]

00000171 kernel 4 276 2 204
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_add_bcast():]

00000172 kernel 4 276 2 205
01\02\2007-13:28:02:859 [IPoIB] :__ib_mgr_activate():[

00000173 kernel 4 276 2 206
01\02\2007-13:28:02:859 [IPoIB] :__ib_mgr_activate():]

00000174 kernel 4 276 2 207
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active():[

00000175 kernel 4 276 2 208
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():[

00000176 kernel 4 276 2 209
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref():[

00000177 kernel 4 276 2 210
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Look for :
MAC: 01-00-5E-00-00-01

00000178 kernel 4 276 2 211
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Failed
endpoint lookup.[IpoIB] :__endpt_mgr_ref():]

00000179 kernel 4 276 2 212
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[

00000180 kernel 4 276 2 213
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]

00000181 kernel 4 276 2 214
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked():[

00000182 kernel 4 276 2 215
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: 01-00-5E-00-00-01

00000183 kernel 4 276 2 216
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[

00000184 kernel 4 276 2 217
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]

00000185 kernel 4 276 2 218
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():]

00000186 kernel 4 276 2 219
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():[

00000187 kernel 4 276 2 220
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref():[

00000188 kernel 4 276 2 221
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Look for :
MAC: 01-80-C2-00-00-03

00000189 kernel 4 276 2 222
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Failed
endpoint lookup.[IpoIB] :__endpt_mgr_ref():]

00000190 kernel 4 276 2 223
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[

00000191 kernel 4 276 2 224
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]

00000192 kernel 4 276 2 225
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked():[

00000193 kernel 4 276 2 226
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: 01-80-C2-00-00-03

00000194 kernel 4 276 2 227
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[

00000195 kernel 4 276 2 228
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]

00000196 kernel 4 276 2 229
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():]

00000197 kernel 4 276 2 230
01\02\2007-13:28:02:859 [IPoIB] :ipoib_resume_oids():[

00000198 kernel 4 276 2 231
01\02\2007-13:28:02:859 [IPoIB] :ipoib_resume_oids():]

00000199 kernel 4 276 2 232
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active() :Link UP!

00000200 kernel 4 276 2 233
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active():]

00000201 kernel 4 276 2 234
01\02\2007-13:28:02:859 [IPoIB] :__bcast_cb():]

00000202 kernel 4 276 2 235
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():[

00000203 kernel 4 276 2 236
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[

00000204 kernel 4 276 2 237
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast() :Create
av for MAC: 01-00-5E-00-00-01

00000205 kernel 4 276 2 238
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[

00000206 kernel 4 276 2 239
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]

00000207 kernel 4 276 2 240
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]

00000208 kernel 4 276 2 241
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():]

00000209 kernel 4 276 2 242
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():[

00000210 kernel 4 276 2 243
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[

00000211 kernel 4 276 2 244
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast() :Create
av for MAC: 01-80-C2-00-00-03

00000212 kernel 4 276 2 245
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[

00000213 kernel 4 276 2 246
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]

00000214 kernel 4 276 2 247
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]

00000215 kernel 4 276 2 248
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():]

00000216 kernel 4 320 3 136
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000217 kernel 4 320 3 139
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000218 kernel 4 320 3 141
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000219 kernel 4 320 3 143
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000220 kernel 4 320 3 144
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000221 kernel 4 320 3 146
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving MCast
group

00000222 kernel 4 320 3 155
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000223 kernel 4 320 3 156
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000224 kernel 4 320 3 157
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000225 kernel 4 320 3 158
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000226 kernel 4 320 3 159
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving MCast
group

00000227 kernel 4 320 3 167
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000228 kernel 4 320 3 168
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000229 kernel 4 320 3 169
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000230 kernel 0 0 3 172
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb():[

00000231 kernel 0 0 3 173
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_add_local():[

00000232 kernel 0 0 3 174
01\02\2007-13:28:02:781 [IPoIB] :ipoib_endpt_create():[

00000233 kernel 0 0 3 175
01\02\2007-13:28:02:781 [IPoIB] :ipoib_endpt_create():]

00000234 kernel 0 0 3 176
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_insert():[

00000235 kernel 0 0 3 177
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_insert():]

00000236 kernel 0 0 3 178
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_add_local():]

00000237 kernel 0 0 3 179
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb() :Received port
info: link width = 2.

00000238 kernel 0 0 3 180
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate():[

00000239 kernel 0 0 3 181
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate() :Link speed is
2.5Gs

00000240 kernel 0 0 3 182
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate() :Link width is
4X

00000241 kernel 0 0 3 183
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate():]

00000242 kernel 0 0 3 184
01\02\2007-13:28:02:781 [IPoIB] :__port_get_bcast():[

00000243 kernel 0 0 3 185
01\02\2007-13:28:02:781 [IPoIB] :__port_get_bcast():]

00000244 kernel 0 0 3 186
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb():]

00000245 kernel 2624 2732 3 187
01\02\2007-13:28:02:781 [IPoIB] :__bcast_get_cb():[

00000246 kernel 2624 2732 3 188
01\02\2007-13:28:02:781 [IPoIB] :__port_join_bcast():[

00000247 kernel 2624 2732 3 189
01\02\2007-13:28:02:781 [IPoIB] :__port_join_bcast():]

00000248 kernel 2624 2732 3 190
01\02\2007-13:28:02:781 [IPoIB] :__bcast_get_cb():]

00000249 kernel 0 0 3 249
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():[

00000250 kernel 0 0 3 250
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():]

...

00000339 kernel 0 0 3 339
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():[

00000340 kernel 0 0 3 340
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():]

00000341 kernel 0 0 0 341
01\02\2007-13:28:03:468 [IPoIB] :ipoib_check_for_hang():[





Thanks,

Anatoly
Anatoly Lisenko
2007-01-02 13:50:52 UTC
Permalink
Repro scenario:

1. reset IB switch
2. wait for sm start to reconfigure the fabric .
3. you should see: in ipoib log : "link down" -> .. -> "link up"

in mthca log: attach -> ... -> detach



I attach fresh log files : mthca (flags = 0x400), ipoib (flags=0x122),
ibal (full log prints)



I will send you later IB traces by CATC .



Thanks,

Anatoly



________________________________

From: Yossi Leybovich [mailto:***@dev.mellanox.co.il]
Sent: Tuesday, January 02, 2007 3:11 PM
To: Anatoly Lisenko; Leonid Keller; openib-***@openib.org
Cc: Tzahi Oved
Subject: RE: [Openib-windows] Win IBhost stop receive broadcast packets



Is it reproduce able ? how ?

Does the SM resides on the switch ?





I think that there was error in the join process against the SM.

In normal behavior of IPoIB __bcast_cb() should be called after IPoIB
issued query of __port_join_bcast()



In your log there is not call to the __bcast_cb() callback, this mean
that IBAL did not get the answer and failed to create time out for the
query.



Can you collect IBAL traces so we can be sure that the query returned ?

Can you also get IB traces between the SM and the IPoIB ?




________________________________


From: openib-windows-***@openib.org
[mailto:openib-windows-***@openib.org] On Behalf Of Anatoly Lisenko
Sent: Tuesday, January 02, 2007 1:57 PM
To: ***@mellanox.co.il; openib-***@openib.org
Cc: Tzahi Oved
Subject: [Openib-windows] Win IBhost stop receive broadcast
packets

Hi ,



I saw some problem with windows ibhost stack: reboot of
infiniband switch can cause ping loss ( even after ibsw get up ).

I start to research this anomaly and I saw:

1. ib stack doesn't receive broadcast arp packets.

2. All other packets unicast + multicast are received.

3. rx packets hca port counter increased each time broadcast
packet arrived

4. It seems that firmware drop this packet. ( I don't see any
completions )



I examined the logs and saw that somehow we fall into state when
:

1. hca's port joined to bcast group

2. ipoib qp detached from bcast group



This is stack backtrace of mlnx_detach_mcast func. :

f7125d10 f68ae6ab mthca!mlnx_detach_mcast+0x13
[n:\win-ibhost\trunk\hw\mthca\kernel\hca_mcast.c @ 142]

f7125d38 f68c9c30 ibbus!__cleanup_mcast+0x24b
[n:\win-ibhost\trunk\core\al\al_mcast.c @ 304]

f7125d70 f6820212 ibbus!async_destroy_cb+0x420
[n:\win-ibhost\trunk\core\al\al_common.c @ 665]

f7125d8c f6825dc2 ibbus!__cl_async_proc_worker+0x92
[n:\win-ibhost\trunk\core\complib\cl_async_proc.c @ 153]

f7125da0 f6827c3a ibbus!__cl_thread_pool_routine+0x52
[n:\win-ibhost\trunk\core\complib\cl_threadpool.c @ 67]

f7125dac 80948bb2 ibbus!__thread_callback+0x2a
[n:\win-ibhost\trunk\core\complib\kernel\cl_thread.c @ 49]

f7125ddc 8088d4d2 nt!PspSystemThreadStartup+0x2e

00000000 00000000 nt!KiThreadStartup+0x16





Mthca wpp log:

00000662 kernel 1236 600 2 312
01\02\2007-13:28:02:781 mlnx_query_ca()===>

00000663 kernel 1236 600 2 321
01\02\2007-13:28:02:781 mlnx_query_ca() :port 0 gid0:

00000664 kernel 1236 600 2 322
01\02\2007-13:28:02:781 mlnx_query_ca() :
0xfe80000000-0x08f14398095

00000665 kernel 1236 600 2 323
01\02\2007-13:28:02:781 mlnx_query_ca() :port 1 gid0:

00000666 kernel 1236 600 2 324
01\02\2007-13:28:02:781 mlnx_query_ca() :
0xfe80000000-0x08f14398096

00000667 kernel 1236 600 2 325
01\02\2007-13:28:02:781 mlnx_query_ca() :Space required 1898
used 1898

00000668 kernel 1236 600 2 326
01\02\2007-13:28:02:781 mlnx_conv_hca_cap() :Port 1 port_guid
0x8f10403980095

00000669 kernel 1236 600 2 327
01\02\2007-13:28:02:781 mlnx_conv_hca_cap() :Port 2 port_guid
0x8f10403980096

00000670 kernel 1236 600 2 328
01\02\2007-13:28:02:781 mlnx_query_ca()<===

00000671 kernel 4 276 2 339
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000672 kernel 4 276 2 340
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 89930EA8,
qp_p 88A56E78, mlid c0, mgid ffff1b4012ff`ffffffff00000000

00000678 kernel 4 276 2 346
01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS

00000681 kernel 4 276 2 349
01\02\2007-13:28:02:859 mlnx_enable_cq_notify()===>

00000682 kernel 4 276 2 350
01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS

00000683 kernel 4 276 2 357
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000684 kernel 4 276 2 358
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 898F1D68,
qp_p 88A56E78, mlid 1c0, mgid ffff1b4012ff`100000000000000

00000685 kernel 4 276 2 359
01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS

00000686 kernel 4 276 2 362
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000687 kernel 4 276 2 363
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 884352D8,
qp_p 88A56E78, mlid 2c0, mgid ffff051412ff`30000c280010000

00000688 kernel 4 276 2 364
01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS

00000689 kernel 0 0 3 129
01\02\2007-13:28:02:750 mlnx_enable_cq_notify()===>

00000690 kernel 0 0 3 130
01\02\2007-13:28:02:750 completes with ERROR status
IB_SUCCESS

...

00000776 kernel 0 0 3 373
01\02\2007-13:28:03:109 mlnx_enable_cq_notify()===>

00000777 kernel 0 0 3 374
01\02\2007-13:28:03:109 completes with ERROR status
IB_SUCCESS

00000778 kernel 4 272 3 375
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 89918F40,
qp_p 88A56E78, mlid 2c0, mgid ffff051412ff`30000c280010000

00000779 kernel 4 272 3 376
01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS

00000780 kernel 4 272 3 377
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88DB3F00,
qp_p 88A56E78, mlid 1c0, mgid ffff1b4012ff`100000000000000

00000781 kernel 4 272 3 378
01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS

00000782 kernel 4 272 3 379
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88A48DA0,
qp_p 88A56E78, mlid 6c0, mgid ffff051412ff`ffffa8ff00ff0000

00000783 kernel 4 272 3 380
01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS

00000784 kernel 4 272 3 381
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88A94DD0,
qp_p 88A56E78, mlid c0, mgid ffff1b4012ff`ffffffff00000000

00000785 kernel 4 272 3 382
01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS

00000786 kernel 4 280 1 383
01\02\2007-13:28:22:781 mlnx_query_ca()===>

00000787 kernel 4 280 1 384
01\02\2007-13:28:22:781 mlnx_query_ca() :port 0 gid0:

00000788 kernel 4 280 1 385
01\02\2007-13:28:22:781 mlnx_query_ca() :
0xfe80000000-0x08f14398095

00000789 kernel 4 280 1 386
01\02\2007-13:28:22:781 mlnx_query_ca() :port 1 gid0:

00000790 kernel 4 280 1 387
01\02\2007-13:28:22:781 mlnx_query_ca() :
0xfe80000000-0x08f14398096

00000791 kernel 4 280 1 388
01\02\2007-13:28:22:781 mlnx_query_ca() :Space required 1898
used 1898

00000792 kernel 4 280 1 389
01\02\2007-13:28:22:781 mlnx_conv_hca_cap() :Port 1 port_guid
0x8f10403980095

00000793 kernel 4 280 1 390
01\02\2007-13:28:22:781 mlnx_conv_hca_cap() :Port 2 port_guid
0x8f10403980096

00000794 kernel 4 280 1 391
01\02\2007-13:28:22:781 mlnx_query_ca()<===





Ipoib wpp log:



00000130 kernel 0 0 0 130
01\02\2007-13:28:01:468 [IPoIB] :ipoib_check_for_hang():]

00000131 kernel 4 280 0 133
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_reset_all():[

00000132 kernel 4 280 0 134
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():[

00000133 kernel 4 280 0 135
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():]

...

00000140 kernel 4 280 0 150
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():[

00000141 kernel 4 280 0 151
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():]

00000142 kernel 4 280 0 152
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_reset_all():]

00000143 kernel 4 280 0 160
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_down():]

00000144 kernel 4 280 1 131
01\02\2007-13:28:02:781 [IPoIB] :__ipoib_pnp_cb() :Link DOWN!

00000145 kernel 4 280 1 132
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_down():[

00000146 kernel 4 312 1 153
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000147 kernel 4 312 1 154
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving
MCast group

00000148 kernel 4 312 1 164
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000149 kernel 4 312 1 165
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000150 kernel 4 312 1 166
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000151 kernel 4 280 1 170
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_up():[

00000152 kernel 4 280 1 171
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_up():]

00000153 kernel 4 308 2 145
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000154 kernel 4 308 2 147
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving
MCast group

00000155 kernel 4 308 2 161
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000156 kernel 4 308 2 162
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000157 kernel 4 308 2 163
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000158 kernel 4 276 2 191
01\02\2007-13:28:02:859 [IPoIB] :__bcast_cb():[

00000159 kernel 4 276 2 192
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_add_bcast():[

00000160 kernel 4 276 2 193
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[

00000161 kernel 4 276 2 194
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]

00000162 kernel 4 276 2 195
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[

00000163 kernel 4 276 2 196
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast()
:Create av for MAC: 00-00-00-00-00-00

00000164 kernel 4 276 2 197
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[

00000165 kernel 4 276 2 198
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]

00000166 kernel 4 276 2 199
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]

00000167 kernel 4 276 2 200
01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert_locked():[

00000168 kernel 4 276 2 201
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: FF-FF-FF-FF-FF-FF

00000169 kernel 4 276 2 202
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[

00000170 kernel 4 276 2 203
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]

00000171 kernel 4 276 2 204
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_add_bcast():]

00000172 kernel 4 276 2 205
01\02\2007-13:28:02:859 [IPoIB] :__ib_mgr_activate():[

00000173 kernel 4 276 2 206
01\02\2007-13:28:02:859 [IPoIB] :__ib_mgr_activate():]

00000174 kernel 4 276 2 207
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active():[

00000175 kernel 4 276 2 208
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():[

00000176 kernel 4 276 2 209
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref():[

00000177 kernel 4 276 2 210
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Look for
: MAC: 01-00-5E-00-00-01

00000178 kernel 4 276 2 211
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Failed
endpoint lookup.[IpoIB] :__endpt_mgr_ref():]

00000179 kernel 4 276 2 212
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[

00000180 kernel 4 276 2 213
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]

00000181 kernel 4 276 2 214
01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert_locked():[

00000182 kernel 4 276 2 215
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: 01-00-5E-00-00-01

00000183 kernel 4 276 2 216
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[

00000184 kernel 4 276 2 217
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]

00000185 kernel 4 276 2 218
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():]

00000186 kernel 4 276 2 219
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():[

00000187 kernel 4 276 2 220
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref():[

00000188 kernel 4 276 2 221
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Look for
: MAC: 01-80-C2-00-00-03

00000189 kernel 4 276 2 222
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Failed
endpoint lookup.[IpoIB] :__endpt_mgr_ref():]

00000190 kernel 4 276 2 223
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[

00000191 kernel 4 276 2 224
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]

00000192 kernel 4 276 2 225
01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert_locked():[

00000193 kernel 4 276 2 226
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: 01-80-C2-00-00-03

00000194 kernel 4 276 2 227
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[

00000195 kernel 4 276 2 228
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]

00000196 kernel 4 276 2 229
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():]

00000197 kernel 4 276 2 230
01\02\2007-13:28:02:859 [IPoIB] :ipoib_resume_oids():[

00000198 kernel 4 276 2 231
01\02\2007-13:28:02:859 [IPoIB] :ipoib_resume_oids():]

00000199 kernel 4 276 2 232
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active() :Link UP!

00000200 kernel 4 276 2 233
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active():]

00000201 kernel 4 276 2 234
01\02\2007-13:28:02:859 [IPoIB] :__bcast_cb():]

00000202 kernel 4 276 2 235
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():[

00000203 kernel 4 276 2 236
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[

00000204 kernel 4 276 2 237
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast()
:Create av for MAC: 01-00-5E-00-00-01

00000205 kernel 4 276 2 238
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[

00000206 kernel 4 276 2 239
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]

00000207 kernel 4 276 2 240
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]

00000208 kernel 4 276 2 241
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():]

00000209 kernel 4 276 2 242
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():[

00000210 kernel 4 276 2 243
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[

00000211 kernel 4 276 2 244
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast()
:Create av for MAC: 01-80-C2-00-00-03

00000212 kernel 4 276 2 245
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[

00000213 kernel 4 276 2 246
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]

00000214 kernel 4 276 2 247
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]

00000215 kernel 4 276 2 248
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():]

00000216 kernel 4 320 3 136
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000217 kernel 4 320 3 139
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000218 kernel 4 320 3 141
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000219 kernel 4 320 3 143
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000220 kernel 4 320 3 144
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000221 kernel 4 320 3 146
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving
MCast group

00000222 kernel 4 320 3 155
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000223 kernel 4 320 3 156
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000224 kernel 4 320 3 157
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000225 kernel 4 320 3 158
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000226 kernel 4 320 3 159
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving
MCast group

00000227 kernel 4 320 3 167
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000228 kernel 4 320 3 168
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000229 kernel 4 320 3 169
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000230 kernel 0 0 3 172
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb():[

00000231 kernel 0 0 3 173
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_add_local():[

00000232 kernel 0 0 3 174
01\02\2007-13:28:02:781 [IPoIB] :ipoib_endpt_create():[

00000233 kernel 0 0 3 175
01\02\2007-13:28:02:781 [IPoIB] :ipoib_endpt_create():]

00000234 kernel 0 0 3 176
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_insert():[

00000235 kernel 0 0 3 177
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_insert():]

00000236 kernel 0 0 3 178
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_add_local():]

00000237 kernel 0 0 3 179
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb() :Received
port info: link width = 2.

00000238 kernel 0 0 3 180
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate():[

00000239 kernel 0 0 3 181
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate() :Link speed
is 2.5Gs

00000240 kernel 0 0 3 182
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate() :Link width
is 4X

00000241 kernel 0 0 3 183
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate():]

00000242 kernel 0 0 3 184
01\02\2007-13:28:02:781 [IPoIB] :__port_get_bcast():[

00000243 kernel 0 0 3 185
01\02\2007-13:28:02:781 [IPoIB] :__port_get_bcast():]

00000244 kernel 0 0 3 186
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb():]

00000245 kernel 2624 2732 3 187
01\02\2007-13:28:02:781 [IPoIB] :__bcast_get_cb():[

00000246 kernel 2624 2732 3 188
01\02\2007-13:28:02:781 [IPoIB] :__port_join_bcast():[

00000247 kernel 2624 2732 3 189
01\02\2007-13:28:02:781 [IPoIB] :__port_join_bcast():]

00000248 kernel 2624 2732 3 190
01\02\2007-13:28:02:781 [IPoIB] :__bcast_get_cb():]

00000249 kernel 0 0 3 249
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():[

00000250 kernel 0 0 3 250
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():]

...

00000339 kernel 0 0 3 339
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():[

00000340 kernel 0 0 3 340
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():]

00000341 kernel 0 0 0 341
01\02\2007-13:28:03:468 [IPoIB] :ipoib_check_for_hang():[





Thanks,

Anatoly
Anatoly Lisenko
2007-01-02 14:09:44 UTC
Permalink
Attached IB traces between the SM and the IPoIB

I forgot to say that SM resides on the switch.



Thanks,

Anatoly

________________________________

From: openib-windows-***@openib.org
[mailto:openib-windows-***@openib.org] On Behalf Of Anatoly Lisenko
Sent: Tuesday, January 02, 2007 3:51 PM
To: Yossi Leybovich; Leonid Keller; openib-***@openib.org
Cc: Tzahi Oved
Subject: Re: [Openib-windows] Win IBhost stop receive broadcast packets



Repro scenario:

1. reset IB switch
2. wait for sm start to reconfigure the fabric .
3. you should see: in ipoib log : "link down" -> .. -> "link up"

in mthca log: attach -> ... -> detach



I attach fresh log files : mthca (flags = 0x400), ipoib (flags=0x122),
ibal (full log prints)



I will send you later IB traces by CATC .



Thanks,

Anatoly



________________________________

From: Yossi Leybovich [mailto:***@dev.mellanox.co.il]
Sent: Tuesday, January 02, 2007 3:11 PM
To: Anatoly Lisenko; Leonid Keller; openib-***@openib.org
Cc: Tzahi Oved
Subject: RE: [Openib-windows] Win IBhost stop receive broadcast packets



Is it reproduce able ? how ?

Does the SM resides on the switch ?





I think that there was error in the join process against the SM.

In normal behavior of IPoIB __bcast_cb() should be called after IPoIB
issued query of __port_join_bcast()



In your log there is not call to the __bcast_cb() callback, this mean
that IBAL did not get the answer and failed to create time out for the
query.



Can you collect IBAL traces so we can be sure that the query returned ?

Can you also get IB traces between the SM and the IPoIB ?




________________________________


From: openib-windows-***@openib.org
[mailto:openib-windows-***@openib.org] On Behalf Of Anatoly Lisenko
Sent: Tuesday, January 02, 2007 1:57 PM
To: ***@mellanox.co.il; openib-***@openib.org
Cc: Tzahi Oved
Subject: [Openib-windows] Win IBhost stop receive broadcast
packets

Hi ,



I saw some problem with windows ibhost stack: reboot of
infiniband switch can cause ping loss ( even after ibsw get up ).

I start to research this anomaly and I saw:

1. ib stack doesn't receive broadcast arp packets.

2. All other packets unicast + multicast are received.

3. rx packets hca port counter increased each time broadcast
packet arrived

4. It seems that firmware drop this packet. ( I don't see any
completions )



I examined the logs and saw that somehow we fall into state when
:

1. hca's port joined to bcast group

2. ipoib qp detached from bcast group



This is stack backtrace of mlnx_detach_mcast func. :

f7125d10 f68ae6ab mthca!mlnx_detach_mcast+0x13
[n:\win-ibhost\trunk\hw\mthca\kernel\hca_mcast.c @ 142]

f7125d38 f68c9c30 ibbus!__cleanup_mcast+0x24b
[n:\win-ibhost\trunk\core\al\al_mcast.c @ 304]

f7125d70 f6820212 ibbus!async_destroy_cb+0x420
[n:\win-ibhost\trunk\core\al\al_common.c @ 665]

f7125d8c f6825dc2 ibbus!__cl_async_proc_worker+0x92
[n:\win-ibhost\trunk\core\complib\cl_async_proc.c @ 153]

f7125da0 f6827c3a ibbus!__cl_thread_pool_routine+0x52
[n:\win-ibhost\trunk\core\complib\cl_threadpool.c @ 67]

f7125dac 80948bb2 ibbus!__thread_callback+0x2a
[n:\win-ibhost\trunk\core\complib\kernel\cl_thread.c @ 49]

f7125ddc 8088d4d2 nt!PspSystemThreadStartup+0x2e

00000000 00000000 nt!KiThreadStartup+0x16





Mthca wpp log:

00000662 kernel 1236 600 2 312
01\02\2007-13:28:02:781 mlnx_query_ca()===>

00000663 kernel 1236 600 2 321
01\02\2007-13:28:02:781 mlnx_query_ca() :port 0 gid0:

00000664 kernel 1236 600 2 322
01\02\2007-13:28:02:781 mlnx_query_ca() :
0xfe80000000-0x08f14398095

00000665 kernel 1236 600 2 323
01\02\2007-13:28:02:781 mlnx_query_ca() :port 1 gid0:

00000666 kernel 1236 600 2 324
01\02\2007-13:28:02:781 mlnx_query_ca() :
0xfe80000000-0x08f14398096

00000667 kernel 1236 600 2 325
01\02\2007-13:28:02:781 mlnx_query_ca() :Space required 1898
used 1898

00000668 kernel 1236 600 2 326
01\02\2007-13:28:02:781 mlnx_conv_hca_cap() :Port 1 port_guid
0x8f10403980095

00000669 kernel 1236 600 2 327
01\02\2007-13:28:02:781 mlnx_conv_hca_cap() :Port 2 port_guid
0x8f10403980096

00000670 kernel 1236 600 2 328
01\02\2007-13:28:02:781 mlnx_query_ca()<===

00000671 kernel 4 276 2 339
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000672 kernel 4 276 2 340
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 89930EA8,
qp_p 88A56E78, mlid c0, mgid ffff1b4012ff`ffffffff00000000

00000678 kernel 4 276 2 346
01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS

00000681 kernel 4 276 2 349
01\02\2007-13:28:02:859 mlnx_enable_cq_notify()===>

00000682 kernel 4 276 2 350
01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS

00000683 kernel 4 276 2 357
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000684 kernel 4 276 2 358
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 898F1D68,
qp_p 88A56E78, mlid 1c0, mgid ffff1b4012ff`100000000000000

00000685 kernel 4 276 2 359
01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS

00000686 kernel 4 276 2 362
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000687 kernel 4 276 2 363
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 884352D8,
qp_p 88A56E78, mlid 2c0, mgid ffff051412ff`30000c280010000

00000688 kernel 4 276 2 364
01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS

00000689 kernel 0 0 3 129
01\02\2007-13:28:02:750 mlnx_enable_cq_notify()===>

00000690 kernel 0 0 3 130
01\02\2007-13:28:02:750 completes with ERROR status
IB_SUCCESS

...

00000776 kernel 0 0 3 373
01\02\2007-13:28:03:109 mlnx_enable_cq_notify()===>

00000777 kernel 0 0 3 374
01\02\2007-13:28:03:109 completes with ERROR status
IB_SUCCESS

00000778 kernel 4 272 3 375
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 89918F40,
qp_p 88A56E78, mlid 2c0, mgid ffff051412ff`30000c280010000

00000779 kernel 4 272 3 376
01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS

00000780 kernel 4 272 3 377
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88DB3F00,
qp_p 88A56E78, mlid 1c0, mgid ffff1b4012ff`100000000000000

00000781 kernel 4 272 3 378
01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS

00000782 kernel 4 272 3 379
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88A48DA0,
qp_p 88A56E78, mlid 6c0, mgid ffff051412ff`ffffa8ff00ff0000

00000783 kernel 4 272 3 380
01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS

00000784 kernel 4 272 3 381
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88A94DD0,
qp_p 88A56E78, mlid c0, mgid ffff1b4012ff`ffffffff00000000

00000785 kernel 4 272 3 382
01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS

00000786 kernel 4 280 1 383
01\02\2007-13:28:22:781 mlnx_query_ca()===>

00000787 kernel 4 280 1 384
01\02\2007-13:28:22:781 mlnx_query_ca() :port 0 gid0:

00000788 kernel 4 280 1 385
01\02\2007-13:28:22:781 mlnx_query_ca() :
0xfe80000000-0x08f14398095

00000789 kernel 4 280 1 386
01\02\2007-13:28:22:781 mlnx_query_ca() :port 1 gid0:

00000790 kernel 4 280 1 387
01\02\2007-13:28:22:781 mlnx_query_ca() :
0xfe80000000-0x08f14398096

00000791 kernel 4 280 1 388
01\02\2007-13:28:22:781 mlnx_query_ca() :Space required 1898
used 1898

00000792 kernel 4 280 1 389
01\02\2007-13:28:22:781 mlnx_conv_hca_cap() :Port 1 port_guid
0x8f10403980095

00000793 kernel 4 280 1 390
01\02\2007-13:28:22:781 mlnx_conv_hca_cap() :Port 2 port_guid
0x8f10403980096

00000794 kernel 4 280 1 391
01\02\2007-13:28:22:781 mlnx_query_ca()<===





Ipoib wpp log:



00000130 kernel 0 0 0 130
01\02\2007-13:28:01:468 [IPoIB] :ipoib_check_for_hang():]

00000131 kernel 4 280 0 133
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_reset_all():[

00000132 kernel 4 280 0 134
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():[

00000133 kernel 4 280 0 135
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():]

...

00000140 kernel 4 280 0 150
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():[

00000141 kernel 4 280 0 151
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():]

00000142 kernel 4 280 0 152
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_reset_all():]

00000143 kernel 4 280 0 160
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_down():]

00000144 kernel 4 280 1 131
01\02\2007-13:28:02:781 [IPoIB] :__ipoib_pnp_cb() :Link DOWN!

00000145 kernel 4 280 1 132
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_down():[

00000146 kernel 4 312 1 153
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000147 kernel 4 312 1 154
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving
MCast group

00000148 kernel 4 312 1 164
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000149 kernel 4 312 1 165
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000150 kernel 4 312 1 166
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000151 kernel 4 280 1 170
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_up():[

00000152 kernel 4 280 1 171
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_up():]

00000153 kernel 4 308 2 145
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000154 kernel 4 308 2 147
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving
MCast group

00000155 kernel 4 308 2 161
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000156 kernel 4 308 2 162
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000157 kernel 4 308 2 163
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000158 kernel 4 276 2 191
01\02\2007-13:28:02:859 [IPoIB] :__bcast_cb():[

00000159 kernel 4 276 2 192
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_add_bcast():[

00000160 kernel 4 276 2 193
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[

00000161 kernel 4 276 2 194
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]

00000162 kernel 4 276 2 195
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[

00000163 kernel 4 276 2 196
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast()
:Create av for MAC: 00-00-00-00-00-00

00000164 kernel 4 276 2 197
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[

00000165 kernel 4 276 2 198
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]

00000166 kernel 4 276 2 199
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]

00000167 kernel 4 276 2 200
01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert_locked():[

00000168 kernel 4 276 2 201
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: FF-FF-FF-FF-FF-FF

00000169 kernel 4 276 2 202
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[

00000170 kernel 4 276 2 203
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]

00000171 kernel 4 276 2 204
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_add_bcast():]

00000172 kernel 4 276 2 205
01\02\2007-13:28:02:859 [IPoIB] :__ib_mgr_activate():[

00000173 kernel 4 276 2 206
01\02\2007-13:28:02:859 [IPoIB] :__ib_mgr_activate():]

00000174 kernel 4 276 2 207
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active():[

00000175 kernel 4 276 2 208
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():[

00000176 kernel 4 276 2 209
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref():[

00000177 kernel 4 276 2 210
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Look for
: MAC: 01-00-5E-00-00-01

00000178 kernel 4 276 2 211
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Failed
endpoint lookup.[IpoIB] :__endpt_mgr_ref():]

00000179 kernel 4 276 2 212
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[

00000180 kernel 4 276 2 213
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]

00000181 kernel 4 276 2 214
01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert_locked():[

00000182 kernel 4 276 2 215
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: 01-00-5E-00-00-01

00000183 kernel 4 276 2 216
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[

00000184 kernel 4 276 2 217
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]

00000185 kernel 4 276 2 218
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():]

00000186 kernel 4 276 2 219
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():[

00000187 kernel 4 276 2 220
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref():[

00000188 kernel 4 276 2 221
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Look for
: MAC: 01-80-C2-00-00-03

00000189 kernel 4 276 2 222
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Failed
endpoint lookup.[IpoIB] :__endpt_mgr_ref():]

00000190 kernel 4 276 2 223
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[

00000191 kernel 4 276 2 224
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]

00000192 kernel 4 276 2 225
01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert_locked():[

00000193 kernel 4 276 2 226
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: 01-80-C2-00-00-03

00000194 kernel 4 276 2 227
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[

00000195 kernel 4 276 2 228
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]

00000196 kernel 4 276 2 229
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():]

00000197 kernel 4 276 2 230
01\02\2007-13:28:02:859 [IPoIB] :ipoib_resume_oids():[

00000198 kernel 4 276 2 231
01\02\2007-13:28:02:859 [IPoIB] :ipoib_resume_oids():]

00000199 kernel 4 276 2 232
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active() :Link UP!

00000200 kernel 4 276 2 233
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active():]

00000201 kernel 4 276 2 234
01\02\2007-13:28:02:859 [IPoIB] :__bcast_cb():]

00000202 kernel 4 276 2 235
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():[

00000203 kernel 4 276 2 236
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[

00000204 kernel 4 276 2 237
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast()
:Create av for MAC: 01-00-5E-00-00-01

00000205 kernel 4 276 2 238
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[

00000206 kernel 4 276 2 239
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]

00000207 kernel 4 276 2 240
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]

00000208 kernel 4 276 2 241
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():]

00000209 kernel 4 276 2 242
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():[

00000210 kernel 4 276 2 243
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[

00000211 kernel 4 276 2 244
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast()
:Create av for MAC: 01-80-C2-00-00-03

00000212 kernel 4 276 2 245
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[

00000213 kernel 4 276 2 246
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]

00000214 kernel 4 276 2 247
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]

00000215 kernel 4 276 2 248
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():]

00000216 kernel 4 320 3 136
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000217 kernel 4 320 3 139
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000218 kernel 4 320 3 141
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000219 kernel 4 320 3 143
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000220 kernel 4 320 3 144
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000221 kernel 4 320 3 146
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving
MCast group

00000222 kernel 4 320 3 155
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000223 kernel 4 320 3 156
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000224 kernel 4 320 3 157
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000225 kernel 4 320 3 158
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000226 kernel 4 320 3 159
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving
MCast group

00000227 kernel 4 320 3 167
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000228 kernel 4 320 3 168
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000229 kernel 4 320 3 169
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000230 kernel 0 0 3 172
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb():[

00000231 kernel 0 0 3 173
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_add_local():[

00000232 kernel 0 0 3 174
01\02\2007-13:28:02:781 [IPoIB] :ipoib_endpt_create():[

00000233 kernel 0 0 3 175
01\02\2007-13:28:02:781 [IPoIB] :ipoib_endpt_create():]

00000234 kernel 0 0 3 176
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_insert():[

00000235 kernel 0 0 3 177
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_insert():]

00000236 kernel 0 0 3 178
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_add_local():]

00000237 kernel 0 0 3 179
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb() :Received
port info: link width = 2.

00000238 kernel 0 0 3 180
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate():[

00000239 kernel 0 0 3 181
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate() :Link speed
is 2.5Gs

00000240 kernel 0 0 3 182
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate() :Link width
is 4X

00000241 kernel 0 0 3 183
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate():]

00000242 kernel 0 0 3 184
01\02\2007-13:28:02:781 [IPoIB] :__port_get_bcast():[

00000243 kernel 0 0 3 185
01\02\2007-13:28:02:781 [IPoIB] :__port_get_bcast():]

00000244 kernel 0 0 3 186
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb():]

00000245 kernel 2624 2732 3 187
01\02\2007-13:28:02:781 [IPoIB] :__bcast_get_cb():[

00000246 kernel 2624 2732 3 188
01\02\2007-13:28:02:781 [IPoIB] :__port_join_bcast():[

00000247 kernel 2624 2732 3 189
01\02\2007-13:28:02:781 [IPoIB] :__port_join_bcast():]

00000248 kernel 2624 2732 3 190
01\02\2007-13:28:02:781 [IPoIB] :__bcast_get_cb():]

00000249 kernel 0 0 3 249
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():[

00000250 kernel 0 0 3 250
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():]

...

00000339 kernel 0 0 3 339
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():[

00000340 kernel 0 0 3 340
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():]

00000341 kernel 0 0 0 341
01\02\2007-13:28:03:468 [IPoIB] :ipoib_check_for_hang():[





Thanks,

Anatoly
Yossi Leybovich
2007-01-02 16:06:18 UTC
Permalink
can you put break point on __destroying_mcast ?
and send the back trace

We also don't see the join requests in the IB trace , can you recollect the
trace.



_____

From: Anatoly Lisenko [mailto:***@voltaire.com]
Sent: Tuesday, January 02, 2007 3:51 PM
To: Yossi Leybovich; Leonid Keller; openib-***@openib.org
Cc: Tzahi Oved
Subject: RE: [Openib-windows] Win IBhost stop receive broadcast packets



Repro scenario:

1. reset IB switch

2. wait for sm start to reconfigure the fabric .

3. you should see: in ipoib log : "link down" -> .. -> "link up"

in mthca log: attach -> . -> detach



I attach fresh log files : mthca (flags = 0x400), ipoib (flags=0x122), ibal
(full log prints)



I will send you later IB traces by CATC .



Thanks,

Anatoly




_____


From: Yossi Leybovich [mailto:***@dev.mellanox.co.il]
Sent: Tuesday, January 02, 2007 3:11 PM
To: Anatoly Lisenko; Leonid Keller; openib-***@openib.org
Cc: Tzahi Oved
Subject: RE: [Openib-windows] Win IBhost stop receive broadcast packets



Is it reproduce able ? how ?

Does the SM resides on the switch ?





I think that there was error in the join process against the SM.

In normal behavior of IPoIB __bcast_cb() should be called after IPoIB issued
query of __port_join_bcast()



In your log there is not call to the __bcast_cb() callback, this mean that
IBAL did not get the answer and failed to create time out for the query.



Can you collect IBAL traces so we can be sure that the query returned ?

Can you also get IB traces between the SM and the IPoIB ?




_____


From: openib-windows-***@openib.org
[mailto:openib-windows-***@openib.org] On Behalf Of Anatoly Lisenko
Sent: Tuesday, January 02, 2007 1:57 PM
To: ***@mellanox.co.il; openib-***@openib.org
Cc: Tzahi Oved
Subject: [Openib-windows] Win IBhost stop receive broadcast packets

Hi ,



I saw some problem with windows ibhost stack: reboot of infiniband switch
can cause ping loss ( even after ibsw get up ).

I start to research this anomaly and I saw:

1. ib stack doesn't receive broadcast arp packets.

2. All other packets unicast + multicast are received.

3. rx packets hca port counter increased each time broadcast packet arrived

4. It seems that firmware drop this packet. ( I don't see any completions )



I examined the logs and saw that somehow we fall into state when :

1. hca's port joined to bcast group

2. ipoib qp detached from bcast group



This is stack backtrace of mlnx_detach_mcast func. :

f7125d10 f68ae6ab mthca!mlnx_detach_mcast+0x13
[n:\win-ibhost\trunk\hw\mthca\kernel\hca_mcast.c @ 142]

f7125d38 f68c9c30 ibbus!__cleanup_mcast+0x24b
[n:\win-ibhost\trunk\core\al\al_mcast.c @ 304]

f7125d70 f6820212 ibbus!async_destroy_cb+0x420
[n:\win-ibhost\trunk\core\al\al_common.c @ 665]

f7125d8c f6825dc2 ibbus!__cl_async_proc_worker+0x92
[n:\win-ibhost\trunk\core\complib\cl_async_proc.c @ 153]

f7125da0 f6827c3a ibbus!__cl_thread_pool_routine+0x52
[n:\win-ibhost\trunk\core\complib\cl_threadpool.c @ 67]

f7125dac 80948bb2 ibbus!__thread_callback+0x2a
[n:\win-ibhost\trunk\core\complib\kernel\cl_thread.c @ 49]

f7125ddc 8088d4d2 nt!PspSystemThreadStartup+0x2e

00000000 00000000 nt!KiThreadStartup+0x16





Mthca wpp log:

00000662 kernel 1236 600 2 312
01\02\2007-13:28:02:781 mlnx_query_ca()===>

00000663 kernel 1236 600 2 321
01\02\2007-13:28:02:781 mlnx_query_ca() :port 0 gid0:

00000664 kernel 1236 600 2 322
01\02\2007-13:28:02:781 mlnx_query_ca() :
0xfe80000000-0x08f14398095

00000665 kernel 1236 600 2 323
01\02\2007-13:28:02:781 mlnx_query_ca() :port 1 gid0:

00000666 kernel 1236 600 2 324
01\02\2007-13:28:02:781 mlnx_query_ca() :
0xfe80000000-0x08f14398096

00000667 kernel 1236 600 2 325
01\02\2007-13:28:02:781 mlnx_query_ca() :Space required 1898 used
1898

00000668 kernel 1236 600 2 326
01\02\2007-13:28:02:781 mlnx_conv_hca_cap() :Port 1 port_guid
0x8f10403980095

00000669 kernel 1236 600 2 327
01\02\2007-13:28:02:781 mlnx_conv_hca_cap() :Port 2 port_guid
0x8f10403980096

00000670 kernel 1236 600 2 328
01\02\2007-13:28:02:781 mlnx_query_ca()<===

00000671 kernel 4 276 2 339
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000672 kernel 4 276 2 340
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 89930EA8,
qp_p 88A56E78, mlid c0, mgid ffff1b4012ff`ffffffff00000000

00000678 kernel 4 276 2 346
01\02\2007-13:28:02:859 completes with ERROR status IB_SUCCESS

00000681 kernel 4 276 2 349
01\02\2007-13:28:02:859 mlnx_enable_cq_notify()===>

00000682 kernel 4 276 2 350
01\02\2007-13:28:02:859 completes with ERROR status IB_SUCCESS

00000683 kernel 4 276 2 357
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000684 kernel 4 276 2 358
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 898F1D68,
qp_p 88A56E78, mlid 1c0, mgid ffff1b4012ff`100000000000000

00000685 kernel 4 276 2 359
01\02\2007-13:28:02:859 completes with ERROR status IB_SUCCESS

00000686 kernel 4 276 2 362
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000687 kernel 4 276 2 363
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 884352D8,
qp_p 88A56E78, mlid 2c0, mgid ffff051412ff`30000c280010000

00000688 kernel 4 276 2 364
01\02\2007-13:28:02:859 completes with ERROR status IB_SUCCESS

00000689 kernel 0 0 3 129
01\02\2007-13:28:02:750 mlnx_enable_cq_notify()===>

00000690 kernel 0 0 3 130
01\02\2007-13:28:02:750 completes with ERROR status IB_SUCCESS

...

00000776 kernel 0 0 3 373
01\02\2007-13:28:03:109 mlnx_enable_cq_notify()===>

00000777 kernel 0 0 3 374
01\02\2007-13:28:03:109 completes with ERROR status IB_SUCCESS

00000778 kernel 4 272 3 375
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 89918F40,
qp_p 88A56E78, mlid 2c0, mgid ffff051412ff`30000c280010000

00000779 kernel 4 272 3 376
01\02\2007-13:28:03:296 completes with ERROR status IB_SUCCESS

00000780 kernel 4 272 3 377
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88DB3F00,
qp_p 88A56E78, mlid 1c0, mgid ffff1b4012ff`100000000000000

00000781 kernel 4 272 3 378
01\02\2007-13:28:03:296 completes with ERROR status IB_SUCCESS

00000782 kernel 4 272 3 379
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88A48DA0,
qp_p 88A56E78, mlid 6c0, mgid ffff051412ff`ffffa8ff00ff0000

00000783 kernel 4 272 3 380
01\02\2007-13:28:03:296 completes with ERROR status IB_SUCCESS

00000784 kernel 4 272 3 381
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88A94DD0,
qp_p 88A56E78, mlid c0, mgid ffff1b4012ff`ffffffff00000000

00000785 kernel 4 272 3 382
01\02\2007-13:28:03:296 completes with ERROR status IB_SUCCESS

00000786 kernel 4 280 1 383
01\02\2007-13:28:22:781 mlnx_query_ca()===>

00000787 kernel 4 280 1 384
01\02\2007-13:28:22:781 mlnx_query_ca() :port 0 gid0:

00000788 kernel 4 280 1 385
01\02\2007-13:28:22:781 mlnx_query_ca() :
0xfe80000000-0x08f14398095

00000789 kernel 4 280 1 386
01\02\2007-13:28:22:781 mlnx_query_ca() :port 1 gid0:

00000790 kernel 4 280 1 387
01\02\2007-13:28:22:781 mlnx_query_ca() :
0xfe80000000-0x08f14398096

00000791 kernel 4 280 1 388
01\02\2007-13:28:22:781 mlnx_query_ca() :Space required 1898 used
1898

00000792 kernel 4 280 1 389
01\02\2007-13:28:22:781 mlnx_conv_hca_cap() :Port 1 port_guid
0x8f10403980095

00000793 kernel 4 280 1 390
01\02\2007-13:28:22:781 mlnx_conv_hca_cap() :Port 2 port_guid
0x8f10403980096

00000794 kernel 4 280 1 391
01\02\2007-13:28:22:781 mlnx_query_ca()<===





Ipoib wpp log:



00000130 kernel 0 0 0 130
01\02\2007-13:28:01:468 [IPoIB] :ipoib_check_for_hang():]

00000131 kernel 4 280 0 133
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_reset_all():[

00000132 kernel 4 280 0 134
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():[

00000133 kernel 4 280 0 135
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():]

...

00000140 kernel 4 280 0 150
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():[

00000141 kernel 4 280 0 151
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():]

00000142 kernel 4 280 0 152
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_reset_all():]

00000143 kernel 4 280 0 160
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_down():]

00000144 kernel 4 280 1 131
01\02\2007-13:28:02:781 [IPoIB] :__ipoib_pnp_cb() :Link DOWN!

00000145 kernel 4 280 1 132
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_down():[

00000146 kernel 4 312 1 153
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000147 kernel 4 312 1 154
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving MCast
group

00000148 kernel 4 312 1 164
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000149 kernel 4 312 1 165
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000150 kernel 4 312 1 166
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000151 kernel 4 280 1 170
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_up():[

00000152 kernel 4 280 1 171
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_up():]

00000153 kernel 4 308 2 145
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000154 kernel 4 308 2 147
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving MCast
group

00000155 kernel 4 308 2 161
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000156 kernel 4 308 2 162
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000157 kernel 4 308 2 163
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000158 kernel 4 276 2 191
01\02\2007-13:28:02:859 [IPoIB] :__bcast_cb():[

00000159 kernel 4 276 2 192
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_add_bcast():[

00000160 kernel 4 276 2 193
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[

00000161 kernel 4 276 2 194
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]

00000162 kernel 4 276 2 195
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[

00000163 kernel 4 276 2 196
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast() :Create
av for MAC: 00-00-00-00-00-00

00000164 kernel 4 276 2 197
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[

00000165 kernel 4 276 2 198
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]

00000166 kernel 4 276 2 199
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]

00000167 kernel 4 276 2 200
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked():[

00000168 kernel 4 276 2 201
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: FF-FF-FF-FF-FF-FF

00000169 kernel 4 276 2 202
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[

00000170 kernel 4 276 2 203
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]

00000171 kernel 4 276 2 204
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_add_bcast():]

00000172 kernel 4 276 2 205
01\02\2007-13:28:02:859 [IPoIB] :__ib_mgr_activate():[

00000173 kernel 4 276 2 206
01\02\2007-13:28:02:859 [IPoIB] :__ib_mgr_activate():]

00000174 kernel 4 276 2 207
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active():[

00000175 kernel 4 276 2 208
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():[

00000176 kernel 4 276 2 209
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref():[

00000177 kernel 4 276 2 210
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Look for :
MAC: 01-00-5E-00-00-01

00000178 kernel 4 276 2 211
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Failed
endpoint lookup.[IpoIB] :__endpt_mgr_ref():]

00000179 kernel 4 276 2 212
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[

00000180 kernel 4 276 2 213
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]

00000181 kernel 4 276 2 214
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked():[

00000182 kernel 4 276 2 215
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: 01-00-5E-00-00-01

00000183 kernel 4 276 2 216
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[

00000184 kernel 4 276 2 217
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]

00000185 kernel 4 276 2 218
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():]

00000186 kernel 4 276 2 219
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():[

00000187 kernel 4 276 2 220
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref():[

00000188 kernel 4 276 2 221
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Look for :
MAC: 01-80-C2-00-00-03

00000189 kernel 4 276 2 222
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Failed
endpoint lookup.[IpoIB] :__endpt_mgr_ref():]

00000190 kernel 4 276 2 223
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[

00000191 kernel 4 276 2 224
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]

00000192 kernel 4 276 2 225
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked():[

00000193 kernel 4 276 2 226
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: 01-80-C2-00-00-03

00000194 kernel 4 276 2 227
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[

00000195 kernel 4 276 2 228
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]

00000196 kernel 4 276 2 229
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():]

00000197 kernel 4 276 2 230
01\02\2007-13:28:02:859 [IPoIB] :ipoib_resume_oids():[

00000198 kernel 4 276 2 231
01\02\2007-13:28:02:859 [IPoIB] :ipoib_resume_oids():]

00000199 kernel 4 276 2 232
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active() :Link UP!

00000200 kernel 4 276 2 233
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active():]

00000201 kernel 4 276 2 234
01\02\2007-13:28:02:859 [IPoIB] :__bcast_cb():]

00000202 kernel 4 276 2 235
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():[

00000203 kernel 4 276 2 236
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[

00000204 kernel 4 276 2 237
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast() :Create
av for MAC: 01-00-5E-00-00-01

00000205 kernel 4 276 2 238
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[

00000206 kernel 4 276 2 239
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]

00000207 kernel 4 276 2 240
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]

00000208 kernel 4 276 2 241
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():]

00000209 kernel 4 276 2 242
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():[

00000210 kernel 4 276 2 243
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[

00000211 kernel 4 276 2 244
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast() :Create
av for MAC: 01-80-C2-00-00-03

00000212 kernel 4 276 2 245
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[

00000213 kernel 4 276 2 246
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]

00000214 kernel 4 276 2 247
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]

00000215 kernel 4 276 2 248
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():]

00000216 kernel 4 320 3 136
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000217 kernel 4 320 3 139
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000218 kernel 4 320 3 141
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000219 kernel 4 320 3 143
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000220 kernel 4 320 3 144
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000221 kernel 4 320 3 146
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving MCast
group

00000222 kernel 4 320 3 155
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000223 kernel 4 320 3 156
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000224 kernel 4 320 3 157
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000225 kernel 4 320 3 158
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000226 kernel 4 320 3 159
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving MCast
group

00000227 kernel 4 320 3 167
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000228 kernel 4 320 3 168
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000229 kernel 4 320 3 169
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000230 kernel 0 0 3 172
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb():[

00000231 kernel 0 0 3 173
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_add_local():[

00000232 kernel 0 0 3 174
01\02\2007-13:28:02:781 [IPoIB] :ipoib_endpt_create():[

00000233 kernel 0 0 3 175
01\02\2007-13:28:02:781 [IPoIB] :ipoib_endpt_create():]

00000234 kernel 0 0 3 176
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_insert():[

00000235 kernel 0 0 3 177
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_insert():]

00000236 kernel 0 0 3 178
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_add_local():]

00000237 kernel 0 0 3 179
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb() :Received port
info: link width = 2.

00000238 kernel 0 0 3 180
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate():[

00000239 kernel 0 0 3 181
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate() :Link speed is
2.5Gs

00000240 kernel 0 0 3 182
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate() :Link width is
4X

00000241 kernel 0 0 3 183
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate():]

00000242 kernel 0 0 3 184
01\02\2007-13:28:02:781 [IPoIB] :__port_get_bcast():[

00000243 kernel 0 0 3 185
01\02\2007-13:28:02:781 [IPoIB] :__port_get_bcast():]

00000244 kernel 0 0 3 186
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb():]

00000245 kernel 2624 2732 3 187
01\02\2007-13:28:02:781 [IPoIB] :__bcast_get_cb():[

00000246 kernel 2624 2732 3 188
01\02\2007-13:28:02:781 [IPoIB] :__port_join_bcast():[

00000247 kernel 2624 2732 3 189
01\02\2007-13:28:02:781 [IPoIB] :__port_join_bcast():]

00000248 kernel 2624 2732 3 190
01\02\2007-13:28:02:781 [IPoIB] :__bcast_get_cb():]

00000249 kernel 0 0 3 249
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():[

00000250 kernel 0 0 3 250
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():]

...

00000339 kernel 0 0 3 339
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():[

00000340 kernel 0 0 3 340
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():]

00000341 kernel 0 0 0 341
01\02\2007-13:28:03:468 [IPoIB] :ipoib_check_for_hang():[





Thanks,

Anatoly
Anatoly Lisenko
2007-01-02 17:03:17 UTC
Permalink
I have 2 backtraces :



1: kd> k

ChildEBP RetAddr

f792acbc f68c8cb7 ibbus!__destroying_mcast+0x6
[n:\win-ibhost\trunk\core\al\al_mcast.c @ 234]

f792acf4 f68c7b31 ibbus!destroy_obj+0x347
[n:\win-ibhost\trunk\core\al\al_common.c @ 592]

f792ad08 f68aff50 ibbus!async_destroy_obj+0xc1
[n:\win-ibhost\trunk\core\al\al_common.c @ 471]

f792ad20 f6640e2a ibbus!ib_leave_mcast+0x2c0
[n:\win-ibhost\trunk\core\al\al_mcast.c @ 561]

f792ad4c f6677882 ipoib!__endpt_cleanup+0x23a
[n:\win-ibhost\trunk\ulp\ipoib\kernel\ipoib_endpoint.c @ 236]

f792ad70 f667dc92 ipoib!__destroy_cb+0xb2
[n:\win-ibhost\trunk\core\complib\cl_obj.c @ 773]

f792ad8c f667e142 ipoib!__cl_async_proc_worker+0x92
[n:\win-ibhost\trunk\core\complib\cl_async_proc.c @ 153]

f792ada0 f667eafa ipoib!__cl_thread_pool_routine+0x52
[n:\win-ibhost\trunk\core\complib\cl_threadpool.c @ 67]

f792adac 80948bb2 ipoib!__thread_callback+0x2a
[n:\win-ibhost\trunk\core\complib\kernel\cl_thread.c @ 49]

f792addc 8088d4d2 nt!PspSystemThreadStartup+0x2e

00000000 00000000 nt!KiThreadStartup+0x16





1: kd> k

ChildEBP RetAddr

f7139ccc f68c8cb7 ibbus!__destroying_mcast+0x6
[n:\win-ibhost\trunk\core\al\al_mcast.c @ 234]

f7139d04 f68c7b31 ibbus!destroy_obj+0x347
[n:\win-ibhost\trunk\core\al\al_common.c @ 592]

f7139d18 f68afa1e ibbus!async_destroy_obj+0xc1
[n:\win-ibhost\trunk\core\al\al_common.c @ 471]

f7139d70 f6820212 ibbus!join_async_cb+0x31e
[n:\win-ibhost\trunk\core\al\al_mcast.c @ 535]

f7139d8c f6825dc2 ibbus!__cl_async_proc_worker+0x92
[n:\win-ibhost\trunk\core\complib\cl_async_proc.c @ 153]

f7139da0 f6827c3a ibbus!__cl_thread_pool_routine+0x52
[n:\win-ibhost\trunk\core\complib\cl_threadpool.c @ 67]

f7139dac 80948bb2 ibbus!__thread_callback+0x2a
[n:\win-ibhost\trunk\core\complib\kernel\cl_thread.c @ 49]

f7139ddc 8088d4d2 nt!PspSystemThreadStartup+0x2e

00000000 00000000 nt!KiThreadStartup+0x16











________________________________

From: Yossi Leybovich [mailto:***@dev.mellanox.co.il]
Sent: Tuesday, January 02, 2007 6:06 PM
To: Anatoly Lisenko; Leonid Keller; openib-***@openib.org
Cc: Tzahi Oved
Subject: RE: [Openib-windows] Win IBhost stop receive broadcast packets





can you put break point on __destroying_mcast ?

and send the back trace



We also don't see the join requests in the IB trace , can you recollect
the trace.






________________________________


From: Anatoly Lisenko [mailto:***@voltaire.com]
Sent: Tuesday, January 02, 2007 3:51 PM
To: Yossi Leybovich; Leonid Keller; openib-***@openib.org
Cc: Tzahi Oved
Subject: RE: [Openib-windows] Win IBhost stop receive broadcast
packets

Repro scenario:

1. reset IB switch
2. wait for sm start to reconfigure the fabric .
3. you should see: in ipoib log : "link down" -> .. ->
"link up"

in mthca log: attach -> ... ->
detach



I attach fresh log files : mthca (flags = 0x400), ipoib
(flags=0x122), ibal (full log prints)



I will send you later IB traces by CATC .



Thanks,

Anatoly




________________________________


From: Yossi Leybovich [mailto:***@dev.mellanox.co.il]
Sent: Tuesday, January 02, 2007 3:11 PM
To: Anatoly Lisenko; Leonid Keller; openib-***@openib.org
Cc: Tzahi Oved
Subject: RE: [Openib-windows] Win IBhost stop receive broadcast
packets



Is it reproduce able ? how ?

Does the SM resides on the switch ?





I think that there was error in the join process against the SM.

In normal behavior of IPoIB __bcast_cb() should be called after
IPoIB issued query of __port_join_bcast()



In your log there is not call to the __bcast_cb() callback,
this mean that IBAL did not get the answer and failed to create time out
for the query.



Can you collect IBAL traces so we can be sure that the query
returned ?

Can you also get IB traces between the SM and the IPoIB ?




________________________________


From: openib-windows-***@openib.org
[mailto:openib-windows-***@openib.org] On Behalf Of Anatoly Lisenko
Sent: Tuesday, January 02, 2007 1:57 PM
To: ***@mellanox.co.il; openib-***@openib.org
Cc: Tzahi Oved
Subject: [Openib-windows] Win IBhost stop receive
broadcast packets

Hi ,



I saw some problem with windows ibhost stack: reboot of
infiniband switch can cause ping loss ( even after ibsw get up ).

I start to research this anomaly and I saw:

1. ib stack doesn't receive broadcast arp packets.

2. All other packets unicast + multicast are received.

3. rx packets hca port counter increased each time
broadcast packet arrived

4. It seems that firmware drop this packet. ( I don't
see any completions )



I examined the logs and saw that somehow we fall into
state when :

1. hca's port joined to bcast group

2. ipoib qp detached from bcast group



This is stack backtrace of mlnx_detach_mcast func. :

f7125d10 f68ae6ab mthca!mlnx_detach_mcast+0x13
[n:\win-ibhost\trunk\hw\mthca\kernel\hca_mcast.c @ 142]

f7125d38 f68c9c30 ibbus!__cleanup_mcast+0x24b
[n:\win-ibhost\trunk\core\al\al_mcast.c @ 304]

f7125d70 f6820212 ibbus!async_destroy_cb+0x420
[n:\win-ibhost\trunk\core\al\al_common.c @ 665]

f7125d8c f6825dc2 ibbus!__cl_async_proc_worker+0x92
[n:\win-ibhost\trunk\core\complib\cl_async_proc.c @ 153]

f7125da0 f6827c3a ibbus!__cl_thread_pool_routine+0x52
[n:\win-ibhost\trunk\core\complib\cl_threadpool.c @ 67]

f7125dac 80948bb2 ibbus!__thread_callback+0x2a
[n:\win-ibhost\trunk\core\complib\kernel\cl_thread.c @ 49]

f7125ddc 8088d4d2 nt!PspSystemThreadStartup+0x2e

00000000 00000000 nt!KiThreadStartup+0x16





Mthca wpp log:

00000662 kernel 1236 600 2
312 01\02\2007-13:28:02:781 mlnx_query_ca()===>

00000663 kernel 1236 600 2
321 01\02\2007-13:28:02:781 mlnx_query_ca() :port 0
gid0:

00000664 kernel 1236 600 2
322 01\02\2007-13:28:02:781 mlnx_query_ca() :
0xfe80000000-0x08f14398095

00000665 kernel 1236 600 2
323 01\02\2007-13:28:02:781 mlnx_query_ca() :port 1
gid0:

00000666 kernel 1236 600 2
324 01\02\2007-13:28:02:781 mlnx_query_ca() :
0xfe80000000-0x08f14398096

00000667 kernel 1236 600 2
325 01\02\2007-13:28:02:781 mlnx_query_ca() :Space
required 1898 used 1898

00000668 kernel 1236 600 2
326 01\02\2007-13:28:02:781 mlnx_conv_hca_cap() :Port 1
port_guid 0x8f10403980095

00000669 kernel 1236 600 2
327 01\02\2007-13:28:02:781 mlnx_conv_hca_cap() :Port 2
port_guid 0x8f10403980096

00000670 kernel 1236 600 2
328 01\02\2007-13:28:02:781 mlnx_query_ca()<===

00000671 kernel 4 276 2
339 01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000672 kernel 4 276 2
340 01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth
89930EA8, qp_p 88A56E78, mlid c0, mgid ffff1b4012ff`ffffffff00000000

00000678 kernel 4 276 2
346 01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS

00000681 kernel 4 276 2
349 01\02\2007-13:28:02:859 mlnx_enable_cq_notify()===>

00000682 kernel 4 276 2
350 01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS

00000683 kernel 4 276 2
357 01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000684 kernel 4 276 2
358 01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth
898F1D68, qp_p 88A56E78, mlid 1c0, mgid ffff1b4012ff`100000000000000

00000685 kernel 4 276 2
359 01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS

00000686 kernel 4 276 2
362 01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000687 kernel 4 276 2
363 01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth
884352D8, qp_p 88A56E78, mlid 2c0, mgid ffff051412ff`30000c280010000

00000688 kernel 4 276 2
364 01\02\2007-13:28:02:859 completes with ERROR status
IB_SUCCESS

00000689 kernel 0 0 3
129 01\02\2007-13:28:02:750 mlnx_enable_cq_notify()===>

00000690 kernel 0 0 3
130 01\02\2007-13:28:02:750 completes with ERROR status
IB_SUCCESS

...

00000776 kernel 0 0 3
373 01\02\2007-13:28:03:109 mlnx_enable_cq_notify()===>

00000777 kernel 0 0 3
374 01\02\2007-13:28:03:109 completes with ERROR status
IB_SUCCESS

00000778 kernel 4 272 3
375 01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth
89918F40, qp_p 88A56E78, mlid 2c0, mgid ffff051412ff`30000c280010000

00000779 kernel 4 272 3
376 01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS

00000780 kernel 4 272 3
377 01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth
88DB3F00, qp_p 88A56E78, mlid 1c0, mgid ffff1b4012ff`100000000000000

00000781 kernel 4 272 3
378 01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS

00000782 kernel 4 272 3
379 01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth
88A48DA0, qp_p 88A56E78, mlid 6c0, mgid ffff051412ff`ffffa8ff00ff0000

00000783 kernel 4 272 3
380 01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS

00000784 kernel 4 272 3
381 01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth
88A94DD0, qp_p 88A56E78, mlid c0, mgid ffff1b4012ff`ffffffff00000000

00000785 kernel 4 272 3
382 01\02\2007-13:28:03:296 completes with ERROR status
IB_SUCCESS

00000786 kernel 4 280 1
383 01\02\2007-13:28:22:781 mlnx_query_ca()===>

00000787 kernel 4 280 1
384 01\02\2007-13:28:22:781 mlnx_query_ca() :port 0
gid0:

00000788 kernel 4 280 1
385 01\02\2007-13:28:22:781 mlnx_query_ca() :
0xfe80000000-0x08f14398095

00000789 kernel 4 280 1
386 01\02\2007-13:28:22:781 mlnx_query_ca() :port 1
gid0:

00000790 kernel 4 280 1
387 01\02\2007-13:28:22:781 mlnx_query_ca() :
0xfe80000000-0x08f14398096

00000791 kernel 4 280 1
388 01\02\2007-13:28:22:781 mlnx_query_ca() :Space
required 1898 used 1898

00000792 kernel 4 280 1
389 01\02\2007-13:28:22:781 mlnx_conv_hca_cap() :Port 1
port_guid 0x8f10403980095

00000793 kernel 4 280 1
390 01\02\2007-13:28:22:781 mlnx_conv_hca_cap() :Port 2
port_guid 0x8f10403980096

00000794 kernel 4 280 1
391 01\02\2007-13:28:22:781 mlnx_query_ca()<===





Ipoib wpp log:



00000130 kernel 0 0 0
130 01\02\2007-13:28:01:468 [IPoIB]
:ipoib_check_for_hang():]

00000131 kernel 4 280 0
133 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_mgr_reset_all():[

00000132 kernel 4 280 0
134 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_destroying():[

00000133 kernel 4 280 0
135 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_destroying():]

...

00000140 kernel 4 280 0
150 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_destroying():[

00000141 kernel 4 280 0
151 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_destroying():]

00000142 kernel 4 280 0
152 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_mgr_reset_all():]

00000143 kernel 4 280 0
160 01\02\2007-13:28:02:781 [IPoIB]
:ipoib_port_down():]

00000144 kernel 4 280 1
131 01\02\2007-13:28:02:781 [IPoIB] :__ipoib_pnp_cb()
:Link DOWN!

00000145 kernel 4 280 1
132 01\02\2007-13:28:02:781 [IPoIB]
:ipoib_port_down():[

00000146 kernel 4 312 1
153 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_cleanup():[

00000147 kernel 4 312 1
154 01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup()
:Leaving MCast group

00000148 kernel 4 312 1
164 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_cleanup():]

00000149 kernel 4 312 1
165 01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000150 kernel 4 312 1
166 01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000151 kernel 4 280 1
170 01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_up():[

00000152 kernel 4 280 1
171 01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_up():]

00000153 kernel 4 308 2
145 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_cleanup():[

00000154 kernel 4 308 2
147 01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup()
:Leaving MCast group

00000155 kernel 4 308 2
161 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_cleanup():]

00000156 kernel 4 308 2
162 01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000157 kernel 4 308 2
163 01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000158 kernel 4 276 2
191 01\02\2007-13:28:02:859 [IPoIB] :__bcast_cb():[

00000159 kernel 4 276 2
192 01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_add_bcast():[

00000160 kernel 4 276 2
193 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_endpt_create():[

00000161 kernel 4 276 2
194 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_endpt_create():]

00000162 kernel 4 276 2
195 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_endpt_set_mcast():[

00000163 kernel 4 276 2
196 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_endpt_set_mcast() :Create av for MAC: 00-00-00-00-00-00

00000164 kernel 4 276 2
197 01\02\2007-13:28:02:859 [IPoIB]
:__create_mcast_av():[

00000165 kernel 4 276 2
198 01\02\2007-13:28:02:859 [IPoIB]
:__create_mcast_av():]

00000166 kernel 4 276 2
199 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_endpt_set_mcast():]

00000167 kernel 4 276 2
200 01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert_locked():[

00000168 kernel 4 276 2
201 01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert_locked() :insert : MAC: FF-FF-FF-FF-FF-FF

00000169 kernel 4 276 2
202 01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert():[

00000170 kernel 4 276 2
203 01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert():]

00000171 kernel 4 276 2
204 01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_add_bcast():]

00000172 kernel 4 276 2
205 01\02\2007-13:28:02:859 [IPoIB]
:__ib_mgr_activate():[

00000173 kernel 4 276 2
206 01\02\2007-13:28:02:859 [IPoIB]
:__ib_mgr_activate():]

00000174 kernel 4 276 2
207 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_set_active():[

00000175 kernel 4 276 2
208 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_port_join_mcast():[

00000176 kernel 4 276 2
209 01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_ref():[

00000177 kernel 4 276 2
210 01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref()
:Look for : MAC: 01-00-5E-00-00-01

00000178 kernel 4 276 2
211 01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref()
:Failed endpoint lookup.[IpoIB] :__endpt_mgr_ref():]

00000179 kernel 4 276 2
212 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_endpt_create():[

00000180 kernel 4 276 2
213 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_endpt_create():]

00000181 kernel 4 276 2
214 01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert_locked():[

00000182 kernel 4 276 2
215 01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert_locked() :insert : MAC: 01-00-5E-00-00-01

00000183 kernel 4 276 2
216 01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert():[

00000184 kernel 4 276 2
217 01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert():]

00000185 kernel 4 276 2
218 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_port_join_mcast():]

00000186 kernel 4 276 2
219 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_port_join_mcast():[

00000187 kernel 4 276 2
220 01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_ref():[

00000188 kernel 4 276 2
221 01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref()
:Look for : MAC: 01-80-C2-00-00-03

00000189 kernel 4 276 2
222 01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref()
:Failed endpoint lookup.[IpoIB] :__endpt_mgr_ref():]

00000190 kernel 4 276 2
223 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_endpt_create():[

00000191 kernel 4 276 2
224 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_endpt_create():]

00000192 kernel 4 276 2
225 01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert_locked():[

00000193 kernel 4 276 2
226 01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert_locked() :insert : MAC: 01-80-C2-00-00-03

00000194 kernel 4 276 2
227 01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert():[

00000195 kernel 4 276 2
228 01\02\2007-13:28:02:859 [IPoIB]
:__endpt_mgr_insert():]

00000196 kernel 4 276 2
229 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_port_join_mcast():]

00000197 kernel 4 276 2
230 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_resume_oids():[

00000198 kernel 4 276 2
231 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_resume_oids():]

00000199 kernel 4 276 2
232 01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active()
:Link UP!

00000200 kernel 4 276 2
233 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_set_active():]

00000201 kernel 4 276 2
234 01\02\2007-13:28:02:859 [IPoIB] :__bcast_cb():]

00000202 kernel 4 276 2
235 01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():[

00000203 kernel 4 276 2
236 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_endpt_set_mcast():[

00000204 kernel 4 276 2
237 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_endpt_set_mcast() :Create av for MAC: 01-00-5E-00-00-01

00000205 kernel 4 276 2
238 01\02\2007-13:28:02:859 [IPoIB]
:__create_mcast_av():[

00000206 kernel 4 276 2
239 01\02\2007-13:28:02:859 [IPoIB]
:__create_mcast_av():]

00000207 kernel 4 276 2
240 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_endpt_set_mcast():]

00000208 kernel 4 276 2
241 01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():]

00000209 kernel 4 276 2
242 01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():[

00000210 kernel 4 276 2
243 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_endpt_set_mcast():[

00000211 kernel 4 276 2
244 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_endpt_set_mcast() :Create av for MAC: 01-80-C2-00-00-03

00000212 kernel 4 276 2
245 01\02\2007-13:28:02:859 [IPoIB]
:__create_mcast_av():[

00000213 kernel 4 276 2
246 01\02\2007-13:28:02:859 [IPoIB]
:__create_mcast_av():]

00000214 kernel 4 276 2
247 01\02\2007-13:28:02:859 [IPoIB]
:ipoib_endpt_set_mcast():]

00000215 kernel 4 276 2
248 01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():]

00000216 kernel 4 320 3
136 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_cleanup():[

00000217 kernel 4 320 3
139 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_cleanup():]

00000218 kernel 4 320 3
141 01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000219 kernel 4 320 3
143 01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000220 kernel 4 320 3
144 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_cleanup():[

00000221 kernel 4 320 3
146 01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup()
:Leaving MCast group

00000222 kernel 4 320 3
155 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_cleanup():]

00000223 kernel 4 320 3
156 01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000224 kernel 4 320 3
157 01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000225 kernel 4 320 3
158 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_cleanup():[

00000226 kernel 4 320 3
159 01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup()
:Leaving MCast group

00000227 kernel 4 320 3
167 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_cleanup():]

00000228 kernel 4 320 3
168 01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000229 kernel 4 320 3
169 01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000230 kernel 0 0 3
172 01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb():[

00000231 kernel 0 0 3
173 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_mgr_add_local():[

00000232 kernel 0 0 3
174 01\02\2007-13:28:02:781 [IPoIB]
:ipoib_endpt_create():[

00000233 kernel 0 0 3
175 01\02\2007-13:28:02:781 [IPoIB]
:ipoib_endpt_create():]

00000234 kernel 0 0 3
176 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_mgr_insert():[

00000235 kernel 0 0 3
177 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_mgr_insert():]

00000236 kernel 0 0 3
178 01\02\2007-13:28:02:781 [IPoIB]
:__endpt_mgr_add_local():]

00000237 kernel 0 0 3
179 01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb()
:Received port info: link width = 2.

00000238 kernel 0 0 3
180 01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate():[

00000239 kernel 0 0 3
181 01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate()
:Link speed is 2.5Gs

00000240 kernel 0 0 3
182 01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate()
:Link width is 4X

00000241 kernel 0 0 3
183 01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate():]

00000242 kernel 0 0 3
184 01\02\2007-13:28:02:781 [IPoIB]
:__port_get_bcast():[

00000243 kernel 0 0 3
185 01\02\2007-13:28:02:781 [IPoIB]
:__port_get_bcast():]

00000244 kernel 0 0 3
186 01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb():]

00000245 kernel 2624 2732 3
187 01\02\2007-13:28:02:781 [IPoIB] :__bcast_get_cb():[

00000246 kernel 2624 2732 3
188 01\02\2007-13:28:02:781 [IPoIB]
:__port_join_bcast():[

00000247 kernel 2624 2732 3
189 01\02\2007-13:28:02:781 [IPoIB]
:__port_join_bcast():]

00000248 kernel 2624 2732 3
190 01\02\2007-13:28:02:781 [IPoIB] :__bcast_get_cb():]

00000249 kernel 0 0 3
249 01\02\2007-13:28:03:109 [IPoIB]
:__endpt_mgr_get_by_gid():[

00000250 kernel 0 0 3
250 01\02\2007-13:28:03:109 [IPoIB]
:__endpt_mgr_get_by_gid():]

...

00000339 kernel 0 0 3
339 01\02\2007-13:28:03:109 [IPoIB]
:__endpt_mgr_get_by_gid():[

00000340 kernel 0 0 3
340 01\02\2007-13:28:03:109 [IPoIB]
:__endpt_mgr_get_by_gid():]

00000341 kernel 0 0 0
341 01\02\2007-13:28:03:468 [IPoIB]
:ipoib_check_for_hang():[





Thanks,

Anatoly
Yossi Leybovich
2007-01-02 17:16:47 UTC
Permalink
_____

From: Anatoly Lisenko [mailto:***@voltaire.com]
Sent: Tuesday, January 02, 2007 7:03 PM
To: Yossi Leybovich; Leonid Keller; openib-***@openib.org
Cc: Tzahi Oved
Subject: RE: [Openib-windows] Win IBhost stop receive broadcast packets



I have 2 backtraces :



1: kd> k

ChildEBP RetAddr

f792acbc f68c8cb7 ibbus!__destroying_mcast+0x6
[n:\win-ibhost\trunk\core\al\al_mcast.c @ 234]

f792acf4 f68c7b31 ibbus!destroy_obj+0x347
[n:\win-ibhost\trunk\core\al\al_common.c @ 592]

f792ad08 f68aff50 ibbus!async_destroy_obj+0xc1
[n:\win-ibhost\trunk\core\al\al_common.c @ 471]

f792ad20 f6640e2a ibbus!ib_leave_mcast+0x2c0
[n:\win-ibhost\trunk\core\al\al_mcast.c @ 561]

f792ad4c f6677882 ipoib!__endpt_cleanup+0x23a
[n:\win-ibhost\trunk\ulp\ipoib\kernel\ipoib_endpoint.c @ 236]

f792ad70 f667dc92 ipoib!__destroy_cb+0xb2
[n:\win-ibhost\trunk\core\complib\cl_obj.c @ 773]

f792ad8c f667e142 ipoib!__cl_async_proc_worker+0x92
[n:\win-ibhost\trunk\core\complib\cl_async_proc.c @ 153]

f792ada0 f667eafa ipoib!__cl_thread_pool_routine+0x52
[n:\win-ibhost\trunk\core\complib\cl_threadpool.c @ 67]

f792adac 80948bb2 ipoib!__thread_callback+0x2a
[n:\win-ibhost\trunk\core\complib\kernel\cl_thread.c @ 49]

f792addc 8088d4d2 nt!PspSystemThreadStartup+0x2e

00000000 00000000 nt!KiThreadStartup+0x16





1: kd> k

ChildEBP RetAddr

f7139ccc f68c8cb7 ibbus!__destroying_mcast+0x6
[n:\win-ibhost\trunk\core\al\al_mcast.c @ 234]

f7139d04 f68c7b31 ibbus!destroy_obj+0x347
[n:\win-ibhost\trunk\core\al\al_common.c @ 592]

f7139d18 f68afa1e ibbus!async_destroy_obj+0xc1
[n:\win-ibhost\trunk\core\al\al_common.c @ 471]

f7139d70 f6820212 ibbus!join_async_cb+0x31e
[n:\win-ibhost\trunk\core\al\al_mcast.c @ 535]
[Yossi Leybovich] you can see that this flow is called when the join req
failed .

Can you check with ib trace what is the fail reason , you can also check it
with kernel debugger.




/* Dereference the mcast object now that the SA operation is complete. */
if( status != IB_SUCCESS )
h_mcast->obj.pfn_destroy( &h_mcast->obj, NULL );
else
deref_al_obj( &h_mcast->obj );





f7139d8c f6825dc2 ibbus!__cl_async_proc_worker+0x92
[n:\win-ibhost\trunk\core\complib\cl_async_proc.c @ 153]

f7139da0 f6827c3a ibbus!__cl_thread_pool_routine+0x52
[n:\win-ibhost\trunk\core\complib\cl_threadpool.c @ 67]

f7139dac 80948bb2 ibbus!__thread_callback+0x2a
[n:\win-ibhost\trunk\core\complib\kernel\cl_thread.c @ 49]

f7139ddc 8088d4d2 nt!PspSystemThreadStartup+0x2e

00000000 00000000 nt!KiThreadStartup+0x16












_____


From: Yossi Leybovich [mailto:***@dev.mellanox.co.il]
Sent: Tuesday, January 02, 2007 6:06 PM
To: Anatoly Lisenko; Leonid Keller; openib-***@openib.org
Cc: Tzahi Oved
Subject: RE: [Openib-windows] Win IBhost stop receive broadcast packets





can you put break point on __destroying_mcast ?

and send the back trace



We also don't see the join requests in the IB trace , can you recollect the
trace.






_____


From: Anatoly Lisenko [mailto:***@voltaire.com]
Sent: Tuesday, January 02, 2007 3:51 PM
To: Yossi Leybovich; Leonid Keller; openib-***@openib.org
Cc: Tzahi Oved
Subject: RE: [Openib-windows] Win IBhost stop receive broadcast packets

Repro scenario:

1. reset IB switch

2. wait for sm start to reconfigure the fabric .

3. you should see: in ipoib log : "link down" -> .. -> "link up"

in mthca log: attach -> . -> detach



I attach fresh log files : mthca (flags = 0x400), ipoib (flags=0x122), ibal
(full log prints)



I will send you later IB traces by CATC .



Thanks,

Anatoly




_____


From: Yossi Leybovich [mailto:***@dev.mellanox.co.il]
Sent: Tuesday, January 02, 2007 3:11 PM
To: Anatoly Lisenko; Leonid Keller; openib-***@openib.org
Cc: Tzahi Oved
Subject: RE: [Openib-windows] Win IBhost stop receive broadcast packets



Is it reproduce able ? how ?

Does the SM resides on the switch ?





I think that there was error in the join process against the SM.

In normal behavior of IPoIB __bcast_cb() should be called after IPoIB issued
query of __port_join_bcast()



In your log there is not call to the __bcast_cb() callback, this mean that
IBAL did not get the answer and failed to create time out for the query.



Can you collect IBAL traces so we can be sure that the query returned ?

Can you also get IB traces between the SM and the IPoIB ?




_____


From: openib-windows-***@openib.org
[mailto:openib-windows-***@openib.org] On Behalf Of Anatoly Lisenko
Sent: Tuesday, January 02, 2007 1:57 PM
To: ***@mellanox.co.il; openib-***@openib.org
Cc: Tzahi Oved
Subject: [Openib-windows] Win IBhost stop receive broadcast packets

Hi ,



I saw some problem with windows ibhost stack: reboot of infiniband switch
can cause ping loss ( even after ibsw get up ).

I start to research this anomaly and I saw:

1. ib stack doesn't receive broadcast arp packets.

2. All other packets unicast + multicast are received.

3. rx packets hca port counter increased each time broadcast packet arrived

4. It seems that firmware drop this packet. ( I don't see any completions )



I examined the logs and saw that somehow we fall into state when :

1. hca's port joined to bcast group

2. ipoib qp detached from bcast group



This is stack backtrace of mlnx_detach_mcast func. :

f7125d10 f68ae6ab mthca!mlnx_detach_mcast+0x13
[n:\win-ibhost\trunk\hw\mthca\kernel\hca_mcast.c @ 142]

f7125d38 f68c9c30 ibbus!__cleanup_mcast+0x24b
[n:\win-ibhost\trunk\core\al\al_mcast.c @ 304]

f7125d70 f6820212 ibbus!async_destroy_cb+0x420
[n:\win-ibhost\trunk\core\al\al_common.c @ 665]

f7125d8c f6825dc2 ibbus!__cl_async_proc_worker+0x92
[n:\win-ibhost\trunk\core\complib\cl_async_proc.c @ 153]

f7125da0 f6827c3a ibbus!__cl_thread_pool_routine+0x52
[n:\win-ibhost\trunk\core\complib\cl_threadpool.c @ 67]

f7125dac 80948bb2 ibbus!__thread_callback+0x2a
[n:\win-ibhost\trunk\core\complib\kernel\cl_thread.c @ 49]

f7125ddc 8088d4d2 nt!PspSystemThreadStartup+0x2e

00000000 00000000 nt!KiThreadStartup+0x16





Mthca wpp log:

00000662 kernel 1236 600 2 312
01\02\2007-13:28:02:781 mlnx_query_ca()===>

00000663 kernel 1236 600 2 321
01\02\2007-13:28:02:781 mlnx_query_ca() :port 0 gid0:

00000664 kernel 1236 600 2 322
01\02\2007-13:28:02:781 mlnx_query_ca() :
0xfe80000000-0x08f14398095

00000665 kernel 1236 600 2 323
01\02\2007-13:28:02:781 mlnx_query_ca() :port 1 gid0:

00000666 kernel 1236 600 2 324
01\02\2007-13:28:02:781 mlnx_query_ca() :
0xfe80000000-0x08f14398096

00000667 kernel 1236 600 2 325
01\02\2007-13:28:02:781 mlnx_query_ca() :Space required 1898 used
1898

00000668 kernel 1236 600 2 326
01\02\2007-13:28:02:781 mlnx_conv_hca_cap() :Port 1 port_guid
0x8f10403980095

00000669 kernel 1236 600 2 327
01\02\2007-13:28:02:781 mlnx_conv_hca_cap() :Port 2 port_guid
0x8f10403980096

00000670 kernel 1236 600 2 328
01\02\2007-13:28:02:781 mlnx_query_ca()<===

00000671 kernel 4 276 2 339
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000672 kernel 4 276 2 340
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 89930EA8,
qp_p 88A56E78, mlid c0, mgid ffff1b4012ff`ffffffff00000000

00000678 kernel 4 276 2 346
01\02\2007-13:28:02:859 completes with ERROR status IB_SUCCESS

00000681 kernel 4 276 2 349
01\02\2007-13:28:02:859 mlnx_enable_cq_notify()===>

00000682 kernel 4 276 2 350
01\02\2007-13:28:02:859 completes with ERROR status IB_SUCCESS

00000683 kernel 4 276 2 357
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000684 kernel 4 276 2 358
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 898F1D68,
qp_p 88A56E78, mlid 1c0, mgid ffff1b4012ff`100000000000000

00000685 kernel 4 276 2 359
01\02\2007-13:28:02:859 completes with ERROR status IB_SUCCESS

00000686 kernel 4 276 2 362
01\02\2007-13:28:02:859 mlnx_attach_mcast()===>

00000687 kernel 4 276 2 363
01\02\2007-13:28:02:859 mlnx_attach_mcast() :mcasth 884352D8,
qp_p 88A56E78, mlid 2c0, mgid ffff051412ff`30000c280010000

00000688 kernel 4 276 2 364
01\02\2007-13:28:02:859 completes with ERROR status IB_SUCCESS

00000689 kernel 0 0 3 129
01\02\2007-13:28:02:750 mlnx_enable_cq_notify()===>

00000690 kernel 0 0 3 130
01\02\2007-13:28:02:750 completes with ERROR status IB_SUCCESS

...

00000776 kernel 0 0 3 373
01\02\2007-13:28:03:109 mlnx_enable_cq_notify()===>

00000777 kernel 0 0 3 374
01\02\2007-13:28:03:109 completes with ERROR status IB_SUCCESS

00000778 kernel 4 272 3 375
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 89918F40,
qp_p 88A56E78, mlid 2c0, mgid ffff051412ff`30000c280010000

00000779 kernel 4 272 3 376
01\02\2007-13:28:03:296 completes with ERROR status IB_SUCCESS

00000780 kernel 4 272 3 377
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88DB3F00,
qp_p 88A56E78, mlid 1c0, mgid ffff1b4012ff`100000000000000

00000781 kernel 4 272 3 378
01\02\2007-13:28:03:296 completes with ERROR status IB_SUCCESS

00000782 kernel 4 272 3 379
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88A48DA0,
qp_p 88A56E78, mlid 6c0, mgid ffff051412ff`ffffa8ff00ff0000

00000783 kernel 4 272 3 380
01\02\2007-13:28:03:296 completes with ERROR status IB_SUCCESS

00000784 kernel 4 272 3 381
01\02\2007-13:28:03:296 mlnx_detach_mcast() :mcasth 88A94DD0,
qp_p 88A56E78, mlid c0, mgid ffff1b4012ff`ffffffff00000000

00000785 kernel 4 272 3 382
01\02\2007-13:28:03:296 completes with ERROR status IB_SUCCESS

00000786 kernel 4 280 1 383
01\02\2007-13:28:22:781 mlnx_query_ca()===>

00000787 kernel 4 280 1 384
01\02\2007-13:28:22:781 mlnx_query_ca() :port 0 gid0:

00000788 kernel 4 280 1 385
01\02\2007-13:28:22:781 mlnx_query_ca() :
0xfe80000000-0x08f14398095

00000789 kernel 4 280 1 386
01\02\2007-13:28:22:781 mlnx_query_ca() :port 1 gid0:

00000790 kernel 4 280 1 387
01\02\2007-13:28:22:781 mlnx_query_ca() :
0xfe80000000-0x08f14398096

00000791 kernel 4 280 1 388
01\02\2007-13:28:22:781 mlnx_query_ca() :Space required 1898 used
1898

00000792 kernel 4 280 1 389
01\02\2007-13:28:22:781 mlnx_conv_hca_cap() :Port 1 port_guid
0x8f10403980095

00000793 kernel 4 280 1 390
01\02\2007-13:28:22:781 mlnx_conv_hca_cap() :Port 2 port_guid
0x8f10403980096

00000794 kernel 4 280 1 391
01\02\2007-13:28:22:781 mlnx_query_ca()<===





Ipoib wpp log:



00000130 kernel 0 0 0 130
01\02\2007-13:28:01:468 [IPoIB] :ipoib_check_for_hang():]

00000131 kernel 4 280 0 133
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_reset_all():[

00000132 kernel 4 280 0 134
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():[

00000133 kernel 4 280 0 135
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():]

...

00000140 kernel 4 280 0 150
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():[

00000141 kernel 4 280 0 151
01\02\2007-13:28:02:781 [IPoIB] :__endpt_destroying():]

00000142 kernel 4 280 0 152
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_reset_all():]

00000143 kernel 4 280 0 160
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_down():]

00000144 kernel 4 280 1 131
01\02\2007-13:28:02:781 [IPoIB] :__ipoib_pnp_cb() :Link DOWN!

00000145 kernel 4 280 1 132
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_down():[

00000146 kernel 4 312 1 153
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000147 kernel 4 312 1 154
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving MCast
group

00000148 kernel 4 312 1 164
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000149 kernel 4 312 1 165
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000150 kernel 4 312 1 166
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000151 kernel 4 280 1 170
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_up():[

00000152 kernel 4 280 1 171
01\02\2007-13:28:02:781 [IPoIB] :ipoib_port_up():]

00000153 kernel 4 308 2 145
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000154 kernel 4 308 2 147
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving MCast
group

00000155 kernel 4 308 2 161
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000156 kernel 4 308 2 162
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000157 kernel 4 308 2 163
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000158 kernel 4 276 2 191
01\02\2007-13:28:02:859 [IPoIB] :__bcast_cb():[

00000159 kernel 4 276 2 192
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_add_bcast():[

00000160 kernel 4 276 2 193
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[

00000161 kernel 4 276 2 194
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]

00000162 kernel 4 276 2 195
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[

00000163 kernel 4 276 2 196
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast() :Create
av for MAC: 00-00-00-00-00-00

00000164 kernel 4 276 2 197
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[

00000165 kernel 4 276 2 198
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]

00000166 kernel 4 276 2 199
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]

00000167 kernel 4 276 2 200
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked():[

00000168 kernel 4 276 2 201
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: FF-FF-FF-FF-FF-FF

00000169 kernel 4 276 2 202
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[

00000170 kernel 4 276 2 203
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]

00000171 kernel 4 276 2 204
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_add_bcast():]

00000172 kernel 4 276 2 205
01\02\2007-13:28:02:859 [IPoIB] :__ib_mgr_activate():[

00000173 kernel 4 276 2 206
01\02\2007-13:28:02:859 [IPoIB] :__ib_mgr_activate():]

00000174 kernel 4 276 2 207
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active():[

00000175 kernel 4 276 2 208
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():[

00000176 kernel 4 276 2 209
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref():[

00000177 kernel 4 276 2 210
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Look for :
MAC: 01-00-5E-00-00-01

00000178 kernel 4 276 2 211
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Failed
endpoint lookup.[IpoIB] :__endpt_mgr_ref():]

00000179 kernel 4 276 2 212
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[

00000180 kernel 4 276 2 213
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]

00000181 kernel 4 276 2 214
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked():[

00000182 kernel 4 276 2 215
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: 01-00-5E-00-00-01

00000183 kernel 4 276 2 216
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[

00000184 kernel 4 276 2 217
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]

00000185 kernel 4 276 2 218
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():]

00000186 kernel 4 276 2 219
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():[

00000187 kernel 4 276 2 220
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref():[

00000188 kernel 4 276 2 221
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Look for :
MAC: 01-80-C2-00-00-03

00000189 kernel 4 276 2 222
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_ref() :Failed
endpoint lookup.[IpoIB] :__endpt_mgr_ref():]

00000190 kernel 4 276 2 223
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():[

00000191 kernel 4 276 2 224
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_create():]

00000192 kernel 4 276 2 225
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked():[

00000193 kernel 4 276 2 226
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert_locked()
:insert : MAC: 01-80-C2-00-00-03

00000194 kernel 4 276 2 227
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():[

00000195 kernel 4 276 2 228
01\02\2007-13:28:02:859 [IPoIB] :__endpt_mgr_insert():]

00000196 kernel 4 276 2 229
01\02\2007-13:28:02:859 [IPoIB] :ipoib_port_join_mcast():]

00000197 kernel 4 276 2 230
01\02\2007-13:28:02:859 [IPoIB] :ipoib_resume_oids():[

00000198 kernel 4 276 2 231
01\02\2007-13:28:02:859 [IPoIB] :ipoib_resume_oids():]

00000199 kernel 4 276 2 232
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active() :Link UP!

00000200 kernel 4 276 2 233
01\02\2007-13:28:02:859 [IPoIB] :ipoib_set_active():]

00000201 kernel 4 276 2 234
01\02\2007-13:28:02:859 [IPoIB] :__bcast_cb():]

00000202 kernel 4 276 2 235
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():[

00000203 kernel 4 276 2 236
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[

00000204 kernel 4 276 2 237
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast() :Create
av for MAC: 01-00-5E-00-00-01

00000205 kernel 4 276 2 238
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[

00000206 kernel 4 276 2 239
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]

00000207 kernel 4 276 2 240
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]

00000208 kernel 4 276 2 241
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():]

00000209 kernel 4 276 2 242
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():[

00000210 kernel 4 276 2 243
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():[

00000211 kernel 4 276 2 244
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast() :Create
av for MAC: 01-80-C2-00-00-03

00000212 kernel 4 276 2 245
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():[

00000213 kernel 4 276 2 246
01\02\2007-13:28:02:859 [IPoIB] :__create_mcast_av():]

00000214 kernel 4 276 2 247
01\02\2007-13:28:02:859 [IPoIB] :ipoib_endpt_set_mcast():]

00000215 kernel 4 276 2 248
01\02\2007-13:28:02:859 [IPoIB] :__mcast_cb():]

00000216 kernel 4 320 3 136
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000217 kernel 4 320 3 139
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000218 kernel 4 320 3 141
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000219 kernel 4 320 3 143
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000220 kernel 4 320 3 144
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000221 kernel 4 320 3 146
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving MCast
group

00000222 kernel 4 320 3 155
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000223 kernel 4 320 3 156
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000224 kernel 4 320 3 157
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000225 kernel 4 320 3 158
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():[

00000226 kernel 4 320 3 159
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup() :Leaving MCast
group

00000227 kernel 4 320 3 167
01\02\2007-13:28:02:781 [IPoIB] :__endpt_cleanup():]

00000228 kernel 4 320 3 168
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():[

00000229 kernel 4 320 3 169
01\02\2007-13:28:02:781 [IPoIB] :__endpt_free():]

00000230 kernel 0 0 3 172
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb():[

00000231 kernel 0 0 3 173
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_add_local():[

00000232 kernel 0 0 3 174
01\02\2007-13:28:02:781 [IPoIB] :ipoib_endpt_create():[

00000233 kernel 0 0 3 175
01\02\2007-13:28:02:781 [IPoIB] :ipoib_endpt_create():]

00000234 kernel 0 0 3 176
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_insert():[

00000235 kernel 0 0 3 177
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_insert():]

00000236 kernel 0 0 3 178
01\02\2007-13:28:02:781 [IPoIB] :__endpt_mgr_add_local():]

00000237 kernel 0 0 3 179
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb() :Received port
info: link width = 2.

00000238 kernel 0 0 3 180
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate():[

00000239 kernel 0 0 3 181
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate() :Link speed is
2.5Gs

00000240 kernel 0 0 3 182
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate() :Link width is
4X

00000241 kernel 0 0 3 183
01\02\2007-13:28:02:781 [IPoIB] :ipoib_set_rate():]

00000242 kernel 0 0 3 184
01\02\2007-13:28:02:781 [IPoIB] :__port_get_bcast():[

00000243 kernel 0 0 3 185
01\02\2007-13:28:02:781 [IPoIB] :__port_get_bcast():]

00000244 kernel 0 0 3 186
01\02\2007-13:28:02:781 [IPoIB] :__port_info_cb():]

00000245 kernel 2624 2732 3 187
01\02\2007-13:28:02:781 [IPoIB] :__bcast_get_cb():[

00000246 kernel 2624 2732 3 188
01\02\2007-13:28:02:781 [IPoIB] :__port_join_bcast():[

00000247 kernel 2624 2732 3 189
01\02\2007-13:28:02:781 [IPoIB] :__port_join_bcast():]

00000248 kernel 2624 2732 3 190
01\02\2007-13:28:02:781 [IPoIB] :__bcast_get_cb():]

00000249 kernel 0 0 3 249
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():[

00000250 kernel 0 0 3 250
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():]

...

00000339 kernel 0 0 3 339
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():[

00000340 kernel 0 0 3 340
01\02\2007-13:28:03:109 [IPoIB] :__endpt_mgr_get_by_gid():]

00000341 kernel 0 0 0 341
01\02\2007-13:28:03:468 [IPoIB] :ipoib_check_for_hang():[





Thanks,

Anatoly

Loading...