Discussion:
[ofw] Unstable operation of WinOFED 4.40.0 with HCA fw 2.30.3200 on Win 2012 Server
Alexey Novozhilov
2013-12-30 05:28:34 UTC
Permalink
Hi,

I've encountered an issue of unstable operation of WinOFED 4.40.0 with HCA
firmware 2.30.3200 on Windows 2012 Server.
Last week I'm going mad trying to get it working well.

I have a system of two Mellanox Infiniscale-IV IS5023 switches, three hosts
running Windows 2012 Server / Windows 2012 Server R2 and two hosts running
Ubuntu Linux 12.04 LTS. All hosts are equiped with Connect-X3 VPI Mellanox
network cards ( MCX354A-QCBT ). Each host is plugged into both switches,
swithes are connected to each other and use shared fabric. All hosts are
based on SuperMicro X9DRW-7TPF mainboards with Intel Xeon E5-2667 v2 CPUs
and DDR3-1866 memory.
WinOFED 4.40.0 in installed on Win Server 2012, WinOFED 4.55 on Win Server
2012R2. Both linux hosts are routers
MLNX_OFED_LINUX-2.0-3.0.0-ubuntu12.04-x86_64 packet installed, IB
interfaces are joined into active-backup bond by ifenslave means. Following
modules are loaded on Linux routers: mlx4_core, mlx4_ib, b_umad, ib_mad,
ib_ipoib, ib_uverbs. All HCAs are burned with 2.30.3200 firmware.
As test I use L3 icmp ping. In case of Linux - Linux communications all is
fine. I'm doing flood ping through Infiniband network with amazing results:
rtt min/avg/max/mdev = 0.011/0.013/1.682/0.002 ms. But I see different when
it goes to Windows.
Flood ping from linux host to w2012 gives almost same good latency numbers
(rtt min/avg/max/mdev = 0.022/0.024/2.492/0.021 ms), but packet loss rate
is always about 1-2%. At same time, IBping shows no packet loss at all and
ibdiagnet on Linux show no warnings or errors, so I conclude IB works good
and issue exist higher than L2.
So I've decided to try Win2012R2 with 4.55 OFED version. It resolved issue
with packet loss, but also gave latency growth: rtt min/avg/max/mdev =
0.093/0.102/17.550/0.101 ms. Digging this issue I've found that in other
system I have no issues like it, and the difference is a firmware version
of HCA. Win2012 with OFED 4.40.0 goes fine with firmware 2.11.500. But
downgrading to fw 2.11.500 on my servers didn't help, with all same
versions of fw and software still I see packet loss.
Still I want it all together - low latency, no packet loss, and latest
software and firmware versions.

Running out of ideas about it, any comments and advises are appreciated.

--
Regards,
Alexey
nicholas ferguso
2013-12-30 15:06:17 UTC
Permalink
Have you researched ICMP policies?



From: ofw-***@lists.openfabrics.org
[mailto:ofw-***@lists.openfabrics.org] On Behalf Of Alexey Novozhilov
Sent: Monday, December 30, 2013 12:29 AM
To: ***@lists.openfabrics.org
Subject: [ofw] Unstable operation of WinOFED 4.40.0 with HCA fw 2.30.3200 on
Win 2012 Server



Hi,



I've encountered an issue of unstable operation of WinOFED 4.40.0 with HCA
firmware 2.30.3200 on Windows 2012 Server.

Last week I'm going mad trying to get it working well.



I have a system of two Mellanox Infiniscale-IV IS5023 switches, three hosts
running Windows 2012 Server / Windows 2012 Server R2 and two hosts running
Ubuntu Linux 12.04 LTS. All hosts are equiped with Connect-X3 VPI Mellanox
network cards ( MCX354A-QCBT ). Each host is plugged into both switches,
swithes are connected to each other and use shared fabric. All hosts are
based on SuperMicro X9DRW-7TPF mainboards with Intel Xeon E5-2667 v2 CPUs
and DDR3-1866 memory.

WinOFED 4.40.0 in installed on Win Server 2012, WinOFED 4.55 on Win Server
2012R2. Both linux hosts are routers
MLNX_OFED_LINUX-2.0-3.0.0-ubuntu12.04-x86_64 packet installed, IB interfaces
are joined into active-backup bond by ifenslave means. Following modules are
loaded on Linux routers: mlx4_core, mlx4_ib, b_umad, ib_mad, ib_ipoib,
ib_uverbs. All HCAs are burned with 2.30.3200 firmware.

As test I use L3 icmp ping. In case of Linux - Linux communications all is
fine. I'm doing flood ping through Infiniband network with amazing results:
rtt min/avg/max/mdev = 0.011/0.013/1.682/0.002 ms. But I see different when
it goes to Windows.

Flood ping from linux host to w2012 gives almost same good latency numbers
(rtt min/avg/max/mdev = 0.022/0.024/2.492/0.021 ms), but packet loss rate is
always about 1-2%. At same time, IBping shows no packet loss at all and
ibdiagnet on Linux show no warnings or errors, so I conclude IB works good
and issue exist higher than L2.

So I've decided to try Win2012R2 with 4.55 OFED version. It resolved issue
with packet loss, but also gave latency growth: rtt min/avg/max/mdev =
0.093/0.102/17.550/0.101 ms. Digging this issue I've found that in other
system I have no issues like it, and the difference is a firmware version of
HCA. Win2012 with OFED 4.40.0 goes fine with firmware 2.11.500. But
downgrading to fw 2.11.500 on my servers didn't help, with all same versions
of fw and software still I see packet loss.

Still I want it all together - low latency, no packet loss, and latest
software and firmware versions.



Running out of ideas about it, any comments and advises are appreciated.



--

Regards,

Alexey
Tziporet Koren
2013-12-30 16:04:53 UTC
Permalink
I understand you are mainly using IPoIB.
We had some latency issue of IPoIB in WinOF 4.55.
We plan 4.60 GA release soon where this issue is resolved.

So I suggest you wait for this release.

Tziporet

From: ofw-***@lists.openfabrics.org [mailto:ofw-***@lists.openfabrics.org] On Behalf Of nicholas ferguso
Sent: Monday, December 30, 2013 5:07 PM
To: ***@lists.openfabrics.org
Subject: Re: [ofw] Unstable operation of WinOFED 4.40.0 with HCA fw 2.30.3200 on Win 2012 Server

Have you researched ICMP policies?

From: ofw-***@lists.openfabrics.org<mailto:ofw-***@lists.openfabrics.org> [mailto:ofw-***@lists.openfabrics.org] On Behalf Of Alexey Novozhilov
Sent: Monday, December 30, 2013 12:29 AM
To: ***@lists.openfabrics.org<mailto:***@lists.openfabrics.org>
Subject: [ofw] Unstable operation of WinOFED 4.40.0 with HCA fw 2.30.3200 on Win 2012 Server

Hi,

I've encountered an issue of unstable operation of WinOFED 4.40.0 with HCA firmware 2.30.3200 on Windows 2012 Server.
Last week I'm going mad trying to get it working well.

I have a system of two Mellanox Infiniscale-IV IS5023 switches, three hosts running Windows 2012 Server / Windows 2012 Server R2 and two hosts running Ubuntu Linux 12.04 LTS. All hosts are equiped with Connect-X3 VPI Mellanox network cards ( MCX354A-QCBT ). Each host is plugged into both switches, swithes are connected to each other and use shared fabric. All hosts are based on SuperMicro X9DRW-7TPF mainboards with Intel Xeon E5-2667 v2 CPUs and DDR3-1866 memory.
WinOFED 4.40.0 in installed on Win Server 2012, WinOFED 4.55 on Win Server 2012R2. Both linux hosts are routers MLNX_OFED_LINUX-2.0-3.0.0-ubuntu12.04-x86_64 packet installed, IB interfaces are joined into active-backup bond by ifenslave means. Following modules are loaded on Linux routers: mlx4_core, mlx4_ib, b_umad, ib_mad, ib_ipoib, ib_uverbs. All HCAs are burned with 2.30.3200 firmware.
As test I use L3 icmp ping. In case of Linux - Linux communications all is fine. I'm doing flood ping through Infiniband network with amazing results: rtt min/avg/max/mdev = 0.011/0.013/1.682/0.002 ms. But I see different when it goes to Windows.
Flood ping from linux host to w2012 gives almost same good latency numbers (rtt min/avg/max/mdev = 0.022/0.024/2.492/0.021 ms), but packet loss rate is always about 1-2%. At same time, IBping shows no packet loss at all and ibdiagnet on Linux show no warnings or errors, so I conclude IB works good and issue exist higher than L2.
So I've decided to try Win2012R2 with 4.55 OFED version. It resolved issue with packet loss, but also gave latency growth: rtt min/avg/max/mdev = 0.093/0.102/17.550/0.101 ms. Digging this issue I've found that in other system I have no issues like it, and the difference is a firmware version of HCA. Win2012 with OFED 4.40.0 goes fine with firmware 2.11.500. But downgrading to fw 2.11.500 on my servers didn't help, with all same versions of fw and software still I see packet loss.
Still I want it all together - low latency, no packet loss, and latest software and firmware versions.

Running out of ideas about it, any comments and advises are appreciated.

--
Regards,
Alexey
Alexey Novozhilov
2014-01-09 11:00:54 UTC
Permalink
Hello, Tziporet.

Thank you for reply. You're right, I use IPoIB.
Recently I've installed new WinOF and firmware. So things are going better
now, but still I have some issues with packet loss. As before, linux to
linux communications are good and linux to windows are not. I've ran few
flood ping tests to understand how does latency and loss rate depend on
WinOF and fw versions. Here is test summary:


Win 2012, WinOF 4.40, fw 2.30.3200
2156359 packets transmitted, 2127781 received, 1% packet loss, time 471502ms
rtt min/avg/max/mdev = 0.032/0.043/3.439/0.019 ms, ipg/ewma 0.218/0.043 ms

Win 2012, WinOF 4.40, fw 2.30.8000
197278 packets transmitted, 193214 received, 2% packet loss, time 57419ms
rtt min/avg/max/mdev = 0.020/0.022/3.742/0.026 ms, ipg/ewma 0.291/0.022 ms

Win 2012, WinOF 4.60, wf 2.30.8000
45086 packets transmitted, 44350 received, 1% packet loss, time 10878ms
rtt min/avg/max/mdev = 0.021/0.025/2.431/0.024 ms, ipg/ewma 0.241/0.023 ms

This case I had low packet loss rate and average latency.
Win 2012 R2, WinOF 4.60, fw 2.30.8000
699589 packets transmitted, 698741 received, 0% packet loss, time 57022ms
rtt min/avg/max/mdev = 0.041/0.051/12.418/0.073 ms, ipg/ewma 0.081/0.048 ms

The only case I had no packet loss, but high latency.
Win 2012 R2, WinOF 4.55, fw 2.30.8000
347256 packets transmitted, 347256 received, 0% packet loss, time 39444ms
rtt min/avg/max/mdev = 0.081/0.098/13.205/0.087 ms, pipe 2, ipg/ewma
0.113/0.097 ms


In short: WinOF 4.55 provides best reliability, while WinOF 4.40 provides
best latency.
Still looking forward to find solution to combine it into one.

--
Regards,
Alexey
Post by Tziporet Koren
I understand you are mainly using IPoIB.
We had some latency issue of IPoIB in WinOF 4.55.
We plan 4.60 GA release soon where this issue is resolved.
So I suggest you wait for this release.
Tziporet
*Sent:* Monday, December 30, 2013 5:07 PM
*Subject:* Re: [ofw] Unstable operation of WinOFED 4.40.0 with HCA fw
2.30.3200 on Win 2012 Server
Have you researched ICMP policies?
*On Behalf Of *Alexey Novozhilov
*Sent:* Monday, December 30, 2013 12:29 AM
*Subject:* [ofw] Unstable operation of WinOFED 4.40.0 with HCA fw
2.30.3200 on Win 2012 Server
Hi,
I've encountered an issue of unstable operation of WinOFED 4.40.0 with HCA
firmware 2.30.3200 on Windows 2012 Server.
Last week I'm going mad trying to get it working well.
I have a system of two Mellanox Infiniscale-IV IS5023 switches, three
hosts running Windows 2012 Server / Windows 2012 Server R2 and two hosts
running Ubuntu Linux 12.04 LTS. All hosts are equiped with Connect-X3 VPI
Mellanox network cards ( MCX354A-QCBT ). Each host is plugged into both
switches, swithes are connected to each other and use shared fabric. All
hosts are based on SuperMicro X9DRW-7TPF mainboards with Intel Xeon E5-2667
v2 CPUs and DDR3-1866 memory.
WinOFED 4.40.0 in installed on Win Server 2012, WinOFED 4.55 on Win Server
2012R2. Both linux hosts are routers
MLNX_OFED_LINUX-2.0-3.0.0-ubuntu12.04-x86_64 packet installed, IB
interfaces are joined into active-backup bond by ifenslave means. Following
modules are loaded on Linux routers: mlx4_core, mlx4_ib, b_umad, ib_mad,
ib_ipoib, ib_uverbs. All HCAs are burned with 2.30.3200 firmware.
As test I use L3 icmp ping. In case of Linux - Linux communications all is
rtt min/avg/max/mdev = 0.011/0.013/1.682/0.002 ms. But I see different when
it goes to Windows.
Flood ping from linux host to w2012 gives almost same good latency numbers
(rtt min/avg/max/mdev = 0.022/0.024/2.492/0.021 ms), but packet loss rate
is always about 1-2%. At same time, IBping shows no packet loss at all and
ibdiagnet on Linux show no warnings or errors, so I conclude IB works good
and issue exist higher than L2.
So I've decided to try Win2012R2 with 4.55 OFED version. It resolved issue
with packet loss, but also gave latency growth: rtt min/avg/max/mdev =
0.093/0.102/17.550/0.101 ms. Digging this issue I've found that in other
system I have no issues like it, and the difference is a firmware version
of HCA. Win2012 with OFED 4.40.0 goes fine with firmware 2.11.500. But
downgrading to fw 2.11.500 on my servers didn't help, with all same
versions of fw and software still I see packet loss.
Still I want it all together - low latency, no packet loss, and latest
software and firmware versions.
Running out of ideas about it, any comments and advises are appreciated.
--
Regards,
Alexey
_______________________________________________
ofw mailing list
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
Alexey Novozhilov
2014-01-09 10:58:31 UTC
Permalink
Hello, Nicolas!

Thank you for comment.
Firewalls are disabled on Windows hosts and allow-all policy is set for IB
network on Linux hosts. I see nothing can prevent packets from reaching
destination.
Post by nicholas ferguso
Have you researched ICMP policies?
*Sent:* Monday, December 30, 2013 12:29 AM
*Subject:* [ofw] Unstable operation of WinOFED 4.40.0 with HCA fw
2.30.3200 on Win 2012 Server
Hi,
I've encountered an issue of unstable operation of WinOFED 4.40.0 with HCA
firmware 2.30.3200 on Windows 2012 Server.
Last week I'm going mad trying to get it working well.
I have a system of two Mellanox Infiniscale-IV IS5023 switches, three
hosts running Windows 2012 Server / Windows 2012 Server R2 and two hosts
running Ubuntu Linux 12.04 LTS. All hosts are equiped with Connect-X3 VPI
Mellanox network cards ( MCX354A-QCBT ). Each host is plugged into both
switches, swithes are connected to each other and use shared fabric. All
hosts are based on SuperMicro X9DRW-7TPF mainboards with Intel Xeon E5-2667
v2 CPUs and DDR3-1866 memory.
WinOFED 4.40.0 in installed on Win Server 2012, WinOFED 4.55 on Win Server
2012R2. Both linux hosts are routers
MLNX_OFED_LINUX-2.0-3.0.0-ubuntu12.04-x86_64 packet installed, IB
interfaces are joined into active-backup bond by ifenslave means. Following
modules are loaded on Linux routers: mlx4_core, mlx4_ib, b_umad, ib_mad,
ib_ipoib, ib_uverbs. All HCAs are burned with 2.30.3200 firmware.
As test I use L3 icmp ping. In case of Linux - Linux communications all is
rtt min/avg/max/mdev = 0.011/0.013/1.682/0.002 ms. But I see different when
it goes to Windows.
Flood ping from linux host to w2012 gives almost same good latency numbers
(rtt min/avg/max/mdev = 0.022/0.024/2.492/0.021 ms), but packet loss rate
is always about 1-2%. At same time, IBping shows no packet loss at all and
ibdiagnet on Linux show no warnings or errors, so I conclude IB works good
and issue exist higher than L2.
So I've decided to try Win2012R2 with 4.55 OFED version. It resolved issue
with packet loss, but also gave latency growth: rtt min/avg/max/mdev =
0.093/0.102/17.550/0.101 ms. Digging this issue I've found that in other
system I have no issues like it, and the difference is a firmware version
of HCA. Win2012 with OFED 4.40.0 goes fine with firmware 2.11.500. But
downgrading to fw 2.11.500 on my servers didn't help, with all same
versions of fw and software still I see packet loss.
Still I want it all together - low latency, no packet loss, and latest
software and firmware versions.
Running out of ideas about it, any comments and advises are appreciated.
--
Regards,
Alexey
_______________________________________________
ofw mailing list
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
Loading...