This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Analysing Performance Issues with Storage (SMB2)

0

Hello,

i'm Windows-Server-Admin and not a network-technican and even not a WireShark-User. I'm analysing a performance-problem of an application.

The application reads and writes to a small config-file (ini) on a Network-Share. If the mentioned config-file is placed on our 3rd-Party-Storage the application-performance is bad. If the config-file is placed on a normal Microsoft Windows Share the application-performance is good.

I am trying to analyse and compare the szenarios with WireShark and I see big differences. Please find below a graphical comparison:

alt text

Link to full comparison-picture: http://www.fotos-hochladen.net/uploads/wiresharkcqgx07prlf.png The comparison shows the same process. (Application-Start)

The traffic looks like this when the config-file is on the 3rd-party-storage the whole time.

alt text

Link to full picture: http://www.fotos-hochladen.net/uploads/packetsyq1d5j0s96.png

When I "follow the TCP-Stream" in WireShark, I can see the content of the file xxxx times.

Does anybody has an idea on this? I can see, that something is wrong, but I don't know how to analyse this further ...

Thank you!

asked 23 Jan '17, 03:18

Panteraa's gravatar image

Panteraa
6124
accept rate: 0%

edited 23 Jan '17, 03:22


One Answer:

4

Hi Panteraa

I assume that your screenshot shows the connection of a single workstation to the server.

A good starting point for the analysis is the function Statistics -> Service Response Times -> SMB2 (or SMB, if your filer does not support SMB2).

The screenshot shows that the client is running in a tight loop and constantly repeats three operations

  • Open file
  • Read file
  • Close file

The time stamps from the second picture document delta times of a few milliseconds between the loop iterations. Although your question mentions write activities, they are not visible in the trace.

Tuning is not too hard:

  1. Make sure that your client caches the contents. This is the most important advise here.
  2. Really, stop running these tight loops. It causes a lot of overhead on your workstation, the server and the network.

Ok, you can't cache. It must just another stupid application. Here are a few more options:

  1. Use file notifications (SMB2 IoCtl Notify), if the file changes and the client needs to process an update right away.
  2. Try to use a file or directory lease, if your application, your developer, your client (Win 8.1 or higher), your Server (Win 2012 or newer) and most importantly you as an Admin understand it.
  3. Make sure, that the client uses proper locking mechanisms (since the file is only read the Create statement should specifiy "Read access").

There are a number of reasons, why the 3rd-party storage server could be slower than the Windows server. Among the possible reasons are:

  • The 3rd-party server uses SMB, and not SMB2
  • If SMB2 is used: The server does not grant sufficient credits (unlikely, given the small I/O sizes)
  • The 3rd-party server uses some virus scanner, which is overloaded
  • Half Duplex / Full Duplex mismatch on the switch port (only for 10/100 MBit)
  • The 3rd-party server is overloaded (Windows servers dedicate "all" (cough) memory to caching. With this I/O-frequency the file will never be squeezed out of the cache).
  • The 3rd-party server uses a poor TCP or Samba implementation (unlikely for EMC, NetApp etc.)
  • Disk architecture (RAID-5 with a lot of writes from another session, critical disk connected to USB 2 ...)
  • Excessive hardware errors (run diagnostics to be sure)

More details would require a closer look at trace files from both systems, the network architecture and the configuration of the two servers.

Good Hunting

answered 23 Jan '17, 12:40

packethunter's gravatar image

packethunter
2.1k71548
accept rate: 8%

Hi packethunter,

thank you for the effort and the extensive Information.

A good starting point for the analysis is the function Statistics -> Service Response Times -> SMB2

The filer supports SMB2. The stats look like this:

http://www.fotos-hochladen.net/uploads/stats7eckw51vqr.png

Really, stop running these tight loops. It causes a lot of overhead on your workstation, the server and the network.

That's my target, but I don't think the application is the cause, because when the configuration-file is placed on a Windows-Share it doesn't show the open-read-close loop behaviour.

.

I will have a closer look on your other suggestions, but I think I will need to get some paid consulting on this.

Thank you

(24 Jan '17, 01:32) Panteraa
1

One last thing to try before you fill out that purchase request:

The most likely candidat for your experience is the branch cache. This feature is described by multiple articles on MSDN. Here is a good introduction: https://technet.microsoft.com/en-us/library/dd637832(v=ws.10).aspx

Please compare the TreeConnect Responses from the storage server and the Windows system. There might be a few tiny differences, including individual bits which Wireshark may not (yet) interpret correctly.

The cache has to be enabled on a per-share base.

Good luck

(24 Jan '17, 13:16) packethunter

Hi packethunter,

I know Branch-Cache. I came into contact with it in a course for a Microsoft exam. We don't use it in our Company at all. The service ist not running and the role is not activated on the client. Would you say, that there is a setting on the storage that we could try to control the behaviour?

I have prepared a comparison between the TreeConnect of the two szenarios ...

Tree Connect Request: http://www.fotos-hochladen.net/uploads/01treeconnecq7wamg48io.png

Tree Connect Response: http://www.fotos-hochladen.net/uploads/02treeconnecth839ialsf1.png

Are you able to see anything suspicious?

I noticed, that the client requests 71 Credits from the 3rd-Party-Storage. The 3rd-Party-Storage grants 1 Credit.

In the other szenario the clients requests 95 Credits from the Windows-Server. The Windows-Server grants 33 Credits. Could this have anything to do with the issue?

Thank you!

(25 Jan '17, 03:51) Panteraa
1

There are only two extra bits set by the 3rd-party storage. I'm not aware, that any of the two bits would cause some client side caching.

It is noteworthy, that the Windows server is a lot faster (680 microseconds vs. 1400). The delay might be caused by the two extra hops between client and 3rd-party storage, at least partially (IP TTL 127 vs. 125 indicates 1 vs. 3 hops --- or a rather strange IP stack).


The Credits would be a limiting factor if the client issues either multiple requests or requests dealing with more than 64 kByte.

If you are interested in a quick way to visualize credits you might want to check my blog posting on packet-foo.com: https://blog.packet-foo.com/2016/10/trace-file-case-files-smb2-performance/

Please note, that the blog post discusses a different problem.


I am afraid, that I can not help you further without a full trace starting with the SMB protocol negotiation.

Good luck

(25 Jan '17, 13:57) packethunter
1

As @packethunter says, it's diffcult to analyse this type of problem without visibility of the trace. I did have one thought regarding the difference between the two devices, in particular regarding the amount of data moved across the network.

SMB2 has a mechanism called Leasing (in SMB1 it was called OpLock). A workstation can request to lease the data it gets from a file server. Leasing means that a workstation can hold file data in a memory buffer where an application can perform repeated operations on it before eventually flushing it to the file server. This improves application performance and reduces network load.

The Lease is requested when the file is opened (SMB2 Create Request) and granted in the Create Response. Wireshark still labels the fields involved as Oplock.

Try comparing the Oplock values in the Create Req and Rsp on the Windows share with that on the NAS.

If you want to analyze the SMB2 performance in detail, check out the TRANSUM plugin for Wireshark at https://community.tribelab.com/mod/page/view.php?id=492

TRANSUM gives you a breakdown of the response time of each SMB2 command, splitting it into server time and network time.

Best regards...Paul

(26 Jan '17, 00:11) PaulOfford

Hi,

I think I would be able to sanitize the trace-files and upload it, but I don't want you to make a time-consuming analysis of my data for free.

If you are interested in a quick way to visualize credits you might want to check my blog posting ...

Nice. I have visualized the credits in the trace:

https://picload.org/image/roawpgaa/iograph.png (https://img3.picload.org/image/roawcowp/iograph.png)

Leasing means that a workstation can hold file data in a memory buffer where ...

I also thought about a caching-issue. I find it strange, because I did also try to disable OpLocks on the Windows-Server and made a test again, but it was fast too ...!?

Try comparing the Oplock values in the Create Req and Rsp on the Windows share with that on the NAS.

I have compared the values. I noticed, that the client requests a "SMB2_CREATE_DURABLE_HANDLE_REQUEST" (DHnQ) in both szenarios. The Windows-Server contains the "DHnQ" in the Response, but the 3rd-party-storage not:

https://picload.org/image/roawclgg/requestresponse.png

All in all your answers were all helpfull. I will mark them, so you get reputation. I think I have collected enough material to give the case to the storage-team/manufacturer.

.

At Packethunter:

Erlauben Sie mir Sie unter der angegebenen E-Mail auf Ihrem Blog betreffend eines Angebots für eine Analyse in einer anderen Sache zu kontaktieren? Machen Sie so etwas? (mit Rechnung usw.)

Thank you

(27 Jan '17, 01:33) Panteraa
1

Sounds like a good time to engage with the storage manufacturer.

If you want quick start to SMB analysis you might want to take a look at the SMB2 Overview page on TribeLab - https://community.tribelab.com/mod/page/view.php?id=608

(30 Jan '17, 06:57) PaulOfford
showing 5 of 7 show 2 more comments