This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Exchange 2010 and SMTP errors

0

Hello all, I have Exchange 2010 running on a Win 2012 R2 VM We use ProofPoint Essentials as our mail filtering service. I am having problems receiving emails In the receive log I see 354 Start Mail Input, then nothing comes through, it just dies and ends connection. When I look on the ProofPoint Admin Console the errors relate to Time Outs.

I did a capture, but am not sure what I should look for and what is causing the problem. I would appreciate any input.... I have the capture file, not sure where I can upload it.

asked 23 Oct '17, 15:30

sethdunn96's gravatar image

sethdunn96
6113
accept rate: 0%

I also did a capture of a successful send/receive on my server...

(23 Oct '17, 15:59) sethdunn96

This was a successful send capture from one of our AV Gateway boxes to our Exchange server https://drive.google.com/open?id=0B4PA4PyuOxmWdmNNTlRhdEw4aTg

This is the packet capture of the failed receive from ProofPoint server: https://drive.google.com/open?id=0B4PA4PyuOxmWd1ZNZkRIVUk4WDQ

(23 Oct '17, 16:09) sethdunn96

In the failed capture there seems to be packet loss from 67.231.154.164 (frame 175-176) to 10.77.50.25.

10.77.50.25 reports this packet loss by sending DupACKs (frame 178 ff.) ACKing frame 175 + additional data with "Selective ACKs" (SACK option). Furthermore it seems that 67.231.154.164 received these DupACKs (the number of RST packets is more or less the amount of DupACKs).

Two possible reasons which comes to my mind:

  • There is a middlebox (e.g. a firewall) between client and server altering packets with SACK.
  • The client can't handle SACK
(24 Oct '17, 12:40) Uli

So how would I rectify this? When we had Exchange 2003 running on Windows 2003, the setup was the same as it is now, but there wasn't this problem.

This seems to be Windows 2012 R2 and Exchange 2010 SP3 thing.... (This is a VM running on a Windows 2012 R2 Host)

(24 Oct '17, 14:03) sethdunn96

To debug this further my next steps would be:

  • Take more sample captures to find/verify a pattern
  • If the pattern is "Issue with SACK packets" I would go on and disable SACK, take more captures, verify...
  • Checking the middleboxes on the path client <-> server

Site node: SACK was enabled by default with Windows 2003 too.

(24 Oct '17, 22:40) Uli

What does SACK do/don't do in regards to the communication?

Cause one thing I notice when I watch the real-time log on my firewall. The ProofPoint server will build an inbound connection to my exchange server....then it will rattle off a bunch of "Deny TCP (no connection)...."with a few diff. flags.... i.e. RST, RST and ACK, FIN and ACK But I think it is mostly just the ACK flag

(25 Oct '17, 04:29) sethdunn96

So it seems that SACK is enabled by default, whether it is in the registry or not. What would the benefit of disabling it be?

(25 Oct '17, 07:28) sethdunn96

What would the benefit of disabling SACK be?

if the middlebox handles SACK incorrectly, it would have nothing to spoil if SACK would not be there at all. Better to have a slower but reliable connection than to have a breaking down one.

(25 Oct '17, 11:09) sindy

Ok, I will try disabling it tonight then and see what happens.

(25 Oct '17, 11:28) sethdunn96

So I Enabled PMTU and had SACKOpt Disabled. MTU on the interface was 1500, according to netsh command. Problem still persisted, I did a wireshark capture and it looked the same as the capture I posted above. The server was rebooted before this test.

Then I Disabled PMTU, Enabled SACKOpts Changed interface MTU to 1470 Rebooted server. Ran wireshark...the packets definitely were smaller coming in, but the results were the same. Here is the capture: https://drive.google.com/open?id=0B4PA4PyuOxmWQXhGMVd3d3RWMGc

(25 Oct '17, 15:56) sethdunn96

Tried disabling Chunking and BinaryMime, no dice. https://drive.google.com/open?id=0B4PA4PyuOxmWOV9SSXNrUWEwaVk

(25 Oct '17, 16:18) sethdunn96

When I watch the log on my Firewall (PIX 515E) This is the communication. The connection is built then immediate tear down from the outside (ProofPoint Server)

6|Oct 25 2017|19:44:59|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/50138 to 216.54.104.225/25 flags RST on interface outside 6|Oct 25 2017|19:44:59|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/50138 to 216.54.104.225/25 flags RST on interface outside 6|Oct 25 2017|19:44:59|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/50138 to 216.54.104.225/25 flags RST on interface outside 6|Oct 25 2017|19:44:59|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/50138 to 216.54.104.225/25 flags RST on interface outside 6|Oct 25 2017|19:44:59|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/50138 to 216.54.104.225/25 flags RST on interface outside 6|Oct 25 2017|19:44:59|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/50138 to 216.54.104.225/25 flags RST on interface outside 6|Oct 25 2017|19:44:59|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/50138 to 216.54.104.225/25 flags RST on interface outside 6|Oct 25 2017|19:44:59|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/50138 to 216.54.104.225/25 flags RST on interface outside 6|Oct 25 2017|19:44:59|106015|Exchange|67.231.154.164|Deny TCP (no connection) from Exchange/25 to 67.231.154.164/50138 flags ACK on interface DMZ4 6|Oct 25 2017|19:44:59|302014|67.231.154.164|Exchange|Teardown TCP connection 3913035 for outside:67.231.154.164/50138 to DMZ4:Exchange/25 duration 0:00:00 bytes 189250 TCP Reset-O 6|Oct 25 2017|19:44:58|302013|67.231.154.164|Exchange|Built inbound TCP connection 3913035 for outside:67.231.154.164/50138 (67.231.154.164/50138) to DMZ4:Exchange/25 (216.54.104.225/25)

(25 Oct '17, 16:59) sethdunn96

Here is the Log from my firewall (PIX 515E) Connection is built, then immediate tear down (from the ProofPoint Server)

6|Oct 25 2017|19:44:59|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/50138 to 216.54.104.225/25 flags RST on interface outside 6|Oct 25 2017|19:44:59|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/50138 to 216.54.104.225/25 flags RST on interface outside 6|Oct 25 2017|19:44:59|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/50138 to 216.54.104.225/25 flags RST on interface outside 6|Oct 25 2017|19:44:59|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/50138 to 216.54.104.225/25 flags RST on interface outside 6|Oct 25 2017|19:44:59|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/50138 to 216.54.104.225/25 flags RST on interface outside 6|Oct 25 2017|19:44:59|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/50138 to 216.54.104.225/25 flags RST on interface outside 6|Oct 25 2017|19:44:59|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/50138 to 216.54.104.225/25 flags RST on interface outside 6|Oct 25 2017|19:44:59|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/50138 to 216.54.104.225/25 flags RST on interface outside 6|Oct 25 2017|19:44:59|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/50138 to 216.54.104.225/25 flags RST on interface outside 6|Oct 25 2017|19:44:59|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/50138 to 216.54.104.225/25 flags RST on interface outside 6|Oct 25 2017|19:44:59|106015|Exchange|67.231.154.164|Deny TCP (no connection) from Exchange/25 to 67.231.154.164/50138 flags ACK on interface DMZ4 6|Oct 25 2017|19:44:59|302014|67.231.154.164|Exchange|Teardown TCP connection 3913035 for outside:67.231.154.164/50138 to DMZ4:Exchange/25 duration 0:00:00 bytes 189250 TCP Reset-O 6|Oct 25 2017|19:44:58|302013|67.231.154.164|Exchange|Built inbound TCP connection 3913035 for outside:67.231.154.164/50138 (67.231.154.164/50138) to DMZ4:Exchange/25 (216.54.104.225/25)

(25 Oct '17, 17:00) sethdunn96

The capture above shows that the connection runs for a while until some packets stop getting through. To verify whether it is a network issue or an issue of the sending client, you would have to capture simultaneously at the client and at the server and then compare the captures.

However, the security devices sometimes behave funny when they think something is wrong. Could it be that the issue is related to contents of one of the messages, so the security device would use rough means to stop its sending because it has classified its contents as malicious?

To avoid influence of eventual security applications or hardware acceleration on the client and the server, you should not capture on those machines directly but using other machines and network tap or monitoring port.

The final result should be identification of the element of the chain which causes trouble.

(26 Oct '17, 01:22) sindy

I know on this one particular email that is trying to be sent it has an attachment...a .wav file I believe. But I don't have anything enabled on the Firewall to stop email from being sent. It wasn't configured to do so when we were on Exchange 2003, and I have not altered anything when I migrated to Exchange 2010.

I will have to see if I can set up a capture from another machine, I know the Cisco Switch I have the machines going into, I have a port set up for promiscuous mode.

(26 Oct '17, 04:13) sethdunn96

Here is the capture done from a different server: https://drive.google.com/open?id=0B4PA4PyuOxmWUy12aFJuOFdIaHc The traffic looks similar as to when it was done on Exchange server.

(26 Oct '17, 05:07) sethdunn96

As said you need to capture as close as possible to the client and as close as possible to the server, while "client" and "server" are roles of machines in the TCP session used as transport for the SMTP session. So as you say you have trouble receiving emails, I guess your MS Exchange is the server and some machine in the internet is the client. As the server has a private address while the client has a public one, I assume that there is a port forwarding set up somewhere.

If my guess is correct, capture simultaneously next to the uplink connection to internet (so that you could see whether the packets are missing already at the edge of your company network) and next to the Exchange server. If the packets missing at Exchange side can be found at uplink side, something in your network is dropping them; if they are missing already at the uplink, something outside your network is dropping them.

(26 Oct '17, 05:27) sindy

I think at this point it has to do with the network adapters and Exchange being a VM. I did some file transfer testing My PC at home has an IPSec tunnel built to the servers. So I transferred a folder (110 MB, using my machine at home) from one Physical Server on the LAN I am tunneled into, to another Physical Server (win 2012 R2) on the LAN. Transfer speed shows >500 KB/sec If I do the same folder transfer to my Exchange (VM) the speed is only ~150 KB/sec....and at times it looks like things time out....then starts over again.

I have the latest drivers installed on the Host machine (I will check HP's site again to be sure). But is there any Adapter Advanced settings I should look into? I have disabled VMQ on the adapters as well as in Hyper-V.

(26 Oct '17, 05:54) sethdunn96

A blind shot then... look at the answer to this question. Althought the symptoms are just loosely relevant, maybe the sending client isn't patient enough and that causes the session to fail?

(26 Oct '17, 06:09) sindy

I have been running Wireshark on the VM, but the problem persists.

I have went in and made a few changes to the adapter (both on host and VM) File transfer speeds to the VM are now on par with transfer speeds to physical machines.... But still the Email issues persists. :(

I disabled VMQ in the registry of the Host machine, stumbled across this thread (the OP enabled it, I disabled it) https://www.reddit.com/r/sysadmin/comments/2k7jn5/after_2_years_i_have_finally_solved_my_slow/

Then made sure that VMQ is disabled in Hyper-V as well as the physical adapter.

(26 Oct '17, 07:04) sethdunn96

A simultaneous capture at the Exchange VM itself and at a mirror of a switch port to which the physical server on which the VM is running is connected should tell you whether VMware and the network card drivers are actually the root cause or not. Same approach - if you can see the packet on the wire but not on the VM, something in between has dropped it. Otherwise, dig in another direction.

(26 Oct '17, 07:13) sindy

I will see how I can do that... Because the server has 4 ethernet ports. So I have a dedicated port for the VM and the host machine has it's own port. On the host machine, the ethernet port I use for the VM, I do not have any IP information assigned to it.

(26 Oct '17, 07:21) sethdunn96

You don't need IP settings to be attached to a network card in order to capture on it. If you know to which port of a switch the dedicated Ethernet port of the host machine is connected, mirror that switch port to another one and capture on the mirroring port using a physical machine. You'll see the packets to/from the IP address of the VM's virtual port in the capture. Even though the whole path from the Eth port of the host to the virtual Eth port of the VM is expected to act as a switch, it may drop packets (which is what you need to confirm or prove false).

(26 Oct '17, 07:26) sindy

So I did a packet capture. One from the VM and the source IP The other capture from another server for the IP addresses of the VM and the Source IP For this particular conversation, both instances capture the same amount of packets. 232 At the end of the capture, both instances show the same entry. So it looks like the data is the same, correct?

Capture from External Server: https://drive.google.com/open?id=0B4PA4PyuOxmWbEtmYjlOR3BUckE

Capture from the VM itself. https://drive.google.com/open?id=0B4PA4PyuOxmWVFo4TWpRbkRZODg

(26 Oct '17, 07:33) sethdunn96

Well, I wouldn't look so much at the number of packets and the last one to be captured as I don't know how precisely you can synchronise start and stop of the capture on the two machines :) What is relevant is that, in both capture files,

  • the last packet from client to server before the trouble starts has tcp.seq 145080, and
  • the first "black" one ("previous segment not captured") comes right after it and has tcp.seq 150552.

This means that the network path between these two capture points is not the source of the trouble as the trouble is present at both capture points. So now reuse whichever one is more convenient for you of these two capture points, find another one closer to the source (the SMTP client), and try again.

(26 Oct '17, 07:46) sindy

For the synchronize aspect. I start the capture and filter on the IP addresses relevant to the capture. Then from the ProofPoint Admin console, I do a resend on an email that is hung in queue there. This way the traffic captured, is done at the same time.

All the traffic once inside the firewall is on the same Cisco Switch (3 VLANs on the switch). Exchange is on VLAN3, the external capture I did from the other server was on VLAN2 So I am not sure how I would setup another capture point.

Would you suspect that my firewall for whatever reason is dropping a packet?

(26 Oct '17, 07:52) sethdunn96

Would you suspect that my firewall for whatever reason is dropping a packet?

Yes, it is one of possible explanations.

(26 Oct '17, 07:58) sindy

Only thing I can come up with now, but I am not seeing where it would be dropping it. As I said the config on the Firewall is the same as when Exchange 2003 was in the picture. And I am not seeing interface errors on the associated interfaces with the flow of traffic.

This looks to me like it is a problem with Win 2012 R2/Exchange 2010 being a VM But I am completely stumped as to why it is happening.

(26 Oct '17, 08:02) sethdunn96

So the sending client is the ProofPoint itself and both the ProofPoint and the receiving server are under your management access, and something routes between the client's subnet in VLAN 2 and the server's subnet in VLAN 3?

(26 Oct '17, 08:02) sindy

ProofPoint is an ext. service provider, we just pay for their filtering services. I have an admin console where I can log in and check the queues and see what is being hung up there for whatever reason.

We have a Cisco PIX 515E in place, due to PCI Compliance we have had to segment the network, so there are 3 LANs. 2 LANs are secured for PCI Compliance, the 3rd party server I ran the capture from was done from one of these segments The other LAN is not monitored for PCI Compliance, this is where my Exchange server sits.

The PIX has 6 ethernet ports, so that is how traffic is sent between the LANs as well as from Outside (Public) to inside

(26 Oct '17, 08:09) sethdunn96

So I went into the registry on both Host and VM I set the jumbo packets to 9014 for all adapters available. On the Host Machine, I set all entries for VMQ to disabled. All done in the registry.

Powered down VM, then reboot Host machine, brought back up VM Problem still persists.

(26 Oct '17, 10:03) sethdunn96

I think you are too stuck to the idea that the change of Exchange version is guilty. But the missing packet is to the Exchange, so even if change of version is guilty, it affects something on the sending side, not in the VM settings. As reasoned above, the host and VM do not seem to steal the missing packets.

So where did you capture? Between ProofPoint and your PIX or between the PIX and the Exchange? The capture suggests that between the PIX and the Exchange because the source and destination MAC addresses are the same in both captures and I assume the PIX acts at L3. So if you can capture at both sides of the PIX (the one facing towards internet/ProofPoint and the one facing towards the Exchange VM), you should see whether it is your PIX which is stealing the packets or something even closer to the ProofPoint.

(26 Oct '17, 12:54) sindy

The reason I think it is Exchange or it being a VM "feature" is due to the fact that the traffic does seem to get to Exchange, but then something goes sideways the longer it takes to deliver a message (being an attachment present).

Yes packet captures were done on the inside of the PIX. Don't have a way to do a capture using the outside of the PIX. I can watch the Log of the PIX during a transaction with the ProofPoint server, and the initial inbound connection is built, and then instantly it gets tore down, 6|Oct 27 2017|07:31:14|106015|67.231.154.164|216.54.104.225|Deny TCP (no connection) from 67.231.154.164/58726 to 216.54.104.225/25 flags FIN ACK on interface outside

........ 6|Oct 27 2017|07:31:11|106015|148.163.129.52|216.54.104.225|Deny TCP (no connection) from 148.163.129.52/47238 to 216.54.104.225/25 flags RST on interface outside

6|Oct 27 2017|07:31:11|106015|148.163.129.52|216.54.104.225|Deny TCP (no connection) from 148.163.129.52/47238 to 216.54.104.225/25 flags RST on interface outside

6|Oct 27 2017|07:31:11|106015|148.163.129.52|216.54.104.225|Deny TCP (no connection) from 148.163.129.52/47238 to 216.54.104.225/25 flags RST on interface outside

6|Oct 27 2017|07:31:11|106015|148.163.129.52|216.54.104.225|Deny TCP (no connection) from 148.163.129.52/47238 to 216.54.104.225/25 flags RST on interface outside

6|Oct 27 2017|07:31:11|106015|148.163.129.52|216.54.104.225|Deny TCP (no connection) from 148.163.129.52/47238 to 216.54.104.225/25 flags RST on interface outside

6|Oct 27 2017|07:31:11|106015|148.163.129.52|216.54.104.225|Deny TCP (no connection) from 148.163.129.52/47238 to 216.54.104.225/25 flags RST on interface outside

6|Oct 27 2017|07:31:11|106015|148.163.129.52|216.54.104.225|Deny TCP (no connection) from 148.163.129.52/47238 to 216.54.104.225/25 flags RST on interface outside

6|Oct 27 2017|07:31:11|106015|148.163.129.52|216.54.104.225|Deny TCP (no connection) from 148.163.129.52/47238 to 216.54.104.225/25 flags RST on interface outside

6|Oct 27 2017|07:31:11|106015|148.163.129.52|216.54.104.225|Deny TCP (no connection) from 148.163.129.52/47238 to 216.54.104.225/25 flags RST on interface outside

6|Oct 27 2017|07:31:10|302013|148.163.129.52|Exchange|Built inbound TCP connection 4512210 for outside:148.163.129.52/47238 (148.163.129.52/47238) to DMZ4:Exchange/25 (216.54.104.225/25)

(27 Oct '17, 04:34) sethdunn96

I had to remove a bunch of the "Deny TCP (no connection)" entries due to character limit.

(27 Oct '17, 04:34) sethdunn96

The one email that was hung up, went through this morning after 3 days of trying. 10.77.50.25:25,67.231.154.164:49860,+,,

10.77.50.25:25,67.231.154.164:49860,*,SMTPSubmit SMTPAcceptAnySender SMTPAcceptAuthoritativeDomainSender AcceptRoutingHeaders,Set Session Permissions

10.77.50.25:25,67.231.154.164:49860,*,SMTPSubmit SMTPAcceptAnyRecipient SMTPAcceptAuthenticationFlag SMTPAcceptAnySender SMTPAcceptAuthoritativeDomainSender BypassAntiSpam BypassMessageSizeLimit SMTPAcceptEXCH50 AcceptRoutingHeaders,Set Session Permissions

10.77.50.25:25,67.231.154.164:49860,>,"220 exch.d2ms.com Microsoft ESMTP MAIL Service ready at Fri, 27 Oct 2017 06:57:03 -0400",

10.77.50.25:25,67.231.154.164:49842,-,,Remote

10.77.50.25:25,67.231.154.164:49850,-,,Remote

10.77.50.25:25,67.231.154.164:49852,-,,Remote

10.77.50.25:25,67.231.154.164:49860,<,EHLO dispatch1-us1.ppe-hosted.com,

10.77.50.25:25,67.231.154.164:49860,>,250-exch.d2ms.com Hello [67.231.154.164],

10.77.50.25:25,67.231.154.164:49860,>,250-SIZE,

10.77.50.25:25,67.231.154.164:49860,>,250-PIPELINING,

10.77.50.25:25,67.231.154.164:49860,>,250-DSN,

10.77.50.25:25,67.231.154.164:49860,>,250-ENHANCEDSTATUSCODES,

10.77.50.25:25,67.231.154.164:49860,>,250-AUTH,

10.77.50.25:25,67.231.154.164:49860,>,250-8BITMIME,

10.77.50.25:25,67.231.154.164:49860,>,250-BINARYMIME,

,10.77.50.25:25,67.231.154.164:49860,>,250-CHUNKING,

10.77.50.25:25,67.231.154.164:49860,>,250-XEXCH50,

10.77.50.25:25,67.231.154.164:49860,>,250 XSHADOW,

10.77.50.25:25,67.231.154.164:49860,<,MAIL FROM:[email protected] SIZE=248166 BODY=8BITMIME,

10.77.50.25:25,67.231.154.164:49860,*,08D51C91FA7F1C33;2017-10-27T10:57:04.662Z;1,receiving message

10.77.50.25:25,67.231.154.164:49860,<,RCPT TO:[email protected] ORCPT=rfc822;[email protected],

10.77.50.25:25,67.231.154.164:49860,<,DATA,

10.77.50.25:25,67.231.154.164:49860,>,250 2.1.0 Sender OK,

10.77.50.25:25,67.231.154.164:49860,>,250 2.1.5 Recipient OK,

10.77.50.25:25,67.231.154.164:49860,>,354 Start mail input; end with <crlf>.<crlf>,

10.77.50.25:25,67.231.154.164:49860,>,250 2.6.0 [email protected] [InternalId=293419] Queued mail for delivery,

10.77.50.25:25,67.231.154.164:49860,<,QUIT,

10.77.50.25:25,67.231.154.164:49860,>,221 2.0.0 Service closing transmission channel,

10.77.50.25:25,67.231.154.164:49860,-,,Local

(27 Oct '17, 04:43) sethdunn96

I checked the syslog of my PIX for when that message finally did go through this morning. This is what it saw.

Oct 27 06:57:06 10.76.0.1 Oct 27 2017 06:56:13: %PIX-6-302013: Built inbound TCP connection 4495227 for outside:67.231.154.164/49860 (67.231.154.164/49860) to DMZ4:Exchange/25 (216.54.104.225/25)

Oct 27 06:57:07 10.76.0.1 Oct 27 2017 06:56:14: %PIX-6-302014: Teardown TCP connection 4495227 for outside:67.231.154.164/49860 to DMZ4:Exchange/25 duration 0:00:00 bytes 248873 TCP Reset-O

Oct 27 06:57:08 10.76.0.1 Oct 27 2017 06:56:15: %PIX-6-106015: Deny TCP (no connection) from Exchange/25 to 67.231.154.164/49860 flags ACK on interface DMZ4

Oct 27 06:57:08 10.76.0.1 Oct 27 2017 06:56:15: %PIX-6-106015: Deny TCP (no connection) from Exchange/25 to 67.231.154.164/49860 flags FIN PSH ACK on interface DMZ4

(27 Oct '17, 05:23) sethdunn96

I could even imagine that some element on one path between ProofPoint and your network had a problem to handle a specific packet contents while another path hasn't, and that the only change was that dynamic routing has chosen the problem-free path this time. If the issue was with a single e-mail message, I'm afraid it is gonna remain an unsolved mystery.

(27 Oct '17, 05:58) sindy

It is beginning to look that way. But what worries me, is I am getting ready to migrate us again to Exchange 2016, and my plan was to put that on a VM as well vs. a physical machine. But if this is a problem with Hyper V and Exchange being on a VM, then my problem will follow and of course my boss and some others in the company will not be happy with this. So that is why I have been wanting to narrow this down and know what the problem is.... Then I can better decide which route to take.

(27 Oct '17, 06:01) sethdunn96

I still don't get what makes you think it was a VM/Hyper V/Exchange problem. Once again - you have seen from the synchronous captures on the switch and on the Exchange's VM that the packet sent by ProofPoint towards Exchange got lost already before it could arrive to the VMware host.

A coincidence is not necessarily a correlation - the more that the issue turned out not to be a permanent one but was related to a single message (and, most likely, to a single packet in that message). So it may have nothing to do with VMware or Exchange at all.

I've read about network cards sensitive about several-byte patterns in packets, causing them to drop such packets (or handle them in a specific way which effectively means a drop).

(27 Oct '17, 06:40) sindy

Because when I look at the PIX logs, as well as the captures. There are resets that are happening...but the traffic still appears to have went to Exchange. But I see resets on the Firewall from outside. U can see in the post I did above that there was a TCP Reset-O

And we never had this issue when Exchange 2003 was in place....the firewall, and internal network is all the same. Once I decommissioned and removed Exchange 2003, I assigned the Exchange 2003 IP address to Exchange 2010, so all would be the same....just as I will do when I migrate to Exchange 2016.

I agree with the network card being an issue. This proliant uses Broadcom NICs, which seems to be a card that has issues with Hyper V/VMs But the workarounds are to disable VMQ and some other things which I have done...so while internal traffic transfer to and from Exchange are on par with physical boxes....the translation happening at the Firewall coming in, seem to be having problems. (So yes I would concur it looks to be at the Firewall or on the outside). I just assume it isn't since it was working good before migration.

(27 Oct '17, 07:18) sethdunn96

In my understanding the tcp reset is just a consequence. And I had in mind a network card somewhere out there, not on your proliant, as the packet has already been missing before the proliant's network card could ever see it.

(27 Oct '17, 07:30) sindy

Ok I see what you're saying But one other thing I will point out, is that when I remove the ProofPoint server MX records from DNS and put in our AV Gateway servers in DNS. The email appears to get to them fine (clients are not telling us about NDR's being received) and then relayed on to Exchange fine. So that is another reason why I point at Exchange/VM setup.

(27 Oct '17, 07:33) sethdunn96

Funny that we come to completely contradictory conclusions on same input data. For me, the fact that it is better when you let the clients' mail servers talk SMTP directly with your Exchange on VM rather than via ProofPoint's filtering gateway means at first place that the network path from TCP client to TCP server is very likely to differ (as the TCP clients are clients' mailservers at different places across the 'net if you set MX to point to your Exchange), while the last section of that path, the one from your PIX through the VM host to the Exchange VM, is the same in both cases (MX pointing to ProofPoint or MX pointing to the public address of the Exchange directly).

So for me, the fact that redirecting the SMTP communication to bypass ProofPoint makes things better is an inditia that at least one of possible network paths between ProofPoint and your PIX is ill, while for you it is a proof that the part of the path which is common for both scenarios is ill.

(27 Oct '17, 07:49) sindy

Yeah I see your point and reasoning... I guess I am just coming to the conclusion I am, because when we were on Exchange 2k3, it worked. No issues. No backups/NDR at ProofPoint. Then when I make the move to Exchange 2010 and a VM...that is when things act up.

And as I said, my big concern is moving forward to Exchange 2016 and the problem is still there...vs. it may not be if I put it on a physical server.

So yes, logic would dictate that I go with how you see it. But gut feeling is going in the opposite direction...

(27 Oct '17, 09:14) sethdunn96

One other thing I will note When I watch the PIX log for incoming email transactions. When traffic coming from a ProofPoint server to our exchange server comes through, it has a very odd readout (like what I posted above) vs. email servers contacting our AV Gateway boxes and sending to them. When our AV Gateway boxes are contacted, you can see in the log on the PIX, the inbound connection build, then you will see the connection be tore down (after email has been delivered). I don't see the multiple Deny TCP (no connection) entries for them, like I do for the ProofPoint to Exchange entries.

(27 Oct '17, 09:59) sethdunn96
showing 5 of 46 show 41 more comments