Can't log in?

Hi,

I have the same problem since last update (for week and a half already), and it became so annoying that I bothered to do a troubleshooting. Maybe my explanation would sound nonsense to you, but I would strongly suggest you show it to your network engineers as the problem is your software/hardware that build tcp connection does not react on certain conditions as it should by the standard. And there is no need to tell me to reset my network card settings as I did solve it for myself with workarounds here and there.
It all goes down to this packet trace of communication between my network provider and your login and realm server:

20:41:18.292602 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 64)
    10.0.0.226.53177 > 37.244.54.40.1119: Flags [S], cksum 0x3718 (correct), seq 4246353231, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 785408915 ecr 0,sackOK,eol], length 0
20:41:18.336546 IP (tos 0x0, ttl 121, id 0, offset 0, flags [DF], proto TCP (6), length 48)
    37.244.54.40.1119 > 10.0.0.226.53177: Flags [S.], cksum 0xa94b (correct), seq 374560496, ack 4246353232, win 65535, options [mss 1460,sackOK,eol], length 0
20:41:18.391706 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 40)
...
20:41:18.444969 IP (tos 0x0, ttl 55, id 1028, offset 0, flags [DF], proto TCP (6), length 3737)
    37.244.54.40.1119 > 10.0.0.226.53177: Flags [P.], cksum 0x7589 (incorrect -> 0xd951), seq 1:3698, ack 152, win 30016, length 3697
20:41:18.444996 IP (tos 0xc0, ttl 64, id 16769, offset 0, flags [none], proto ICMP (1), length 576)
    10.0.0.226 > 37.244.54.40: ICMP 10.0.0.226 unreachable - need to frag (mtu 1476), length 556
	IP (tos 0x0, ttl 55, id 1028, offset 0, flags [DF], proto TCP (6), length 3737)

I have to emulate my own network provider by faking it as a vpn tunnel to some other place, so I can have same circumstances (shortened MTU) and be able to to catch what is happening on the provider side sniffing traffic fired out of remote vpn side to the internet.
What you can see from it and what is crucial for understanding the problem is that start of the session requests MSS to be 1460 bytes, which is already kind of wrong on my network side, but not fatally, this happens due to my local computer be connected by ethernet cable and having mtu 1500 by default, and my router connected to my network provider by ppp connection and having mtu 1492, 8 bytes less, so my provider in reality cannot send me packets with data more than 1452 bytes in them (mss should be 1452). These circumstances are not a problem as my provider can actually fragment packets sent to my and chop them to smaller ones and my router can reassemble them back to big normal fat 1460 bytes of data as ethernet can transmit. But then the second issue kicks in, which is not really an issue by itself as well. As you can see on the packet number three from your host to me it has “flag [DF]”, which means “don’t fragment”. This is a normal type of packet especially for establishing encrypted connections which connection to blizzard is. And the packet is bigger than 1460 bytes of data (the size on the dump is even bigger, but this is due to tso, this is not really important) This is the problem start, as my network provider see that it has the packet bigger than it can send on the line as connection between it and my router is ppp which can hold no more than 1452 bytes of data and it has explicit command to not fragment this packet. But then there is solution to that for the internet as well, the provider send back to originator control packet “need to frag” asking the server to chop the packet smaller, as it want the packet to not be fragmented. And it never happens. Server keep resending the packet until giving up as it receive no answer to it. And this is exactly what broke recently. Your edge router software, or whatever you use to terminate tls does not react on the ICMP control packets at all. Either they don’t reach it, or it doesn’t bother.
There are few things to back my theory:

  • Why this is not a catastrophe and only some people are affected? With modern internet not a lot of people have this kind of network layout they use. But still it is pretty common. All you need to have is ethernet network inside your house, connection to provider that require additional header, some kind of ppp or other tunnelling and a router that doesn’t bother to squeeze MTU on the local net.
  • Why it sometimes works and sometimes doesn’t? There is either one server in your network that honour those control packets and if you lucky enough to hit it and it works, or sometimes tls session is being built the way it fits to smaller packets as only 8 excessive bytes matter.
  • Why people write that IT mumbo-jumbo like resetting their network card helped? Either reset dropped custom MTU settings they had for no reason set and they picked up proper settings from their router, or again they hit the host that honour ICMP need fragment.
  • Why it worked flawlessly before and broke only recently? I don’t know, you tell me. I can make couple of theories right off the cuff - either edge router software was upgraded or changed last week Wednesday and now it either set DF, or ignore need fragment, or the way TLS is built was changed and now it is a little bit more fat than it was, and the edge router was always like this.

Anyway, I don’t think anyone is going to react on that, I solved the problem for myself setting proper MTU on my network as A and have remote tunnel at hand where I can control MSS on leaving packets as B, but I honestly advice you blue guys show it to your network engineers.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.