I’m very disappointed of WoW servers and really surprised how Blizzard in 2020 still struggle with a poor performance issue. Having this major problem never addressed for a decade should raise question about your technical team who manage the servers.
This server issue been going for a long time, how the server performs poorly and management just watching.
The game can’t be played many times, massive lags, disconnection, down time, multiple restarts, maintenance…etc. Do you know this hurt the business and customers lose the trust on Blizzard ?
What surprise me the most is the Horrific SLA and lack of redundancy in your servers. I been working hosting website servers for fortunate companies around the globe (Sony, Microsoft, Nintendo, DHL, Deloitte, Phillips, Dell… over thousands of companies) and when a server goes down, we bring it back up immediately, also most of the time customers never feel the downtime or any restart since there are always 2 servers minimum in each region backing each other. In times when high traffic is expected due a season or a launch, servers are upsized and ready in advance to manage the load.
But with Blizzard servers nothing of this is present, they launch the game knowing every time there was an expansion there will be high traffic, what’s been done about it? nothing. Hasn’t your tech team heard of AWS ? or Azure? that you can upsize your instance on demand / create new servers on demand for a specific period of time to handle the load ? dedicated S3 buckets or Blobe Store to handle Assets. Apache dispatchers behind each servers to handle cache synching together on top of CDNs. All this can be automated and takes few minutes to upsize vertically or horizontally an instance
I’m really surprised that Blizzard is 20 years behind with their IT infrastructure and every time after a long day at work when i want to have some good time playing the game, i am either unable to logging , keep DC or character is unable to perform any action due to server’s CPU and memory spiking, caused by coding issues, memory leak (lack of testing on stage before going to prod) and poor infrastructure.
If you are using a 3rd party to manage your servers then time to change this, invest on a 3rd party who have a better SLA, solid experienced team or create and automate your own servers.
Review your infrastructure, minimize the cache requests on servers (example a healthy webserver Publisher will perform only 10% of cache while the other 90% is handled by the dispatcher server attached to it, then CDNs support the dispatchers)