Backstory

 

We recently updated our multiplayer, HTML5 game Jetstream Riders with a few new features. Namely, bots and sounds. The game is written using the ImpactJS library and uses SmartFox2 for the multiplayer server – or at least it did. While updating the game to support bots there was discussion of replacing SmartFox with a NodeJS equivalent to improve replication and stability of the multiplayer server as we moved our deployment infrastructure into Docker containers.

The actual multiplayer logic was relatively simple to reimplement in Node… On socket connection, we place users into a lobby where they are informed every 30 seconds of what the current level is. Once users hit the Play button, they leave the lobby and join a game room, once 10 players or 8 seconds of idling (whichever comes first) have passed, the race begins and standard multiplayer IO occurs wherein we receive client data at arbitrary rates and broadcast all player data at fixed 100ms intervals.

Before release it was calculated that this new single-threaded multiplayer server would need 2-3 servers to cope with peak loads, so three VMs were brought online to run the new app. To get users from the same location to play together, load-balancing was done by hashing the client IP to target a particular VM. This was great for high load, people could see their friends alongside them, but during quiet times the load-balancing worked against the “fun” element; rooms seemed to be ghost towns with few or no other humans around. This was a damp towel on what had been an exciting game update, something had to be done.

NodeJS clustering

 

The obvious solution was to swap in a different adaptor for Socket.IO – however, we had no working knowledge of Reddis nor infrastructure in place to create Docker images for it. A more complex, but time-critical solution was needed. We did have Zookeeper available to us, but after a couple of hours trying to plug in the Socket.IO Kafka library we had to abandon it.

Step in: Node’s cluster module which while wouldn’t solve load-balancing between multiple machines, it would allow us to split work across multiple threads on a single machine and as mentioned, was going to be a short-term solution until a centralised data store could be implement. Here’s what we ended up creating:

So, what’s occurring…When the process is first started, a child worker is spawned, it’s the children that create Socket.IO instances and listen to the Websocket connection. While the worker is under capacity things occur as per the single-thread app. It’s only when we hit the worker’s concurrent user value that things change. Firstly, the worker informs the master thread it has reached its cap, the master then spawns a new child worker and once that emits its “online” event, the previous worker is disconnected from the cluster. The old thread will continue to operate but will no longer be picked by the OS when scheduling which thread to use for execution. This ensures that the Websocket port is only exposed to a single thread during the handshake process.

We ran into some issues initially when Socket.IO was using XHR polling as well as Websockets. The worker would receive Socket.IO’s “connection” event on an XHR GET request, then after a second or two an upgrade would occur but after a new thread had been spawned and so it would sporadically fail. We had to turn off the polling transport:

var socket = io(protocol + '://' + host + ':' + port, {
  transports: ['websocket']
});

Testing

 

We ran this through some load-tests and pumped several hundred concurrent users through the server with only a couple of failed requests. There is a small window between a new worker being spawned and an existing worker being disconnected where a Websocket could be initialised and fail for the above-mentioned handshake scenario, but Socket.IO is robust enough to reattempt connections on failure.

All told, this was around two day’s work and a good stand-in for short to medium term usage. Our plan going forward is to provision a Reddis container and then make a decision between running multiple single-thread multiplayers servers or continue to use the multi-threaded app.

 

Words by Daniel Jackson (previous developer at Mangahigh)