The dawn of Tracker v2

Why: Gotta go fast! And besides, reinventing the wheel is the best way to build an understanding of it.

What: C# RPC / PubSub server. Clients in C# and Javascript. About 400k requests per second per core, with just a single client connection.

How: .NET Core. FlatBuffers. Persistent connections. Code generation.

Background

So… For the past month or so I’ve been working on an improved server for Tracker v1. Tracker v1 is a project I’ve worked on for almost two years now, though there has only been few bigger sprints, rest of the time it’s been just running nicely. Unfortunately I’ve been unable to find the time to properly write about it, so maybe this’ll have to do.

As the name might suggest, Tracker is a system for realtime tracking of connected devices. The server is contained in a single Python process(and a MySQL database). There exists some additional tools, and also the tracking client for modern Android phones. The trackable devices use short, encrypted UDP-datagrams for sending the position data to the server. This position data is then relayed to connected clients via WebSockets, and then overlaid on a map.

The data is also saved to database, so that in can later be analyzed to trips(which are just trip updates clustered using time gaps), and viewed using the web interface. The web interface also includes user management. The Android client is configured by reading a QR-code generated using the web interface. The code includes the server address, device identity and encryption key.

The system is designed so that the devices do not require any return channel to the server, so that they could, in theory, also be used over one-way radio links. Latest addition for the UDP-protocol includes an optional reply message, though. Without, it would require extra effort to verify the connectivity and the use of a recent enough encryption key and packet id. I’d like to avoid using TCP in the client, but there are some supporting functions for authenticating the web interface, and updating trip description.

Pre-post-mortem

V1 works just fine, is stable, and somewhat feature-complete. So why the need for a new version? Because I have BIG PLANS. While I haven’t bothered to benchmark the current version(but I really should), I’m quite certain, that it will not work with 100k+ or 1M+ users. V1 just doesn’t scale. The server is just a single process, running on a single core. While there are some components that could be split, the fact still remains. It just doesn’t scale.

So, learning from the past we can see that there are a lot of things we can do better. Also, this is a great opportunity to do some really exciting high-scalability stuff!

A new approach

Aaand I already forgot everything, and initially planned and prototyped a new monolithic architecture. Unfortunately the monolith would have required all the business logic to be written in C/C++ (and preferably running on a bare-metal unikernel) to reach adequate performance, and still there wouldn’t have been any guarantees on the level of performance. It would also be a single point of failure: when it would fail, it would be messy.

So, let’s go the other direction this time, for real. In v1 there already was some momentum in the direction of separate services. The geospatial queries were offloaded to a stand-alone microservice, and there were plans for moving the UDP handling and decryption to a separate process. So, I’m now proposing an all-new microservice-inspired architecture, where many of the tasks are running with only minimal inter-service dependencies. This way the load can be spread to multiple machines, and maybe, just maybe, there’ll be a way to make the system more resilient to outages in individual services.

But how about the clients. They could, in theory, communicate directly with individual services, but user authentication, service discovery and security go so much smoother if there is only one or two endpoints the clients connect. These connection points would then pass the messages to relevant internal services.

And this, dear reader, is what this post is all about.

One proxy to rule them all

This client communication endpoint should thereby be able to transmit - and possibly translate - message from clients to the internal services, and vice versa. And because it would be extremely wasteful to open new internal connection for each client, the communication should be handled using a message bus of some sort. 

Most of the messages the endpoint can just directly proxy, but for others it needs to have some intelligence of it’s own. It should be able to enrich the requests using the user identifier, and thereby also handle the user authentication, at least on some level.

To keep the proxy simple, user authentication should be the only state it contains (and maybe some subscription state, so that the proxy can subscribe only once for each topic). This allows for running multiple proxies at the same time, evenly handling the client requests. A separate load balancer is thereby not required.

(In ideal case the proxy would also handle service discovery, failure detection, and automatic failover. As part of this mechanism, the proxy could also - if it doesn’t make it too complicated - be the primary location to set feature flags. Feature flags are toggles than the system administrator can set to disable parts of the system even if no faults are present. The flags could, for example, set some internal service read-only, or disable it altogether. Rest of the features will then continue working, if they do not require access to that service. For example, the user service could be disabled for maintenance, but all existing authenticated connections would continue working. This, though, gets more complicated if there are dependencies between the services.)

And this proxy is what I’ve been working on now.

One serialization format to bind them all

I place my bets on a strongly structured binary protocol that can be read without additional copy operations. (Just like I’m now switching from the weakly typed Python to strongly typed C#. Strong typing is very useful in eliminating many accidental mistakes when typing identifier names etc.) One such strongly typed serialization protocol is FlatBuffers(made by Google). It is much like Protocol Buffers(also made by Google): the message format is defined using a schema file, and then a code generator is run, producing the strongly typed binding to manipulate the messages. The message format supports protocol evolution, meaning that new fields can be added, and messages still stay readable for older and newer clients. Not a very important aspect in a system of this size, especially when all the parts are controlled by a single party, but it’s kinda nice to have.
table RequestSum { a:int; b:int; } table ReplySum{ sum:int; }
Listing 2. Example service.

As mentioned, the cool thing with FlatBuffers is that they are extremely fast to read, only about few times slower than accessing raw structs(due to the vtable-based offset-system required to support the compatibility). No additional processing is required to access the data, and building the messages is equally straightforward, requiring no additional allocations in addition to the single builder pool allocation.

And one extra complexity layer to unite them all

Like many other serialization formats, FlatBuffers doesn’t know anything about the concept of RPC. Because of this, I made my own layer on top. A service file defines named remote methods, that take a specific message type as an argument, and return another type. The message types themselves are defined in the FlatBuffers schema file.

service SomeApi { sum: RequestSum -> ReplySum; sub: RequestSub -> void; pub: RequestPub -> void; }
Listing 2. Example service.

After a service has been defined, few code generating tools are run to generate the template code for the server, and to define an interface the clients can invoke to make calls to the service. This code generation step also creates the identifiers that tie message names to actual protocol level identifiers. These identifiers are generated for events and errors too, not just the requests and responses mentioned in the example. Those two do not require extra definitions in the service file (at least not for the moment, it might be nice to explicitly define them, too).

And it works

It was quite an effort, but preliminary testing gives very nice performance numbers. C# server, running on .NET Core RC2. TCP, minimal framing protocol for messages. Pipelining. A single Intel Core i7-4790k server core can handle about 400 000 request/response pairs per second. I’ve yet to test this using multiple clients, but I have high hopes. Those hopes might get shattered though, as only one request is being executed at once (per connection). This of course not a problem if all the operations happen in-memory, but throw even 1ms of IO latency there, and request rate drops to 1000/s…

The plan for the future is to - obviously - solve that little problem, and then to clean up code generation, and tidy up rest of the code, improve the tester, and then maybe finally get to writing some business logic.