Following
the theme set by the previous post, I have continued my pursuit for improved
infrastructure. Not that it really is anything special yet. Just more services
with almost default config. But the idea here is that these services will form
some kind of stable core for many other services to follow, and hopefully
evolve over time to become even more dependable. At this point they are to
“just be out there” and usable in order to test new ideas.
One of
these ideas is about finally upgrading how I might collect timeseries data.
Over the years I’ve had several tiny data collection projects, each
implementing the storing of the data different ways. I’ve already reinvented
the wheel so many times and it is about time to stop it. Or at least try to
do it a bit less :p Also, in the previous stage I installed MongoDB, so for
this stage I thought that it is about time to also install a relational
database, and PostgreSQL has been my absolute favourite on this front for a
while now.
Meanwhile,
after doing a tiny bit of research on storing timeseries data I found
TimescaleDB. And what a coincidence, it is a PostgreSQL extension! I think that
we’ll be BFFs! That is, after it supports PostgreSQL 12... I wanted to install
the latest version of Postgres so that I get to enjoy whatever new features it
has. But mostly because I’d be then avoiding a version upgrade from 11 to 12,
had I chosen to install the older version. Not that it would have probably been
a big problem. Anyway, the data can be easily stored in a format TimescaleDB
would expect, and it shouldn’t balloon to sizes that absolutely require the
acceleration structures provided by TimescaleDB before I it is upgraded for 12.
Rather, this smaller dataset should be perfectly usable with just a
plain PostgreSQL server. Upgrading should be just a matter of installing the
extension and running a few commands. Avoiding one upgrade to perform another,
oh well…
In the far past
I’ve used both custom binary files and text files containing lines JSON to
store time series data like hardware or room temperatures. More recently I’ve
used SQLite databases to keep track of stored energy and items on a modded
Minecraft server (Draconic Evolution Energy Core + AE2, OpenComputers lua
script, Dockerized TCP host (there was not enough RAM in the OC computer to serialize a full JSON
in-memory)). I should try to add some pictures if I happen to find them…
For
visualizing the data in the past I’ve used either generated Excel sheets or generated
JavaScript files with whatever visualization library I found. Not very nice.
* * *
But let’s
get to the point, as there was a reason I wanted to improve data collection this
time. I finally got around to checking out the API of Destiny 2 in more depth,
and built a proof-of-concept of an account tracker.
For those
poor souls that don’t know what Destiny 2 is, it is a relatively multifaceted MMOFPS,
and I’ve been playing it since I got it from a Humble Monthly (quite a lot :3).
As with any MMO, there is a lot to do with almost endless amount of both short-
and long-term goals. It made sense to build a tracker of some sorts so that I
could feel even more pride and accomplishment for completing them. And in case
of some specific goals, in order to maybe even see what strategy works the
best, and what doesn’t. The API provides near-realtime statistics on great many
things, and it would be nice to also be able to visualize everything at real
time.
To
accomplish this I needed multiple things: authenticate, get data, store data
and lastly visualize the data.
Authentication in the API is via OAuth, so I
needed to register my application on Bungie’s API console and set up a
redirection URL for my app. After this I could generate a login-link pointing
to the authorize endpoint of Bungie’s API. This redirects back to my
application, containing a code in the query string. This code can then be posted
as form url encoded to Bungie’s token endpoint. This endpoint requires using
basic authentication with the app’s client id and secret. After all this the
reply contains an access token (valid for one hour) and an url to call in order
to refresh the token (valid for a few months, but reset on larger patches). The
access token can then be used to call the API for that specific account. This
would probably be a great opportunity to opensource some of the code…
Speaking
of which, there already exists some open-source libraries for using the API! I
didn’t look into them yet, as I was most unsure about how the authentication
would work. I guess I should now take a look.
The process
of figuring out how the authentication works contained quite a bit of stumbling
in the dark. The documentation wasn’t that clear at all steps, although at least
it did exist. On the other hand I’d never really used OAuth before, so there
was quite a bit of learning to do.
This also presented
one nice opportunity to put all this infrastructure I’m building to good use! As
part of the OAuth flow there is the concept of application’s redirection URL,
but in case of a script there really isn’t any kind of permanent address for it.
So what do? I didn’t yet implement it, but I think that a nice solution for
this would be to create a single serverless endpoint for passing the code
forward. While I haven’t yet talked about it, I’m planning on using NATS (a
pub-sub broker, optional durability) for routing and balancing many kinds of
internal traffic. In this case an app could listen to a topic like /reply/well-known/oauth-randomstatehere.
When the remote OAuth implementation redirects back to the serverless endpoint,
it publishes the code to that topic, and the app received it. All this without
the app needing to have a dedicated endpoint! It seems that someone really
thought things through when designing OAuth. And as a bonus that code is
short-lived, and must only be used once, so it can be safely logged as part of
traffic analysis.
Reading
game data is just a
matter of sending some API requests with the access token from earlier, and
then parsing the results. At the moment I am only utilizing a fraction of what
the API has to offer, so I can’t really tell much. So at the moment this means
the profile components API with components 104,202 and 900. This returns status
of account-wide quests and “combat record” counters, which can be used to track
weapon catalyst progression. I’m reducing this data to key-value pairs. Each
objective has a int64 key called “objectiveHash”, and an another int64 as the
value. The same goes for the combat record data. At the moment I'm using a LinqPad script that I start when I start playing, but in the future I'd like to move this to be a microservice. This service could ideally poll some API endpoint to see if I'm online in the game, and only then call the more expensive API methods. Not that it would probably be a problem, but I'd like to be nice.
Data is
saved to the
PostgreSQL database. I wrote a small shared library abstracting the metrics
database queries (and another for general database stuff), so now writing the
values is very simple. This shared library could be used for writing other
data, too. Like the temperatures and energy amounts I mentioned above. I should
probably add better error handling, so that lost connection could be
automatically retried without interaction from the code using the library. But
anyway, here is how it is used:
var worker
= new PsqlWorker(dbConfig); // Lib1
var client = new MetricsGenericClient(worker); //Lib2
var last_progress = /*client.get*/;
// ...
var id = await client.GetOrCreateMetricCachedAsync("destiny2.test." +
objectiveHash); // Result is cached in-memory after first call
if(progress != last_progress) // Compress data by dropping not-changed values
await client.SaveMetricAsync(id, progress);
last_progress = progress;
Visualizing
the data was next. I have been jealously eying Grafana dashboards for a long time,
but never had the time to set something up. There was one instance a few years
ago with Tracker3 where I stumbled around a bit with Netdata and Prometheus, but
that didn’t really stay. Now I made some quick research on Grafana, and
everything became clear.
Grafana is
just a tool to visualize data stored elsewhere. It supports multiple implementations
for that, and they each have slightly different use cases. I’m still not
exactly sure what kind of aggregation optimizations are possible when viewing
larger datasets at once, but I kinda just accepted that it doesn’t matter,
especially when most of the time I’d be viewing the most recent data. What I
also had to accept was that Grafana doesn’t automagically create the pretty
dashboards for me and that I’d have to see some effort there. But not too much.
Adding a graph is just a matter of writing a relatively simple SQL-query and
slapping in the time macro to the SELECT-clause. And then the graph just
appears. For visualizing the number of total kills with a weapon this was as
complicated as it would get. For counters displaying the current value it likewas
was just a matter of writing the SQL query with ORDER BY time DESC LIMIT 1.
And while I
was at it, I also added a metric for the duration of the API calls. I also remembered that Grafana supports annotations, which could also be saved to Postgres. And the
dashboard started to really look like something! Here there's one graph for "favourite" things and then one which just visualizes everything that is changing.
And why
stop there? I also installed Telegraf for collecting system metrics such as CPU
or RAM utilization or ping times. I went with the simplest approach of
installing InfluxDB for this data, as there were some ready-made dashboards for
this combination. More services, more numbers, more believable stack :S
* * *
That’s it.
No fancy conclusions. See you next time. I’ve been using this system for only a
week or two now. Maybe in the future I have some kind of deeper analysis to
give. Maybe. And maybe I get to refine the account tracker a bit more, so that I
could consider maybe (again, maybe) opensourcing it.
PS. These posts are probably not very helpful if you are trying to set up something like this yourself. Well, there's a reason. These are blog posts, not tutorials. I don't want to claim to know so much that I'd dare to create a tutorial. Although... some tutorials are
very bad, I'm sure I could do better than those.