At least alerting works...

I've also done some work in setting up infrastructure monitoring. There's still work to do - and do better - but at least I have one alert. And it works.

I think I might have just been thinking of doing something more leisurely, but Grafana sent me a Telegram message that there was something wrong with NATS response times. I open the link and see that there is no data, meaning that the instance is probably down. But there's also no any other data. Fuck. Everything going up in flames this soon?

But wait a sec. There is data, momentarily. Then it disappears again. I bring up logs for InfluxDB and and see an error "panic: keys must be added in sorted order". I spend quite a while trying to figure out what exactly is wrong and how to proceed, almost giving up. It seems that lot of the tooling for fixing and managing the files has been removed or made internal-only. But then I find an up-to-date guide for rebuilding the index and decide to try it.

Because my installation is dockerized, and there seems to be some issues with the rebuild command, I had to chown the data directory to "some user", and then run the repair command, and then chown the files back. And yay! It works again. For reference, the docker command I used: docker run --rm --user 1000 -v /path/to/influxdb-data/:/data influxdb:1.7.9 influx_inspect buildtsi -v -datadir /data/data -waldir /data/wal

At least least it works again. But just as I though everything was going nicely... Maybe the problem is the server itself? It served as my desktop earlier, but I moved away from it due to constant crashes with GTA V, and much more rare crashes other times. Maybe I have to invest in some proper hardware :o We'll see. Maybe it'll work again without issue for a long time. Pls :s

No comments: