Network Reliability Engineering Community

MP1.3 Antidote-stats

@mierdin, noticed that you’ve started working on MP1.1 db package so, I am going to start with MP1.3 : antidote-stats. Let me know if you have any thoughts or comments.

That sounds good to me. I probably need to elaborate on what MP1 “definition of done” means, so I’ll add that to the doc so that it’s not just in my head. In the meantime, the earlier you open a draft PR, even if it doesn’t do anything yet, the better.

Also, note that there’s a pretty hard dependence on MP1.1 in order to really implement antidote-stats. That’s actually true for a lot of mp1, that’s why I had it first, and why I got started on it so quickly, so it wasn’t holding everything up for too long.

You can feel free to continue, as there are a few other things to figure out that don’t require the db stuff to be in place yet, like the overall layout for the service, and especially rethinking the metrics we’re exporting, but when it comes time to retrieving data from the database, you’ll have to build some mock functions for the time being.

yea it has dependency on MP1.1 but I’ll mock the data for now.

@Mierdin, so, I am thinking, onTSDBExport(), we can store lessonId, lessonName, syringeTier, error, healthyTests, totalTests, lessonStage and createdTime on each liveLesson from the LiveLessonState, periodically. I think it would give us more query options and we can also compute average duration and active users from this. By the way, what were the shortcomings that you encountered?

You’re very much on the right track here. When that was first written, it was built as a quick PoC to export what I knew I wanted to look for, meaning I did a lot of the consolidation server-side. I think if we can export much more raw data and let the influxdb querier (i.e. grafana) do the heavily lifting of aggregating the data, we’ll be in a good place.

In general we want to know what lessons are in use at what times of the day, and be able to easily perform aggregate reports at the end of week/month/year.

I would also be curious if you can think of ways to extend this to other parts of Antidote. Maybe not so much getting into observability/tracing instrumentation since we already have a plan for that but I’m sure there are other interesting things we could export that I didn’t consider initially. Like I said, when I originally built the TSDB export I was really laser focused on one thing, so the whole idea could use some tlc.

Thanks for thinking about this stuff!

I see. Ok, sounds good. Creating aggregate weekly/monthly/yearly reports can be handled in a separate PR. This PR takes care of changing the data format on TSDBExport and refactoring influxdb functions into it’s own program. Would you be able to review it, please?

Sorry, I wasn’t clear. We shouldn’t do aggregation in Antidote at all, IMO. Let’s just get the data into influx in as raw a state that makes sense. We’ll let other systems that are querying influx handle how they want to represent the data.