Network Reliability Engineering Community

Cleanup Tasks Post-v0.4.0

I wanted to give a quick update about the state of things now that the Antidote v0.4.0 release and the NRE Labs v1.0.0 release is out the door.

Document New Release Processes and Curriculum Versioning

The release process for both the curriculum and the platform has changed a bit. If you’ve paid attention to the code and issues in antidote-ops you probably saw a good amount of what we did, but I know this needs to be transitioned to formal documentation. So that’s a priority for me.

PTR Doesn’t Currently Exist

PTR (our test site) doesn’t currently exist. The main reason for this is that the old PTR, located within GCP, was separated from prod by kubernetes namespace - it ran on the same cluster. So, now that we’re off GCP for the main cluster, we need to build a new PTR.

Running on the same cluster had some big drawbacks - namely that when we wanted to make changes to the nginx ingress controller, we couldn’t do this without affecting both prod and ptr. So instead, we want to create totally separate infrastructure for PTR. This will take some additional time to do, but it’s a priority.

I also want to do better at documenting exactly what PTR is, what it’s used for, and how things get there. In particular I want to correct some bad behavior on my part, which was to use PTR as a way of testing changes to the platform, or review someone’s contribution. That’s not why it was created, and by doing that I made it an unreliable way of understanding the “current state” of things, because who knows what I had PTR pointing to.

I’m going to put together a plan for implementing a new PTR on our new BMaaS host, and once I have my ducks in a row, I’ll likely do an ad-hoc livestream so you can get more visibility into how we’re running things.

Nightly Builds for EVERYTHING

Now that production is protected from image changes, because all images for both the platform and the curriculum are using static versioned tags, we can actually do nightly builds (GASP). Once PTR is up, I’m hoping to augment our stackstorm rules to not only automatically build the platform and the curriculum nightly, but auto-deploy it to PTR. We almost got there in GCP but not quite.

Coupled with the changes and clarifications I mentioned above re: PTR, this should make PTR a much better representation of the current state of the latest version of everything.

Just like the last section, I’m going to do an ad-hoc livestream for this too.

Better “Next Release” Planning and Communication

Once the above technical issues are finished, there’s a very important task that needs to be done, and that is much better communication for future work. My goal is that anyone, at a glance, can see, and provide feedback on the short-term and long-term roadmaps for any part of this project.

Before the next release, I’ll be putting in a lot more effort towards this goal.

1 Like

Alright - I’ve managed to spend some time on the items I listed above, and I’d like to share updates on each, as I feel like each of them is in a much better place now.

Document New Release Processes and Curriculum Versioning

The full process for releasing both the curriculum and the Antidote platform can be found here: https://antidoteproject.readthedocs.io/en/latest/releases/index.html

PTR Doesn’t Currently Exist

We were dealing with a bit of analysis paralysis, trying to figure out if it was worth spinning up a separate cluster for PTR. While we may still do this in the future for various reasons, for now it just makes sense to spin it up the way it was before in GCP, which is in it’s own namespace in the same cluster.

Please see the curriculum release doc section on the PTR for more information on how the PTR will be run going forward. It will no longer be used for anything for testing the latest curriculum changes. Platform testing will be done elsewhere.

Nightly Builds for EVERYTHING

As part of the initiative to stand PTR back up, I also created a workflow for deploying the latest curriculum to PTR every night at midnight (pacific). This not only redeploys the lesson and collection resources in master, but also builds, from scratch, each image in images, and pushes them to docker hub automatically. This was the last step to verify that everything is truly represented properly in PTR.

Better “Next Release” Planning and Communication

As part of the release process, you can expect to see community forum posts for every stage of the release process, as well as time allocated in the weekly standup to keep things moving forward. If you want to know “the latest” at any point, please see this page.

1 Like