Network Reliability Engineering Community

Highlighting Some Opportunities for Improvement

As technical lead on the project I feel it’s my responsibility to call out where I think we have gaps. NRE Labs is a reasonably large and diverse project, and we’re all busy, so don’t take this post as assignment of blame, or an indication that we have huge systemic problems, etc, etc. It’s probably my fault for not having proactively called attention to these things earlier. On the bright side, I actually think what’s needed isn’t a huge amount of work, but rather a focused attention to a few key priorities. My goal here is simply to highlight things that could use some work, and in doing so, provide opportunities for others to step up to leadership roles and make the project better for it.

My concerns boil down to three main things:

  • Selfmedicate is a bit of a mess right now
  • NRE Labs Curriculum Development has some unintended barriers
  • Github Issues and Pull Requests are bottlenecked

Selfmedicate

I have been traveling for a few weeks, so my ability to chime into discussions has been limited. However, I have been reading the myriad of issues, pull requests, and forum posts on the subject of selfmedicate. To be clear, I am encouraged by the amount of activity, and what I’m about to say is in no way meant to discourage that.

However, what’s become clear to me is that selfmedicate is in rough shape right now. In its current state, it may work for a few of us, but I keep hearing stories about users that just had way too many problems getting started. I’m not going to drill into details here, but I’m sure it’s a combination of OS support, hypervisor incompatibilities, the OS used within the selfmedicate VM, etc etc. Bottom line is that our one tool that we point everyone to (selfmedicate) isn’t doing its job. If selfmedicate can’t be viewed as a stable, reliable way to get Antidote/NRE Labs started on a laptop, no one will build lessons, plain and simple.

My strong suggestion is that someone (not me) needs to step up and own the entire experience. They need to treat stability as the number one consideration. They need to take the myriad of issues and PRs open in that repository, and coordinate them to a common plan so that we don’t just have a bunch of loose ideas, but rather a cohesive plan for how selfmedicate will work in the near-term. Our recently ratified governance model provides positions for “committers”, and selfmedicate is it’s own project that requires a leadership position like this to constantly assure a postive experience for as many users as possible.

In my opinion, this person would use the following as guidelines:

  • Stability is number one priority, speed is number two - Again, we have to ensure that folks can at the very least work on content. That’s gotta be rule number one. Second to that rule is probably making sure the experience is “good”, which probably roughly translates to a few things, one of which is speed. I love Brian’s post here on using KVM - if that works for a large portion of our user base and the advantages are clear, I’m all for it. The key here is that someone needs to lead this effort - this can’t be a bunch of loose ideas, it has to be a cohesive plan that addresses all the angles for selfmedicate.

  • Being opinionated enough about tools, but no more - this post is a good example of a discussion where using a certain tool might have certain advantages over another. Again, this is totally fine, and it’s important to stay on top of the latest developments for all of the tools we use, and re-evaluate new ones. But if we choose to switch, it should be because we took the time to evaluate things and decided that it’s the best choice, and then we need to stick with that. We can’t keep ripping the carpet out every month and expect our overarching stability goals to be reached.

  • Hold regular meetings with users to ensure their experience is solid - The selfmedicate maintainer should, at least monthly, be crucially aware if users are having problems with selfmedicate. If there are issues, they’re on the hook for troubleshooting and opening PRs and issues in other Antidote repositories. This can be accomplished through a combination of asking key users how their experience has been lately, as well as working with selfmedicate directly to ensure things are working as expected.

In short, we need to think about what promotes the best experience for the widest audience, and be opinionated enough about what tools will get us there, and stick to it.

@cloudtoad and I will be combing through the existing PRs and Issues in selfmedicate and providing input where I can in order to get back to normal, but with the Antidote v1.0 rebuild coming up soon, I will not be able to pay much attention to this long-term, so there’s an opportunity for someone who can take this on to step up here.

NRE Labs Curriculum

I realize not everyone in the project is interested in the NRE Labs curriculum - if this describes you,
feel free to skip this section.

The pace of curriculum contributions has slowed to a near-halt. It doesn’t seem to me that there’s a lack of interest, as there are a myriad of issues in the curriculum repo that are begging for content, or even volunteering to create it. So I don’t think this slowdown is due to lack of interest, but rather that there are barriers, and we need to address them. I’ll speculate on a few here:

  • No one has stepped up to take charge of the curriculum release process documented here. I wrote that document, not for me to re-read to myself, but for others to take the helm and lead the curriculum through a release process. Now - I’ll take the blame for the actual stall here, as I assigned myself as release manager for v1.0.1 but the fact that after a few months, no one said anything to call me out on this tells me we haven’t yet rallied around this process. Even minor releases here, if done with regularity, are very powerful for building momentum and getting others on board.

  • Selfmedicate instability is likely to be a contributing factor as well. I covered this in the previous section so I won’t belabor the point again, but I have no doubt that there have been many folks that start eagerly building lesson content, but once they try to get into selfmedicate, their experience sours.

  • The NRE Labs docs are exhaustive, but undoubtedly confusing, as they’re intermingled with Antidote docs, and not structured in a way that can be easily followed. I suggest that we split off NRE Labs docs into its own repo, with its own site (maybe even using Gitbook instead of readthedocs), and structure it so that it can be read top-to-bottom to get a sense of how curriculum content can be built. This is based from feedback we’ve received recently from someone we’ve reached out to build content.

  • We have a few lessons and images that are effectively half-finished (JET and OpenConfig lessons as examples). We need someone to step up to the role of NRE Labs curriculum committer/maintainer, and they should be vigilant about content that is contributed to the curriculum but isn’t yet ready for primetime - it should be deprecated, or preferably improved to the point where it can be pushed to prod. Sitting in purgatory is a discouraging sight to everyone.

In the short term, I will take on the task of cleaning up existing issues and Pull requests in Github. Part of this will be closing old or no-longer-relevant issues. If there are issues that have been open for a long time but have no activity, we’ll likely close them but move their summary to an external location for archival purposes - we don’t want to lose track of lesson ideas but we also don’t want to use Github issues for storing every idea we’ve ever had - it should be used to coordinate short-to-medium term work on the curriculum, or bug reports. After the clean-up, I’ll post these somewhere and let everyone know where they can find them. I’ll also add a section to the docs that specify the new intention for GH issues in the curriculum, and where they can find the archive of old ideas.

Long-term however, NRE Labs needs its own project lead (ideally not me) that is willing to stay on top of contributions, and also willing to take the lead on the curriculum release process. This is a role that was established in the governance doc, so we just need someone to step up for this.

Github Issues and Pull Requests are bottlenecked

There are a lot of Github issues and pull requests that are open or unmerged. Undoubtedly, a big reason for this is that Derick and I have both had a busy schedule lately, but there’s a bigger reason for this that I want to bring up.

We want to provide space for someone to step up to the plate and participate in the project in a leadership capacity. In almost every open source project, you don’t get to be a committer by some back-channel process, or by being part of a big company that participates. You get there by building confidence with the rest of the community that you understand and respect the values and goals of the project.

To that end, I want to be clear about something. You need no permission from Derick or I to perform Pull Request reviews or respond to Github issues. If you have the answer, or if you have opinions to share, or if you want to ask questions about a particular implementation, you are hereby empowered to do so. The only difference between a contributor and a committer is the ability to merge code into master - that’s it. If you demonstrate that you know what you’re doing and care about the values and goals of the project by participating in this way, Derick and I will be more than happy to share that responsibility with you.

Thank You!

Everything I’ve said above is written with the intention of continuing this project’s mission of being the community resource for skill development, and my hope is that these guidelines make the project more welcoming and easier to get involved with.

I want to express my heartfelt gratitude to everyone that’s been involved with the project thus far. No matter your contribution, it means a lot to me. The fact that what started as a side project is now holding weekly standups, has a ratified governance model, and has booths at conferences, is still a bit surreal, and I’m incredibly grateful to all that have decided to invest their time in this project.

– M

@mierdin Thank you for taking the time to put this together.

+1.

Thanks alot, too.

I have been giving the Selfmedicate issue some thinking. Given the issues that @obergix pointed out with version support on kubernetes, we could be providing a box image with supported version preinstalled instead of installing the latest build. This would ensure that the box is fully tested before release. This does lengthen the release process though, and would require someone to maintain the box.

I would like to commit to supporting Selfmedicate. I am going to start immediate work on looking at the issues that @obergix brought up in the issues. I am hoping to be able to run a copy of libvirt and VirtualBox.

2 Likes

Thanks @smk4664, this is excellent to hear. Will be adding you as a reviewer for my PRs there, and I’m encouraging others tondo the same.