Show HN: Hands-On Cloud Troubleshooting Labs – AWS, GCP, Azure, and Kubernetes

9 points by lumaks 1 year ago | 7 comments
Hello HN,

I’m excited to share something we’ve been working on: Cloud Troubleshooting Labs. This tool provides on-demand environments for engineers to practice hands-on troubleshooting with AWS, GCP, Azure, and Kubernetes.

What it does: We use Pulumi to spin up these environments. Each scenario includes a diagram and an explanation of the available IT infrastructure. Engineers are tasked with fixing configuration issues to make the application work. Depending on the test type, the environment could range from a simple terminal to a full cloud environment or a web-based VSCode. Many of our scenarios generate a random set of problems each time the test is started. This was initially designed for companies to prevent cheating. Now released for engineers, you can redo the same test but face different problems each time.

Why we built this: Initially, I created these labs to assess the hands-on skills of engineers we were hiring for senior cloud engineer/SRE roles. Selling to companies was tough; the best conversations we had were with engineers. So, we decided to release personal plans where engineers can practice troubleshooting in realistic environments.

Technical details: Environments are provisioned using Pulumi. We use a microservice architecture, which developed historically because the first skills assessment was designed for Kubernetes. Inspired by Katacoda for the user experience. Our last load test spun up 100 environments in parallel, taking around 10 minutes. For this launch, we’ve increased the number of replicas for core services to handle potential spikes in users. We generally expect some things to break if there is enough interest, but that will be a good sign for us. If you find any security bugs, please report them to info@brokee.io.

Relatable experience: The main pain point I wanted to solve was working with engineers who, even after months of training, struggled with new problems. I wanted to see if people could jump into a broken environment and figure out what’s wrong. A fun story: one student couldn’t sign up as he didn’t get an invite to an early version of the website, so he hacked us a bit. I’ll leave the details out for now, but I’d want to hire him for his persistence!

Future plans: In the near future, we plan to add an option to select problem areas you’d like to practice, such as Linux networking or working with the filesystem. This will generate problems related to those areas, providing a more guided experience. However, we don’t want to be like other platforms where you are told what to do and just repeat the steps.

Engineers can sign up here and try 5 tests for free (except full cloud environments, which aren't available on the free tier): https://brokee.io/devopsskillstests. There is a similar offer for companies on the main page if someone is interested in trying the product with a team.

We’re eager for feedback, so please give it a try and share your thoughts!

  • diamonce 1 year ago
    I love the interview gamification aspect of this. So much fun actually. Great product!
    • matelang 1 year ago
      By far the most enjoyable interview experience I had. Hands-on and efficient. Kudos to Maksym and the brokee team!
      • lumaks 1 year ago
        Thank you, Mate, I had to look up my Slack history with one of our clients to find your test, I think you showed the best result we've seen on our AWS test :)
      • borgzl 1 year ago
        Cool. Do you allow some kind of certification too?
        • lumaks 1 year ago
          We are thinking to add shareable badges, so you can share your progress on social platforms like LinkedIn. Is that what you were thinking about?

          If you mean certifications like CKA or cloud certifications, we didn't want to tie our tests to those as learning some technology and managing it in production are kind of two different things, but if the main demand from engineering community will be mostly to get certifications vs getting jobs or being ready to deal with production incidents, we may reconsider.

          • egraes 1 year ago
            Yeah, badges would be cool.
            • lumaks 1 year ago
              Glad to hear you like the idea, it shouldn't be too much work to implement.