A curated list of Site Reliability and Production Engineering resources.
-
Updated
Jan 10, 2023
A curated list of Site Reliability and Production Engineering resources.
Hands on labs and code to help you learn, measure, and build using architectural best practices.
Chaos Engineering Toolkit & Orchestration for Developers
A checklist of anyone practicing Site Reliability Engineering
A curated list of Site Reliability and Production Engineering Tools
This repository provides a design methodology and approach to building highly-reliable applications on Microsoft Azure for mission-critical workloads.
Serverless chaos monkey for AWS (runs on AWS Lambda)
Reliability engineering toolkit for Python - https://reliability.readthedocs.io/en/latest/
Probabilistic Risk Analysis Tool (fault tree analysis, event tree analysis, etc.)
OpenShift Guide. Learn about the Red Hat OpenShift Container Platform, Data Science, Code Ready Containers, Podman, Buildah, and Kubernetes.
A curated list of awesome Site Reliability and Production Engineering resources.
GOV.UK PaaS - Cloud Foundry
The Chaos Toolkit core library
The k6 documentation website.
A collection of SRE tools
A terraform provider for Concourse
An opinionated list of attributes and policies that need to be met in order to establish a stable software system.
GSP is a container platform and curated suite of components helping government deploy, run, observe and secure their services
A collection templates ported from the SRE Workbook
Add a description, image, and links to the reliability-engineering topic page so that developers can more easily learn about it.
To associate your repository with the reliability-engineering topic, visit your repo's landing page and select "manage topics."