-
Updated
Jun 26, 2021
#
site-reliability-engineering
Here are 57 public repositories matching this topic...
A curated list of Site Reliability and Production Engineering resources.
devops
availability
list
awesome
monitoring
reliability-engineering
incident-response
site-reliability-engineering
production
post-mortem
capacity-planning
service-level-agreement
scalability
reliability
alerting
on-call
awesome-list
sre
postmortem
site-reliability
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
devops
monitoring
best-practices
incident-response
site-reliability-engineering
post-mortem
reliability
alerting
on-call
dev-ops
sre
observability
incident-management
chaos-engineering
sre-team
sre-teams
sre-culture
sre-classroom
-
Updated
Jun 29, 2021 - JavaScript
A curated list of Chaos Engineering resources.
awesome
site-reliability-engineering
chaos
netflix-chaos-monkey
chaos-monkey
awesome-list
resilience
chaos-testing
chaos-engineering
simian-army
chaos-community
-
Updated
Jun 22, 2021
STRRL
commented
Jun 30, 2021
// Node defines a single step of a workflow.
type Node struct {
Name string `json:"name"`
Type NodeType `json:"type"`
State NodeState `json:"state"`
Serial *NodeSerial `json:"serial,omitempty"`
Parallel *NodeParallel `json:"parallel,omitempty"`
Template string `json:"template"`
UID
rajdas98
commented
Apr 7, 2021
Currently, we have a lot of env vars in the litmusportal, so we need to use envconfig pkg to validate and process the env vars.
PKG- https://github.com/kelseyhightower/envconfig
Need to add in:
- graphql-server
- event-tracker
- subscriber
- authentication-server
dastergon
opened
Jun 11, 2018
meenal06
commented
Apr 8, 2021
What to Read to Learn More About DevOps
devops
cloud
monitoring
continuous-integration
continuous-delivery
stress
site-reliability-engineering
continuous-deployment
culture
systems
release
leader
lean
cloud-native
sre
failure
blame
systems-engineering
systems-administration
devops-journey
-
Updated
Mar 26, 2021
Knowledge seeks no man
linux
docker
kubernetes
aws
devops
cloud
azure
containers
site-reliability-engineering
gcp
gke
infrastructure-as-code
sre
amazon-web-services
python-tutorial
information-security
devsecops
-
Updated
Jun 2, 2021
A curated list of Site Reliability and Production Engineering Tools
devops
availability
list
awesome
monitoring
reliability-engineering
site-reliability-engineering
production
post-mortem
service-level-agreement
reliability
awesome-list
sre
devops-tools
service-level-objective
incident-management
postmortem
monitoring-tools
service-level-monitoring
incident-responce
-
Updated
May 31, 2021
This repository includes resources which are more than sufficient to prepare for google interview if you are applying for a software engineer position or a site reliability engineer position
google
database
algorithms
site-reliability-engineering
interview
competitive-programming
operating-system
software-engineering
interview-questions
software-architecture
interview-preparation
system-design
google-interview
software-design
algorithms-and-data-structures
sre-interview
site-reliability-engineer
-
Updated
Jun 16, 2020
Curated list of good SRE interview questions.
-
Updated
May 6, 2020
Google Site Reliability Engineering book converted in audio
-
Updated
Mar 22, 2017
A party card game for engineers caring about reliability. Based on Cards Against Humanity.
devops
distributed-systems
incident-response
site-reliability-engineering
cards-against-humanity
incident-management
chaos-engineering
site-reliability
-
Updated
Dec 10, 2018 - TeX
A curated list of awesome Site Reliability and Production Engineering resources.
devops
availability
awesome
monitoring
reliability-engineering
incident-response
site-reliability-engineering
production
post-mortem
capacity-planning
service-level-agreement
scalability
reliability
alerting
on-call
awesome-list
sre
observability
postmortem
site-reliability
-
Updated
May 6, 2018
The Skinny Distributed Lock Service
-
Updated
May 23, 2020 - Go
dastergon
commented
Jun 2, 2018
Although it's not a high priority, we could get a more fancy and modern wheel.
Calculate how much downtime should be permitted in your Service Level Agreement or Objective
calculator
devops
availability
site-reliability-engineering
service-level-agreement
slo
service-level-objective
service-level-indicator
sla
chaos-engineering
postmortem
site-reliability
service-level
-
Updated
Feb 14, 2021 - HTML
A collection of SRE tools
-
Updated
Nov 27, 2019
My opinionated list of products and tools used for high-scalability projects
distributed-systems
cloud
encryption
networking
storage
containers
high-performance
architecture
site-reliability-engineering
scalability
databases
tracing
service-mesh
-
Updated
Jun 19, 2021
A collection templates ported from the SRE Workbook
devops
reliability-engineering
templates
site-reliability-engineering
slo
sli
sla
site-reliability
error-budget
-
Updated
Aug 24, 2018
A list of common Disaster Recovery (DR) scenarios for software companies
security
devops
site-reliability-engineering
disaster-recovery
disaster-management
chaos-engineering
site-reliability
-
Updated
Dec 23, 2018
mysql
kubernetes
rust
redis
golang
haskell
functional-programming
nosql
book
terraform
site-reliability-engineering
postgresql
coursera
courses
software-engineering
infrastructure-as-code
articles
system-programming
operating-systems
-
Updated
Jun 22, 2021 - Makefile
This repository helps performance testers and engineers who wants to dive into DevOps and SRE world.
microsoft
testing
linux
docker
kubernetes
engineering
devops
roadmap
performance
site-reliability-engineering
chaos
rancher
sre
engineers
chaos-engineering
aws-devops
performance-engineers-devops
-
Updated
Jun 28, 2021
A combination of introduction to operating system and computer network
-
Updated
Feb 2, 2017
kubernetes
microservice
openshift
site-reliability-engineering
cncf
rancher
operator
gremlin
k8s
litmus
kind
chaos-engineering
openshift-cluster
kubespray
crd
chaos-experiments
chaos-mesh
litmus-chaos
-
Updated
Nov 25, 2020 - Ruby
devops
availability
backend
reliability-engineering
architecture
site-reliability-engineering
scalability
back-end
design-patterns
interview
preparation
awesome-list
dev-ops
sre
interview-questions
job-interviews
system-design
site-reliability
scale-systems
back-end-development
-
Updated
May 17, 2018
Endpoint monitoring and DNS failover agent written in Go
-
Updated
Dec 8, 2017 - Go
The agent of Komlog, a PaaS for helping observability teams to better understand their systems.
python
devops
monitoring
analytics
site-reliability-engineering
data-visualization
observability
o11y
-
Updated
Nov 14, 2017 - Python
Improve this page
Add a description, image, and links to the site-reliability-engineering topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the site-reliability-engineering topic, visit your repo's landing page and select "manage topics."
Issue Description
Question
Describe what happened (or what feature you want)
Trying to evaluate ChaosBlade as an option for resiliency testing. But I'm not sure if this is a feature request or a question. Actually, two questions: