AI-powered SRE platform for automated incident investigation
-
Updated
Feb 28, 2026 - Python
AI-powered SRE platform for automated incident investigation
HoloInsight is a cloud-native observability platform with a special focus on real-time log analysis and AI integration.
GMSSH: Desktop-Grade AI-Driven Operations Terminal High Performance · Non-Intrusive · AI-Powered;GMSSH 桌面级 AI 运维终端.高性能·AI 智驱
autonomous systems engineering cli agent
🏰 Real-time dashboard for monitoring and managing Clawdbot AI agents
InsightSolver: Colab notebooks for exploring and solving operational issues using deep learning, machine learning, and related models.
An autonomous SRE agent that monitors cloud logs across multiple platforms, leveraging AI models from various providers to detect anomalies, perform root cause analysis, and automate remediation by creating GitHub Pull Requests.
AI that ships your code. Deploy to any cloud with plain English.
ARF is an agentic reliability intelligence platform that separates decision intelligence (OSS) from governed execution (Enterprise), enabling autonomous operations with deterministic safety guarantees.
SDK to track cost-per-outcome for AI workflows
Lightweight server management & intelligent ops platform with Docker one-click deployment, batch script execution, web terminal, and AI-powered operations.
Open source code for AIOpsServing
Kill-switch for AI agents. Validates infrastructure operations before execution. Fail-closed. Evidence-backed.
🐝 引巢 · ZyHive | AI 团队操作系统 — 为 AI 成员注入灵魂,可视化管理多智能体团队。Self-hosted AI Team OS · Go + Vue 3 · One-click deploy
Hypothesis-driven AI agent for incident investigation. AWS, K8s, PagerDuty.
Real-world patterns for shipping AI agents to production. Learn versioning, cost optimization, multi-tenancy, guardrails, and observability through runnable TypeScript examples.
Advanced, end-to-end, enterprise-grade agentic AI pipeline that automates competitor ad intelligence, performs multimodal creative strategy extraction, enables brand-safe adaptation, and generates AI video ads using LLM reasoning, multimodal analysis, and deterministic workflow orchestration with full auditability.
tpu-doc is a zero-dependency diagnostic binary for Google Cloud TPU environments that instantly validates hardware health, discovers software stack configurations, and provides AI-powered log analysis to eliminate expensive debugging downtime.
🤖 Build and deploy scalable Multi-AI Agent systems with LangGraph and Groq LLMs to enhance intelligence across enterprise applications.
Add a description, image, and links to the ai-ops topic page so that developers can more easily learn about it.
To associate your repository with the ai-ops topic, visit your repo's landing page and select "manage topics."