Amazon OpenSearch Gets Agentic AI: Plan-Execute-Reflect Investigation Agent
April 4, 2026 · 7 min read · By Connie
TL;DR
Amazon added agentic AI to OpenSearch Service on April 2, 2026. The Investigation Agent uses a plan-execute-reflect loop to autonomously diagnose incidents — translating natural language to OpenSearch DSL, correlating signals across indices, and producing root cause analysis without manual query writing. Backed by Amazon Bedrock, it's model-agnostic and natively integrated with CloudWatch and X-Ray. Direct competition to Elastic AI and Splunk AIOPS, but with open-source data tier and no AI vendor lock-in.
Observability has always been a data problem with a human bottleneck. The data is there — logs, metrics, traces, events — but turning it into actionable root cause analysis requires an engineer who knows how to write complex queries, knows which indices to look in, and can correlate signals across distributed systems under the pressure of an active incident. AI agents remove that bottleneck.
Amazon's April 2, 2026 update to OpenSearch Service introduces two agentic capabilities that make autonomous incident investigation a production reality for AWS-native teams: an Agentic Chatbot for natural language querying, and an Investigation Agent that runs a full plan-execute-reflect diagnostic loop without human intervention.
What the Plan-Execute-Reflect Loop Actually Does
The investigation loop is the core innovation here, and it is worth understanding precisely how it works. Traditional AI-assisted observability tools answer questions — you ask, they respond. The Investigation Agent operates differently: it receives an incident signal and operates autonomously until it reaches a conclusion.
Agent receives the incident signal (alert, error rate spike, latency anomaly). It generates an investigation plan: which indices to query, which time windows to examine, which correlated metrics to pull.
Agent translates the plan into OpenSearch DSL queries and executes them against the relevant indices — logs, metrics, traces, or custom application events.
Agent reviews the query results against the original hypothesis. If the evidence supports the hypothesis, it proceeds to root cause. If not, it updates the plan and loops back to execute with refined queries.
Once the loop converges, the agent synthesizes a structured report: root cause identified, supporting evidence, affected components, and recommended remediation steps.
The loop runs until it either converges on a root cause or exhausts its investigation budget. The result is a junior on-call engineer being able to present senior-SRE-quality incident analysis in minutes rather than hours.
Full Capability Breakdown
| Feature | What It Does | Operational Impact |
|---|---|---|
| Agentic Chatbot | Natural language interface to OpenSearch data — ask questions about logs, metrics, and traces in plain English | Eliminates need to write OpenSearch DSL for routine queries; accessible to non-engineers |
| Investigation Agent | Plan-execute-reflect loop: autonomously plans investigation steps, executes OpenSearch queries, reflects on results, and iterates toward root cause | Reduces MTTD for complex incidents from 30–60 min to under 10 min; handles multi-signal correlation automatically |
| Natural Language → DSL translation | Translates plain English questions into valid OpenSearch Domain Specific Language queries | SREs and platform engineers can query without DSL knowledge; reduces query-authoring bottleneck |
| Amazon Bedrock backend | Foundation model layer is Bedrock — teams choose their model (Claude, Titan, etc.) based on cost and compliance | Model-agnostic architecture; no vendor lock-in on AI provider; enterprise compliance controls inherited from Bedrock |
| Root cause analysis generation | Agent synthesizes multi-index evidence into a structured root cause report with recommended remediation | Junior on-call engineers can handle incident triage at the level previously requiring senior SREs |
| AWS native integration | Deep integration with CloudWatch metrics, X-Ray traces, and S3 log archives in the same AWS account | Unified observability across the full AWS stack without manual data pipeline configuration |
Why the Bedrock Backend Architecture Matters
Most AI-assisted observability tools bake in a specific model — Elastic defaults to Azure OpenAI, Datadog uses its own internal LLM. OpenSearch's decision to route through Amazon Bedrock has three significant enterprise implications.
- Model selection flexibility: Teams can use Anthropic Claude for reasoning-heavy incident analysis, Amazon Titan for cost-sensitive workloads, or other Bedrock models as they are added. The investigation capability is not tied to a single model's limitations.
- Compliance inheritance: Bedrock's enterprise controls — VPC endpoints, PrivateLink, AWS IAM, CloudTrail audit logs, SOC 2 / HIPAA / FedRAMP certifications — all apply to the AI backend automatically. This matters enormously for regulated industries that need to prove their AI queries are not leaving compliant environments.
- Cost transparency: OpenSearch does not add an AI surcharge. Bedrock usage is billed at standard token rates, visible in the same AWS Cost Explorer view as all other services. No AI black-box pricing.
For enterprises already running workloads on AWS, this is the path of least resistance to agentic observability — no new vendors, no new compliance reviews, no new data egress architecture decisions.
Platform Comparison: AI-Assisted Observability 2026
| Platform | AI Feature | AI Backend | Investigation Loop | Pricing |
|---|---|---|---|---|
| Amazon OpenSearch | Investigation Agent + Agentic Chatbot | Amazon Bedrock (model-agnostic) | Plan-Execute-Reflect (autonomous) | No AI surcharge; Bedrock usage billed separately |
| Elastic (ELK/Elastic Cloud) | Elastic AI Assistant | OpenAI / Azure OpenAI | Semi-automated (human-in-loop) | AI Assistant included in Platinum+ |
| Splunk AIOPS | Einstein AIOPS, AI-assisted triage | Splunk AI / OpenAI | Event correlation + playbooks | Premium feature; usage-based |
| Datadog AI | Watchdog AI + Bits AI | Internal LLM | Watchdog autonomous anomaly detection | Bits AI included in some plans |
Who Should Enable This Now
Platform engineering teams on AWS
If your observability stack runs on OpenSearch Service, this requires no migration — enable Bedrock integration, enable the Investigation Agent in the console, and your existing data is immediately queryable with natural language.
SRE teams with junior-to-mid ratio skewed junior
The plan-execute-reflect loop is essentially the investigation procedure a senior SRE would follow, automated. Teams where senior SREs are the bottleneck for incident escalations see the most immediate impact.
Compliance-sensitive organizations
Healthcare, financial services, and government teams that need AI tools to stay inside existing compliance boundaries get that automatically via the Bedrock backend — no new BAA, no new vendor DPA, no new infrastructure.
AWS-native organizations evaluating Elastic or Datadog migrations
If you were considering a migration primarily for AI-assisted investigation capabilities, OpenSearch now has a comparable answer in the same AWS console where you already manage your data.
Research Enterprise AI with HappyCapy
Use HappyCapy to compare observability platforms, draft RFPs, and evaluate AI-assisted DevOps tooling for your team.
Try HappyCapy FreeFrequently Asked Questions
What is the Amazon OpenSearch Investigation Agent?
What AI model powers Amazon OpenSearch Agentic AI?
How does OpenSearch Agentic AI compare to Elastic AI and Splunk AIOPS?
Is Amazon OpenSearch Agentic AI available now?
What is mean time to detection (MTTD) and how does AI improve it?
The Observability Stack Is Becoming Autonomous
OpenSearch's Investigation Agent is part of a broader shift across the observability market. The era of AI-assisted querying — where AI helps engineers write better queries — is being superseded by AI-autonomous investigation, where agents handle the entire diagnostic workflow from alert to resolution.
Amazon's execution here is notable because it takes the lowest-friction path for AWS-native teams: no migration, no new vendors, no additional compliance review. If your data is already in OpenSearch Service, you enable two settings and your on-call rotation has a digital first responder that can do the first 20 minutes of incident investigation automatically.
The Investigation Agent is available now in the OpenSearch Service console. Teams using the managed service can enable it with an active Bedrock configuration.