-
Notifications
You must be signed in to change notification settings - Fork 777
Description
Is this related to an existing feature request or issue?
No response
Summary
We propose contributing CloudWatch Synthetics MCP Server to AWS Labs. This server allows customers to interact with AWS CloudWatch Synthetics using natural language for intelligent canary failure analysis and guided creation workflows, significantly reducing time-to-resolution for canary failures and eliminating the complexity of canary creation.
As the AWS CloudWatch Synthetics team developers (@rgupta2205, @sharmaakshay23, @maxismailov) of this MCP server, we also volunteer to be active maintainers of the project within AWS Labs.
Use case
Synthetics Operators can:
- Diagnose canary failures with comprehensive artifact analysis (HAR files, screenshots, CloudWatch logs)
- Create canaries effortlessly through an interactive guided wizard that handles infrastructure provisioning
- Analyze performance trends using CloudWatch metrics and success rate patterns
With the Synthetics MCP Server, engineers can have conversations like:
Engineer: "My checkout-flow canary is failing. What's wrong?"
AI Assistant: "Analysis shows Protocol errors due to browser crashes from insufficient ephemeral storage. The canary is using runtime syn-nodejs-puppeteer-6.2. Recommendation: Upgrade to syn-nodejs-puppeteer-10.2 with increased storage allocation."
Engineer: "Help me create a canary to monitor my API endpoint"
AI Assistant: "I'll guide you through canary creation. What's your API endpoint URL? [Collects parameters interactively and provisions artifacts location automatically]"
Engineer: "Show me performance trends for user-login canary"
AI Assistant: "Success rate dropped from 98% to 73% over last 24 hours. HAR analysis shows database timeout patterns in 67% of failures."
This transforms complex synthetic monitoring operations into simple, AI-guided workflows, reducing customer support tickets and incident investigation time from hours to minutes.
Proposal
What We're Building
A Python-based MCP server that exposes 8 comprehensive tools for AI assistants:
- analyze_canary_failures - Comprehensive multi-service failure analysis combining Synthetics runs, CloudWatch logs, S3 artifacts, IAM configuration, and Lambda function validation
- create_guided_canary - Interactive workflow that orchestrates canary creation with automatic S3 bucket provisioning, IAM role creation, and infrastructure setup
- get_canary_metrics - Intelligent performance analysis with trend detection, success rate patterns, and actionable insights
Technical Approach
- Uses boto3 for AWS API interactions
- Implements MCP protocol via FastMCP framework
- Supports standard AWS authentication (IAM roles, credentials)
- Automated infrastructure provisioning (S3 buckets, IAM roles)
- Configurable via environment variables (AWS_REGION, log levels, endpoints)
- Comprehensive error handling and logging
Integration
Works with any MCP-compatible AI assistant:
- Claude Desktop
- GitHub Copilot
- Amazon Q
- Custom AI applications
Out of scope
- Multi-account operations - Single account/region per instance
- Real-time streaming - Uses polling-based approach for reliability
- Custom canary script development - Uses AWS blueprints only, asking necessary parameters for safety purposes
- Advanced VPC configuration - Doesn't manage complex VPC setups currently for security and operational simplicity
- Data aggregation - AI assistant handles correlation and analysis
Potential challenges
- API Rate Limits
Challenge: Heavy usage of AWS APIs could hit throttling limits
Mitigation: Implemented pagination, query limits, and timeout controls
- Large Artifact Analysis
Challenge: Logs, HAR files and screenshots can return massive datasets
Mitigation: Streaming download, intelligent filtering, size limits, and progressive analysis with early termination for healthy runs
- Security and Data Exposure
Challenge: Exposing potentially sensitive canary data and logs through AI assistants
Mitigation: Read-only operations, standard AWS IAM controls, no credential storage
- Complex Failure Patterns
Challenge: Canary failures can have multiple root causes requiring expert knowledge
Mitigation: Comprehensive pattern database based on real AWS support tickets, structured analysis workflow, and clear escalation paths
- AI Hallucination
Challenge: AI assistants might misunderstand complex debugging data
Mitigation: Detailed tool descriptions and examples guide appropriate usage
Dependencies and Integrations
No response
Alternative solutions
Metadata
Metadata
Assignees
Labels
Type
Projects
Status