____. ________ ________
| |____ ___.__.\_____ \ \_____ \ ____ ____
| \__ \< | | / / \ \ / | \ / \_/ __ \
/\__| |/ __ \\___ |/ \_/. \ / | \ | \ ___/
\________(____ / ____|\_____\ \_/_____\_______ /___| /\___ >
\/\/ \__>_____/ \/ \/ \/
--=[ PrEsENtZ ]=--
--=[ 🚀 Cloud Infra Lab: Scalable ALB + ASG + NGINX + RDS Setup ]=--
--=[ Provision a complete AWS stack using Terraform ]=--
--=[ #StayUp | #End2EndBurner ]=--
First time using ChatGPT to assist my AWS and Terraform knowledge in building and troubleshooting a small, scalable yet extendable, cloud project end-to-end for learning purposes. Beginner to intermediate level. Enjoy!
AWS:
aws
cli installed and configured with an AWS account.
Zone and Domain:
- AWS Route53 zone resource should already exist (either manually or in Terraform).
- Must own the DNS zone via some domain registrar with the DNS servers pointed to the Route53 zone name servers.
- Demo will look up the zone resource by name.
- Change the
zone_name
variable in variables.tf to your own zone.- The
cloud.some.domain
DNS record will be created from thevar.zone_name
(ie.var.zone_name = "jq1.io"
->output.url = "https://cloud.jq1.io"
) - Demo is not configured for an apex domain at this time.
- The
IPAM Configuration:
- There are many ways to configure IPAM so I manually created IPAM pools (advanced tier) in the AWS UI.
- You'll need to configure your own IPv4 pools/subpools in IPAM.
- Demo will look up the IPAM pools via filter on description and ipv4 type.
- Advanced Tier IPAM in
us-west-2
operating reigons.- No IPv4 regional pools at the moment.
us-west-2
(ipam locale)- IPv4 Pool (private scope)
- Description:
ipv4-test-usw2
- Provisioned CIDRs:
10.0.0.0/18
- Description:
- IPv4 Pool (private scope)
Build:
terraform init
terraform apply
(takes a few minutes for asg instances to finish spinning up once apply is complete)- profit!
Tear Down:
- Remove RDS deletion protection:
aws rds modify-db-instance --db-instance-identifier test-app-mysql --no-deletion-protection --apply-immediately --region us-west-2
- Destroy resources:
terraform destroy
- note: vpcs will take 10-15min to destroy due to IPAM taking a long time to release the IP.
- Force delete the secrets manager path instead of waiting for scheduled deletion:
aws secretsmanager delete-secret --region us-west-2 --secret-id rds/test/mysql/app --force-delete-without-recovery --region us-west-2
- Delete snapshot that was created when destroying the DB.
aws rds delete-db-snapshot --db-snapshot-identifier test-app-mysql-final-snapshot --region us-west-2
Health Check:
https://cloud.some.domain/
->Health: OK: MaD GrEEtz! #End2EndBurner
RDS Connectivity Checks:
https://cloud.some.domain/app1
->App1: MySQL Primary OK (or MySQL Primary Error)
https://cloud.some.domain/app2
->App2: MySQL Read Replica OK (or MySQL Read Replica Error)
- Add RDS proxy for primary and read replica DBs.
- Will require NATGWs.
Modularize (OO style):
alb.tf
asg.tf
rds.tf
Application Load Balancer (ALB):
- HTTPS (TLS 1.2 & 1.3) with ACM + ELBSecurityPolicy-TLS13-1-2-2021-06.
- HTTP to HTTPS Redirects
Auto Scaling Group (ASG):
- EC2 instances with cloud-init & socat health endpoints.
- using Mariadb as the MYSQL client.
- Scales based on CPU utilization.
- Deployed across multiple AZs.
- Instances can spin up without a NATGW because there's an S3 gateway.
- This is because Amazon Linux 2023 AMI uses S3 for the yum repo.
- If you plan on using NATGWs for the ASG instances when modifying the cloud-init script then set
natgw = true
(on public subnet per Az) and you'll need to add an egress security group rule to the instances security group.
- It's difficult to test scale-out with no load testing scripts (at the moment) but you can test the scale-in by selecting a desired capacity of 6 and watch the asg terminate unneeded instance capacity down back to 2.
- Boolean to auto deploy instance refresh using latest launch template version after the launch template user_data or image_id is modified.
- The config prioritizes availability (launch before terminate) over cost control (terminate before launch).
- Only one instance refresh can be run at a time or it will error.
- View in progress instance refreshes with
aws autoscaling describe-instance-refreshes --auto-scaling-group-name test-web-asg --region us-west-2
. - Current demo configuration will take up to 10min for a refresh to finish or manually cancel.
NGINX reverse proxy + Socat Health Checks:
- Path-based routing: /app1, /app2.
- /app1 returns primary db health.
- /app2 returns read replica db health
- Uses socat for reliable TCP responses.
- Lightweight bash scripts to simulate apps.
- mysql -e "SELECT 1" run with credentials pulled from Secrets Manager.
Amazon RDS (MySQL):
- Multi-AZ with encryption via custom KMS key.
- Access controlled by SGs (only from ASG instances).
- Secrets (MySQL creds) stored in AWS Secrets Manager.
- Intra-region encrypted Multi-AZ Read Replica.
Security Groups:
- Fine-grained rules for ALB ↔ EC2 ↔ RDS.
- Outbound rules configured for necessary security groups.
Scaling Behavior:
- Scale Out: if average CPU > 70% for 2 minutes.
- Scale In: if average CPU < 30% for 2 minutes.
- Policies managed via CloudWatch alarms + ASG.
VPC:
- Uses Tiered VPC-NG module.
- Requires IPAM.
- VPC Endpoint for sending s3 traffic direct to s3 instead of traversing IGW or NATGW.
- Using isolated subnets for db subnets for future use when scaling VPCs in a Centralized Router (TGW hub and spoke).
- It will make it easier for db connections to be same VPC only so other intra region VPCs cant connect when full mesh TGW routes exist.
- example: Centralized Egress Demo
Advantages:
- Horizontal scalability.
- ASG lets you scale NGINX nodes based on CPU, connections, etc.
- Managed ingress.
- ALB handles TLS termination, health checks, and routing to NGINX instances cleanly.
- Separation of concerns.
- NGINX handles HTTP logic (e.g., authentication, load balancing), MySQL stays private.
- Custom routing logic.
- You can implement advanced logic like conditional proxying, auth, throttling, etc.
- Can front many apps.
- One NGINX can proxy to multiple backends, including MySQL-checking microservices.
Limitations:
- NGINX is not a MySQL proxy.
- NGINX is built for HTTP, not stateful MySQL TCP connections.
- You cannot proxy raw MySQL traffic through NGINX.
- Unnecessary complexity.
- If just connecting to MySQL from backend apps, NGINX is likely overkill.
- Extra latency.
- Adds a hop: ALB → NGINX → app → MySQL.
- This could slightly slow down reads/writes if not designed carefully.
- Scaling not tied to DB load
- Scaling NGINX does not help with MySQL bottlenecks unless your NGINX is doing significant compute (auth, caching, etc.).
- Maintains state poorly.
- MySQL connections are long-lived and stateful, not ideal for stateless NGINX workers.
- Not resilient to MySQL issues.
- If MySQL becomes slow/unavailable, NGINX becomes a bottleneck or fails with 5xx unless you explicitly handle those errors.