The cloud bill keeps climbing month over month, and nobody can explain exactly why. This is not a rare scenario. In nearly every cloud project we support at EverBright IT, the same question surfaces after about six months: where is the money going, and what can we do about it?
Why Cloud Costs Spiral Out of Control
The pattern repeats itself across organizations of all sizes. A team provisions resources for a new project, picks generous instance sizes (“better too much than too little”), and forgets to adjust the configuration after launch. Test environments pile up, snapshots accumulate, and orphaned resources quietly burn through the budget.
The problem is not the cloud itself. What is missing is a systematic approach to cost management as an ongoing engineering concern. In the on-premise era, the budget was fixed after the hardware purchase. In the cloud, it is variable, and that makes cost control a continuous responsibility. If you are still early in your cloud journey, our cloud migration strategy guide provides the framework — and it is worth thinking about how your architecture choice (monolith vs. microservices) shapes your cost trajectory from day one.
Three cost drivers we encounter in almost every engagement: oversized instances running at 15-30% average CPU utilization; forgotten resources like unused Elastic IPs, empty load balancers, or orphaned EBS volumes; and a lack of automation for shutting down non-production environments outside business hours.
Right-Sizing as the First Lever
The fastest path to lower costs runs through right-sizing. The idea is straightforward: match instances and services to actual demand instead of maintaining reserves that never get used.
AWS, Azure, and GCP all offer native tools for this. On AWS, the Cost Explorer combined with Compute Optimizer provides actionable recommendations. A typical example from a client project:
## AWS CLI: Fetch Compute Optimizer recommendations
aws compute-optimizer get-ec2-instance-recommendations \
--filters "name=Finding,values=OVER_PROVISIONED" \
--output table
In one mid-sized company project, consistent right-sizing reduced the monthly EC2 bill by 35%. The effort required was two days of analysis and half a day of implementation. The ratio of effort to savings with right-sizing is almost always excellent.
One important caveat: right-sizing is not a one-time event. Usage patterns shift, new services get added, and configurations drift. A monthly review cycle keeps things current.
Reserved Instances and Savings Plans
For stable workloads running around the clock, Reserved Instances (RIs) and Savings Plans offer significant discounts. On AWS, savings range from 30% to 72% compared to on-demand pricing, depending on term length and payment options.
The mistake many companies make is purchasing RIs too early, before they have a clear picture of actual usage. The recommendation is to wait three to six months of stable operations before committing. Without that data baseline, reservations become guesswork.
## Example: Terraform config for Savings Plan coverage monitoring
resource "aws_budgets_budget" "savings_plan_coverage" {
name = "savings-plan-coverage"
budget_type = "SAVINGS_PLANS_COVERAGE"
limit_amount = "80"
limit_unit = "PERCENTAGE"
time_unit = "MONTHLY"
notification {
comparison_operator = "LESS_THAN"
threshold = 80
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = ["cloud-ops@example.com"]
}
}
Savings Plans are more flexible than classic RIs. They do not bind to a specific instance family or region. For small and mid-sized businesses (SMBs, or “Mittelstand” in the DACH region) whose workloads are still evolving, this is often the smarter choice.
Automation: Schedule Non-Production Environments
Non-production environments run 24/7 in many organizations, even though they are only used during business hours. That is roughly 65% wasted runtime. With three to five environments, this quickly adds up to a four-figure monthly cost.
The fix is not complicated: scheduled start/stop scripts that shut down development and staging environments in the evening and bring them back in the morning.
import boto3
from datetime import datetime
def lambda_handler(event, context):
ec2 = boto3.client('ec2')
action = event.get('action', 'stop')
filters = [
{'Name': 'tag:Environment', 'Values': ['dev', 'staging']},
{'Name': 'instance-state-name',
'Values': ['running'] if action == 'stop' else ['stopped']}
]
instances = ec2.describe_instances(Filters=filters)
instance_ids = [
i['InstanceId']
for r in instances['Reservations']
for i in r['Instances']
]
if not instance_ids:
return {'statusCode': 200, 'body': 'No instances to process'}
if action == 'stop':
ec2.stop_instances(InstanceIds=instance_ids)
else:
ec2.start_instances(InstanceIds=instance_ids)
return {'statusCode': 200, 'body': f'{action}ped {len(instance_ids)} instances'}
This Lambda function can be triggered via EventBridge (formerly CloudWatch Events) as a cron job. Setup takes about half a day, and it pays for itself in the first month.
Establishing FinOps as a Practice
Individual measures deliver short-term results, but sustainable cost optimization requires a framework. FinOps, the intersection of Finance and DevOps, provides exactly that: an operating model where engineering teams take ownership of their cloud spend.
In practice, this means every team sees its own costs, understands the drivers, and can implement optimizations independently. The foundation for this is a clean tagging strategy. Without tags, there is no way to attribute which team, project, or environment is generating which costs.
A minimal tagging schema to start with covers three tags: Team (who pays), Environment (dev, staging, prod), and Project (what for). Sounds simple, but in reality 40-60% of resources are missing at least one of these tags. Tools like AWS Tag Policies or Azure Policy enforce compliance automatically.
The cultural shift matters as much as the technical implementation. When cloud costs only surface at the quarterly controlling meeting, the feedback loop is too slow. Weekly cost dashboards per team create transparency and a sense of ownership.
Spot Instances for the Right Workloads
For fault-tolerant workloads like batch processing, CI/CD pipelines, or data analysis, Spot Instances (AWS) and Preemptible VMs (GCP) offer savings of up to 90%. The trade-off: these instances can be terminated with short notice.
Not every workload qualifies. Databases and stateful services do not belong on Spot Instances. But CI/CD runners, test environments, and data processing jobs benefit enormously. In one project, switching to Spot-based runners cut CI/CD costs by 70% with no impact on build speed.
The key to success with Spot Instances lies in architecture. Applications designed from the start with cloud-native patterns, using container orchestration and automatic failover, can use Spot capacity without issues. For companies planning a cloud migration, it pays to factor this in from day one.
Conclusion
Optimizing cloud costs is not a one-time project but an ongoing engineering discipline. The five levers discussed here, right-sizing, reservations, automation, FinOps, and Spot usage, complement each other and deliver the greatest impact when combined. Based on our project experience, the first optimization cycle typically yields 20-40% savings without any functional trade-offs.
The first step does not need to be dramatic: an inventory of current resource utilization reveals the biggest savings opportunities within a few hours. If you are looking for support with systematic cloud cost optimization, EverBright IT brings hands-on experience from numerous projects. Schedule a consultation →
Frequently Asked Questions
How much can right-sizing reduce cloud costs?
Right-sizing typically cuts EC2 costs by 30 to 35 percent in the first cycle. Many instances run at only 15 to 30 percent CPU utilization because engineers provision generously “just in case.” AWS Compute Optimizer identifies over-provisioned instances automatically. This is a low-effort, high-return optimization that pays for analysis and implementation within weeks.
What is the difference between reserved instances and savings plans?
Reserved Instances commit to a specific instance family and region for one to three years, offering 30 to 72 percent discounts. Savings Plans provide more flexibility, covering multiple instance types and regions with similar discounts. For evolving workloads, Savings Plans are preferable because they adapt as your infrastructure changes without wasting purchased commitment.
How do spot instances reduce costs?
Spot Instances cost 70 to 90 percent less than on-demand pricing but can be terminated with short notice. They work well for fault-tolerant workloads like CI/CD runners, batch processing, and test environments. In one project, switching to Spot-based CI runners cut CI/CD costs by 70 percent with zero impact on build speed, provided your architecture handles interruptions gracefully.
What is FinOps and why does it matter?
FinOps is the practice of giving engineering teams ownership over their cloud spend. Every team sees its own costs, understands what drives them, and optimizes independently. Without this visibility, cost control happens quarterly after damage is done. Weekly cost dashboards per team create real-time feedback loops and accountability for budget management.
How much can automation save on non-production environments?
Development and staging environments run 24/7 in many companies but are only used during business hours, wasting 65 percent of runtime. Scheduled start/stop scripts shut them down at 6 PM and restart them at 8 AM, reducing those costs by roughly two-thirds. Setup takes half a day and typically pays for itself in the first month.