HomeTech NewsHow One Team Cut Cloud Costs by $120,000 Without Breaking Anything

How One Team Cut Cloud Costs by $120,000 Without Breaking Anything

Cloud cost savings rarely come from one dramatic fix. For one engineering team running a production platform on AWS, the $120,000 they trimmed from their annual bill arrived in small pieces — a leaner codebase here, a resized database cluster there, a staging environment that had quietly ballooned to near-production scale. The story of how they got there is less about any single decision and more about finally revisiting dozens of decisions that had never been questioned again.

Cover image for We Cut $120,000 from Our Cloud Bill Without Sacrificing Reliability
via dev.to

  • Cloud cost savings of $120,000 annually came from dozens of small fixes across code, architecture, and infrastructure.
  • Improving application efficiency delivered roughly 30% lower CPU and memory usage, directly driving cloud cost savings.
  • Merging tightly coupled microservices reduced running containers, simplified deployments, and cut Kubernetes resource requests.
  • Non-production environments and forgotten snapshots quietly added thousands to the bill until a systematic cleanup began.

Why Cloud Bills Grow the Way They Do

Ask any infrastructure team where their cloud spend went and you’ll rarely get a clean answer. That’s because cloud bills don’t usually spike — they creep. A new microservice here, an extra backup retention policy there, a staging database that gradually accumulated two years of production-like data. None of these are bad decisions in isolation. Combined and left unreviewed, they compound into a problem that’s hard to trace back to any single cause.

This team’s setup was a fairly typical mid-scale architecture: AWS EKS with self-managed EC2 worker nodes, MongoDB Atlas handling NoSQL workloads, AWS RDS for relational data, and Amazon ElastiCache for Redis-based caching. Not exotic. Not obviously wasteful. But the bill had drifted well above what the actual workload justified, and the team decided to find out why — systematically. Achieving real cloud cost savings meant approaching the problem with the same rigor they’d apply to any engineering challenge.

What they found wasn’t one bloated service or a single misconfigured instance. It was the cumulative weight of infrastructure that had grown to support a system that no longer looked the way it did when those resources were provisioned. Genuine cloud cost savings, they concluded, would require working across every layer at once: application code, architecture, databases, Kubernetes configuration, storage, backups, and non-production environments.

Start With the Code, Not the Infrastructure

The instinct when facing a high cloud bill is to reach for infrastructure controls — resize instances, adjust autoscaling policies, prune unused resources. That’s valid, but it misses something important: inefficient application code is itself an infrastructure cost driver. If your services are burning twice the CPU they need to, you’re paying for twice the compute capacity to run them. Treating code quality as a cloud cost savings lever is one of the most underappreciated strategies available to engineering teams.

The team audited critical parts of their codebase with this lens. They reduced unnecessary in-memory object creation, replaced inefficient algorithms in hot loops, shifted selected workflows to asynchronous and event-driven processing, and cut down on heavy image-processing operations that were running synchronously where they didn’t need to. The aggregate result was roughly a 30% reduction in CPU and memory consumption across the platform.

On AWS EKS, that translated directly into better pod density — more workloads fitting onto the same EC2 nodes — which meant fewer nodes were needed overall. This is the kind of cloud cost savings that compounds: every percentage point you shave off resource usage at the application layer shrinks the infrastructure footprint required to host it. Better code is cheaper infrastructure. It’s that direct.

The Hidden Cost of Too Many Microservices

Microservices architecture has genuine advantages. Independent scaling, team autonomy, fault isolation, flexible deployment cadences — there are real reasons the industry moved in this direction. But microservices also carry real costs, and those costs multiply when services are too granular, too tightly coupled, or no longer independently useful.

Each running service needs CPU and memory allocations, log streams, monitoring agents, network routing, Kubernetes resource requests, and operational attention. When you have services that always deploy together, share the same traffic patterns, and are so tightly coupled that changing one requires changing the other, you’re paying the overhead of distribution without getting the benefits.

The team identified a set of services in exactly this situation — low individual traffic, high coupling, always released as a unit. They merged them. The result was fewer running containers, less inter-service network chatter, simpler deployments, and reduced Kubernetes resource overhead. It also made the system easier to reason about and operate, which is a benefit that doesn’t show up directly on a cloud bill but absolutely affects engineering time and reliability over the long run. The cloud cost savings from this consolidation were immediate and measurable.

This is a tension the broader industry is actively grappling with. The pendulum has been swinging back toward what some engineers are calling “modular monoliths” — architectures that preserve code modularity without forcing every boundary into a network call. For teams running on Kubernetes, the cost argument for consolidation is increasingly hard to ignore.

Cloud Cost Savings Inside MongoDB Atlas

Database costs are easy to overlook in a cloud bill breakdown because they tend to grow gradually and feel like fixed infrastructure. But MongoDB Atlas clusters sized for peak traffic and left alone can quietly consume a disproportionate share of monthly spend — especially when write patterns are inefficient or environments carry more data than they actually use.

The team made two meaningful changes here. First, they disabled multi-write behavior on clusters where the business logic didn’t actually require it. Multi-region, multi-write MongoDB configurations are powerful and genuinely necessary for certain workloads — but they’re expensive, and running them where they’re not needed is pure waste. Importantly, the team validated the impact carefully before making the change, confirming that reliability and data consistency were unaffected.

Second, they implemented autoscaling on their Atlas clusters. Rather than provisioning permanently for peak demand, autoscaling let the database layer flex with actual usage patterns. This is one of the more underused features in Atlas — many teams provision statically out of caution and never revisit it. The cloud cost savings from letting capacity track real load, rather than theoretical maximums, can be substantial.

Staging Environments: The Bill You Forgot About

Non-production environments are a reliable source of silent overspend. Staging gets set up to mirror production closely — which makes sense for accurate testing — and then it never gets downsized as the production system grows. Over time, staging inherits production’s scale without production’s justification for it.

This team’s staging environment had accumulated far more data than testing actually required. The infrastructure around it had crept toward production sizing. Cleaning it up meant trimming the dataset to what was genuinely representative for test purposes, and rightsizing the compute, database, and backup infrastructure accordingly.

The goal wasn’t to make staging useless — you still need enough realistic data to catch real bugs. But you don’t need full production-scale capacity running 24/7 to achieve that. The resulting cloud cost savings touched database storage, compute, backup retention, and supporting services all at once. For many teams, non-production environments represent some of the fastest available cloud cost savings precisely because they’ve gone unexamined the longest.

AWS RDS, Storage Cleanup, and Everything Else That Adds Up

The team also reviewed their AWS RDS relational databases, upgrading engines to current supported versions. That’s primarily a security and maintainability move, but newer database engine versions often bring meaningful performance improvements — better query planning, more efficient memory management, indexing improvements — that translate into reduced resource consumption at the same workload level.

Beyond that, backup retention policies were tightened, database sizing was validated against actual usage rather than original estimates, and a systematic cleanup of accumulated storage waste was carried out. Old container images in the registry, forgotten snapshots, “temporary” resources that had long since stopped being temporary — none of these individually make a dent in a six-figure bill, but together they absolutely do. This is where real cloud cost savings often hide: not in dramatic architecture changes, but in the unglamorous work of deleting things that should have been deleted months ago.

What This Actually Tells Us About Cloud Spending

The $120,000 figure is striking, but the more useful takeaway is the pattern. Cloud bills grow through accumulation, not through single catastrophic decisions. The remedies are almost always distributed in the same way — spread across application code, architecture choices, database configuration, storage hygiene, and environment management.

Teams that treat cloud cost as purely an infrastructure problem will keep missing the application-layer contributions. Teams that focus only on compute will keep paying for oversized databases. And teams that review production carefully while ignoring staging will keep throwing money at environments that exist primarily to test a handful of scenarios a day.

As cloud-native architectures mature and infrastructure costs become a more visible line item on engineering budgets, the expectation that developers understand and own the cost implications of their code is only going to grow. FinOps as a discipline has been formalizing this for several years now — the idea that cloud spend is an engineering responsibility, not just a finance one. Stories like this one are exactly why.

Source: https://dev.to/aws-builders/we-cut-120000-from-our-cloud-bill-without-sacrificing-reliability-1p61

Yasir Khursheed
Yasir Khursheedhttps://www.squaredtech.co/
Meet Yasir Khursheed, a VP Solutions expert in Digital Transformation, boosting revenue with tech innovations. A tech enthusiast driving digital success globally.
RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular