Home › Forums › AWS › AWS Certified Data Engineer Associate DEA-C01 › A manufacturing company uses Amazon S3 as a data lake › Reply To: A manufacturing company uses Amazon S3 as a data lake
-
Hi @drsparrow,
That’s a great point. Redshift isn’t always the cheapest option, and it’s good to question that assumption. In this case, though, Redshift is being used in a targeted and cost-efficient way, not as a complete data warehouse for everything, but as a fast analytics layer for just the “hot” data.
Here’s the reasoning behind why this setup is still considered cost-effective:
-
Hot vs. Warm vs. Cold Data Strategy:
The company’s sales team runs frequent queries on data from the past three months. Keeping this small, active dataset in Amazon Redshift allows them to get sub-second query performance and handle complex analytical queries efficiently.
Meanwhile, data from months 3–6 remains in Amazon S3, where Redshift Spectrum can query it directly, such as during the semi-annual reporting. This hybrid model minimizes Redshift storage costs while giving users a seamless query experience. -
Optimized Use of Redshift Resources:
Only the recent three months are physically stored in Redshift tables. Spectrum queries the external S3 data without requiring a full data load, which means:-
No unnecessary duplication of data.
-
You only pay for the Redshift compute you actually use.
-
Spectrum’s pay-per-scan model keeps costs low for occasional queries.
-
-
Automated Cost Control via Lifecycle Policies:
Data older than six months is automatically moved to S3 Glacier Deep Archive using an S3 Lifecycle policy. This satisfies retention requirements while keeping storage costs at their absolute minimum. Glacier Deep Archive is ideal here since there’s no plan to access that data again. -
Why not other approaches?
Keeping all data in Redshift would be overkill and costly. Using Glacier Instant or Flexible Retrieval for the 3–6 month data would make it harder (and more expensive) to access during the semi-annual report. And while tools like QuickSight can query S3 directly, they don’t replace the performance and flexibility of Redshift + Spectrum for complex analytical workloads.
In short, this architecture uses Redshift only where it adds real value, fast analysis for recent data, while leveraging S3 and Glacier for cheap, long-term storage. That combination makes it both high-performance and cost-effective.
I hope this provides a clearer picture of why Redshift is still the most cost-effective choice in this scenario. If you have further questions, please don’t hesitate to contact us.
Regards,
Nikee @ Tutorials Dojo
-