Amazon Simple Storage Service or Amazon S3 is a service designed to house storage for the internet. In this article, we'll talk through all the strategies you can use to reduce Amazon S3 costs.
First, let’s review the factors that affect Amazon S3 monthly costs. You will pay for:
One of the most important cost factors is the storage class. Make sure you understand the different classes available and their use cases. Let’s quickly review and compare them.
Amazon S3 offers 6 different storage classes.
Keep in mind that every S3 object can be assigned a specific storage
class. Thus a bucket might have objects with different classes simultaneously.
S3 Standard class is typically used for frequently accessed data. Although the cost per Gb is high, you don’t pay for the number of requests. And therefore this storage class is best suited for objects read or written several times each month.
The third class in the table is S3 Standard-IA. It has a lower Storage Price. But the access cost is higher. According to AWS, it should be used for long-lived but infrequently accessed data that needs instant access.
As a rule of thumb, S3 Standard-IA should be used if the object is accessed on average less than once a month. Why one month? Because that’s the frequency where S3 Standard and S3 Standard-IA both have roughly have the same cost overall cost. And it’s also the minimum recommended amount of time to keep the objects in the S3 Standard-IA class. If they are kept less than 30 days, then the rest will be charged.
Usually, it’s difficult to know how often the object is accessed. AWS created S3 Intelligent-Tiering to address this issue. This class automatically moves data between S3 Standard and S3 Standard-IA classes. And that minimizes the S3 cost for the object. If you are keeping an object for more than 30 days, the S3 Intelligent-Tiering class will be cheaper than S3 Standard and S3 Standard-IA. This should be your first option in those cases.
S3 One Zone-IA class is similar to S3 Standard-IA. But, instead of storing data in 3 (or more) AZs, data is stored in only one AZ. For this reason, data could be unavailable if the AZ fails. You should use S3 One Zone-IA only if you can tolerate this risk.
The last 2 classes are S3 Glacier and S3 Glacier Deep Archive. They have the lowest cost per GB. But the access cost is high. Therefore they are used for archiving purposes. They replace the tape libraries used on-premises.
You should keep in mind that Glacier objects aren’t immediately available. If you want to access the contents of an object in any Glacier class, you will have to wait until retrieval ends. For Bulk retrieval mode, this time is between 5 and 12 hours. And other retrieval modes are faster but more expensive. For this reason, Glacier should only be used for objects that are rarely accessed. For example, Glacier is ideal for backups, archiving, and any long-term infrequently accessed data.
The difference between S3 Glacier and S3 Glacier Deep Archive is the latter is for even less frequently accessed objects. For example, it’s recommended for objects accessed every 6 (or more) months. S3 Glacier
Deep Archive storage costs are lower. But the object needs to be stored
for at least 180 days in that class. Otherwise, that minimum period will
be charged.
We have just described the main characteristics of each S3 class, and
the suggested use cases. So now we can start optimizing them.
Below we talk through the main strategies to reduce AWS S3 costs.
Your first step is to analyze the access patterns for your data.
Start thinking about the intended usage for each new object to be
created in S3. Each object in S3 should have a specific access pattern.
And therefore there is an S3 class that works best for it.
The right class should be applied to all new objects in Amazon S3. It’s not possible to define the default class per bucket in S3. But you can assign it per object.
Start defining the best class for each new object in S3. And set this class in the operation that uploads this object to Amazon S3. This can be done using AWS CLI, AWS Console, or AWS SDK. As a consequence, each new object will have the right class. This could the best money-saving
strategy in the long term. And probably be the most time-efficient
strategy.
Now that you have already set the right class for new (to be created)
objects, you can focus on the already-created objects. The process is similar to the one described in the previous point. Start analyzing data
access patterns for every existing object in your S3 account. Then decide the best class for each one. And finally, assign that class in the object configuration. This will allow you to optimize every S3 bucket, and thus reduce your S3 costs.
How to check if this worked? You can use AWS Cost Explorer to check your daily S3 cost. You will also notice the cost reduction in next month’s bill. AWS bills show the consumption for each service, including
Amazon S3.
Consider that it could be time-consuming to update every object class after it’s created. So that’s why it’s very important to set classes
before objects are created (as previously described).
Note also that this process consists of an object-by-object (or bucket-by-bucket) revision. And, depending on the number of objects that
you have, it could a considerable amount of time. It’s probably better to focus on big (or very frequently accessed) objects. And then update their storage classes first.
You might also use S3 Storage Class Analysis. This is a tool to analyze S3 objects’ access patterns. It monitors the objects within a bucket. And it will show the amount of data stored in the bucket, the amount of data retrieved, and how frequently data is accessed (based on object age). Note that there is a small charge used by this tool. But it allows you to understand if the objects are accessed often. After you understand the access pattern, you can update the S3 storage class accordingly.
For example, if you find out that most objects in a bucket are accessed only once per year (and you don’t need immediate access), then you should adjust their storage class to S3 Glacier Deep Archive
You have probably noticed that you pay for the amount of data stored on S3. So if you remove unused objects, you will also reduce S3 costs.
How to check the contents of your S3 buckets? There are several ways.
For example, you can list the objects on each bucket. This will show object names (or key) without downloading the object’s contents. This can be done using the AWS Console, AWS CLI, or SDK.
Another option to check S3 buckets’ contents is using CloudWatch Metrics. Use BucketSizeBytes metric to get the complete size of the bucket. Or use NumberOfObjects metric to get the number of objects stored in it. These are per Bucket metrics, and they will show you how big the buckets are. Then you can start removing any unused object in the biggest buckets.
You can also activate the S3 Inventory in a bucket. This tool prepares a CSV (or Apache ORC) file, which lists all objects in a bucket. And it’s delivered to another S3 bucket on a daily or weekly basis. This is a good approach when you have thousands of objects in a bucket, and you want to quickly find some of their properties (like size, encryption status, or last modified time). Note that S3 Inventory has a small cost when active.
Amazon S3 offers a tool to automatically change the storage class of any object. For example, you can transition from S3 Standard class to S3 Glacier after some days of object creation. Therefore you can transition each object to the most suitable storage class. And this will translate into a cost reduction.
How does S3 Lifecycle management works? You set rules for each bucket. Each rule has a transition period. It counts the number of days since the object was created (or removed). And the rule also sets the storage class to transition into after this period. Note that you can always transition the objects to a longer-term storage class. But you can't transition to a shorter-term storage class.
You can also set a lifecycle rule for a whole bucket, or based on a
prefix. So you don’t need to transition your objects one-by-one. S3
Lifecycle Management is one of the most useful tools to save costs on
S3. And you should always consider using it.
This is another strategy to remove unused objects. Amazon S3
Lifecycle Management can also set an expiration policy. This allows you to expire an object some days after from creation. Every expired object will be automatically removed by AWS.
If you keep log files (or any other temporary data) as S3 objects,
you should set an expiration for them. For example, you can set log objects to expire 30 days after creation. And they will be removed automatically.
Amazon S3 uploads big objects using multipart upload. AWS divides a
big file into smaller fragments, and each one is uploaded independently to S3. Then AWS joins the several uploaded parts into the final object.
AWS recommends using multipart uploads for objects larger than 100 Mb.
And it’s required to use it for objects over 5 Tb.
It can take some time to upload big objects. And this upload process might be interrupted. As a consequence, the S3 upload bucket will keep some unused fragments. To remove them, you can set a new LifeCycle policy. Policies have a Clean up incomplete multipart uploads setting to expire these partial objects. Removing these objects will allow you to free space on S3, and then reduce costs.
You can also compress any object before uploading the S3 object. You just create a compressed file (eg ZIP, GZIP, or equivalent), which will be smaller than the original. And then you upload the compressed file to S3. The amount of data used in S3 will be lower. Then Amazon S3 costs will be reduced.
Note that, to get the original file, you will have to download it and
also decompress it. But you could save a lot of space in S3 (for example when using text files).
Remember that you also pay for the number of operations done in
Amazon S3. If you have to download many S3 objects simultaneously, it might be a good idea to pack them into one big object (e.g. TAR, ZIP, or equivalent).
Some storage classes have minimum capacity charges for objects. For
example, the minimum capacity charge per object for S3 Standard-IA and
S3 One Zone-IA is 128KB. And the minimum capacity charge per object for
S3 Glacier and S3 Glacier Deep Archive classes are 40 Kb.
For this reason, a small 1 KB object (having S3 Standard-IA class)
will be charged as 128 KB. Packing multiple files together will take advantage of minimum capacity charges. If you pack small size objects together, you will also reduce your S3 costs.
S3 Object versioning is a very useful tool. Every time you change the
contents of an object, AWS will keep the previous version of it. But if
you have a 1 MB object with 100 versions, then you will be paying for
100 MB of storage.
But you can use lifecycle policies to automatically delete previous versions after some time. For example, you set a policy to delete objects after 30 days of becoming non-current. This will limit the number of versions stored, and lower the storage used. This is another approach to increase your savings.
f you can wait for some hours to retrieve the objects, you can save money. So try to use Bulk Retrieval mode if possible. You can choose the retrieval mode when you request this retrieval.
Some applications store tables as Amazon S3 objects. These tables have a specific format (like JSON, CSV, or Apache Parquet formats). To query the data, you have to download the whole file. And then you have to query the whole table to find the desired data inside the table.
But there are more efficient ways to get the contents. AWS offers tools like Amazon Athena, S3 Select, or Amazon Redshift Spectrum.
These tools allow you to perform queries directly on the cloud. They process the data using SQL commands. And then they send you the data you need.
These tools offer many advantages. As the queries are completed on the cloud, you will need less processing power locally. Another benefit is that you download less data from Amazon S3. This makes the process faster and cheaper. Remember that you are charged by the amount of data downloaded from S3. If less data is requested, then you will save money on bandwidth.
Note that S3 queries have a small additional cost. You should evaluate whether you will get a cost optimization.
Some regions are much expensive than others. And this applies to
Amazon S3 prices also. So it’s worth considering moving your S3 bucket to a region with lower prices.
Another factor to consider is data transfer costs between AWS
regions. Data sent from a bucket to a VPC in the same region is free.
But sending data to a VPC in another region will have a cost per Gb.
It’s a good idea to keep a bucket in the region where data is sent.
In this article, you learned the common strategies to reduce
Amazon S3 costs. Now it’s time to take action. Pick the strategies that work best for your workload in AWS and let me know how it goes!