Skeddly Blog

Skeddly news and announcements...

Looking at Amazon Athena Pricing

Amazon Athena is an interactive query service where you can query your data in Amazon S3 using standard SQL statements. Amazon Athena only reads your data, it will not add to or modify it. So you can think of it as only being able to execute SELECT statements.

Today, we’re going to take closer look at Amazon Athena pricing and how you can reduce your Athena costs.

General Pricing Structure

According to the Amazon Athena Pricing page, Athena is priced at $5 per TB (terabyte) scanned per query execution. There is a 10 MB data scanning minimum per execution. You are not charged for failed queries. If you cancel a query, you are charged for the data scanned up to the point of cancelling the query.

Doing that math for smaller queries:

$5 / 1024 / 1024 = $4.768e-6

So you will be charged $0.000004768 per MB scanned with a $0.00004768 minimum charge (for the 10 MB scanning minimum). So be careful of those 200 KB queries. You will still be charged for a full 10 MB.

Things That Are Free

Database, table, schema, and DDL-related executions are all free. For example, there is no charge for any of the following statements:

  • CREATE EXTERNAL TABLE
  • ALTER TABLE
  • MSCK REPAIR TABLE

Additional Costs

Amazon Athena reads your data stored in Amazon S3. There will be normal S3 data charges for the storage of that data, depending on how it’s stored.

Amazon Athena stores query history and results in a secondary S3 bucket. So there will also be normal S3 data charges for that new data stored in that bucket as well.

Cost Reduction Techniques

Technique 1: Use S3 lifecycle rules to remove historical results.

Since all results are stored back into S3, you are going to pay for that storage. To reduce the cost of historical queries, you can use S3 lifecycle rules to delete old results.

Technique 2: Compress your input data in S3

Amazon Athena pricing is based on the bytes read out of S3. It’s not based on the bytes of record data read into Athena. So, if your data is compressed in S3, then that will help reduce the Athena costs.

By simply using GZIP on your input files before they are placed into S3, you can reduce your costs.

For example, your log file is 20 MB uncompressed. You used GZIP to compress the log file down to 10 MB before placing it in S3. Then when scanning that log file, you will pay only for 10 MB.

Technique 3: Use Partitions Effectively

Without partitions, all of your data needs to be scanned simply to eliminate it from the results by your WHERE clause.

By structuring your data in S3 using prefixes, you can use partitions to eliminate large amounts of data from being read from S3.

For example, if your data had a column such as CustomerId and your queries resembled the following:

SELECT * FROM table1
WHERE CustomerId = 'cus_1'

Then you can structure your data in S3 using prefix folders, such as:

cus_1/DataFile1.json.gz

if you manually add your partitions using ALTER TABLE, or

CustomerId=cus_1/DataFile1.json.gz

if you want to use MSCK REPAIR TABLE to automatically load your partitions. And yes, that is a key=value pair in the S3 object’s key name. That is how Athena knows the partition information.

Technique 4: Store Your Data in a Columnar Format

By storing your data in S3 in a columnar format, you can reduce the amount of data read from S3. Athena likes the Apache Parquet or ORC formats.

Note, if your queries are SELECT * FROM ..., then you are reading all columns, and you won’t benefit from columnar storage. To take advantage of columnar storage, explicitly specify the columns you want:

SELECT `Col1`, `Col2` FROM tableName

Final Thoughts

Amazon Athena is a very exciting new service. Be aware of the pricing structure. Structure your data and queries to reduce your costs as much as possible, and you’ll have a fantastic and powerful new candidate to be added to your arsenal for serverless computing.

Articles In This Series

  1. Getting Started with Amazon Athena, JSON Edition
  2. Using Compressed JSON Data With Amazon Athena
  3. Partitioning Your Data With Amazon Athena
  4. Automatic Partitioning With Amazon Athena
  5. Looking at Amazon Athena Pricing

About Skeddly

Skeddly is the leading scheduling service for your AWS account. Using Skeddly, you can:

  • Reduce your AWS costs,
  • Schedule snapshots and images, and
  • Automate many DevOps and IT tasks.

Sign-up for our 30 day free trial or sign-in to your Skeddly account to get started.

<