Redshift sortkey

3/22/2023

In some extreme cases, if the query use cases are very different and performance is critical, you may consider duplicating the table and having different sort keys for each application.

So you want to be sure that when selecting the sort keys you are considering all the users of the table and which types of queries should take priority when planning your optimization. There isn’t a way to optimize a table for all possible queries. You can read a more thorough comparison of sort styles in this AWS article.Īs you can see, sort keys only help the queries that use those fields in filters or joins. One thing to note when using interleaved sort keys, however, is that they are costlier to maintain - loading and vacuuming these tables will be much slower. So while a query against just the date won’t be as fast as the compound sort key with date and region, a query filtering only on the region will perform much better. These keys give an equal weight to all the fields in the sort key. If, however, there is no clear pattern for the fields that are filtered on, but there are several that are used in different cases, consider an interleaved sortkey. If you select date then region, a query that only filters on region won’t see any benefit to the compound sort key because it isn’t first. Note that the order of your sort keys matters.

You would then select date and region as your compound sort keys. Returning to our revenue table example - say your organization is divided into geographical regions and most of your queries filter on the date and the region. If you have a table where there is a pattern of more than one field that is typically filtered on, you should select a compound sort key with those fields. If you make this field your sort key and distribution key, you will be greatly optimizing queries that have that join. With this sort key in place, the database will know which blocks of data to process to get your result and not have to scan the entire table for the entries.Īnother good candidate for a sort key is a field that is frequently joined on.

In those cases, the field with the date is a great candidate for a sort key. Say a large number of queries against it are interested in just this week’s (or this month’s / this year’s) revenue numbers. How to Select a Sort KeyĪ common case is a table that you typically query for specific date ranges. While the mechanics are different, conceptually they serve the same purpose - to let the database know where certain data exists so it doesn’t have to scan the entire table for it. If you have five years of sales data and only need one week, Amazon Redshift will only have to scan less than half a percent of your table - a huge savings!Ī sort key is analogous to an index in a traditional database. So if your data is sorted by purchase time and your query asks for the revenue made in the last week, the query planner can quickly review the metadata of each block and see that all the records it needs to process are together in the most recent block (or blocks), saving it from having to process all the blocks to find the sales of interest. If you have a table of sales and you select the purchase time as the sort key, the data will be ordered from oldest to newest purchase.Īmazon Redshift stores your data in 1MB blocks and for each block it keeps metadata about the minimum and maximum values of your data. What are Sort Keys?Ī sort key is a field in your table that determines the order in which the data is physically stored in the database.

You can read about more techniques for optimizing Amazon Redshift performance. This tutorial will explain how to select appropriate sort keys. One key step towards tuning your Amazon Redshift database is carefully selecting sort keys to optimize your queries.

0 Comments

Redshift sortkey

Leave a Reply.

Author

Archives

Categories