June 2, 2023
30
mins
It’s been a pretty time for Azure announcements following the recent MS Build conference! Almost hidden away behind the big ticket Open AI announcements were a few new features and improvements for CosmosDB including Heirarchical Partition Keys.
In order for me to get a better understanding of Cosmos Partitioning in general, I decided to create some code samples so that I could see partitioning in action.
You can find the full code for these samples on GitHub here: https://github.com/brendan-nobadthing/cosmos-partitions
When you create a cosmos container, the data is written to one or more physical partitions. As a consumer of Cosmos services, you do not get direct control on how many partitions will be created. Cosmos will create partitions as needed as your data set grows. The Partition key path is specified when a container is created and tells cosmos which field to use when deciding which partition to route records to. The set of records for a single specific partition key value is also referred to as a logical partition and each physical partition will usually contain multiple logical partitions . The only guarantee is that every record with the same partition key value will be routed to the same partition. If your dataset is small, it might not get physically partitioned at all, regardless of how the partition key is configured.
One of the biggest impacts of partitions is on queries. On a small dataset, all your data is in one physical partition with one index so the performance and cost of all queries is simple. If you have many partitions, your query cost and performance will vary greatly depending upon whether the query filters on the partition key value - and can therefore be routed to a single partition to execute.
Cross-Partition queries occur when you have many physical partitions and a query that is not filtered by partition key. Cosmos must run the query on each partition and each partitioned query execution incurs execution costs. If not managed this can seriously impact operational costs - and will continue to get worse as the data set grows.
So lets see the above in action by uploading large datasets into cosmos with different partition settings.
To get started I needed large amounts of test data and for that I used the Bogus nuget package. Using Bogus you can procedurally generate fake data - and if you set the appropriate locale, those fake details such as names and addresses “look right“ for that locale.
Here's an example record:
To create a somewhat realistic use case for my fake data, I came up with a the following data schema to represent Users from Organisations interacting with a Product website.
UserProductEvent is the event record to be written to CosmosDB and captures an instance of a user doing something with a product on our non-existant website, such as viewing a product or adding a product to a shopping cart.
Here’s a snippet of Bogus Code to used generate data.
View the full data generation class here: https://github.com/brendan-nobadthing/cosmos-partitions/blob/main/CosmosPartitions.Console/Data.cs
To make the main cosmos records bigger, and hopefully trigger partitioning sooner, I used Bogus' Lorem Ipsum generator to stuff a ‘filler’ field with lots of generated text for each record.
With the ability to generate test data complete, I could then look into creating cosmos containers with different partitioning settings.
As per this tutorial: Bulk import data to Azure Cosmos DB for NoSQL account by using the .NET SDK , the latest Cosmos package for .NET makes bulk operations easy to implement - set AllowBulkExecution=true on the singleton CosmosClient object.
Quick word of warning - running these scripts will incur Azure costs! I have an MSDN subscription with $230AUD per month of 'free' azure credits and creating this blog post used up over $100AUD of that. Most importantly, if you want to run this for yourself, make sure you delete the test data when finished.
Before running, get a Cosmos DB account setup (I used terraform for this) and add the connection string to your app config as CosmosConnectionString.
dotnet run -- CreateIdKeyedColletion
This creates a container with 250K CustomerProductEvent records with the item id as the partition key: /id
Using the item id can often be a good choice as explained here: Partitioning and horizontal scaling - Azure Cosmos DB
With this partitioning strategy, there is no risk of hitting the 20GB - per logical partition limit (the maximum amount of data that can share a partition key value.)
The downside of this approach is that once you have multiple physical partitions, all queries are cross-partition queries except for retriving records directly by id. One possible workaround for this could be to use another service such as Azure Cognitive Search to index and query your dataset.
dotnet run -- CreateUserIdKeyedColletion
Create a container with 250K CustomerProductEvent records with /UserId as the partition key
dotnet run -- CreateOrgIdKeyedColletion
Create a container with 250K CustomerProductEvent records with /OrgId as the partition key
dotnet run -- CreateHiearchicalKeyedColletion
Create a container with 250K CustomerProductEvent records and a Hierarchical pratition key: OrgId then UserId.
To speed up tthe inserts, containers were created with indexing disabled. Before running any queries, indexes were created with the following config:
The easiest method I found for viewing information was via the Metrics(classic) page in the Azure portal.
This shows that the collection with partitionKeyPath=/id is split between 2 physical partitions of just over 8GB each. When I finished up yesterday, everything was in one big 16GB partition - so this shows me that partitions can be configured while you write or reconfigured via a background process at any time.
Other collections were partitioned differently:
From this we can conclude that the process of assiging data to physical partitions is a black box within cosmos that we get very little control over! For the herarchical and /OrgId collecitons I had to run the importer twice before I saw any partition splitting.
Now that we’ve got some partitioned colllections, lets take a look at the impact on queries.
The Azure Data Explorer give a good UI for running queries and inspecting query stats. As you’d expect, querying a single record by id is fast and cheap.
The value I’m going to focus onmost is the Request Charge - which gives you a direct indication of the cost incurred to run this query. At the bottom you see a link to download a csv containing per-partition query metrics and from here I could confirm that only one partition was involved.
Now let’s query the same container by UserId:
We can see that this query was more expensive and the per-partition metrics show that both partitions were involved.
Next, lets see the impact of a similar query on the container with /UserId partition key.
So we can see that this query is slightly cheaper and the per-partition query metrics confirm that only 1 partition was queried. This is particularly important for this container as it is split into 9 partitions.
To see the impact of those 9 partitions we'll cast the net a little wider and query for a whole organisation.
This is a significant cost jump. Note that the RU figure includes all costs required to service the request and that there are many more documents matched by this query. I introduced the “Select top 25“ to mitigate this a little but we’re still not exactly comparing apples to apples. The Per-Partition query metrics confirm that all nine partitions were involed to run this query.
Lets compare Query 04 with an almost identical query - but this time the container was partitioned by OrgId
This shows the clear difference between running a similar query Cross-Partition vs a single partition. running the query across 9 partitions cost almost exactly 9x more
Remember from our schema that each User is associated with one Organisation so if we know the User, we probably already know the OrgId - so just partitioning by OrgId will let us write effecient single-partition queries for both OrgId and UserId scopes. Partitioning via this high-level object has a serious downside. If the total data for a single OrgId reaches 20GB, we hit the logical partition limit and can’t write any more records.
This point is where heirarchical partition keys come to the rescue! We can partition by OrgId first, and then further subdivide by UserId if required. Hieirarchical partition keys can support up to three levels of depth. With Heirarchical Keys, we can query by OrgId, UserId or both and the correct partition or subset of partitions will be selected. We also no longer have the risk of maxing out a logical partition when 20GB gets stored against a single OrgId.
Both the above queries against a container with a herarchical partition key are executed against a single partition.
To even see the effects of partitioning strategies, you need to be working with non-trivial quanties of data. But if your application does start running cross-partition queries we can see that these is a significant cost impact. Also, because you have no direct control over when a container gets partitioned, a production application that has not considered the impact of partitoning is at risk of experiencing a sudden jump in query executions costs after partitioning occurs. In my limited experience I saw one of my containers get repartitioned overnight - that’s potentisally something that could give you a nasty shock on a monday morning!
Heirarchical Partitions Keys provide a useful mechanism for mitigating these concers for some workloads - especially workloads where data can be divided into large, independent silos such as multi-tenant applications.
I took a look into monitoring and alerting here:
Create alerts for Azure Cosmos DB using Azure Monitor
Create alerts to monitor if storage for a logical partition key is approaching 20 GB
And while I found metrics and alerts for throughput and logical partition utilisation, I have not yet found a useful metric to alert if your physical partition count goes up. Probably the best approach would be to monitor and alert on increases in cost
In all the above queries we’ve been looking at fetching data via the Organisation->User relationship.
What if we wanted to aggregate details from opposite end of our schema such as ProductId? As it stands, this could be an expensive report to execute!
Well, that’s where Materialised Views might be able to help us - but I’ll leave that for the next blog! Maybe next month when I get some more Azure credits to burn!
This exercise certainly helped my understanding of partitioning. Any comments or suggestions, please contribute via the github project:
GitHub - brendan-nobadthing/cosmos-partitions: sample code to test cosmosDB partitioning behaviour
We’ve got ambitious plans to be the best Microsoft solutions company in Australia and New Zealand.