Compress stored data
Please compress data in place for more efficient use of storage.
For example, I recently exported the entire dataset from a collection and gathered the following stats:
DocumentDB Data Storage: 627 MiB (including hidden fields and other overhead I assume, but not Indexes)
Exported JSON: 461.72 MiB
Compressed JSON: 47MiB
So potentially a 10x saving in data storage size.
We’re evaluating more compressed storage formats. Meanwhile, please consider using a client-side compression library for encoding fields/JSON fragments that are not used for querying.
Hi CosmosDB team - we're currently looking at the same issue. Any updates on compression? This is quite a deal breaker in a plain-text JSON world with repetitiveness in text :-)
Hello , we also have similar requirement...currently our data in Azure Cosmos DB is taking huge space and looking for compression techniques on server side . Also trying to check if Redis provides good compression techniques.
Let us know if there is any update.
I ported 40GB data from MongoDB (WiredTiger) into Cosmos DB only to find that the data is using 170GB in Cosmos DB!!! Wow, this goes to 21 partitions which forces a huge RU/s required for performance since the RU/s are distributed to each partition. Can you give an example for using client-side compression library?
Ian Bennett commented
Thanks for taking a look at this.
In my case I am using Cosmos DB to ingest near-time data. As the best compression will likely be achieved over similar data from the same feed but taken over time, compression before load would not be suitable. I need query access to everything, anyway.
I could imagine that something index driven (akin to column based storage in RDBMS) would be effective.