Compress stored data
Please compress data in place for more efficient use of storage.
For example, I recently exported the entire dataset from a collection and gathered the following stats:
DocumentDB Data Storage: 627 MiB (including hidden fields and other overhead I assume, but not Indexes)
Exported JSON: 461.72 MiB
Compressed JSON: 47MiB
So potentially a 10x saving in data storage size.
We’re evaluating more compressed storage formats. Meanwhile, please consider using a client-side compression library for encoding fields/JSON fragments that are not used for querying.
Hi Cosmos Support,
Does cosmosdb provide time bucketing on IoT device telemtery data? I am porting timescaledb compressed data from postgresql to cosmosdb and also searching for time compression in cosmosdb, e.g. aggregating timestamp data on time buckets like 15sec,30secs,1min,30min,10minutes. Is there any solution to this currently?
In addition, we need staggered pricing on storage. Right now, regardless of how much you store, Azure charges 25 cents per gig. Steep pricing on RU/s is infuriating as it is, and we're getting doubly screwed by Microsoft because of bad pricing on storage.
Anargyros Tomaras commented
We need this also. The 50GB per shard limitation is severely impacting our ability to scale. To give you an example we could have easily supported a tenant with 160K users in a single 1mil RU collection but Because they would require 50TB of space we can only fit 17K of that tenants users. We need either 10x storage savings (using compression for example) or shards with 10x more storage i.e 500GB cosmos shards.
Anderson, Colin D commented
TimescalleDB supports transparent compression, and it achieves fantastic savings, even on JSON data. Would love to see this added to CosmosDB too
Hi CosmosDB team - we're currently looking at the same issue. Any updates on compression? This is quite a deal breaker in a plain-text JSON world with repetitiveness in text :-)
Hello , we also have similar requirement...currently our data in Azure Cosmos DB is taking huge space and looking for compression techniques on server side . Also trying to check if Redis provides good compression techniques.
Let us know if there is any update.
I ported 40GB data from MongoDB (WiredTiger) into Cosmos DB only to find that the data is using 170GB in Cosmos DB!!! Wow, this goes to 21 partitions which forces a huge RU/s required for performance since the RU/s are distributed to each partition. Can you give an example for using client-side compression library?
Ian Bennett commented
Thanks for taking a look at this.
In my case I am using Cosmos DB to ingest near-time data. As the best compression will likely be achieved over similar data from the same feed but taken over time, compression before load would not be suitable. I need query access to everything, anyway.
I could imagine that something index driven (akin to column based storage in RDBMS) would be effective.