Support secondary Indexes
Need to be able to sort on something other than the rowkey
Restricting us to only the row key is very limiting. We currently have to maintain our own secondary indexes for other columns and this is very cumbersome, it should be a feature of the platform.
Thank you for the feedback on this feature, and we apologize for the lack of updates. We have not gotten to this feature due to the focus on delivering Disaster Recovery features such as Read-Access Geo-Redundant Storage, IaaS Disks (Page Blobs), along with other features. Our intent is to provide this feature at some point, but we do not have a timeline for when it will be delivered.
Hans Olav Stjernholm commented
Aaron, DocumentDB is not a replacement, it is a totally different beast and not a key-value store such as Table Storage. Also DocumentDB is not portable (yet at least) and gives you a lock-in to the public Azure cloud. Table Storage is portable through Azure Stack.
I dream and hope MS is picking up Table Storage again and gives us Premium (SSD-backed) and with secondary indexes. There's really no reason on this good earth they shouldn't do this!
Aaron Lawrence commented
I would agree with Mike Olsen that it is a BAD IDEA to build a new application on Azure Table Storage, as Microsoft appear to be deprecating it (although it's not completely clear what they regard as a replacement, DocumentDB seems most likely)
Hans Olav Stjernholm commented
Regarding table storage, I've also noticed that in the new portal when you look at diagnostics config in WebApps, there's only File Storage and Blob Storage as options, while in the classic portal you also have Table Storage as an option.
Even more, I've noticed that the internal logging in WebJobs via WebJobs Dashboard seems to have moved from using Table Storage to using Blob Storage - but that may be for any number of reasons, of course.
Anyways, I wouldn't recommend table storage as a long-term option for anything new for these reasons. And of course it's hard to use without any secondary indexing.
Mike Olson commented
The writing's on the wall folks. It's been what, 3 years since this was supposedly added to their roadmap? When was the last time Microsoft made *any* real improvements to Table Storage, besides putting a ton of breaking changes in the SDK that required hundreds of hours of dev to support?
Azure Table Storage is old news and is being ushered unceremoniously out the door. It's time for us to give in and move to DocumentDB, Azure SQL, or one of the dozen new storage services that Microsoft is actually putting effort into. Or just do what I've been doing and just start storing your data in flat files on Blob Storage, since for a lot of cases that's easier to code, more performant and more flexible.
At least Microsoft isn't slowly moving us away from Cloud Services... oh wait.
Any comments from Microsoft admins ?
Aaron Lawrence commented
It's difficult to use ATS as it stands for anything real. We built our own indices in SQL, which of course introduces exciting new consistency issues. One index is just too little - basically that gives you the ability to move data in and out by an identifier, but not do anything else with it.
We absolutely need this! Due to the high cost and throughput limitations of DocumentDB it is not a viable alternative to having secondary indexes in table storage.
Let me give you an example of why this is critical. I have about 30 GB of event logs. I want to store these in ATS. The problem is that I need to be able to query them along multiple different axes - the company they belong to, the individual user they belong to, the date/time they came in, the event name, and so forth. And eventually, probably lots more. My current solution is to store the data "n" times, each one with a different partition-key/row-key schema, to enable querying along that particular dimension. So far so good - it's not a problem to write the data "n" times, given how well ATS performs, and how cheap it is.
But the problem is maintenance. Right now, with about 30 GB of data, if I come up with a new dimension that I need to support, it takes me at least a day to write the scripts to export and then re-import the data to the new format (because it all has to be parallelized, and needs to track state for each portion I'm running in parallel, or it would take weeks); and even if I don't ***** up on the (very complex) import/export scripts somehow, it then takes at least 1-2 days to actually get all the data over to the new table.
And that's simply not a scalable model. What happens when I don't have 30 GB of data I need to pivot, but 300 GB? Or 30 TB? The pivot scripts will take weeks to run, and maintaining any shred of consistency through the process gets very, very complicated.
I get that this is a complex problem to solve. But it doesn't make it any less complicated by telling every ATS user to come up with their own (almost certainly unoptimal and buggy) solution.
With the DocumentDB and Search services now in play, does Azure still plan to add secondary index support to Table Storage?
Yet another 6 months passed since we last heard from you MS
You can manage this by hand, I do it today and built some reusable code to maintain this. However, changing or adding indexes in the future is a pain, and with the lack of cross table transactions, I have had instances where an index did not get created. Which means if you want/need true integrity, you need to have other process/patterns in place.
So while, yeah you can do this today, other systems, even Amazon's does this now for you. I would love to not have to deal with this.
I wonder the same thing...at first glance this feature seems like an essential key ingredient, but on further examination the 'feature' can be implemented as part of the table design.
Am I right in saying that the only real benefit this feature would bring would be automatic guarantees of integrity across multiple entries? So the programmer would not need to rely on azure queues for example?
I would agree that this should be a high priority feature...but largely for non-technical reasons. Am I correct in understanding that the real problem here is a development community entrenched in the relational mindset, and who need features like this to drive adoption and improve the reputation of the platform?
John Leidegren commented
Does this even make sense? I thought the point was that you'd build out any secondary indexes much in the sense that you have to think of it as a asynchronous process that eventually indexes all your data. What else could you possibly be doing at this scale?
What Anders Madsen wrote in his comment makes a lot of sense to me.
Anders Madsen commented
Price kept in mind, I think that ATS functions and performs very well, and there's no reason why you can't index the data yourself.
I've used a radical approach, where I create Btree indexes, saved in blobs. I then use cheap small workerroles that executes SQL-like queries, and it performs very well.
Only thing to keep in mind, is that this approach creates stale indexes, so it primarily targets applications that access the table storage in an asynchronous way, e.g. reading and writing at different times of the day.
Shaun Tonstad commented
@Jeff Windows Azure Storage / Tables is something of a joke among savy cloud developers. Four years have past since this original request was made. There is no sense of urgency from MS to make this service competitive with other solutions (i.e. AWS SimpleDb). It's clear that MS lacks the expertise to improve it or the political desire given dominance of SQL Azure.
Ben Adams commented
Index Table Pattern http://msdn.microsoft.com/en-us/library/dn589791.aspx
At this point, ATS is so behind the ball compared to all the other no-sql alternatives out there that it almost seems like a waste to continue with it. Wouldn't it make better sense just to make something like RavenDB or MongoDB a first-class citizen and move forward with that?
@John Wyler, I've used Azure Tables for years now and it performs great if what you need is a key, value or key,attribute store. Yes, it will be great to have secondary index, but Azure Tables is significantly cheaper than DynamoDB, even more so that they just announced that you get 1,000,000 transactions for $0.05, and that is any size of transaction (up to 1MB entity). I can afford Azure Tables, but I can't afford DynamoDB for my business.
John Wyler commented
There is no way I'm using ATS without its support for global and local secondary indices. It appears neither SQL Azure nor Table Storage is suitable for big data applications. DynamoDB it is!
Aaron Bird commented
"Announced at PDC that it’s coming." - 4 years ago!