Enable MapReduce over BOTH Table Storage and Blob Storage with no heavy transaction cost.
Unify MapReduce efforts ( LINQ to HPC (Dryad), Project Daytona, Excel DataScope, HortonWorks Hadoop Partnership & PowerView, etc. ) and expose through a single async API that works against data in table storage and blob storage (and trinity graph storage) that is affordable and scalable enough to use for near real-time results over data that is constantly appended to.
Let's get DryadLINQ running on top of Table Storage and leapfrog other cloud storage systems.
please see the post here for an initial take on this:
There is a version of this available here:
Simon Elliston Ball commented
You can certainly connect Hive to Table Storage on HDInsight at the moment with a custom JAR. I wrote a Hive InputFormat which lets you query Azure Tables from Hive.
(see http://www.simonellistonball.com/technology/hadoop-hive-inputformat-azure-tables/ for details and https://github.com/simonellistonball/hive-azuretables for code and example)
Ideally having both low level access (map reduce) and higher level abstractions would be the best of both worlds. This would allow custom development at the lower level (integration into applications) and ad-hoc analysis at the high level.