How can we improve HDInsight?

Data Integration in HDInsight

Please bundle a Data Integration tool either from Microsoft or from HortonWorks as part of HDInsight.

36 votes
Vote
Sign in
Check!
(thinking…)
Reset
or sign in with
  • facebook
  • google
    Password icon
    I agree to the terms of service
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    Ravi BandaruRavi Bandaru shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →
    Anonymous shared a merged idea: SSIS like UI to author MapReduce jobs  ·   · 

    4 comments

    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      I agree to the terms of service
      Signed in as (Sign out)
      Submitting...
      • Jamie ThomsonJamie Thomson commented  ·   ·  Flag as inappropriate

        Hi Matt,
        That (i.e. "a set of SSIS tasks for orchestrating HDINSIGHT jobs") would be interesting to me. 7 months later, has there been any movement on this?

      • Jamie ThomsonJamie Thomson commented  ·   ·  Flag as inappropriate

        SSIS in its current form cannot (as far as I know) access files in Azure BLOB storage and that would seem, to me, to be quite a barrier to using it with HDInsight.
        Also, SSIS's architecture is all about running ETL on a single box and scaling it up where necessary. That paradigm works well but I don't think its ideally suited to the cloud (and particularly HDInsight) where the mantra is scale-out rather than scale-up.
        Feels to me like you need a brand-new, cloud-based, scale-out, ETL tool (I've got a few ideas about such a thing here: http://sqlblog.com/blogs/jamie_thomson/archive/2013/02/12/what-would-a-cloud-based-etl-tool-look-like.aspx ). I hope there are some people secreted away in Redmond somewhere working on such a thing!

        Bottom-line, yes, there needs to be a data integration (aka ETL) tool available with HDInsight. 3 votes from me.

      • Anonymous commented  ·   ·  Flag as inappropriate

        Sure, any ETL tool that can handle the volume, velocity and variability of a Big Data workload will do.

      Feedback and Knowledge Base