How can we improve Microsoft Azure Data Lake?

Support Avro in Azure Data Lake Analytics

87 votes
Sign in
Check!
(thinking…)
Reset
or sign in with
  • facebook
  • google
    Password icon
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Wesley Backelant shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

    15 comments

    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      Signed in as (Sign out)
      Submitting...
      • Anonymous commented  ·   ·  Flag as inappropriate

        Is there an update on when support for the file will be available as a native part of U-SQL

      • Pete commented  ·   ·  Flag as inappropriate

        Not at all sure why there is not a built in solution for analysing Avro files in a data lake. Avro files are the only file type you can use when capturing events directly to a data lake, so it seems crazy that you can't then query them without using third party solutions.
        Please work on a solution for this.

      • Tony Thul commented  ·   ·  Flag as inappropriate

        Same as Mayo's comment below:

        We almost had a wow moment with event hub capture --> data lake --> data lake analytics. It fell apart on the data lake analytics side.

      • matt commented  ·   ·  Flag as inappropriate

        Add support for seekable stream. Currently with the Apache.Avro library for dealing with OCF files it relies on the ability to seek a stream.

        I have added a feature request to Apache as well to support non-seekable streams.
        https://issues.apache.org/jira/browse/AVRO-2098

        Hopefully the two features converge and we get options for both kinds of streams.

      • Mayo commented  ·   ·  Flag as inappropriate

        Using Data Lake Analytics to process event hub capture files (Avro) is a huge use case and right now it's a fairly awful experience on the Data Analytics side.

        There are multiple versions of the MS Avro libraries floating around (with different bugs e.g. seekable vs non seekable streams), none of them currently handle the empty avro file (header but no blocks) sent by event hub capture....it's a mess.

        We almost had a wow moment with event hub capture --> data lake --> data lake analytics. It fell apart on the data lake analytics side.

        Please implement this. Please.

      • Kiran Kolli commented  ·   ·  Flag as inappropriate

        We are considering Avro files instead of JSON since it holds the schema within it. It would be great if we have inbuilt support for Avro extraction and output from USQL.

      • Iain commented  ·   ·  Flag as inappropriate

        Glad to have the example, but even better to have this built in. Because it's now a native format of Azure (eg: Event Hubs Archive), you'd expect the integration between Azure products to be smooth and out-of-the-box.

      • Andrew Sears commented  ·   ·  Flag as inappropriate

        It would be very helpful to have an Avro JSON extractor to be compatible with Stream Analytics Archive format.

      Feedback and Knowledge Base