How can we improve Microsoft Azure Data Lake?

Excel Extractor

Extractor that pulls from Excel Worksheets in a Workbook!

48 votes
Sign in
(thinking…)
Sign in with: Microsoft
Signed in as (Sign out)

We’ll send you updates on this idea

Jayme Edwards shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

7 comments

Sign in
(thinking…)
Sign in with: Microsoft
Signed in as (Sign out)
Submitting...
  • Nelson Williams commented  ·   ·  Flag as inappropriate

    In using the Document.Format.xml object, I am running into out of system memory errors on particularly large files. I have a spreadsheet that is over 260K rows. Is there a possible solution in dealing with these errors?

  • Nelson Williams commented  ·   ·  Flag as inappropriate

    Aaaaaaand...please disregard previous e-mail.

    It can be added through NuGet Package Manager.

    PM>Install-Package DocumentFormat.OpenXml -Version 2.8.1

  • Nelson Williams commented  ·   ·  Flag as inappropriate

    Is there a chance we can get the class files for the DocumentFormat.OpenXML.xml or at least the compiled .dll? The extractor won't work without it.

  • Kory Skistad commented  ·   ·  Flag as inappropriate

    I've worked in the financial/mortgage industry for nearly 20 years and if there is one constant- it's Excel. Moving from DTS to SSIS and to various other ETL technologies has always presented challenges to bringing Excel data into databases in order to make "desktop" data into "enterprise" data. It still baffles me why this continues to persist as a challenge for Microsoft to address. Why are we still using the Jet driver? Excel has a very logical hierarchy- Workbook->Sheet->Range. That should be enough to create a driver that can navigate this hierarchy and allow tools like SSIS, USQL, and any other MS product to target the area we need to pull data out of an Excel document. Named ranges make it even more intuitive.

    I started using Excel at version 3 back around 1991. I worked for a bank at the time. Now almost 30 years later I am still trying to find a simple way to deal with this data. Face it Microsoft- Excel is not going away... let's make a killer driver that can be used with your other tools and technologies to tap this valuable resource.

  • Michael Rys commented  ·   ·  Flag as inappropriate

    Thanks Jayme

    Note that U-SQL can read most CSV and TSV files that are generated by Excel (without header and no CR/LF in content). XLSX files are harder to support: They are a compressed archive of XML files, so it makes it rather difficult to give you good performing processing.

    We will look into it though if there are enough votes.

Feedback and Knowledge Base