Matthew

My feedback

  1. 2,328 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    57 comments  ·  Data Factory  ·  Flag idea as inappropriate…  ·  Admin →
    Matthew commented  · 

    Hey nick, the last time I checked (which was a while ago so maybe something changed) you could integrate with public git sources. Some enterprises don’t feel good storing source code in the cloud, and are using on premise solutions. For example, customers using on premise TFS with integrated Git. To my understanding, there was not a way to have the UI write back to this kind of source code repository, which would be possible for UI development with this feature being made available. With that said, we stopped waiting and just learned to write the pipelines with arm template code.

    Matthew commented  · 

    https://www.purplefrogsystems.com/paul/2017/09/whats-new-in-azure-data-factory-version-2-adfv2/ - see screenshots on this blog of private preview for ADFv2 showing gui authoring tool. Apparently it works together with git repos. Hoping for this to be released soon.

    Matthew supported this idea  · 
  2. 180 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    8 comments  ·  Storage » Blobs  ·  Flag idea as inappropriate…  ·  Admin →
    Matthew supported this idea  · 
    Matthew commented  · 

    Please can we see a microsoft response on this. Not having this option is dangerous for enterprise rollout of blob storage.

  3. 2 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  HDInsight » Security  ·  Flag idea as inappropriate…  ·  Admin →
    Matthew shared this idea  · 
  4. 1,176 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    46 comments  ·  HDInsight » Platform  ·  Flag idea as inappropriate…  ·  Admin →

    [Update] Thanks for your continued feedback on this capability! Rest assured that we are tracking this request closely along with several other platform capabilities our customers have requested. In the meanwhile, you can leverage cluster scaling capability to adjust HDInsight cluster size according to your varying compute needs. Azure Data Factory is another option you can explore for scheduling jobs with automatic creation and deletion of clusters: https://azure.microsoft.com/en-us/documentation/articles/data-factory-data-transformation-activities/

    Adnan Ijaz
    Program Manager
    Microsoft Azure HDInsight

    Matthew supported this idea  · 
  5. 25 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Data Factory  ·  Flag idea as inappropriate…  ·  Admin →
    Matthew commented  · 

    Hi, thought I would share a solution we recently came up with here having found the same challenges. We have found that it is in fact possible to link HDInsight On Demand with Azure Data Lake directly using Hive transformations.

    First, check out this Cloudera article, indicating from cloudera you can reference Data Lake Store directly:

    https://azure.microsoft.com/en-us/blog/cloudera-adls/

    Note that they achieve this by adding service principal credentials to the core-site.xml as follows:

    <property>
    <name>dfs.adls.oauth2.client.id</name>
    <value>Application ID</value>
    </property>
    <property>
    <name>dfs.adls.oauth2.credential</name>
    <value>Authentication Key</value>
    </property>
    <property>
    <name>dfs.adls.oauth2.refresh.url</name>
    <value>https://login.microsoftonline.com/<Tenant ID>/oauth2/token</value>
    </property>
    <property>
    <name>dfs.adls.oauth2.access.token.provider.type</name>
    <value>ClientCredential</value>
    </property>

    If you have tried converting to data lake store from blob storage with HDI On Demand, you should be familiar with the error that references "dfs.adls.oauth2". We have found that the Service Principal ID and associated key can be injected into core-site.xml on the HDI On Demand cluster. This is accomplished through the following code addition into your HDInsight On Demand cluster, under the coreConfiguration section:

    "osType": "linux", /* must be Linux for Hadoop 3.5 or greater */
    "version": "3.6", /* we must use Hadoop 3.5 or greater to support encrypted storage*/
    "clusterSize": 1, /* size of the HDI cluster – bigger faster but costs more*/
    "coreConfiguration": { /*config for Service Principal to access Data Lake Store */
    "fs.adl.oauth2.access.token.provider.type": "ClientCredential",
    "fs.adl.oauth2.client.id": "[parameters('ADLSServicePrincipalID')]",
    "fs.adl.oauth2.credential": "[parameters('ADLSServicePrincipalKey')]",
    "fs.adl.oauth2.refresh.url": "https://login.microsoftonline.com/[parameters('tenantID')]/oauth2/token&quot;
    },

    You're welcome :-)

    Matthew C

Feedback and Knowledge Base