Other distributions of Zeppelin notebook include %pyspark interpreter. The one on HDinsight has only %spark, %sql, %dep, %md. Would be really nice to have %pyspark.24 votes
Thanks for the feedback! This is a common request and we are jointly investigating with Azure Storage team in bringing this support to HDInsight.
Microsoft Azure HDInsight
A previously working Jupyter Notebook failes with the exception "Java gateway process exited before sending the driver its port number".
The pyspark source contains at that point the comment "In Windows, ensure the Java child processes do not linger after Python has exited.".
Even restarting the HDInsight instance doesn't fixes that issue.4 votes
Currently if you Submit query1.hql it pops up the Hive Job Summary pane which I can monitor to see if that query succeeded and to see the results.
If in the meantime I Submit query2.hql, it replaces query1 in the Hive Job Summary pane. As far as I can see, there's no way to get back to query1 job summary.
I wish the Hive Job Summary pane were attached to the bottom of the HQL window like it is with most other SQL query tools in Visual Studio. Then we could have one results pane per .hql file.4 votes
Allow flume to stream data directly to HDInsight5 votes
In our setup we're dealing with data with a complex schemas, so we're using a custom build json SerDe downloaded from here https://github.com/rcongiu/Hive-JSON-Serde in relation with HIVE. Each time HDinsight is updated to a newer version we run into issues related to this SerDe. It could be nice if MS could provide a SerDe that was tested and supported when a new HDinsight distribution is released.54 votes
- Don't see your idea?