Implement python bindings for azure-sqldb-spark connector
The azure-sqldb-spark Spark connector (https://github.com/Azure/azure-sqldb-spark) provides support for Spark on Scala, but does not currently provide Python bindings.
Python-based Spark applications can still connect to MSSQL/Azure SQL databases using a JDBC connection, but this approach does not support bulk-inserts and is therefore quite slow for persisting large Spark dataframes to MSSQL.
It would be useful if PySpark applications could take advantage of the bulk insert capabilities provided by the azure-sqldb-spark scala package. From the git repo readme: "Comparing to the built-in Spark connector, this connector provides the ability to bulk insert data into SQL databases. It can outperform row by row insertion with 10x to 20x faster performance."
This is definitely needed. The JDBC connection method (see https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/sql-databases) only supports pushdown of SELECT statements.
This is a major headache, as it's currently not possible for a pyspark application to execute stored procedures or perform DDL operations without installing pyodbc and its associated drivers, and performing bulk inserts is essentially impossible.
Arvind Ravish commented
This is becoming an ask from many customers. Can the engineering team prioritize it?