SQL Server - Big Data Clusters
Summary:
-
Add Delta Lake OSS to Spark API
From https://github.com/delta-io/delta
Add functionality to Spark engines to better handle Upserts/Merges, data versioning/CDC functionality inside BDC cluster. Audit history is a plus.
15 votes -
Add PolyBase feature (CREATE EXTERNAL TABLE AS SELECT) to SQL Server 2019 and Big Data Clusters
Please add the PolyBase feature (CREATE EXTERNAL TABLE AS SELECT) which is only available on Azure Synapse Analytics (SQLDW) and PDW right now.
We require this on SQL 2019 and Big Data Clusters for creating a data hub/catalog catalog from curated views that are federated across many SQL Server platforms or Big Data Clusters platforms. Thus extending the data virtualization capability to views for SQL Server as source and not just tables.
11 votes -
10 votes
-
Backup of storage and data pool
Ability to backup entire Cluster, Storage and Data pool is good to have.
8 votes -
Volume Management for SQL Server Big Data Clusters on AKS
Currently it is possible to customise storage volumes sizes and storage classes (AKS,Kubernetes) per pool (master, data, storage pool)
But right now only 1 (one) volume per pod is available. It would be a game changer feature if you could have a volume management feature for Big Data Clusters to add, resize and change storage classes for multiple volumes per pod for all pool types.https://docs.microsoft.com/en-us/azure/aks/concepts-storage#storage-classes
8 votes -
Add the ability to specify labels on the PVCs of each Pool
It would be nice to have the ability to add labels on the PVC level of each Pool like you specify the storage class. on the deployment json files.
These labels can be further used afterwards inside K8s for volume placement strategies.7 votes -
Enhance execution plans for queries against data pool
currently queries against data pool tables only show as "remote query" (which makes sense as it's an external table) but for data pool, it should be able to drill down to the next level and analyse the query happening on the data pool
6 votes -
BDC Bulk Data Loading utility
Many customers coming from data warehouse appliances are used to have bulk data loading utility. To speed migration to BDC platform it would be nice feature to have native BDC utility included instead of having to write our own.
6 votes -
support sharding
It will be good to add support for sharding.
6 votes -
expose dfs.datanode.provided.volume.readthrough=false in bdc.json
This parameter can be set in the hdfs-site.xml file within the storage pool pods (I beleive), its useful for high performance on-premises S3 storage that does not require caching in HDFS. My ask is that this parameter be made something that can be set inside the bdc.json file.
5 votes -
Kubernetes cluster validation
I would like to suggest that a cluster validation step is added to azdata dc create, such that a number of "Pre-flight checks" are performed prior to big data cluster deployment, to include things such as:
- Checking that the cluster nodes are running supported versions of Linux
- Checking that Kubernetes is at-least version 1.13
- Check that a persistent volume can be created
- Check that if each worker node has sufficient space to store the bdc images
- For ad integration, check that any bdcuser, the bdc admin account, bdcusers group and bdadmins group all exist . . . and that the…
5 votes -
Run BDC Notebook Code Using ADF Directly (Automated Way)
SQL 2019 (BDC) now supports Jupyter Notebook like Notebook. Can we write code and run them in an automated way using ADF (Azure Data Factory) like we can do it in Azure Data Bricks? If that support is not available now, HOW SOON can we get the feature?
5 votes -
More access availabilty to Ceph-based cluster/pools
Ceph is supported with K8s via CSI (Rados Block Device - RBD) as a storage class (and persistent volume/claims) for containers, but would be nice to see direct object/block access (beyond the general S3 rados gateway options) from the Storage/Data Pool perspective
5 votes -
Job Scheduling / Orchestration in BDC
Are there any plans to introduce orchestration in BDC? I would like to be able to schedule Spark jobs for example
4 votes -
4 votes
-
Add support for SQL SCOM Management Pack to monitor at least Master Instance
Would be important to add to the SQL SCOM Management Pack the capability to monitor at least the master instance. Adding to the SQL SCOM MP guide the steps how to install and configure SCOM to monitor the Master Instance.
4 votes -
scale
Being able to take benefit of containers cluster to dynamically scale the number of container for each role..
4 votes -
BDC with AD integration
Currently only one BDC is allowed on AD domain, can we get multiple BDC on same AD domain.
4 votes -
Problem with enabling SQL Agent on multi-node BDC with HA
Currently, if you attempt to enable the SQL agent service on a multi-node BDC (CU4, with HA configured), the local instances are enabled, but the HA listener isn't. You get an error pop up when attempting to log into the HA listener instance (port 31433) using SSMS, see file attached for screenshot.
4 votesThank you for raising this issue. The team will investigate this and get back to you.
-
4 votes
- Don't see your idea?