SQL Server - Big Data Clusters

Summary:


SQL Server Big Data Clusters is the multi-cloud, open data platform for analytics at any scale. Big Data Clusters (BDC) unites SQL Server with Apache Spark to deliver the best compute engines available for analytics in a single, easy to use deployment. With these engines, BDC is the ideal data platform for AI, ML, M/R, Streaming, BI, T-SQL, and Spark. Delivered as part of the SQL Server 2019 release, BDC is a cloud-native solution orchestrated by Kubernetes. Our mission is to accelerate, delight and empower our users as they quench their thirst for data driven insights.
 
More information about SQL Server Big Data Clusters is available in the documentation.

  • Hot ideas
  • Top ideas
  • New ideas
  • My feedback
  1. Add Delta Lake OSS to Spark API

    From https://github.com/delta-io/delta

    Add functionality to Spark engines to better handle Upserts/Merges, data versioning/CDC functionality inside BDC cluster. Audit history is a plus.

    7 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Apache Spark  ·  Flag idea as inappropriate…  ·  Admin →
  2. Volume Management for SQL Server Big Data Clusters on AKS

    Currently it is possible to customise storage volumes sizes and storage classes (AKS,Kubernetes) per pool (master, data, storage pool)
    But right now only 1 (one) volume per pod is available. It would be a game changer feature if you could have a volume management feature for Big Data Clusters to add, resize and change storage classes for multiple volumes per pod for all pool types.

    https://docs.microsoft.com/en-us/sql/big-data-cluster/concept-data-persistence?view=sql-server-ver15#customize-storage-configurations-for-each-pool

    https://docs.microsoft.com/en-us/azure/aks/concepts-storage#storage-classes

    6 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
  3. Add PolyBase feature (CREATE EXTERNAL TABLE AS SELECT) to SQL Server 2019 and Big Data Clusters

    Please add the PolyBase feature (CREATE EXTERNAL TABLE AS SELECT) which is only available on Azure Synapse Analytics (SQLDW) and PDW right now.

    We require this on SQL 2019 and Big Data Clusters for creating a data hub/catalog catalog from curated views that are federated across many SQL Server platforms or Big Data Clusters platforms. Thus extending the data virtualization capability to views for SQL Server as source and not just tables.

    5 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Data Virtualization  ·  Flag idea as inappropriate…  ·  Admin →
  4. BDC Bulk Data Loading utility

    Many customers coming from data warehouse appliances are used to have bulk data loading utility. To speed migration to BDC platform it would be nice feature to have native BDC utility included instead of having to write our own.

    5 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
  5. Run BDC Notebook Code Using ADF Directly (Automated Way)

    SQL 2019 (BDC) now supports Jupyter Notebook like Notebook. Can we write code and run them in an automated way using ADF (Azure Data Factory) like we can do it in Azure Data Bricks? If that support is not available now, HOW SOON can we get the feature?

    5 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Other  ·  Flag idea as inappropriate…  ·  Admin →
  6. More access availabilty to Ceph-based cluster/pools

    Ceph is supported with K8s via CSI (Rados Block Device - RBD) as a storage class (and persistent volume/claims) for containers, but would be nice to see direct object/block access (beyond the general S3 rados gateway options) from the Storage/Data Pool perspective

    5 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Storage Pool  ·  Flag idea as inappropriate…  ·  Admin →
  7. expose dfs.datanode.provided.volume.readthrough=false in bdc.json

    This parameter can be set in the hdfs-site.xml file within the storage pool pods (I beleive), its useful for high performance on-premises S3 storage that does not require caching in HDFS. My ask is that this parameter be made something that can be set inside the bdc.json file.

    4 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    started  ·  0 comments  ·  Storage Pool  ·  Flag idea as inappropriate…  ·  Admin →
  8. Kubernetes cluster validation

    I would like to suggest that a cluster validation step is added to azdata dc create, such that a number of "Pre-flight checks" are performed prior to big data cluster deployment, to include things such as:


    1. Checking that the cluster nodes are running supported versions of Linux

    2. Checking that Kubernetes is at-least version 1.13

    3. Check that a persistent volume can be created

    4. Check that if each worker node has sufficient space to store the bdc images

    5. For ad integration, check that any bdcuser, the bdc admin account, bdcusers group and bdadmins group all exist . . . and that the…
    4 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Kubernetes  ·  Flag idea as inappropriate…  ·  Admin →
  9. Enhance execution plans for queries against data pool

    currently queries against data pool tables only show as "remote query" (which makes sense as it's an external table) but for data pool, it should be able to drill down to the next level and analyse the query happening on the data pool

    4 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Data Pool  ·  Flag idea as inappropriate…  ·  Admin →
  10. support sharding

    It will be good to add support for sharding.

    4 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  SQL Server  ·  Flag idea as inappropriate…  ·  Admin →
  11. 3 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Data Pool  ·  Flag idea as inappropriate…  ·  Admin →
  12. 3 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  SQL Server  ·  Flag idea as inappropriate…  ·  Admin →
  13. Add support for SQL SCOM Management Pack to monitor at least Master Instance

    Would be important to add to the SQL SCOM Management Pack the capability to monitor at least the master instance. Adding to the SQL SCOM MP guide the steps how to install and configure SCOM to monitor the Master Instance.

    3 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  SQL Server  ·  Flag idea as inappropriate…  ·  Admin →
  14. Backup of storage and data pool

    Ability to backup entire Cluster, Storage and Data pool is good to have.

    3 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Storage Pool  ·  Flag idea as inappropriate…  ·  Admin →
  15. Need samples or documentation for how to setup HDFS tiering with Cloudera

    I know it is possible to configure HDFS tiering with S3, ADLS, and Cloudera. It's been near impossible to find any clear documentation on how to setup HDFS tiering for Cloudera. I'd like to see some materials added to the GItHub SQL Samples site for this. https://github.com/microsoft/sql-server-samples/tree/master/samples/features/sql-big-data-cluster

    3 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Data Virtualization  ·  Flag idea as inappropriate…  ·  Admin →
  16. Anaconda in BDC for Spark and ML Services

    Anaconda Enterprise lets you manage packages on a cluster and on your computer and keep them in sync.

    Can BDC include this? Some way to manage packages for spark

    3 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Apache Spark  ·  Flag idea as inappropriate…  ·  Admin →
  17. scale

    Being able to take benefit of containers cluster to dynamically scale the number of container for each role..

    2 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Other  ·  Flag idea as inappropriate…  ·  Admin →
  18. BDC with AD integration

    Currently only one BDC is allowed on AD domain, can we get multiple BDC on same AD domain.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    started  ·  1 comment  ·  SQL Server  ·  Flag idea as inappropriate…  ·  Admin →
  19. Problem with enabling SQL Agent on multi-node BDC with HA

    Currently, if you attempt to enable the SQL agent service on a multi-node BDC (CU4, with HA configured), the local instances are enabled, but the HA listener isn't. You get an error pop up when attempting to log into the HA listener instance (port 31433) using SSMS, see file attached for screenshot.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  SQL Server  ·  Flag idea as inappropriate…  ·  Admin →
  20. 2 votes
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Platform  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1
  • Don't see your idea?

Feedback and Knowledge Base