Do you have a comment or suggestion to improve SQL Server? We’d love to hear it!

QUALIFY - Request for a New Filtering Clause

This entry describes a request for a new query filtering clause called QUALIFY in SQL Server.

As of SQL Server 2008 (including R2) T-SQL supports three standard filtering clauses based on predicates (ON, WHERE, HAVING), and one nonstandard filtering option based on a number/percent of rows and ordering (TOP). Window calculations (e.g., ranking and aggregate calculations that use the OVER clause, and hopefully others in the future) are only allowed in the SELECT and ORDER BY clauses of a query, and therefore you cannot refer to those directly in the filtering clauses that are based on predicates. For example, the following is not a valid statement:

SELECT col1, col2, col3
FROM dbo.T1
WHERE ROW_NUMBER() OVER(PARTITION BY col1 ORDER BY col2) BETWEEN 11 AND 20;

There is logic behind this restriction. Consider a table T1 with the following sample data and the query that follows:

col1
-----
100
150
300
350
900

SELECT col1
FROM dbo.T1
WHERE col1 >= 300 AND ROW_NUMBER() OVER(ORDER BY col1) <= 2;

Should the filter based on the ROW_NUMBER function be evaluated after col1 >= 300 or before it? If the former, you should get the output:

col1
-----
300
350

If the latter, you should get an empty set. But in SQL things get processed logically in an all-at-once fashion and order of evaluation of predicates shouldn�t matter. Due to this ambiguity SQL allows window calculations only in the SELECT and ORDER BY clauses of a query, making it clear that they are evaluated after the FROM, WHERE, GROUP BY and HAVING clauses.
Sure, you can achieve the filtering task indirectly by using a table expression:

WITH C AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY col1 ORDER BY col2) AS rownum,
col1, col2, col3
FROM dbo.T1
)
SELECT col1, col2, col3
FROM C
WHERE rownum BETWEEN 11 AND 20;

However, this adds a layer of complexity that if possible would be nice to avoid.
I propose implementing a new filtering clause called QUALIFY similar to the one implemented already by Teradata. In terms of logical query processing, QUALIFY fits after the SELECT clause, allowing it to refer to window calculations:

1. FROM
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. QUALIFY
7. ORDER BY

If implemented, you will be able to refer to window calculations directly in the QUALIFY clause, like so:

SELECT col1, col2, col3
FROM dbo.T1
QUALIFY ROW_NUMBER() OVER(PARTITION BY col1 ORDER BY col2) BETWEEN 11 AND 20;

Or indirectly, like so:

SELECT ROW_NUMBER() OVER(PARTITION BY col1 ORDER BY col2) AS rownum,
col1, col2, col3
FROM dbo.T1
QUALIFY rownum BETWEEN 11 AND 20;

But this time without the need for a table expression.
Support for the QUALIFY clause would also help writing clearer code by avoiding the need to use the TOP option. TOP involves a lot of confusion surrounding the TOP ordering vs. presentation ordering. People use TOP to try to circumvent restrictions regarding the use of ORDER BY in table expressions, not realizing that unless the outermost query has an ORDER BY clause, presentation ordering is not guaranteed. The standard doesn�t support TOP, but does support TOP-like filtering clauses. However, those standard features have the same confusion aspects related to ordering.
In the past I proposed revising the TOP option to support an OVER clause with its own ordering specification instead of relying on the ORDER BY clause that traditionally serves a presentation purpose. And while we�re at it, why not also support a PARTITION BY clause. You can find my proposal here:

https://connect.microsoft.com/SQLServer/feedback/details/254390/over-clause-enhancement-request-top-over

I still feel that such a feature would be great since it would avoid the existing ambiguity and confusion, plus support a partitioning element that TOP doesn�t currently support.
I find that supporting the QUALIFY clause is even more important than the proposed revision to TOP since:
� It gives you all the functionality that a TOP OVER (and the top-like standard features) would give you.
� There�s no confusion around the ordering aspect.
� You have a partitioning element in window calculations and naturally it will be reflected in the filter.
� You can do much more than what TOP and the TOP-like standard features allow you to do, e.g., if SQL Server adds support in the future for other window calculations and enhance existing ones, those will be supported in the QUALIFY clause as well. For example, you will be able to filter by a running total or a sliding total.

If you also feel that this feature could be useful, please support it by casting your vote.

Cheers,
BG

1 vote
Sign in
Check!
(thinking…)
Reset
or sign in with
  • facebook
  • google
    Password icon
    Signed in as (Sign out)

    We’ll send you updates on this idea

    AdminMicrosoft SQL Server (Admin, Microsoft Azure) shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →
    unplanned  ·  AdminMicrosoft SQL Server (Admin, Microsoft Azure) responded  · 

    Upvotes: 371

    <=-=Feb 10 2010 1:31PM=-=>

    What a great suggestion – this would make writing clearer and more self-documenting code so much easier.

    <=-=Feb 15 2010 7:53AM=-=>

    I do these kinds of ROW_NUMBER() queries ALL the time. This would be a wonderful option. Please give this serious consideration. It would make my code much cleaner.

    <=-=Feb 16 2010 7:12AM=-=>

    This would be extremely useful. The OVER clause was really handy, but the CTE is a necessary evil if you want to get the most use out of row_number(). You have my vote.

    Thanks,
    Eric

    <=-=Feb 17 2010 4:09PM=-=>

    Hi Itzik,

    Thank you for your suggestion! This does sound like a very useful addition to T-SQL. We will be triaging the request and see if it can make it into a future release of SQL Server.

    Again, thanks for your feedback!
    - Tobias, SQL Server Engine

    <=-=Mar 2 2010 3:14AM=-=>

    With so much ANS standard SQL functionality missing, is it really worth spending time implementing a non-standard and functionally rather limited enhancement (QUALIFY doesn’t permit GROUP on the ROW_NUMBER() result, but I do this frequently), for which a full-function ANS-conformant workaround (nested table expression), that can be readily readable (WITH clause) already exists?

    <=-=Mar 4 2010 7:01PM=-=>

    Hi Itzik,

    Despite all the up votes, I have to argue against it.

    Consider these queries similar to your example:

    — Query Q
    SELECT col1, col2, col3
    FROM dbo.T1
    QUALIFY ROW_NUMBER() OVER BETWEEN 11 AND 20;

    — Query Q_RN
    SELECT col1, col2, col3, ROW_NUMBER() OVER AS rn
    FROM dbo.T1
    QUALIFY ROW_NUMBER() OVER BETWEEN 11 AND 20;

    Multiple choice question:
    For query Q_RN, what values appear in column [rn] of the result set?
    a) 1, 2, …, 10
    b) 11, 12, …, 20
    c) Query Q_RN is not permitted
    d) QUALIFY is not added to T-SQL

    Of course, if QUALIFY is implemented in T-SQL 20xx, we can wait and look in Books Online, but what do you think?Microsoft has to decide which it is before they can implement the new nonstandard language feature, so let’s think about it.

    Argument for a): We can currently add any row_number() expression as a column in the outermost select list of a query expression to get rows numbered starting at 1. Adding an additional row_number() column should always number the result set rows starting at 1.

    Argument against a): It will be impossible to include the QUALIFY expression in the result set in the obvious way. Of course, T-SQL programmers will try some very crazy things, and some might even work. Add 0+ in the select list?

    SELECT col1, col2, col3, 0+ROW_NUMBER() OVER
    FROM dbo.T1
    QUALIFY ROW_NUMBER() OVER BETWEEN 11 AND 20;

    Add a dummy item to the OVER clause to change the ranking expression in the SELECT list?

    SELECT col1, col2, col3, 0+ROW_NUMBER() OVER
    FROM dbo.T1
    QUALIFY ROW_NUMBER() OVER BETWEEN 11 AND 20;

    Even better, surround the qualified query with TOP (100) PERCENTORDER BY ROW_NUMBER() …, because we know ROW_NUMBER is allowed in ORDER BY and maybe this will work. (G-d help us here!)

    This is all very bad. We should choose answer b), I supposed.

    Arguments for b):
    1. Prevents complications from choosing answer a)
    2. It agrees with Itzik’s articles, which show SELECT before QUALIFY in the logical query processing order :)

    Arguments against b):
    1. Now it is impossible to add a row-number column to query Q, and
    2. The behavior of all ranking functions in the SELECT list becomes very confusing: they rank rows neither before nor after filtering, but after some (WHERE and HAVING) filtering, but before some (QUALIFY) filtering. If QUALIFY is evaluated between SELECT and ORDER BY, A ranking function in the ORDER BY clause will evaluate to something different than the same ranking function in the SELECT list. It will be impossible to display the values used for ordering in this case.

    Argument for c): Avoids figuring out whether a) or b) is correct.
    Argument against c): Makes QUALIFY much less useful. Your “indirect” use of QUALIFY would have to be prohibited also. Other things would be prohibited (like QUALIFY and ORDER BY that both use ranking functions).

    Arguments for d):

    QUALIFY does not provide any new capabilities (as myudkin already explained) QUALIFY will be very hard to document and whatever is decided, some results will be unintuitive (should row numbers start at 11, or should row numbers start at 1 when you asked for row numbers 11 through 20?), or many things will be prohibited, or Microsoft will not solve all the problems, and when QUALIFY is used in a view (if allowed), it will behave inconsistently, and various undocumented behaviors will have to be preserved in compatibility levels < 135.

    It takes a large international committee a long time to work out all the details of a new language feature. Microsoft already got TOP wrong, and had to deprecate GROUP BY ALL and SET ROWCOUNT. Still T-SQL is not a fully defined language. Getting QUALIFY right now and in combination with future features is too hard. I believe TOP..OVER is much safer, because there is no column materialized with it. It will not be easy, but TOP .. OVER repairs a badly designed proprietary extension – it doesn’t add a new and questionable one.

    We have a good situation now, where we must decide whether each ranking function expression belongs inside
    the CTE or outside the CTE. Clumsy, but precise. No chance of misinterpretation. We can get the 1..10 numbers or the 11..20 numbers from Q_RN, whichever we want.

    By the way, I was unable to find full documentation from Teradata, and in fact I found a couple of complaints that the documentation was insufficient. Perhaps there are Teradata users of Connect who can lend some insight into how Teradata answered the multiple choice question I asked.

    Steve Kass

    <=-=Mar 5 2010 7:29AM=-=>

    Hi Steve :)

    To me the answer is simple since what I had in mind was logically evaluating QUALIFY after SELECT and before ORDER BY. With this in mind, the answer to your question is a clear b.

    Yes, if the feature will be implemented new crazy options will be available for programmers that are focused on trouble-making instead of writing productive code ;) but thinking practically, the common use would most likely be sensible and prevent the need for table expressions.

    Let me try to respond the arguments against b:

    >> 1. Now it is impossible to add a row-number column to query Q, and <<

    True, assuming you mean a row number starting with 1. But that�s likely the less common case. Here if you need a result row number starting with 1 you should use a table expression. It�s important to address the common need as simply as possible, and allow the less common need to be achieved with a bit more effort.

    >> 2. The behavior of all ranking functions in the SELECT list becomes very confusing <<

    As long as the documentation is clear about logical query processing order (FROMWHEREGROUP BY→HAVINGSELECTQUALIFYORDER BY), I don�t see any room for confusion.

    >> they rank rows neither before nor after filtering, but after some (WHERE and HAVING) filtering, but before some (QUALIFY) filtering. <<

    This is no different than talking about GROUP BY. It is logically processed after the WHERE filter and before the HAVING filter. Is it so confusing? So why should it be any more confusing if the SELECT was processed after HAVING and before QUALIFY?

    >> If QUALIFY is evaluated between SELECT and ORDER BY, A ranking function in the ORDER BY clause will evaluate to something different than the same ranking function in the SELECT list. It will be impossible to display the values used for ordering in this case. <<

    Again, that�s correct, but I don�t find it to be confusing as long as the documentation is clear about logical query processing order. And as mentioned, I believe that the problematic cases are not the common ones, and with those, it�s reasonable to expect to put a bit more effort in using table expressions.

    BTW, think of the benefits in QUALIFY beyond window calculation. You can refer to target column aliases assigned to any expression, e.g.,

    SELECT custid, orderyear,
    CASE orderyear
    WHEN 2006 THEN 2006
    WHEN 2007 THEN 2007
    WHEN 2008 THEN 2008
    END AS val
    FROM dbo.PvtOrders
    CROSS JOIN
    (VALUES,(2007),(2008)) AS Y(orderyear)
    QUALIFY val IS NOT NULL;

    Yeah, like myudkin said, the same need is applicable to other pre-SELECT clauses like GROUP BY, but it�s a far more common need for filtering purposes.

    >> It takes a large international committee a long time to work out all the details of a new language feature. Microsoft already got TOP wrong <<

    So true, and you know I�m all in favor of adding TOP OVER and deprecating the existing TOP, but I also want to be practical. The likelihood that this will happen is very low, especially due to the fact that the standard SQL:2008 adds a TOP-like feature that (are you sitting?) suffers from the same design flows that the existing TOP does. Check out FETCH FIRST in SQL:2008:

    �A can also optionally contain a , which may limit the cardinality of the derived table. If a contains both an and a , then the rows in the derived table are first sorted according to the and then limited to the number of rows specified in the .�

    So it appears that the standard can get it wrong too. They added this FETCH FIRST option a

    0 comments

    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      Signed in as (Sign out)
      Submitting...

      Feedback and Knowledge Base