Do you have a comment or suggestion to improve SQL Server? We’d love to hear it!

Add support for Row Pattern Recognition in T-SQL (SQL:2016 features R010 and R020)

The ISO/IEC 9075:2016 standard (aka SQL:2016) introduces support for Row Pattern Recognition (RPR) in SQL. Similar to using regular expressions to identify patterns in a string, RPR allows you to use regular expressions to identify patterns in a sequence of rows.
To me, it's the next step in the evolution of window functions. If you think that window functions are profound and useful, RPR is really going to bake your noodle.
RPR has limitless practical applications, including identifying patterns in stock market activity, handling time series, fraud detection, material handling, shipping applications, DNA sequencing, gaps and islands, top N per group, and many others.
Similar to window functions, RPR supports partitioning and ordering, which are the basis for finding a pattern in a given sequence of rows, within the indicated partitions, based on the indicated ordering. Also similar to window functions, this feature lends itself for good optimization, with the ability to rely on index ordering to avoid sorting the data.
You can find coverage of the feature including examples in a 90-page technical report ISO/IEC TR 19075-5, available for free here:
https://standards.iso.org/ittf/PubliclyAvailableStandards/c065143_ISO_IEC_TR_19075-5_2016.zip
You can also find it as part of the SQL:2016 Part 2: Foundation document, available for purchase here: https://www.iso.org/standard/63556.html
I also provide a summarized coverage of this feature in an article (publication date April 10, 2019) here:
https://sqlperformance.com/2019/04/t-sql-queries/row-pattern-recognition-in-sql

SQL:2016 defines two RPR-related features:
Feature R010, “Row pattern recognition: FROM clause”
Feature R020, “Row pattern recognition: WINDOW clause”
R010 defines a clause/table operator called MATCH_RECOGNIZE, which you apply to an input table or table expression in the FROM clause, and you get a virtual table as output. The context is similar to that of other table operators like APPLY, PIVOT and UNPIVOT. Here’s the general form of a query using this feature:

SELECT <select_list>
FROM <source_table>
MATCH_RECOGNIZE
(
<match_recognize specification>
) AS <table_alias>;

As an example adapted from the aforementioned technical report, suppose that you’re given a table dbo.Ticker, with columns symbol, tradedate and price. The following query identifies V shapes in the stock prices (period with strictly falling price followed by period with strictly increasing price):

SELECT
MR.symbol, MR.matchnum, MR.startdate, MR.startprice,
MR.bottomdate, MR.bottomprice, MR.enddate, MR.endprice, MR.maxprice
FROM dbo.Ticker
MATCH_RECOGNIZE
(
PARTITION BY symbol
ORDER BY tradedate
MEASURES
MATCH_NUMBER() AS matchnum,
A.tradedate AS startdate,
A.price AS startprice,
LAST(B.tradedate) AS bottomdate,
LAST(B.price) AS bottomprice,
LAST(C.tradedate) AS enddate, -- same as LAST(tradedate)
LAST(C.price) AS endprice,
MAX(U.price) AS maxprice -- same as MAX(price)
ONE ROW PER MATCH -- default
AFTER MATCH SKIP PAST LAST ROW -- default
PATTERN (A B+ C+)
SUBSET U = (A, B, C)
DEFINE
-- A defaults to True, matches any row
B AS B.price < PREV(B.price),
C AS C.price > PREV(C.price)
) AS MR;

The PARTITION BY clause defines that you want to handle each stock symbol separately.
The ORDER BY clause defines ordering based on trade date.
The MEASURES clause defines measures that you want to compute for each match for the pattern.
ONE ROW PER MATCH means that the result table will have one row per pattern match, similar to the outcome of grouping. The alternative is ALL ROWS PER MATCH where you want the detail rows returned per pattern match.
AFTER MATCH SKIP PAST LAST ROW means that once a match is found, you want the next attempt to find a match to start after the last row of the current match. There are other alternatives like looking for the next match in the row following the first row of the current match (SKIP TO NEXT ROW), and others.
The DEFINE clause defines row pattern variables representing the different subsequences of rows in the pattern. In the above example A represents any row as a starting point, B represents a subsequence of decreasing prices (B.price < PREV(B.price)), and C represents a subsequence of decreasing prices (C AS C.price > PREV(C.price)).
The PATTERN clause uses regular expressions to identify a pattern. In the above query the pattern is (A B+ C+), meaning (any row, followed by one or more rows with increasing prices, followed by one or more rows with decreasing prices). By default the quantifiers are greedy, but you can define them to be reluctant.
Here’s the expected output of this query:

symbol matchnum startdate startprice bottomdat bottomprice enddate endprice maxprice
------- --------- ----------- ---------- ----------- ----------- ----------- --------- ---------
STOCK1 1 2019-02-13 151.00 2019-02-18 142.00 2019-02-20 152.00 152.00
STOCK1 2 2019-02-27 154.00 2019-03-04 140.00 2019-03-06 143.00 154.00
STOCK2 1 2019-02-14 329.00 2019-02-18 325.00 2019-02-20 328.00 329.00
STOCK2 2 2019-02-21 326.00 2019-02-25 317.00 2019-02-27 325.00 326.00
STOCK2 3 2019-03-01 324.00 2019-03-05 319.00 2019-03-07 326.00 326.00

R020 uses row pattern recognition as part of the windowing OVER clause (or WINDOW clause when naming a window specification) to further restrict a window frame. Just like the window partition restrict the rows of the input table expression (FROM…WHERE…GROUP BY….HAVING), and the window frame further restricts the window partition, with R020 you further restrict the window frame to the first pattern match in the windowed sequence of rows. Here’s a query demonstrating this:

SELECT T.symbol, T.tradedate, T.price,
startdate OVER W, startprice OVER W,
bottomdate OVER W, bottomprice OVER W,
enddate OVER W, endprice OVER W,
maxprice OVER W
FROM dbo.Ticker T
WINDOW W AS
(
PARTITION BY symbol
ORDER BY tradedate
MEASURES
A.tradedate AS startdate,
A.price AS startprice,
LAST(B.tradedate) AS bottomdate,
LAST(B.price) AS bottomprice,
LAST(C.tradedate) AS enddate, -- same as LAST(tradedate)
LAST(C.price) AS endprice,
MAX(U.price) AS maxprice -- same as MAX(U.price)
ROWS BETWEEN CURRENT ROW -- RANGE and GROUPS are not permitted with row pattern matching in windows
AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
INITIAL -- pattern must start at first row of full window frame; alternative is SEEK
PATTERN (A B+ C+)
SUBSET U = (A, B, C)
DEFINE
-- A defaults to True, matches any row
B AS B.price < PREV(B.price),
C AS C.price > PREV(C.price)
);

Currently, the only platform I know of that implements R010 is Oracle. I don’t know of any platform that implements R020 yet.
It’s completely understood that for Microsoft to add a feature to T-SQL—especially such a substantial one—is a pretty significant investment. But what’s great about features that are added to T-SQL is that they pretty much stay there forever. There’s a huge community hungry for T-SQL improvements such as this one.

Thanks for your consideration!
Itzik

240 votes
Sign in
(thinking…)
Sign in with: oidc
Signed in as (Sign out)

We’ll send you updates on this idea

Itzik Ben-Gan shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

7 comments

Sign in
(thinking…)
Sign in with: oidc
Signed in as (Sign out)
Submitting...
  • Chojrak commented  ·   ·  Flag as inappropriate

    Yeah, T-SQL needs a lot of improvements. It's actually one of the most miserable SQL dialects out there now in 2019. And that impacts developer's productivity a lot.

    It's more important as it's not only about on-prem version which I think is doomed by MS, but the same pathetic SQL is in all incarnations of Azure SQL products.

  • Itzik Ben-Gan commented  ·   ·  Flag as inappropriate

    Islands of price >= 150, tolerating gaps of fewer than 7 days with RPR:

    SELECT MR.symbol, MR.startdate, MR.enddate, MR.maxprice
    FROM (SELECT * FROM dbo.Ticker WHERE price >= 150) AS D
    MATCH_RECOGNIZE
    (
    PARTITION BY symbol
    ORDER BY tradedate
    MEASURES FIRST(tradedate) AS startdate, LAST(tradedate) AS enddate, MAX(price) AS maxprice
    PATTERN (A* B)
    DEFINE A AS DATEDIFF(day, A.tradedate, NEXT(A.tradedate)) < 7
    ) AS MR;

  • Magnus commented  ·   ·  Flag as inappropriate

    Hi, today i install regular expressions support in my sql server. however in the company sql server farm i am not allowed to use it. so using it is a razzle. I am a strong supporter of RPR, please add this! its super poweful in a number of ways when you learned to know it. Br Magnus

  • Martyn Jones commented  ·   ·  Flag as inappropriate

    This has the potential to be incredibly useful and a simpler alternative to a current manual process using R which we currently undertake.

  • Itzik Ben-Gan commented  ·   ·  Flag as inappropriate

    Packing intervals with RPR:

    SELECT MR.username, MR.starttime, MR.endtime
    FROM dbo.Sessions
    MATCH_RECOGNIZE
    (
    PARTITION BY username
    ORDER BY starttime, endtime, id
    MEASURES FIRST(starttime) AS starttime, MAX(endtime) AS endtime
    PATTERN (A* B)
    DEFINE A AS MAX(A.endtime) >= NEXT(A.starttime)
    ) AS MR;

  • Itzik Ben-Gan commented  ·   ·  Flag as inappropriate

    Correction:

    The DEFINE clause defines row pattern variables representing the different subsequences of rows in the pattern. In the above example A represents any row as a starting point, B represents a subsequence of decreasing prices (B.price < PREV(B.price)), and C represents a subsequence of increasing prices (C AS C.price > PREV(C.price)).
    The PATTERN clause uses regular expressions to identify a pattern. In the above query the pattern is (A B+ C+), meaning (any row, followed by one or more rows with decreasing prices, followed by one or more rows with increasing prices).

Feedback and Knowledge Base