Do you have a comment or suggestion to improve SQL Server? We’d love to hear it!

Add support for Row Pattern Recognition in T-SQL (SQL:2016 features R010 and R020)

The ISO/IEC 9075:2016 standard (aka SQL:2016) introduces support for Row Pattern Recognition (RPR) in SQL. Similar to using regular expressions to identify patterns in a string, RPR allows you to use regular expressions to identify patterns in a sequence of rows.
To me, it's the next step in the evolution of window functions. If you think that window functions are profound and useful, RPR is really going to bake your noodle.
RPR has limitless practical applications, including identifying patterns in stock market activity, handling time series, fraud detection, material handling, shipping applications, DNA sequencing, gaps and islands, top N per group, and many others.
Similar to window functions, RPR supports partitioning and ordering, which are the basis for finding a pattern in a given sequence of rows, within the indicated partitions, based on the indicated ordering. Also similar to window functions, this feature lends itself for good optimization, with the ability to rely on index ordering to avoid sorting the data.
You can find coverage of the feature including examples in a 90-page technical report ISO/IEC TR 19075-5, available for free here:
https://standards.iso.org/ittf/PubliclyAvailableStandards/c065143_ISO_IEC_TR_19075-5_2016.zip
You can also find it as part of the SQL:2016 Part 2: Foundation document, available for purchase here: https://www.iso.org/standard/63556.html
I also provide a summarized coverage of this feature in an article (publication date April 10, 2019) here:
https://sqlperformance.com/2019/04/t-sql-queries/row-pattern-recognition-in-sql

SQL:2016 defines two RPR-related features:
Feature R010, “Row pattern recognition: FROM clause”
Feature R020, “Row pattern recognition: WINDOW clause”
R010 defines a clause/table operator called MATCH_RECOGNIZE, which you apply to an input table or table expression in the FROM clause, and you get a virtual table as output. The context is similar to that of other table operators like APPLY, PIVOT and UNPIVOT. Here’s the general form of a query using this feature:

SELECT <select_list>
FROM <source_table>
MATCH_RECOGNIZE
(
<match_recognize specification>
) AS <table_alias>;

As an example adapted from the aforementioned technical report, suppose that you’re given a table dbo.Ticker, with columns symbol, tradedate and price. The following query identifies V shapes in the stock prices (period with strictly falling price followed by period with strictly increasing price):

SELECT
MR.symbol, MR.matchnum, MR.startdate, MR.startprice,
MR.bottomdate, MR.bottomprice, MR.enddate, MR.endprice, MR.maxprice
FROM dbo.Ticker
MATCH_RECOGNIZE
(
PARTITION BY symbol
ORDER BY tradedate
MEASURES
MATCH_NUMBER() AS matchnum,
A.tradedate AS startdate,
A.price AS startprice,
LAST(B.tradedate) AS bottomdate,
LAST(B.price) AS bottomprice,
LAST(C.tradedate) AS enddate, -- same as LAST(tradedate)
LAST(C.price) AS endprice,
MAX(U.price) AS maxprice -- same as MAX(price)
ONE ROW PER MATCH -- default
AFTER MATCH SKIP PAST LAST ROW -- default
PATTERN (A B+ C+)
SUBSET U = (A, B, C)
DEFINE
-- A defaults to True, matches any row
B AS B.price < PREV(B.price),
C AS C.price > PREV(C.price)
) AS MR;

The PARTITION BY clause defines that you want to handle each stock symbol separately.
The ORDER BY clause defines ordering based on trade date.
The MEASURES clause defines measures that you want to compute for each match for the pattern.
ONE ROW PER MATCH means that the result table will have one row per pattern match, similar to the outcome of grouping. The alternative is ALL ROWS PER MATCH where you want the detail rows returned per pattern match.
AFTER MATCH SKIP PAST LAST ROW means that once a match is found, you want the next attempt to find a match to start after the last row of the current match. There are other alternatives like looking for the next match in the row following the first row of the current match (SKIP TO NEXT ROW), and others.
The DEFINE clause defines row pattern variables representing the different subsequences of rows in the pattern. In the above example A represents any row as a starting point, B represents a subsequence of decreasing prices (B.price < PREV(B.price)), and C represents a subsequence of decreasing prices (C AS C.price > PREV(C.price)).
The PATTERN clause uses regular expressions to identify a pattern. In the above query the pattern is (A B+ C+), meaning (any row, followed by one or more rows with increasing prices, followed by one or more rows with decreasing prices). By default the quantifiers are greedy, but you can define them to be reluctant.
Here’s the expected output of this query:

symbol matchnum startdate startprice bottomdat bottomprice enddate endprice maxprice
------- --------- ----------- ---------- ----------- ----------- ----------- --------- ---------
STOCK1 1 2019-02-13 151.00 2019-02-18 142.00 2019-02-20 152.00 152.00
STOCK1 2 2019-02-27 154.00 2019-03-04 140.00 2019-03-06 143.00 154.00
STOCK2 1 2019-02-14 329.00 2019-02-18 325.00 2019-02-20 328.00 329.00
STOCK2 2 2019-02-21 326.00 2019-02-25 317.00 2019-02-27 325.00 326.00
STOCK2 3 2019-03-01 324.00 2019-03-05 319.00 2019-03-07 326.00 326.00

R020 uses row pattern recognition as part of the windowing OVER clause (or WINDOW clause when naming a window specification) to further restrict a window frame. Just like the window partition restrict the rows of the input table expression (FROM…WHERE…GROUP BY….HAVING), and the window frame further restricts the window partition, with R020 you further restrict the window frame to the first pattern match in the windowed sequence of rows. Here’s a query demonstrating this:

SELECT T.symbol, T.tradedate, T.price,
startdate OVER W, startprice OVER W,
bottomdate OVER W, bottomprice OVER W,
enddate OVER W, endprice OVER W,
maxprice OVER W
FROM dbo.Ticker T
WINDOW W AS
(
PARTITION BY symbol
ORDER BY tradedate
MEASURES
A.tradedate AS startdate,
A.price AS startprice,
LAST(B.tradedate) AS bottomdate,
LAST(B.price) AS bottomprice,
LAST(C.tradedate) AS enddate, -- same as LAST(tradedate)
LAST(C.price) AS endprice,
MAX(U.price) AS maxprice -- same as MAX(U.price)
ROWS BETWEEN CURRENT ROW -- RANGE and GROUPS are not permitted with row pattern matching in windows
AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
INITIAL -- pattern must start at first row of full window frame; alternative is SEEK
PATTERN (A B+ C+)
SUBSET U = (A, B, C)
DEFINE
-- A defaults to True, matches any row
B AS B.price < PREV(B.price),
C AS C.price > PREV(C.price)
);

Currently, the only platform I know of that implements R010 is Oracle. I don’t know of any platform that implements R020 yet.
It’s completely understood that for Microsoft to add a feature to T-SQL—especially such a substantial one—is a pretty significant investment. But what’s great about features that are added to T-SQL is that they pretty much stay there forever. There’s a huge community hungry for T-SQL improvements such as this one.

Thanks for your consideration!
Itzik

126 votes
Sign in
(thinking…)
Password icon
Signed in as (Sign out)

We’ll send you updates on this idea

Itzik Ben-Gan shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

1 comment

Sign in
(thinking…)
Password icon
Signed in as (Sign out)
Submitting...
  • Itzik Ben-Gan commented  ·   ·  Flag as inappropriate

    Correction:

    The DEFINE clause defines row pattern variables representing the different subsequences of rows in the pattern. In the above example A represents any row as a starting point, B represents a subsequence of decreasing prices (B.price < PREV(B.price)), and C represents a subsequence of increasing prices (C AS C.price > PREV(C.price)).
    The PATTERN clause uses regular expressions to identify a pattern. In the above query the pattern is (A B+ C+), meaning (any row, followed by one or more rows with decreasing prices, followed by one or more rows with increasing prices).

Feedback and Knowledge Base