Add language support for null treatment clause (RESPECT NULLS | IGNORE NULLS) with offset window functions (LAG, LEAD, FIRST_VALUE
There�s a common need with offset window functions (LAG, LEAD, FIRSTVALUE, LASTVALUE) to request to ignore NULLs. A classic example is to return the last non-NULL in an ordered stream of values. Consider the following sample data:
DROP TABLE IF EXISTS dbo.T1;
CREATE TABLE dbo.T1
(
id INT NOT NULL CONSTRAINT PK_T1 PRIMARY KEY,
col1 INT NULL
);
INSERT INTO dbo.T1(id, col1) VALUES
( 2, NULL),
( 3, 10),
( 5, -1),
( 7, NULL),
(11, NULL),
(13, -12),
(17, NULL),
(19, NULL),
(23, 1759);
The id column represents the order of the events, and whenever the col1 value is non-NULL, this represents a value change, and from that point, the new value is considered the applicable one. You want to return the last non-NULL col1 value based on id ordering. For this sample data, the desired result is the following:
id col1 lastnonnull
2 NULL NULL
3 10 10
5 -1 -1
7 NULL -1
11 NULL -1
13 -12 -12
17 NULL -12
19 NULL -12
23 1759 1759
The SQL standard supports a <null treatment> clause with offset window functions, allowing you to specify RESPECT NULLS (default) | IGNORE NULLS. With this clause, the above task would be achieved like so:
SELECT id, col1,
LAG(col1) IGNORE NULLS OVER(ORDER BY id) AS lastnonnull
FROM dbo.T1;
Instead, currently we need to resort to more complex solutions such as this one:
WITH C AS
(
SELECT id, col1,
MAX(CASE WHEN col1 IS NOT NULL THEN id END)
OVER(ORDER BY id
ROWS UNBOUNDED PRECEDING) AS grp
FROM dbo.T1
)
SELECT id, col1,
MAX(col1) OVER(PARTITION BY grp
ORDER BY id
ROWS UNBOUNDED PRECEDING) AS lastnonnull
FROM C;
Thanks much for your consideration!
Cheers,
Itzik

Upvotes: 60
<=-=Mar 29 2017 11:32PM=-=>Just wanted to add that both Oracle and DB2 seem to support this feature:
http://docs.oracle.com/database/122/SQLRF/LAG.htm#SQLRF00652
https://www.ibm.com/support/knowledgecenter/en/SSGU8G_12.1.0/com.ibm.sqls.doc/ids_sqs_1513.htm
Cheers,
Itzik
That last IBM link was for Informix!
The DB2 link is https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0023461.html
<=-=Jul 4 2017 3:30AM=-=>Thanks for reporting this request.
It is in our backlog, and we have planned to do something like this, but we cannot confirm when it would be completed.
Jovan
<=-=Jul 10 2017 11:54AM=-=>Thanks, Jovan; good to hear.
3 comments
-
simon sabin commented
I put this on par with the use of expressions in the COUNT to have conditional counts by having expressions return NULL when not required. This would allow for rows to be "ignored" for the windowed aggregates and solves many problems.
The other option is complex loop constructs that loop through the data managing state during the loop, this becomes super complex if wanting to return multiple rows, as one has to store results and return them.
There are options that are far from intuitive that result in complex queries with use of other windowed aggregates. -
Alexander Sharovarov commented
It very useful feature. It would allow the use of LAST_VALUE function for filling gaps in the data. A very common problem in data analysis and data science. If it is difficult to implement for all functions it should at least implemented for the LAST_VALUE function.
-
Koen Verbeeck commented
This is a must have. It would make writing window functions a lot easier when NULLs are involved.