Add a built-in table of numbers
There are several SQL problems that can be solved with a table of numbers or dates.
This is a simple one-column table with numbers from 1 and up. Typical common problems
solved by with such a function:
o Show me sales for all months, including months for which there were no sales.
o Iterate over all characters in a string in a set-based statements.
o Any other query which you need to drive with a consecutive seres.
It's easy to construct and fill such a table. However, there are a couple of problems
1) If you query needs more numbers than there are in the table, you
get incorrect results.
2) Same thing happens if some smartass delets rows in the middle - or adds
a zero to the table.
3) There are queries where the optimizer does not know what the table is for
and makes incorrect assumptions.
The latter calls for some explanation. Assume this function:
CREATE FUNCTION inline_split_me(@param nvarchar(MAX))
RETURNS TABLE AS
charindex(N',' COLLATE Slovenian_BIN2,
@param + N',', Number) -
))) AS Value
WHERE Number <= convert(int, len(@param))
AND substring(N',' + @param, Number, 1) = N',' COLLATE Slovenian_BIN2)
This function is used to convert a comma-separated list of numbers to a
table. If I say.
CREATE PROCEDURE get_company_names_inline @customers nvarchar(2000) AS
SELECT C.CustomerID, C.CompanyName
FROM Northwind..Customers C
JOIN inline_split_me(@customers) s ON C.CustomerID = s.Value
EXEC get_company_names_inline 'ALFKI,BONAP,CACTU,FRANK'
and have one million rows in my Numbers table, the optimizer thinks that
the Numbers table will return 300000 rows. Had Numbers been a
built-in table, the optimzier would have known the hit rate exactly.
Here is a further list, courtsey of Jeffery Moden:
Here are several things that I use it for.
1.� Continue to use it for a splitter because the STRING_SPLIT function doesn't return element positions nor guarantee any order.
2. Luhn mod 10 checksum for credit card, bank routing, and other similar checksum algorithms.
3. Relational multiplication where rows must be replicated based on a value within the row.
4. Solving complex problems such as many different overlapping dates, currently active rows, distributions of values over multiple months, etc, etc.
5.�Using it as a read-less "Pseudo-Cursor" (a row source with no SQL Level loop, every SELECT is a "Pseudo-Cursor") to generate millions of rows of test data in a very easy and intuitive manner in just seconds.
6. Many different string functions including numeric base translations, "initial caps", "data purification" functions such as returning all alpha or all numeric digit strings or removal of accidental control characters, unconventional splitters such as what I used to solve the conversion of million row Adjacency Lists to Nested Sets in less than a minute instead of days using the old push-stack method, removal of duplicated characters, special numeric checksum and encoding for GS1/Uuc/Ean-128 and other encoders/decoders, etc, etc, etc.�� There are several other non-delimited, character based places where performance would have suffered greatly had it not been for a Tally Table or Tally Function (numbers or sequence table or function for many of you).
That's certainly not the limit of what I used it for.� I just can't remember all the places I've used it.� It's one of those "imagination limited" tools where, to a good imagination, it's an unlimited tool.� Like I said, I�even use it as a simple "readless row source" to power some things that don't even need�a counter but still needs to act as a high performance loop.� Heh... and ask Itzik Ben-Gan about all the things he's used it for.
It would be really nice if it were a built in function that operated at machine language speeds, hopefully even faster than the "pseudo-cursors" that loop behind the scenes in every Insert, Select, Update, and Delete.
Upvotes: 278<=-=Feb 1 2008 7:18PM=-=>
Thanks for the valuable suggestion.
This seems more like adding the sequence support which we’re seriously considering for the next major release.
Meanwhile, would using identity column help?<=-=Feb 2 2008 2:11AM=-=>
It does not seem that you understood the request. This definitely has nothing to do with
IDENTITY. I am less versed about sequences, but I don’t think they will cut it either. If you think
it does, maybe you could provide an example? Take this problem: For the Orders table in
Northwind, write a query that lists the number of orders for all days in 1997. The result set should
include all 365 days, and list zero for days without a number.
This is a typical problem where you need a table of of numbers (or dates). While it’s easy to
create such a table, I argue in this request that there would be a number of advantages of
having such a table (function) builtin. For more details on the optimization problems, see my
article on Arrays and Lists in SQL Server, http://www.sommarskog.se/arrays-in-sql.html.
Where do you stop with the content of the numbers table?
1-100? 1-1000000? What if I need 10billion? What if I need numbers up 2^64?? More?
Seems like an awful waste of space for something that may or may not be used.
As per workaround, use a function. (would be nice if that function was built-in..maybe that’s what they mean by ‘sequence’?)<=-=Jul 3 2008 6:11AM=-=>
I would say 100000 by default
Another option would be if that 100000 is not good enough for you there would be a built in proc that will generate this table for you, you would have to pass in the start and end value and the proc would generate this table
This would be a great asset and I believe it to be worth the storage requirement. 100,000 is a good number to default to and if more numbers are needed; I am sure some sort of system sp that builds the table dynamically. The bottom line is a lot of people are not acclimated with the concept of a numbers table and are missing out of all the performance gains from using one. Having MS built-in numbers table not only exposes the numbers table, but also allows those who do not want to manage an additional table to reap the benefits.<=-=Jul 7 2008 5:11AM=-=>
Adding a built in table of numbers from 1 to 2 billion would certainly take up a lot of disk space especially if you consider the fact that most people never use one with a count over 8000 and those that do almost never use one over a million.
What I’d rather see is a whole number “sequence generator” where you could identify the StartValue, EndValue, and optional Increment which would default to 1. And, it should be able to order the values in an ascending or descending fashion as if it had a Clustered Primary Key doing so.<=-=Jul 15 2008 9:59AM=-=>
Why would anything need to be stored on disk? Make the query engine intelligently produce the numbers on demand. It would be an in-memory operation, perhaps not even needing all rows to be materialized, as in the case of splitting a string on a delimiter where the majority of the numbers are discarded.
The optimizer could pay attention to how the numbers are used and do an in-memory loop through the string rather than creating a table, temp or otherwise. The table is uniform and regular, and re-generating it each time in a tight +1 loop is probably faster than building the whole thing. It’s like a kind of native index where you know in advance where everything is because the lookup index is exactly equal to the lookup value.
Look at it this way. What would the query engine have to do to satisfy this query:
SELECT Num FROM Numbers WHERE Num = 123
It wouldn’t have to hit the Numbers table at all: the input IS the result. The answer can be given at parse time. Similarly,
SELECT Num FROM Numbers WHERE Num BETWEEN 1 AND 100000
Again, why hit the table at all? It already knows the desired result, the numbers 1- 100000. It would be far more efficient to just generate the values in memory than go read them somewhere. (give me 1st value: okay, got 1. Give me 2nd value, okay, got 2. …)
Many problems are easily solvable with a numbers table that are difficult or impossible otherwise.<=-=Aug 5 2008 12:58PM=-=>
Indeed, the table would only be logical, it would not be stored on disk.<=-=Nov 25 2010 11:53PM=-=>
A little bit off-topic, but could I also get the same thing for dates? It just seems so stupid to create table that has dates so that you can join it with other tables to get one row for each day even if data doesn’t exist for that day.<=-=Dec 14 2010 2:54PM=-=>
Please read feedback id 347442. Such a built-in function would have (potentially) significant performance impacts, lacking in user defined functions/solutions (mentioned feedback id has examples).<=-=Jan 1 2011 6:00PM=-=>
Glad to see this suggestion. This functionality would help with many of the perennial problems that come up on news groups and forums.<=-=Mar 14 2012 4:17PM=-=>
I see Postgres has the “generate_series” function for this need.<=-=Jun 11 2012 1:19PM=-=>
Neither identity nor sequence will help here. It is hard to believe you even read the question.
+1<=-=Nov 27 2012 12:04PM=-=>
Having had the “joy” of splitting a multi-XML-document proprietary 3rd-party supplied file of over 3MB, my Numbers table was 12M+ rows, just in case a heavy ordering day bumped the file size…
On SQL2012, using page-level compression, it consumes over 130MB.
If, as current best practices appear to recommend, our SQL Servers are stacked with huge quantities of RAM, 130MB in, say 120GB isn’t going to matter a great deal…
If Microsoft gave us a “standard set” of Numbers that is configurable, but which could be bumped on demand and shrunken automatically to the standard-set-size, memory pressure would be reduced while still delivering high-speed data-access.
My Dates table stretches from 2008 (birth of system) to 2099, comprising 33K rows of 32 columns (at present) and 16MB compressed, with alternate indexes. Again, a triflingly small memory “hog”.
I’m less persuaded that a Microsoft-supplied Dates table is a “need” simply because the slices and representations of dates in short/long/week/month/quarter, etc. are really site-specific (hence my 32 columns!).
+1 for the Numbers table…<=-=Mar 19 2013 1:14PM=-=>
Please, please, please.<=-=Nov 24 2014 9:27PM=-=>
Howdy there, Microsoft,
This suggestion is going to be 8 years old in 3 months. It would be wonderful to celebrate that anniversary with a CU for both 2012 and 2014 that contains this wonderful suggested tool. It shouldn’t take you long.
I’ll also suggest that it take two parameters so that folks can start it “0” if they need to (which is more common than you’d think).
Thank you for your time.<=-=Nov 26 2014 4:42AM=-=>
Personally, I think this relatively simple addition is long overdue. The Tally table has been called the Swiss Army Knife of SQL and I have to agree. It can be used for everything from splitting strings to date generation. An ITVF that’s built into the product would allow for optimization that’s probably beyond what DBAs can do. I’ve been using a table of numbers for years and consider it completely normal, as do many others in the SQL Server community. Please consider the addition of this feature to the product.<=-=Nov 26 2014 6:31AM=-=>
This would be handy even if it it were no larger than a single record and additional functionality were added to SQL Server such as “Connect BY” and so forth to allow for larger virtual numbers tables could be accessed. Many uses are out there for such a table and if there is any doubt, take a look at the DUAL table in Oracle.<=-=Dec 1 2014 7:55AM=-=>
This is fairly trivial to develop, but also the issues raised above (integrity) are potentially a problem in app code. Think SQL Injection or some other tampering that wouldn’t be the case in a native numbers table.
While this would be a nice backport, I would argue this is fairly minor work for SQL Server 2014+ (in an SP) and giving us a table with an integer’s worth of numbers, would be very handy. I would suggest that the table be a DMV visible in every database.<=-=Apr 2 2015 7:56AM=-=>
CREATE FUNCTION [dbo].[fn_Nums](@n AS BIGINT)
Nums AS OVER AS n FROM L5)
SELECT n FROM Nums WHERE n <= @n;
The easiest trick to generate table of numbers with OPENJSON:
return (select cast([key] as int) as number from OPENJSON( '[1' + replicate(',1',count-1)+’]’))
You will need SQL Server 2016 CTP3 or higher for this.<=-=Mar 5 2016 8:30AM=-=>
This should not be a real table with a fixed number of rows. This should be a generator TVP that can provide any number range, e.g.
select Number from Builtin_Numbers(1, 10000000)
It would be a pity if the built in numbers table would be restricted to some range, e.g. [1, 1M].
We also need such a generator for dates (all dates in a given range).<=-=Mar 16 2016 12:15AM=-=>
I really love to have a number sequence generator build into SQL Server as it removed a lo
Jeff Moden commented
Lordy. On the old CONNECT site, this suggestion had 278 upvotes. On this "new" site, it has only 96. Shows you how "good" the migration from the old site was.
This excellent suggestion has been "unplanned" for more than a decade. It would have taken less time to implement this as a high performance CLR based function that it took to create the mistake called String_Split() (it's missing some very necessary functionality that the public is well aware of). Can you please implement this idea and test it just to make sure it doesn't end up having performance issues like the FORMAT function does (which also needs to be fixed).