Sql, Auxiliary Table of Numbers

SQL, Auxiliary table of numbers

Heh... sorry I'm so late responding to an old post. And, yeah, I had to respond because the most popular answer (at the time, the Recursive CTE answer with the link to 14 different methods) on this thread is, ummm... performance challenged at best.

First, the article with the 14 different solutions is fine for seeing the different methods of creating a Numbers/Tally table on the fly but as pointed out in the article and in the cited thread, there's a very important quote...

"suggestions regarding efficiency and
performance are often subjective.
Regardless of how a query is being
used, the physical implementation
determines the efficiency of a query.
Therefore, rather than relying on
biased guidelines, it is imperative
that you test the query and determine
which one performs better."

Ironically, the article itself contains many subjective statements and "biased guidelines" such as "a recursive CTE can generate a number listing pretty efficiently" and "This is an efficient method of using WHILE loop from a newsgroup posting by Itzik Ben-Gen" (which I'm sure he posted just for comparison purposes). C'mon folks... Just mentioning Itzik's good name may lead some poor slob into actually using that horrible method. The author should practice what (s)he preaches and should do a little performance testing before making such ridiculously incorrect statements especially in the face of any scalablility.

With the thought of actually doing some testing before making any subjective claims about what any code does or what someone "likes", here's some code you can do your own testing with. Setup profiler for the SPID you're running the test from and check it out for yourself... just do a "Search'n'Replace" of the number 1000000 for your "favorite" number and see...

--===== Test for 1000000 rows ==================================
GO
--===== Traditional RECURSIVE CTE method
WITH Tally (N) AS
(
SELECT 1 UNION ALL
SELECT 1 + N FROM Tally WHERE N < 1000000
)
SELECT N
INTO #Tally1
FROM Tally
OPTION (MAXRECURSION 0);
GO
--===== Traditional WHILE LOOP method
CREATE TABLE #Tally2 (N INT);
SET NOCOUNT ON;
DECLARE @Index INT;
SET @Index = 1;
WHILE @Index <= 1000000
BEGIN
INSERT #Tally2 (N)
VALUES (@Index);
SET @Index = @Index + 1;
END;
GO
--===== Traditional CROSS JOIN table method
SELECT TOP (1000000)
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS N
INTO #Tally3
FROM Master.sys.All_Columns ac1
CROSS JOIN Master.sys.ALL_Columns ac2;
GO
--===== Itzik's CROSS JOINED CTE method
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
E02(N) AS (SELECT 1 FROM E00 a, E00 b),
E04(N) AS (SELECT 1 FROM E02 a, E02 b),
E08(N) AS (SELECT 1 FROM E04 a, E04 b),
E16(N) AS (SELECT 1 FROM E08 a, E08 b),
E32(N) AS (SELECT 1 FROM E16 a, E16 b),
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
SELECT N
INTO #Tally4
FROM cteTally
WHERE N <= 1000000;
GO
--===== Housekeeping
DROP TABLE #Tally1, #Tally2, #Tally3, #Tally4;
GO

While we're at it, here's the numbers I get from SQL Profiler for the values of 100, 1000, 10000, 100000, and 1000000...

SPID TextData                                 Dur(ms) CPU   Reads   Writes
---- ---------------------------------------- ------- ----- ------- ------
51 --===== Test for 100 rows ============== 8 0 0 0
51 --===== Traditional RECURSIVE CTE method 16 0 868 0
51 --===== Traditional WHILE LOOP method CR 73 16 175 2
51 --===== Traditional CROSS JOIN table met 11 0 80 0
51 --===== Itzik's CROSS JOINED CTE method 6 0 63 0
51 --===== Housekeeping DROP TABLE #Tally 35 31 401 0

51 --===== Test for 1000 rows ============= 0 0 0 0
51 --===== Traditional RECURSIVE CTE method 47 47 8074 0
51 --===== Traditional WHILE LOOP method CR 80 78 1085 0
51 --===== Traditional CROSS JOIN table met 5 0 98 0
51 --===== Itzik's CROSS JOINED CTE method 2 0 83 0
51 --===== Housekeeping DROP TABLE #Tally 6 15 426 0

51 --===== Test for 10000 rows ============ 0 0 0 0
51 --===== Traditional RECURSIVE CTE method 434 344 80230 10
51 --===== Traditional WHILE LOOP method CR 671 563 10240 9
51 --===== Traditional CROSS JOIN table met 25 31 302 15
51 --===== Itzik's CROSS JOINED CTE method 24 0 192 15
51 --===== Housekeeping DROP TABLE #Tally 7 15 531 0

51 --===== Test for 100000 rows =========== 0 0 0 0
51 --===== Traditional RECURSIVE CTE method 4143 3813 800260 154
51 --===== Traditional WHILE LOOP method CR 5820 5547 101380 161
51 --===== Traditional CROSS JOIN table met 160 140 479 211
51 --===== Itzik's CROSS JOINED CTE method 153 141 276 204
51 --===== Housekeeping DROP TABLE #Tally 10 15 761 0

51 --===== Test for 1000000 rows ========== 0 0 0 0
51 --===== Traditional RECURSIVE CTE method 41349 37437 8001048 1601
51 --===== Traditional WHILE LOOP method CR 59138 56141 1012785 1682
51 --===== Traditional CROSS JOIN table met 1224 1219 2429 2101
51 --===== Itzik's CROSS JOINED CTE method 1448 1328 1217 2095
51 --===== Housekeeping DROP TABLE #Tally 8 0 415 0

As you can see, the Recursive CTE method is the second worst only to the While Loop for Duration and CPU and has 8 times the memory pressure in the form of logical reads than the While Loop. It's RBAR on steroids and should be avoided, at all cost, for any single row calculations just as a While Loop should be avoided. There are places where recursion is quite valuable but this ISN'T one of them.

As a side bar, Mr. Denny is absolutely spot on... a correctly sized permanent Numbers or Tally table is the way to go for most things. What does correctly sized mean? Well, most people use a Tally table to generate dates or to do splits on VARCHAR(8000). If you create an 11,000 row Tally table with the correct clustered index on "N", you'll have enough rows to create more than 30 years worth of dates (I work with mortgages a fair bit so 30 years is a key number for me) and certainly enough to handle a VARCHAR(8000) split. Why is "right sizing" so important? If the Tally table is used a lot, it easily fits in cache which makes it blazingly fast without much pressure on memory at all.

Last but not least, every one knows that if you create a permanent Tally table, it doesn't much matter which method you use to build it because 1) it's only going to be made once and 2) if it's something like an 11,000 row table, all of the methods are going to run "good enough". So why all the indigination on my part about which method to use???

The answer is that some poor guy/gal who doesn't know any better and just needs to get his or her job done might see something like the Recursive CTE method and decide to use it for something much larger and much more frequently used than building a permanent Tally table and I'm trying to protect those people, the servers their code runs on, and the company that owns the data on those servers. Yeah... it's that big a deal. It should be for everyone else, as well. Teach the right way to do things instead of "good enough". Do some testing before posting or using something from a post or book... the life you save may, in fact, be your own especially if you think a recursive CTE is the way to go for something like this. ;-)

Thanks for listening...

What is the best way to create and populate a numbers table?

here are some code examples taken from the web and from answers to this question.

For Each Method, I have modified the original code so each use the same table and column: NumbersTest and Number, with 10,000 rows or as close to that as possible. Also, I have provided links to the place of origin.

METHOD 1 here is a very slow looping method from here

avg 13.01 seconds

ran 3 times removed highest, here are times in seconds: 12.42, 13.60

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest(Number INT IDENTITY(1,1))
SET NOCOUNT ON
WHILE COALESCE(SCOPE_IDENTITY(), 0) < 100000
BEGIN
INSERT dbo.NumbersTest DEFAULT VALUES
END
SET NOCOUNT OFF
-- Add a primary key/clustered index to the numbers table
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE())/1000.0)+' seconds'
SELECT COUNT(*) FROM NumbersTest

METHOD 2 here is a much faster looping one from here

avg 1.1658 seconds

ran 11 times removed highest, here are times in seconds: 1.117, 1.140, 1.203, 1.170, 1.173, 1.156, 1.203, 1.153, 1.173, 1.170

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number INT NOT NULL);
DECLARE @i INT;
SELECT @i = 1;
SET NOCOUNT ON
WHILE @i <= 10000
BEGIN
INSERT INTO dbo.NumbersTest(Number) VALUES (@i);
SELECT @i = @i + 1;
END;
SET NOCOUNT OFF
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE())/1000.0)+' seconds'
SELECT COUNT(*) FROM NumbersTest

METHOD 3 Here is a single INSERT based on code from here

avg 488.6 milliseconds

ran 11 times removed highest, here are times in milliseconds: 686, 673, 623, 686,343,343,376,360,343,453

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number int not null)
;WITH Nums(Number) AS
(SELECT 1 AS Number
UNION ALL
SELECT Number+1 FROM Nums where Number<10000
)
insert into NumbersTest(Number)
select Number from Nums option(maxrecursion 10000)
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest

METHOD 4 here is a "semi-looping" method from here
avg 348.3 milliseconds (it was hard to get good timing because of the "GO" in the middle of the code, any suggestions would be appreciated)

ran 11 times removed highest, here are times in milliseconds: 356, 360, 283, 346, 360, 376, 326, 373, 330, 373

DROP TABLE NumbersTest
DROP TABLE #RunDate
CREATE TABLE #RunDate (RunDate datetime)
INSERT INTO #RunDate VALUES(GETDATE())
CREATE TABLE NumbersTest (Number int NOT NULL);
INSERT NumbersTest values (1);
GO --required
INSERT NumbersTest SELECT Number + (SELECT COUNT(*) FROM NumbersTest) FROM NumbersTest
GO 14 --will create 16384 total rows
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
SELECT CONVERT(varchar(20),datediff(ms,RunDate,GETDATE()))+' milliseconds' FROM #RunDate
SELECT COUNT(*) FROM NumbersTest

METHOD 5 here is a single INSERT from Philip Kelley's answer

avg 92.7 milliseconds

ran 11 times removed highest, here are times in milliseconds: 80, 96, 96, 93, 110, 110, 80, 76, 93, 93

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number int not null)
;WITH
Pass0 as (select 1 as C union all select 1), --2 rows
Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
--I removed Pass5, since I'm only populating the Numbers table to 10,000
Tally as (select row_number() over(order by C) as Number from Pass4)
INSERT NumbersTest
(Number)
SELECT Number
FROM Tally
WHERE Number <= 10000
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest

METHOD 6 here is a single INSERT from Mladen Prajdic answer

avg 82.3 milliseconds

ran 11 times removed highest, here are times in milliseconds: 80, 80, 93, 76, 93, 63, 93, 76, 93, 76

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number int not null)
INSERT INTO NumbersTest(Number)
SELECT TOP 10000 row_number() over(order by t1.number) as N
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number);
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest

METHOD 7 here is a single INSERT based on the code from here

avg 56.3 milliseconds

ran 11 times removed highest, here are times in milliseconds: 63, 50, 63, 46, 60, 63, 63, 46, 63, 46

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO NumbersTest
FROM sys.objects s1 --use sys.columns if you don't get enough rows returned to generate all the numbers you need
CROSS JOIN sys.objects s2 --use sys.columns if you don't get enough rows returned to generate all the numbers you need
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest

After looking at all these methods, I really like Method 7, which was the fastest and the code is fairly simple too.

MYSQL: Sequential Number Table

-- To use the bitwise solution you need a view of 2 to the power 25.
-- the following solution is derived from http://stackoverflow.com/questions/9751318/creating-a-numbers-table-in-mysql
-- the following solution ran in 43.8 seconds with the primary key, without it 4.56 seconds.

-- create a view that has 2 to the power 25 minus 1

-- 2 ^ 1
CREATE or replace VIEW `two_to_the_power_01_minus_1` AS select 0 AS `n` union all select 1 AS `1`;

-- 2 ^ 2
CREATE or replace VIEW `two_to_the_power_02_minus_1`
AS select
((`hi`.`n` << 1) | `lo`.`n`) AS `n`
from (`two_to_the_power_01_minus_1` `lo` join `two_to_the_power_01_minus_1` `hi`) ;

-- 2 ^ 4
CREATE or replace VIEW `two_to_the_power_04_minus_1`
AS select
((`hi`.`n` << 2 ) | `lo`.`n`) AS `n`
from (`two_to_the_power_02_minus_1` `lo` join `two_to_the_power_02_minus_1` `hi`) ;

-- 2 ^ 8
CREATE or replace VIEW `two_to_the_power_08_minus_1`
AS select
((`hi`.`n` << 4 ) | `lo`.`n`) AS `n`
from (`two_to_the_power_04_minus_1` `lo` join `two_to_the_power_04_minus_1` `hi`) ;

-- 2 ^ 12
CREATE or replace VIEW `two_to_the_power_12_minus_1`
AS select
((`hi`.`n` << 8 ) | `lo`.`n`) AS `n`
from (`two_to_the_power_08_minus_1` `lo` join `two_to_the_power_04_minus_1` `hi`) ;

-- 2 ^ 13
CREATE or replace VIEW `two_to_the_power_13_minus_1`
AS select
((`hi`.`n` << 1) | `lo`.`n`) AS `n`
from (`two_to_the_power_01_minus_1` `lo` join `two_to_the_power_12_minus_1` `hi`);



-- create a table to store the interim results for speed of retrieval
drop table if exists numbers_2_to_the_power_13_minus_1;

create table `numbers_2_to_the_power_13_minus_1` (
`i` int(11) unsigned
) ENGINE=myisam DEFAULT CHARSET=latin1 ;

-- faster 2 ^ 13
insert into numbers_2_to_the_power_13_minus_1( i )
select n from `two_to_the_power_13_minus_1` ;

-- faster 2 ^ 12
CREATE or replace view `numbers_2_to_the_power_12_minus_1`
AS select
`numbers_2_to_the_power_13_minus_1`.`i` AS `i`
from `numbers_2_to_the_power_13_minus_1`
where (`numbers_2_to_the_power_13_minus_1`.`i` < (1 << 12));

-- faster 2 ^ 25
CREATE or replace VIEW `numbers_2_to_the_power_25_minus_1`
AS select
((`hi`.`i` << 12) | `lo`.`i`) AS `i`
from (`numbers_2_to_the_power_12_minus_1` `lo` join `numbers_2_to_the_power_13_minus_1` `hi`);

-- create table for results

drop table if exists numbers ;

create table `numbers` (
`i` int(11) signed
, primary key(`i`)
) ENGINE=myisam DEFAULT CHARSET=latin1;

-- insert the numbers
insert into numbers(i)
select i from numbers_2_to_the_power_25_minus_1
where i <= 20000000 ;

drop view if exists numbers_2_to_the_power_25_minus_1 ;
drop view if exists numbers_2_to_the_power_12_minus_1 ;
drop table if exists numbers_2_to_the_power_13_minus_1 ;
drop view if exists two_to_the_power_13_minus_1 ;
drop view if exists two_to_the_power_12_minus_1 ;
drop view if exists two_to_the_power_08_minus_1 ;
drop view if exists two_to_the_power_04_minus_1 ;
drop view if exists two_to_the_power_02_minus_1 ;
drop view if exists two_to_the_power_01_minus_1 ;

Creating an auxiliary table to improve performance on a large MySQL table?

Edit: I said before not to use partitions. But Bill is right that the way he described would work. Your only concern would be if you tried to select across the 101 partitions, then the whole thing would come to a standstill. If you don't intend to do this then partitioning would solve the problem. Fix your indexes first though.

Your primary problem is that MyISAM is not the best engine, neither is InnoDB. TokuDB would be your best bet, but you'd have to install that on the server.

Now, you need to prune your indexes. This is the major reason for the slowness. Remove an index on everything that isn't part of common SELECT statements. Add an multi-column index on exactly what is requested in the WHERE of your SELECT statements.

So (in addition to your primary key) you want an index on datetime, device only as a multi-column index, according to your posted SELECT statement.

If you change to TokuDB the inserts will be much faster, if you stick with MyISAM then you could speed the whole thing up by using INSERT DELAYED instead of INSERT. The only issue with this is that the inserts will not be live, but will be added whenever MySQL decides there is not too much load.

Alternatively, if the above still does not help, your final option would be to use two tables. One table that you SELECT from, and another that you INSERT to. Once an day or so you would then copy the insert table to the select table. Though this means the data in your select table could be up to 24 hours old.

Other than that you would have to completely change the table structure, for which I can't tell you how to do because it depends on what you are using it for exactly, or use something other than MySQL for this. However, my above optimizations should work.

Generating a sequence in sql server

Just create an indexed permanent auxiliary numbers table and be done with it. This will out perform any other method.

See Jeff Moden's answer here for more details and a script to populate such a table. if for some reason that isn't an option this should beat the recursive CTE according to the performance tests in the linked answer.

   WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
E02(N) AS (SELECT 1 FROM E00 a, E00 b),
E04(N) AS (SELECT 1 FROM E02 a, E02 b),
E08(N) AS (SELECT 1 FROM E04 a, E04 b),
E16(N) AS (SELECT 1 FROM E08 a, E08 b),
E32(N) AS (SELECT 1 FROM E16 a, E16 b),
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
SELECT N FROM cteTally
WHERE N BETWEEN 10 AND 20


Related Topics



Leave a reply



Submit