SQL2014 -- insert performance/optimization

hello everyone,
I have an insert performance quandary that I'm stumped on, and was wondering if any  SQL mavens would be able to provide some pointers   --  on  SQL2014 RTM, Developer Edition.  :-)

On an otherwise fairly beefy IO (4 SSD's in RAID0, on a fairly decent hardware RAID card, achieving approx. 2000 mb/sec, per SQLIO RR, RS, WS baselining of the system, and some 800mb/sec WR), on a  Xeon 2x4 cores @3.5ghz  and 128gb RAM DDR4 @2100mhz,  power settings @max,  I am only getting  about 20-30 mb/sec when inserting records, in a simple, unpartitioned, uncompressed, 5 column, unindexed heap table, below, in a database in either bulk or simple logging mode:

CREATE TABLE [dbo].[bigTransactionHistory](
 [TransactionID] [bigint] IDENTITY(1,1) NOT NULL,
 [ProductID] [int] NOT NULL,
 [TransactionDate] [date] NULL,
 [Quantity] [int] NULL,
 [ActualCost] [money] NULL
) ON [PRIMARY]

Basically, I am using Adam  Machanic's script, at http://sqlblog.com/blogs/adam_machanic/archive/2011/10/17/thinking-big-adventure.aspx  for a fairly large POC.

I am inserting some 31 mill records at a time,  in approx. 1 min 4 secs, some approx. 1.239 gb of data  -- this seems fairly slow, some approx. 20 mb/sec.

Without the slow RAND(CHECKSUM(NEWID())) , the insert executes in about 15 seconds, but still, that's about 80 mb/sec,  a far cry from what I *think* I'm  supposed to get.  (for IO verification, once records are inserted, when I'm scanning the data,  the throughput indeed reaches approx. 2000 mb/sec). 

Any thoughts on further tweaking the inserts to achieve *substantially* better performance? 
thanks a big bunch for any pointers,
Cos

Data pop. script: 

insert INTO bigTransactionHistory (productid, transactiondate, quantity, actualcost)
SELECT
 p1.ProductID,
 x.TransactionDate,
 x.Quantity,
 CONVERT(MONEY, p1.ListPrice * x.Quantity * RAND(CHECKSUM(NEWID())) * 2) AS ActualCost
FROM
(
 SELECT
  p.ProductID,
  p.ListPrice,
  CASE
   WHEN p.productid % 26 = 0 THEN 26
   WHEN p.productid % 25 = 0 THEN 25
   WHEN p.productid % 24 = 0 THEN 24
   WHEN p.productid % 23 = 0 THEN 23
   WHEN p.productid % 22 = 0 THEN 22
   WHEN p.productid % 21 = 0 THEN 21
   WHEN p.productid % 20 = 0 THEN 20
   WHEN p.productid % 19 = 0 THEN 19
   WHEN p.productid % 18 = 0 THEN 18
   WHEN p.productid % 17 = 0 THEN 17
   WHEN p.productid % 16 = 0 THEN 16
   WHEN p.productid % 15 = 0 THEN 15
   WHEN p.productid % 14 = 0 THEN 14
   WHEN p.productid % 13 = 0 THEN 13
   WHEN p.productid % 12 = 0 THEN 12
   WHEN p.productid % 11 = 0 THEN 11
   WHEN p.productid % 10 = 0 THEN 10
   WHEN p.productid % 9 = 0 THEN 9
   WHEN p.productid % 8 = 0 THEN 8
   WHEN p.productid % 7 = 0 THEN 7
   WHEN p.productid % 6 = 0 THEN 6
   WHEN p.productid % 5 = 0 THEN 5
   WHEN p.productid % 4 = 0 THEN 4
   WHEN p.productid % 3 = 0 THEN 3
   WHEN p.productid % 2 = 0 THEN 2
   ELSE 1
  END AS ProductGroup
 FROM bigproduct p
) AS p1
CROSS APPLY
(
 SELECT
  transactionDate,
  CONVERT(INT, (RAND(CHECKSUM(NEWID())) * 100) + 1) AS Quantity
 FROM
 (
  SELECT
   DATEADD(dd, number * 2, '20050101') AS transactionDate,
   NTILE(p1.ProductGroup) OVER
   (
    ORDER BY number
   ) AS groupRange
  FROM master..spt_values
  WHERE
   type = 'p'
 ) AS z
 WHERE
  z.groupRange % 2 = 1
) AS x

February 15th, 2015 9:58am

It is not clear to me what your concern is. Is it the INSERT performance as such, or the time it takes to construct the dataset?

If it is only the INSERT performance, you could test by copying the data to a second table. I would expect that this runs faster.

If it is the time it takes to construct the data that bothers you, there may be bottlenecks in the query plan, including things spilling to disk. Since you are on SQL 2014, you could test the effect of changing the compatibility level, to see if using the new or old cardinality estimator gives better performance.

I don't want to point out the obvious, but you did size the data file, the log file and tempdb in advance, didn't you?

Since I don't have the table bigproduct, I am not in position to make any tests myself.

Free Windows Admin Tool Kit Click here and download it now
February 15th, 2015 10:19am

That's 484,000 transactions per second (TPS).

That is astonishingly fast!

You have to realize, SQL Server has all kinds of thread synchronization overhead under the impression it's going to have a lot of disparate loads, all under ACID conditions, and was mostly designed for something in the single-digit thousands of TPS (or less), at least per-core, and then it would scale up with the log of the number of cores (slower than linearly).

I'm curious, does this load pretty much onto just one CPU, or does it spread significantly across?

Have you looked at perfmon counters to see if it builds up any kind of disk queues, in spite of the SSD?

Thanks,

Josh

ps - well actually, it's a single transaction, isn't it, but even the production via TSQL of 484k rows/second is pretty good, in similar fashion lightning-fast CPU operations are not what TSQL is all about.



  • Edited by JRStern 18 hours 31 minutes ago
February 15th, 2015 11:52am

Hello,

It's a sequential single insert, of course it isn't going to push the system. Use something such as SSIS, distributed replay, etc, with multiple concurrent threads to bulk load some data. A single unparalleled insert isn't going to push it to hard as you've found out.

Free Windows Admin Tool Kit Click here and download it now
February 15th, 2015 12:52pm

 thanks a big bunch Erland, for the tips!  You're correct, this seems to be a problem of both how  the data is  constructed as well as the INSERT part, probably COMPATIBILITY also,  which I need to dig into further, with the tips you and the other folks have provided!

thanks again,
Cos

February 15th, 2015 2:55pm

thanks for the help, Josh, I think Erland has closed in on the pursuit  :-) 
Free Windows Admin Tool Kit Click here and download it now
February 15th, 2015 3:01pm

hi  Sean, granted, I could spin off multiple TSQL commands to push the system :-)
thank you,
Cos

February 15th, 2015 3:03pm

I think Sean is very right when he points out the thing about single-threaded.

But it all depends on what your POC is all about.

Free Windows Admin Tool Kit Click here and download it now
February 15th, 2015 3:13pm

Also, since you are using SQL 2014, consider in-memory table with schema_only option as a staging table to absorb the high insertion rate as the main contention with high insertion rate is the 3 Ls (Locking, Latching and Logging). In-memory tables eliminate the need for locking and latching and incur minimal logging. However, creating in-memory table with schema_only option will eliminate logging too and is a perfect candidate as a staging table to absorb the high insertion rate. 
February 15th, 2015 4:42pm

absolutely right, Erland  -- depends on the POC. 

Basically, for now, I am trying to fill a db with about 10 bln records, as quick as possible, ultimately for a tabular cube (SSAS) prototype, which, right now, at about 5bln, I get a response time of between 1 to 5 seconds, overall, for selected queries.

Just spawning multiple T-SQL sessions to populate data should do the trick, for now.

thanks again for the help!

Free Windows Admin Tool Kit Click here and download it now
February 15th, 2015 4:57pm

In that case, I would say that all you need is patience. :-)

It would have been a different thing if the purpose of the POC had been to show that you can load data with a certain speed.

February 15th, 2015 5:09pm

ha ha ha -- very inspired. :-)  needing patience --  I like that, Erland.  

Yes, I will add a dash of patience to my ingredient list for the recipe. 

Free Windows Admin Tool Kit Click here and download it now
February 15th, 2015 8:43pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics