Number of distinct users in the last 3, 6, 9, 12 months SQL?

Hi,

I have table A(year int, month int, user varchar(20)), and I am trying to write a view that would show number of distinct users in the last 3 months, last 6 months, last 9 months and last 12 months (all from the most recent year/month) in following format:

                                       3 months     6 months     9 months     12  months

__________________________________________________________

number of distinct users        x                  y                  z                      w

Any ideas?

Tnx      

August 25th, 2015 9:48pm

Convert your year/month to a date and do date calculations to determine the intervals. Would be much nicer than attempting to do case against years and months
Free Windows Admin Tool Kit Click here and download it now
August 25th, 2015 10:54pm

Hi Pedja,

Please see below sample.

CREATE TABLE [A] (
    [Year] INTEGER NULL,
    [Month] INTEGER NULL,
    [User] VARCHAR(255) NULL);

GO

INSERT INTO A([Year],[Month],[User]) 
VALUES
(2014,8,'Giacomo'),
(2014,9,'Leila'),
(2014,10,'Melodie'),
(2014,11,'Cora'),
(2014,12,'Amanda'),
(2015,1,'Stone'),
(2015,2,'Kaitlin'),
(2015,3,'Gloria'),
(2015,4,'Petra'),
(2015,5,'Kirsten'),
(2015,6,'Cally'),
(2015,7,'Alexander'),
(2015,8,'Ciaran');


;WITH Cte AS
(
SELECT CAST(LTRIM(STR(Year))+'-'+LTRIM(STR(Month))+'-01' AS DATE) DT, [User] FROM A
)
,Cte2 AS
(
SELECT DT,[USER],(DENSE_RANK() OVER(ORDER BY DT)-1)/3 GrpBy FROM Cte WHERE DT>= DATEADD(MONTH,DATEDIFF(MONTH,0,GETDATE())-12,0)
)
,Cte3 AS
(
SELECT GrpBy,COUNT([User]) UserCnt FROM CTE2 GROUP BY GrpBy
)
,Cte4 AS
(
SELECT GrpBy,UserCnt2
 FROM CTE3 c3
CROSS APPLY
(
SELECT SUM(UserCnt) UserCnt2 FROM Cte3 WHERE GrpBy<=c3.GrpBy
) c  
)
SELECT [0] AS [3 Months],[1] AS [6 Months],[2] AS [9 Months],[3] AS [12 Months] FROM CTE4
PIVOT
(
MAX(UserCnt2) FOR GrpBy IN([0],[1],[2],[3])
)p

--DROP TABLE A 

 

If you have any question, feel free to let me know.
August 26th, 2015 2:13am

Convert your year/month to a date and do date calculations to determine the intervals. Would be much nicer than attempting to do case against years and months
Free Windows Admin Tool Kit Click here and download it now
August 26th, 2015 2:51am

Hi,

I am using Eric DDL with some more data (since we need to check the distinct work OK)

-- DDL+DML

CREATE TABLE [A] (
    [Year] INTEGER NULL,
    [Month] INTEGER NULL,
    [User] VARCHAR(255) NULL);

GO

truncate table A
INSERT INTO A([Year],[Month],[User]) 
VALUES
(2014,8,'Giacomo'),
(2014,9,'Leila'),
(2014,10,'Melodie'),
(2014,11,'Cora'),
(2014,12,'Amanda'),
(2015,1,'Stone'),
(2015,2,'Kaitlin'),
(2015,3,'Gloria'),
(2015,4,'Petra'),
(2015,5,'Kirsten'),
(2015,6,'Cally'),
(2015,7,'Alexander'),
(2015,8,'Ciaran'),

(2014,2,'Amanda'),
(2014,10,'Melodie'),
(2015,7,'Petra'),
(2015,7,'Ciaran');

select * from A

please try this:

declare @CurrentYear int = datepart(YEAR,GETDATE())
declare @CurrentMonth int = datepart(MONTH,GETDATE())
-- select @CurrentMonth,@CurrentYear
select 
	'number of distinct users', 
	[3 Months] = (
	select count(distinct [user])from A 
	where 
		([Year] = @CurrentYear and [Month] > @CurrentMonth-3) -- data from this year
		or 
		-- if @CurrentMonth < 3 then we need data from previous year as well:
		-- If the "@CurrentMonth > 3" then this part "12 + (@CurrentMonth - 3)" is more then 12 and will not return any row
		-- but If the "@CurrentMonth < 3" then this part "12 + (@CurrentMonth - 3)" is less then 12 and will return the missing data from previous year
		([Year] = @CurrentYear - 1 and [Month] > 12 + (@CurrentMonth - 3)) -- data from previous year
	), 
	-- same with 9 and 12
	[6 Months] = (
	select count(distinct [user]) from A 
	where 
		([Year] = @CurrentYear and [Month] > @CurrentMonth-6) -- data from this year
		or 
		([Year] = @CurrentYear - 1 and [Month] > 12 + (@CurrentMonth - 6)) -- data from previous year
	), 
	[9 Months] = (
	select count(distinct [user]) from A 
	where 
		([Year] = @CurrentYear and [Month] > @CurrentMonth-9) -- data from this year
		or 
		([Year] = @CurrentYear - 1 and [Month] > 12 + (@CurrentMonth - 9)) -- data from previous year
	), 
	[12 Months] = (
	select count(distinct [user]) from A 
	where 
		([Year] = @CurrentYear and [Month] > @CurrentMonth-12) -- data from this year
		or 
		([Year] = @CurrentYear - 1 and [Month] > 12 + (@CurrentMonth - 12)) -- data from previous year
	)

* My first solution do not using any calculation on the original columns [Year],[Month] so if you have (and you probably should have) index on these column the query will probably use the index (I am not saying it will use the index if there are)

 ** we have no information regarding your real DDL, therefore, I can not say which query will be best or if indexes are relevant in your case, or if this query will use indexes in your case :-(

Another solution can be using CONVERT, which In some cases you can just use this as well, but check which is best in your specific case!

declare @FirstDayOfTheMonth DATE = DATEADD(DAY,1-(DATEPART(DAY,GETDATE())),GETDATE())
SELECT 
	'number of distinct users',
	[3 Months] = (
		select COUNT(DISTINCT [User])
		from A
		where DATEFROMPARTS([Year],[Month],01) > DATEADD(MONTH,-3,@FirstDayOfTheMonth)
	),
	[6 Months] = (
		select COUNT(DISTINCT [User])
		from A
		where DATEFROMPARTS([Year],[Month],01) > DATEADD(MONTH,-6,@FirstDayOfTheMonth)
	),
	[9 Months] = (
		select COUNT(DISTINCT [User])
		from A
		where DATEFROMPARTS([Year],[Month],01) > DATEADD(MONTH,-9,@FirstDayOfTheMonth)
	),
	[12 Months] = (
		select COUNT(DISTINCT [User])
		from A
		where DATEFROMPARTS([Year],[Month],01) > DATEADD(MONTH,-12,@FirstDayOfTheMonth)
	)
GO

August 26th, 2015 5:04am

Hi RyanAB,

This is not always a good idea :-)

by converting/calculating the column before you use it for the filter, you might eliminate the indexes

if you have indexes on these columns (AND YOU SHOULD HAVE INDEXES probably),then by using filter on the calculated value, you must first calculate the value, the SQL Server will not have option to use the indexes since it must first scan all and do the calculation and only on the result of the calculation it can use the filter (it will need at least index scan instead of index seek pro

Free Windows Admin Tool Kit Click here and download it now
August 26th, 2015 5:12am

Try

declare @a table (
    [Year] INTEGER not null,
    [Month] INTEGER not NULL,
    [User] VARCHAR(255) not NULL);



INSERT INTO @a([Year],[Month],[User]) 
VALUES
(2014,8,'Giacomo'),
(2014,9,'Leila'),
(2014,9,'Jim'),
(2014,10,'Melodie'),
(2014,11,'Cora'),
(2014,12,'Amanda'),
(2014,12,'Melodie'),
(2015,1,'Stone'),
(2015,2,'Kaitlin'),
(2015,3,'Gloria'),
(2015,4,'Petra'),
(2015,5,'Kirsten'),
(2015,5,'Gloria'),
(2015,6,'Cally'),
(2015,6,'Jim'),
(2015,7,'Alexander'),
(2015,8,'Ciaran');

;with cte as (select [User] as UserId, DATEFROMPARTS([Year],[Month],1) as dt from @a)

select 
count(distinct (case when dt between dateadd(month,-3, CURRENT_TIMESTAMP) and CURRENT_TIMESTAMP then UserId end)) as last3MonthsCount,
count(distinct (case when dt between dateadd(month,-6, CURRENT_TIMESTAMP) and CURRENT_TIMESTAMP then UserId end)) as last6MonthsCount,
count(distinct (case when dt between dateadd(month,-9, CURRENT_TIMESTAMP) and CURRENT_TIMESTAMP then UserId end)) as last9MonthsCount
from cte 

August 26th, 2015 11:18am

Hi,

Please check the execution plan comparing

* the first query is my second solution and the second query is Naomi's.
Tested on SQL 2014

** I added to Naomi's solution the last colum to get 12 month as the OP asked

"count(distinct (case when dt between dateadd(month,-12, CURRENT_TIMESTAMP) and CURRENT_TIMESTAMP then UserId end)) as last12MonthsCount "

1. without indexes:

2. with index 

CREATE NONCLUSTERED INDEX A_User_Year_Month_Inx on A([user],[Year],[Month])

* you should check other DDL as well.

** You should check the IO as well!
Sometimes the execution plan percentage misleading.

Free Windows Admin Tool Kit Click here and download it now
August 26th, 2015 12:20pm

Select distinct Sum(Case when dtMonth=0 Then  1   End) Over() [3MONTH],
Sum(Case when dtMonth IN (0,1)  Then  1   End) Over() [6MONTH],
Sum(Case when dtMonth IN (0,1,2)  Then 1   End) Over() [9MONTH], 
Sum(Case when dtMonth IN (0,1,2,3)  Then 1   End) Over() [12MONTH] 
 from (
select  distinct [User], datediff(month,datefromparts([Year],[Month],1),getdate())/3 dtMonth 
from [A]
) t
 

August 26th, 2015 3:13pm

Hi RyanAB,

This is not always a good idea :-)

by converting/calculating the column before you use it for the filter, you might eliminate the indexes

if you have indexes on these columns (AND YOU SHOULD HAVE INDEXES probably),then by using filter on the calculated value, you must first calculate the value, the SQL Server will not have option to use the indexes since it must first scan all and do the calculation and only on the result of the calculation it can use the filter (it will need at least index scan instead of index seek pro

Free Windows Admin Tool Kit Click here and download it now
August 26th, 2015 4:42pm

I agree with you that for the long term you might need to think about a better solution (maybe even using different DDL) :-)

Unfortunately we don't have the real DDL here since the OP did not post it :-(
so we are just talking theoretically, and I had to mentioned that this solution is not always a good idea :-)


* I posted two full solutions code anyway which the first one do not use this approach

** "nicer" is not an issue here i think.
     I care about performance first :-)

Anyway, WELL DONE RyanAB
I
t is a solution to the question (we do not have to post full code each time), and till we will get DDL we can only talk about performance in

August 26th, 2015 5:18pm

Good day Jingyang Li

Yes, your query is faster but is not returning the same result :-)
Your solution return the same result as Eric's solution.
This is why I added several more rows from the start (to create duplicate rows).

Assuming that I understood the issue, then your code return duplicate rows.
Please compare the results in your code with the results in my and in Naomi codes.

This is what I got (and Naomi as well):

3 Months    6 Months    9 Months    12 Months
----------- ----------- ----------- -----------
4           6           9           12

This is what your code return on the same data (my DML as above):

3MONTH      6MONTH      9MONTH      12MONTH
----------- ----------- ----------- -----------
4           7           10          13

This s all the information i used ordered by date

---------- ----------
Amanda     2014-02-01
Giacomo    2014-08-01
Leila      2014-09-01
Melodie    2014-10-01
Melodie    2014-10-01
Cora       2014-11-01
Amanda     2014-12-01
Stone      2015-01-01
Kaitlin    2015-02-01
Gloria     2015-03-01
Petra      2015-04-01
Kirsten    2015-05-01
Cally      2015-06-01
Ciaran     2015-07-01
Alexander  2015-07-01
Petra      2015-07-01
Ciaran     2015-08-01

we supposed to get these rows for the last 6 month (8 rows return in the last 6 month):

Gloria     2015-03-01
Petra      2015-04-01

Kirsten    2015-05-01
Cally      2015-06-01
Ciaran     2015-07-01
Alexander  2015-07-01
Petra      2015-07-01
Ciaran     2015-08-01

but the user Ciaran has duplicate rows and the user Petra as well, therefore the last result should be 6 and not 7 for the last six month. If I notice correctly then your query only remove duplicate in the first 3 month therefore you cleared the duplication of the user Ciaran but stayed with 2 rows for the user Petra (therefore you got 8-1 = 7 instead 8-2 = 6)

* I hope I understood the OP question :-)
And that this is

Free Windows Admin Tool Kit Click here and download it now
August 26th, 2015 5:44pm

I don't think there is a plural form of the 'code' word. I can double check, but I am pretty sure there isn't one.

Check this page (the first google result)

http://english.stackexchange.com/questions/20455/is-it-wrong-to-use-the-word-codes-in-a-programming-conte

August 26th, 2015 6:10pm

Check the code. Don't use 'Codes' word at all, it's incorrect. I also added a link to the discussion about the usage.

I think in Hebrew there are lots of words having only plural or only singular form as well. Say, 'maim', 'shamaim' - only plural, right? I have to admit that my Hebrew is very weak, almost non-existent :(

Each language has its rules and exceptions.

August 26th, 2015 6:21pm

LOL :-)
I need to ask my father regarding the Hebrew (he used to teach Hebrew grammar and Literature).
I am not sure regarding these Hebrew examples as well :-)

Anyway, thanks for the fix, +1
so from now on "CODE" instead of "CODES" :-)

Thanks Naomi

Free Windows Admin Tool Kit Click here and download it now
August 26th, 2015 6:34pm

** "nicer" is not an issue here i think.
     I care about performance first :-)

There's always a trade-off between performance and maintainability. I agree something needs to perform well but if we're talking seconds, not really a major issue. I would take the ease of reading date manipulation over playing with years and months... I'd just make sure it's done in a way that still performs well
August 26th, 2015 6:58pm

When your database size is almost 100 TB (97TB to be more accurate, but it was over 100 TB for short time) performance is THE ISSUE always!

one second better, means that someone/something else can be executed better! You are not the only user in the database (at least in most of my cases)

0.03 second for 1 query that executed 10K times in one second is an issue!

:-)

But it is true that There's trade-off between performance and maintainab

Free Windows Admin Tool Kit Click here and download it now
August 26th, 2015 7:29pm

If the OP is querying 100TB then you have a point but until then... I'm not saying it's fine to take 5 minutes when loss of readability brings it to 5 seconds but don't get too hung up on performance when you could spend more time eeking a second out of a query than you'd ever gain from that improvement... that's for the OP (and crew) to determine where the trade off happens
August 26th, 2015 8:31pm

Please follow basic Netiquette and post the DDL we need to answer this. Follow industry AND ANSI/ISO standards in your data. You should follow ISO-11179 rules for naming data elements. I have tried to guess at corrections. You should follow ISO-8601 rules for displaying temporal data (https://xkcd.com/1179/). We need to know the data types, keys and constraints on the table. Avoid dialect in favor of ANSI/ISO Standard SQL; 

>> I have table A(year int, month int, user varchar(20)),<<

This is a rude, useless narrative without a key. And it is completely wrong. This is a classic design error called attribute splitting.  You split a temperate interval into two columns.  

A useful idiom is a month period calendar. It gives a name to a range of dates. The worst way would be to use temporal math; it tells the world you do not think in sets or understand declarative programming yet. Here is a skeleton:  

CREATE TABLE Month_Periods
(month_name CHAR(10) NOT NULL PRIMARY KEY
  CHECK(month_name LIKE '[12][0-9][0-9][0-9]-[0-1][0-9]-00'),
 month_start_date DATE NOT NULL,
 month_end_date DATE NOT NULL,
 CONSTRAINT date_ordering
 CHECK (month_start_date <= month_end_date),
 ordinal_period INTEGER NOT NULL UNIQUE 
  CHECK(ordinal_period > 0)
etc);

I like the MySQL convention of using double zeroes for months and years, That is 'yyyy-mm-00' for a month within a year and 'yyyy-00-00' for the whole year. The advantage is that it will sort with the ISO-8601 data format required by Standard SQL. 

>>  I am trying to write a view that would show number of distinct users in the last 3 months, last 6 months, last 9 months and last 12 months (all from the most recent year/month) in following format: << 

NO! we do not do display or report formatting in the database. 

CREATE TABLE Log
(user_id CHAR(10) NOT NULL,
 log_date DATE NOT NULL,
 PRIMARY KEY (user_id, log_date))
CREATE VIEW Month_Log_Summary
AS
SELECT M.month_name, COUNT(DISTINCT L.user_id) AS unique_user_cnt
  FROM Month_Periods AS M,
       Log AS L
 WHERE L.log_date BETWEEN M.month_start_date AND M.month_end_date)
GROUP BY  M.month_
Free Windows Admin Tool Kit Click here and download it now
August 26th, 2015 11:12pm

This is may be simplest query you can use.

Declare @a table(yr int,[month] int,[user] varchar(20))
insert into @A values(2015,7,'user1'),(2015,3,'user2'),(2015,2,'user3'),(2014,10,'user4'),(2014,8,'user5')

Select 
'Unique users',
count( distinct case when datediff(m,dt,getdate())<=3 then [User] else null  end) months3,
count(distinct case when datediff(m,dt,getdate())<=6 then [User] else null  end) months6,
count(distinct case when datediff(m,dt,getdate())<=12 then [User] else null  end) months12
from (Select * ,convert(date,cast(yr as varchar(4))+ '-'+ right('0'+ cast([month] as varchar(3)),2)+'-'+ +'01') dt
from @a) bc
where bc.dt >= dateadd(YY,-1,getdate())
Best luck

August 27th, 2015 1:11am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics