Best Way in MySQL or Rails to Get Avg Per Day Within a Specific Date Range

Best way in MySQL or Rails to get AVG per day within a specific date range

Is there a reason (other than the date one already mentioned) why you wouldn't use the built-in group function capabilities in ActiveRecord? You seem to be concerned about "post-processing", which I don't think is really something to worry about.

You're in Rails, so you should probably be looking for a Rails solution first[1]. My first thought would be to do something like

Product.average(:sales_price, :group => "DATE(created_at)", :conditions => ["merchant_id=?", 1])

which ActiveRecord turned into pretty much the SQL you described. Assuming there's a declared has_many association between Merchant and Product, then you'd probably be better using that, so something like:

ave_prices = Merchant.find(1).products.average(:sales_price, :group => "DATE(created_at)")

(I'm hoping that your description of the model as "products_sold" is some kind of transcription error, btw - if not, you're somewhat off-message with your class naming!)

After all that, you're back where you started, but you got there in a more conventional Rails way (and Rails really values conventions!). Now we need to fill in the gaps.

I'll assume you know your date range, let's say it's defined as all dates from from_date to to_date.

date_aves = (from_date..to_date).map{|dt| [dt, 0]}

That builds the complete list of dates as an array. We don't need the dates where we got an average:

ave_price_dates = ave_prices.collect{|ave_price| ave_price[0]} # build an array of dates
date_aves.delete_if { |dt| ave_price.dates.index(dt[0]) } # remove zero entries for dates retrieved from DB
date_aves.concat(ave_prices) # add the query results
date_aves.sort_by{|ave| ave[0] } # sort by date

That lot looks a bit cluttered to me: I think it could be terser and cleaner. I'd investigate building a Hash or Struct rather than staying in arrays.


[1] I'm not saying don't use SQL - situations do occur where ActiveRecord can't generate the most efficient query and you fall back on find_by_sql. That's fine, it's supposed to be like that, but I think you should try to use it only as a last resort.

Generate array of daily avg values from db table (Rails)

Yes, that's a lot of unnecessary stuff when you can just go down to SQL to do it (I'm assuming you have a class called Round in your app):

class Round

DAILY_AVERAGE_SELECT = "SELECT
DATE(rounds.created_at) AS day_date,
AVG(rounds.audio_points) AS audio_points
FROM rounds
WHERE rounds.user_id = ?
GROUP BY DATE(rounds.created_at)
"

def self.daily_average(user_id)
connection.select_all(sanitize_sql_array([DAILY_AVERAGE_SELECT, user_id]), "daily-average")
end

end

Doing this straight into the database will be faster (and also include less code) than doing it in ruby as you're doing now.

ruby on rails average per day

Try using

Date.today.at_beginning_of_month.to_s(:db)

and

Date.today.tomorrow.to_s(:db)

instead.

Get average of last 7 days

Am taking following as sample

CREATE TABLE some_data1 (unique_id text, date date, value integer);

INSERT INTO some_data1 (unique_id, date, value) VALUES
( 'a', '2014-03-20', 2),
( 'a', '2014-03-21', 2),
( 'a', '2014-03-22', 3),
( 'a', '2014-03-23', 5),
( 'a', '2014-03-24', 1),
( 'a', '2014-03-25', 0),
( 'a', '2014-03-26', 1),
( 'b', '2014-03-01', 1),
( 'b', '2014-03-02', 1),
( 'b', '2014-03-03', 1),
( 'b', '2014-03-04', 1),
( 'b', '2014-03-05', 1),
( 'b', '2014-03-06', 1),
( 'b', '2014-03-07', 1)

OPTION A : - Using PostgreSQL Specific Function WITH

with cte as (
select unique_id
,max(date) date
from some_data1
group by unique_id
)
select max(sd.unique_id),max(sd.date),avg(sd.value)
from some_data1 sd inner join cte using(unique_id)
where sd.date <=cte.date
group by cte.unique_id
limit 7

> SQLFIDDLE DEMO


OPTION B : - To work in PostgreSQL and MySQL

select max(sd.unique_id)
,max(sd.date)
,avg(sd.value)
from (
select unique_id
,max(date) date
from some_data1
group by unique_id
) cte inner join some_data1 sd using(unique_id)
where sd.date <=cte.date
group by cte.unique_id
limit 7

> SQLFDDLE DEMO

How to calculate the average per day for different years

Your decision to store a numerical increment range as text is not a good one, and ideally you should be storing the two points of the increment in separate columns. That being said, we can do some string olympics to work around this:

SELECT
YEAR(Year) AS Year,
COUNT(CASE WHEN 3 BETWEEN CAST(LEFT(Increment, CHARINDEX('-', Increment)-1) AS int) AND
CAST(RIGHT(Increment, LEN(Increment) - CHARINDEX('-', Increment)) AS int)
THEN 1 END) AS apple_3_cnt
FROM exampleTable
WHERE
TEXT LIKE '%apple%'
GROUP BY
YEAR(year);

Sample Image

Demo

Here I am aggregating by year, and then taking a conditional count of record, for each year, where the apple increment range contains 3. To do this, I separate out the two ends of the increment range, and then convert them to integers.

Edit:

Based on your updated table, we can try a simpler query:

SELECT
YEAR(Year) AS Year,
COUNT(CASE WHEN 3 BETWEEN setNum AND ExNum THEN 1 END) AS apple_3_cnt
FROM exampleTable
WHERE
TEXT LIKE '%apple%'
GROUP BY
YEAR(year);

Get records within specific year and month

Select * from dual 
where created_at BETWEEN '2017-09-01' AND '2017-09-30'

Can be read as :

Select * from dual 
where created_at >= '2017-09-01 00:00:00'
AND created_at <= '2017-09-30 00:00:00'

And 30 Sep 2017 22:10:00 is not under or equal at 2017-09-30 00:00:00...

edit :

SQL Fiddle

MySQL 5.6 Schema Setup:

CREATE TABLE t
(`d` datetime)
;

INSERT INTO t
(`d`)
VALUES
('2017-09-30 00:00:00'),
('2017-09-30 02:02:02'),
('2017-09-20 12:12:12'),
('2017-09-08 21:21:21'),
('2017-09-08 00:00:00')
;

Query 1:

select date_format(d,'%Y-%m-%d %k:%i:%s') h from t
where d between '2017-09-08' and '2017-09-30'
order by d

Results:

|                   h |
|---------------------|
| 2017-09-08 0:00:00 |
| 2017-09-08 21:21:21 |
| 2017-09-20 12:12:12 |
| 2017-09-30 0:00:00 |

Query 2:

select date_format(d,'%Y-%m-%d %k:%i:%s') h from t
where d between '2017-09-08' and '2017-09-30 23:59:59'
order by d

Results:

|                   h |
|---------------------|
| 2017-09-08 0:00:00 |
| 2017-09-08 21:21:21 |
| 2017-09-20 12:12:12 |
| 2017-09-30 0:00:00 |
| 2017-09-30 2:02:02 |

How to get average values for time intervals in Postgres

DB design

While you can work with separate date and time columns, there is really no advantage over a single timestamp column. I would adapt:

ALTER TABLE tbl ADD column ts timestamp;
UPDATE tbl SET ts = date + time; -- assuming actual date and time types
ALTER TABLE tbl DROP column date, DROP column time;

If date and time are not actual date and time data types, use to_timestamp(). Related:

  • Calculating Cumulative Sum in PostgreSQL
  • How to convert "string" to "timestamp without time zone"

Query

Then the query is a bit simpler:

SELECT *
FROM (
SELECT sn, generate_series(min(ts), max(ts), interval '5 min') AS ts
FROM tbl
WHERE sn = '4as11111111'
AND ts >= '2018-01-01'
AND ts < '2018-01-02'
GROUP BY 1
) grid
CROSS JOIN LATERAL (
SELECT round(avg(vin1), 2) AS vin1_av
, round(avg(vin2), 2) AS vin2_av
, round(avg(vin3), 2) AS vin3_av
FROM tbl
WHERE sn = grid.sn
AND ts >= grid.ts
AND ts < grid.ts + interval '5 min'
) avg;

db<>fiddle here

Generate a grid of start times in the first subquery grid, running from the first to the last qualifying row in the given time frame.

Join to rows that fall in each partition with a LATERAL join and immediately aggregate averages in the subquery avg. Due to the aggregates, it always returns a row even if no entries are found. Averages default to NULL in this case.

The result includes all time slots between the first and last qualifying row in the given time frame. Various other result compositions would make sense, too. Like including all times slots in the given time frame or just time slots with actual values. All possible, I had to pick one interpretation.

Index

At least have this multicolumn index:

CRATE INDEX foo_idx ON tbl (sn, ts);

Or on (sn, ts, vin1, vin2, vin3) to allow index-only scans - if some preconditions are met and especially if table rows are much wider than in the demo.

Closely related:

  • Slow LEFT JOIN on CTE with time intervals
  • Best way to count records by arbitrary time intervals in Rails+Postgres

Based on your original table

As requested and clarified in the comment, and later updated again in the question to include the columns mac and loc. I assume you want separate averages per (mac, loc).

date and time are still separate columns, vin* columns are type float, and exclude time slots without rows:

The updated query also moves the set-returning function generate_series() to the FROM list, which is cleaner before Postgres 10:

SELECT t.mac, sn.sn, t.loc, ts.ts::time AS time, ts.ts::date AS date
, t.vin1_av, t.vin2_av, t.vin3_av
FROM (SELECT text '4as11111111') sn(sn) -- provide sn here once
CROSS JOIN LATERAL (
SELECT min(date+time) AS min_ts, max(date+time) AS max_ts
FROM tbl
WHERE sn = sn.sn
AND date+time >= '2018-01-01 0:0' -- provide time frame here
AND date+time < '2018-01-02 0:0'
) grid
CROSS JOIN LATERAL generate_series(min_ts, max_ts, interval '5 min') ts(ts)
CROSS JOIN LATERAL (
SELECT mac, loc
, round(avg(vin1)::numeric, 2) AS vin1_av -- cast to numeric for round()
, round(avg(vin2)::numeric, 2) AS vin2_av -- but rounding is optional
, round(avg(vin3)::numeric, 2) AS vin3_av
FROM tbl
WHERE sn = sn.sn
AND date+time >= ts.ts
AND date+time < ts.ts + interval '5 min'
GROUP BY mac, loc
HAVING count(*) > 0 -- exclude empty slots
) t;

Create a multicolumn expression index to support this:

CRATE INDEX bar_idx ON tbl (sn, (date+time));

db<>fiddle here

But I would much rather use timestamp all along.

Get average for last month only

The answer depends on the exact definition of "last month" and the exact definition of "average count".

Assuming:

  • Your column is defined created_at timestamptz NOT NULL
  • You want the average number of rows per day - days without any rows count as 0.
  • Cover 30 days exactly, excluding today.

SELECT round(count(*)::numeric / 30, 2)  -- simple now with a fixed number of days
FROM tbl
WHERE created_at >= (now()::date - 30)
AND created_at < now()::date -- excl. today

Rounding is optional, but you need numeric instead of float to use round() this way.

Not including the current day ("today"), which is ongoing and may result in a lower, misleading average.

If "last month" is supposed to mean something else, you need to define it exactly. Months have between 28 and 31 days, this can mean various things. And since you obviously operate with timestamp or timestamptz, not date, you also need to be aware of possible implications of the time of day and the current time zone. The cast to date (or the definition of "day" in general) depends on your current timezone setting while operating with timestamptz.

Related:

  • Ignoring timezones altogether in Rails and PostgreSQL
  • Select today's (since midnight) timestamps only
  • Subtract hours from the now() function


Related Topics



Leave a reply



Submit