Best way in MySQL or Rails to get AVG per day within a specific date range
Is there a reason (other than the date one already mentioned) why you wouldn't use the built-in group function capabilities in ActiveRecord? You seem to be concerned about "post-processing", which I don't think is really something to worry about.
You're in Rails, so you should probably be looking for a Rails solution first[1]. My first thought would be to do something like
Product.average(:sales_price, :group => "DATE(created_at)", :conditions => ["merchant_id=?", 1])
which ActiveRecord turned into pretty much the SQL you described. Assuming there's a declared has_many
association between Merchant and Product, then you'd probably be better using that, so something like:
ave_prices = Merchant.find(1).products.average(:sales_price, :group => "DATE(created_at)")
(I'm hoping that your description of the model as "products_sold" is some kind of transcription error, btw - if not, you're somewhat off-message with your class naming!)
After all that, you're back where you started, but you got there in a more conventional Rails way (and Rails really values conventions!). Now we need to fill in the gaps.
I'll assume you know your date range, let's say it's defined as all dates from from_date
to to_date
.
date_aves = (from_date..to_date).map{|dt| [dt, 0]}
That builds the complete list of dates as an array. We don't need the dates where we got an average:
ave_price_dates = ave_prices.collect{|ave_price| ave_price[0]} # build an array of dates
date_aves.delete_if { |dt| ave_price.dates.index(dt[0]) } # remove zero entries for dates retrieved from DB
date_aves.concat(ave_prices) # add the query results
date_aves.sort_by{|ave| ave[0] } # sort by date
That lot looks a bit cluttered to me: I think it could be terser and cleaner. I'd investigate building a Hash or Struct rather than staying in arrays.
[1] I'm not saying don't use SQL - situations do occur where ActiveRecord can't generate the most efficient query and you fall back on find_by_sql
. That's fine, it's supposed to be like that, but I think you should try to use it only as a last resort.
Generate array of daily avg values from db table (Rails)
Yes, that's a lot of unnecessary stuff when you can just go down to SQL to do it (I'm assuming you have a class called Round
in your app):
class Round
DAILY_AVERAGE_SELECT = "SELECT
DATE(rounds.created_at) AS day_date,
AVG(rounds.audio_points) AS audio_points
FROM rounds
WHERE rounds.user_id = ?
GROUP BY DATE(rounds.created_at)
"
def self.daily_average(user_id)
connection.select_all(sanitize_sql_array([DAILY_AVERAGE_SELECT, user_id]), "daily-average")
end
end
Doing this straight into the database will be faster (and also include less code) than doing it in ruby as you're doing now.
ruby on rails average per day
Try using
Date.today.at_beginning_of_month.to_s(:db)
and
Date.today.tomorrow.to_s(:db)
instead.
Get average of last 7 days
Am taking following as sample
CREATE TABLE some_data1 (unique_id text, date date, value integer);
INSERT INTO some_data1 (unique_id, date, value) VALUES
( 'a', '2014-03-20', 2),
( 'a', '2014-03-21', 2),
( 'a', '2014-03-22', 3),
( 'a', '2014-03-23', 5),
( 'a', '2014-03-24', 1),
( 'a', '2014-03-25', 0),
( 'a', '2014-03-26', 1),
( 'b', '2014-03-01', 1),
( 'b', '2014-03-02', 1),
( 'b', '2014-03-03', 1),
( 'b', '2014-03-04', 1),
( 'b', '2014-03-05', 1),
( 'b', '2014-03-06', 1),
( 'b', '2014-03-07', 1)
OPTION A : - Using PostgreSQL Specific Function WITH
with cte as (
select unique_id
,max(date) date
from some_data1
group by unique_id
)
select max(sd.unique_id),max(sd.date),avg(sd.value)
from some_data1 sd inner join cte using(unique_id)
where sd.date <=cte.date
group by cte.unique_id
limit 7
> SQLFIDDLE DEMO
OPTION B : - To work in PostgreSQL and MySQL
select max(sd.unique_id)
,max(sd.date)
,avg(sd.value)
from (
select unique_id
,max(date) date
from some_data1
group by unique_id
) cte inner join some_data1 sd using(unique_id)
where sd.date <=cte.date
group by cte.unique_id
limit 7
> SQLFDDLE DEMO
How to calculate the average per day for different years
Your decision to store a numerical increment range as text is not a good one, and ideally you should be storing the two points of the increment in separate columns. That being said, we can do some string olympics to work around this:
SELECT
YEAR(Year) AS Year,
COUNT(CASE WHEN 3 BETWEEN CAST(LEFT(Increment, CHARINDEX('-', Increment)-1) AS int) AND
CAST(RIGHT(Increment, LEN(Increment) - CHARINDEX('-', Increment)) AS int)
THEN 1 END) AS apple_3_cnt
FROM exampleTable
WHERE
TEXT LIKE '%apple%'
GROUP BY
YEAR(year);
Demo
Here I am aggregating by year, and then taking a conditional count of record, for each year, where the apple increment range contains 3. To do this, I separate out the two ends of the increment range, and then convert them to integers.
Edit:
Based on your updated table, we can try a simpler query:
SELECT
YEAR(Year) AS Year,
COUNT(CASE WHEN 3 BETWEEN setNum AND ExNum THEN 1 END) AS apple_3_cnt
FROM exampleTable
WHERE
TEXT LIKE '%apple%'
GROUP BY
YEAR(year);
Get records within specific year and month
Select * from dual
where created_at BETWEEN '2017-09-01' AND '2017-09-30'
Can be read as :
Select * from dual
where created_at >= '2017-09-01 00:00:00'
AND created_at <= '2017-09-30 00:00:00'
And 30 Sep 2017 22:10:00
is not under or equal at 2017-09-30 00:00:00
...
edit :
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE t
(`d` datetime)
;
INSERT INTO t
(`d`)
VALUES
('2017-09-30 00:00:00'),
('2017-09-30 02:02:02'),
('2017-09-20 12:12:12'),
('2017-09-08 21:21:21'),
('2017-09-08 00:00:00')
;
Query 1:
select date_format(d,'%Y-%m-%d %k:%i:%s') h from t
where d between '2017-09-08' and '2017-09-30'
order by d
Results:
| h |
|---------------------|
| 2017-09-08 0:00:00 |
| 2017-09-08 21:21:21 |
| 2017-09-20 12:12:12 |
| 2017-09-30 0:00:00 |
Query 2:
select date_format(d,'%Y-%m-%d %k:%i:%s') h from t
where d between '2017-09-08' and '2017-09-30 23:59:59'
order by d
Results:
| h |
|---------------------|
| 2017-09-08 0:00:00 |
| 2017-09-08 21:21:21 |
| 2017-09-20 12:12:12 |
| 2017-09-30 0:00:00 |
| 2017-09-30 2:02:02 |
How to get average values for time intervals in Postgres
DB design
While you can work with separate date
and time
columns, there is really no advantage over a single timestamp
column. I would adapt:
ALTER TABLE tbl ADD column ts timestamp;
UPDATE tbl SET ts = date + time; -- assuming actual date and time types
ALTER TABLE tbl DROP column date, DROP column time;
If date and time are not actual date
and time
data types, use to_timestamp()
. Related:
- Calculating Cumulative Sum in PostgreSQL
- How to convert "string" to "timestamp without time zone"
Query
Then the query is a bit simpler:
SELECT *
FROM (
SELECT sn, generate_series(min(ts), max(ts), interval '5 min') AS ts
FROM tbl
WHERE sn = '4as11111111'
AND ts >= '2018-01-01'
AND ts < '2018-01-02'
GROUP BY 1
) grid
CROSS JOIN LATERAL (
SELECT round(avg(vin1), 2) AS vin1_av
, round(avg(vin2), 2) AS vin2_av
, round(avg(vin3), 2) AS vin3_av
FROM tbl
WHERE sn = grid.sn
AND ts >= grid.ts
AND ts < grid.ts + interval '5 min'
) avg;
db<>fiddle here
Generate a grid of start times in the first subquery grid
, running from the first to the last qualifying row in the given time frame.
Join to rows that fall in each partition with a LATERAL
join and immediately aggregate averages in the subquery avg
. Due to the aggregates, it always returns a row even if no entries are found. Averages default to NULL
in this case.
The result includes all time slots between the first and last qualifying row in the given time frame. Various other result compositions would make sense, too. Like including all times slots in the given time frame or just time slots with actual values. All possible, I had to pick one interpretation.
Index
At least have this multicolumn index:
CRATE INDEX foo_idx ON tbl (sn, ts);
Or on (sn, ts, vin1, vin2, vin3)
to allow index-only scans - if some preconditions are met and especially if table rows are much wider than in the demo.
Closely related:
- Slow LEFT JOIN on CTE with time intervals
- Best way to count records by arbitrary time intervals in Rails+Postgres
Based on your original table
As requested and clarified in the comment, and later updated again in the question to include the columns mac
and loc
. I assume you want separate averages per (mac, loc)
.
date
and time
are still separate columns, vin* columns are type float
, and exclude time slots without rows:
The updated query also moves the set-returning function generate_series()
to the FROM
list, which is cleaner before Postgres 10:
SELECT t.mac, sn.sn, t.loc, ts.ts::time AS time, ts.ts::date AS date
, t.vin1_av, t.vin2_av, t.vin3_av
FROM (SELECT text '4as11111111') sn(sn) -- provide sn here once
CROSS JOIN LATERAL (
SELECT min(date+time) AS min_ts, max(date+time) AS max_ts
FROM tbl
WHERE sn = sn.sn
AND date+time >= '2018-01-01 0:0' -- provide time frame here
AND date+time < '2018-01-02 0:0'
) grid
CROSS JOIN LATERAL generate_series(min_ts, max_ts, interval '5 min') ts(ts)
CROSS JOIN LATERAL (
SELECT mac, loc
, round(avg(vin1)::numeric, 2) AS vin1_av -- cast to numeric for round()
, round(avg(vin2)::numeric, 2) AS vin2_av -- but rounding is optional
, round(avg(vin3)::numeric, 2) AS vin3_av
FROM tbl
WHERE sn = sn.sn
AND date+time >= ts.ts
AND date+time < ts.ts + interval '5 min'
GROUP BY mac, loc
HAVING count(*) > 0 -- exclude empty slots
) t;
Create a multicolumn expression index to support this:
CRATE INDEX bar_idx ON tbl (sn, (date+time));
db<>fiddle here
But I would much rather use timestamp
all along.
Get average for last month only
The answer depends on the exact definition of "last month" and the exact definition of "average count".
Assuming:
- Your column is defined
created_at timestamptz NOT NULL
- You want the average number of rows per day - days without any rows count as 0.
- Cover 30 days exactly, excluding today.
SELECT round(count(*)::numeric / 30, 2) -- simple now with a fixed number of days
FROM tbl
WHERE created_at >= (now()::date - 30)
AND created_at < now()::date -- excl. today
Rounding is optional, but you need numeric
instead of float
to use round()
this way.
Not including the current day ("today"), which is ongoing and may result in a lower, misleading average.
If "last month" is supposed to mean something else, you need to define it exactly. Months have between 28 and 31 days, this can mean various things. And since you obviously operate with timestamp
or timestamptz
, not date
, you also need to be aware of possible implications of the time of day and the current time zone. The cast to date
(or the definition of "day" in general) depends on your current timezone
setting while operating with timestamptz
.
Related:
- Ignoring timezones altogether in Rails and PostgreSQL
- Select today's (since midnight) timestamps only
- Subtract hours from the now() function
Related Topics
Oracle SQL:Get All Integers Between Two Numbers
In VS or of Oracle, Which Faster
Icalendar "Field" List (For Database Schema Based on Icalendar Standard)
Difference Between Select Unique and Select Distinct
Postgresql: Full Text Search - How to Search Partial Words
Sql-Server Performance: What Is Faster, a Stored Procedure or a View
Is Inner Join the Same as Equi-Join
Replacing Null and Empty String Within Select Statement
Select Rows with Maximum Column Value Group by Another Column
Space Used by Nulls in Database
How to Escape Square Brackets Inside Square Brackets for Field Name
SQL Convert Week Number to Date (Dd/Mm)
Storing Datetime (Utc) VS. Storing Datetimeoffset
Comparing Two Bitmasks in SQL to See If Any of the Bits Match