Sql-Style Group by Aggregate Functions in Jq (Count, Sum and etc)

SQL aggregate function when count(*)=1 so there can be only one value

You could, instead, use a windowed COUNT and then filter based on that:

WITH CTE AS(
SELECT ba.book_id,
ba.author_id,
COUNT(ba.book_id) OVER (PARTITION BY ba.book_id) AS Authors
FROM dbo.book_authors ba)
SELECT c.book_id,
c.author_id
FROM CTE c
WHERE c.Authors = 1;

An alternative method would be to use a correlated subquery:

SELECT ba.book_id,
ba.author_id
FROM dbo.book_authors ba
WHERE EXISTS (SELECT 1
FROM dbo.book_authors e
WHERE e.book_id = ba.book_id
GROUP BY e.book_id
HAVING COUNT(*) = 1);

I have not tested performance on either with a decent amount of data, however, I would hope that for a correlated subquery with a well indexed table, you should see better performance.

How do I sum the values in an array of maps in jq?

Unless your jq has inputs, you will have to slurp the objects up using the -s flag. Then you'll have to do a fair amount of manipulation:

  1. Each of the objects needs to be mapped out to key/value pairs
  2. Flatten the pairs to a single array
  3. Group up the pairs by key
  4. Map out each group accumulating the values to a single key/value pair
  5. Map the pairs back to an object
map(to_entries)
| add
| group_by(.key)
| map({
key: .[0].key,
value: map(.value) | add
})
| from_entries

With jq 1.5, this could be greatly improved: You can do away with slurping and just read the inputs directly.

$ jq -n '
reduce (inputs | to_entries[]) as {$key,$value} ({}; .[$key] += $value)
' input.json

Since we're simply accumulating all the values in each of the objects, it'll be easier to just run through the key/value pairs of all the inputs, and add them all up.

MySQL - any advantage using LIMIT 1 with an aggregate function like COUNT or SUM?

There is no reason to include LIMIT 1. An aggregation query with no GROUP BY always returns exactly one row. The optimizer knows this, so LIMIT provides zero additional information.

In fact, by putting a LIMIT in, you are being a bit misleading, because you are suggesting that it could return more than one row. At some later point, someone might ask "where is the GROUP BY?" or something like that.

jq count the number of items in json by a specific key

Here's one solution (assuming the input is a stream of valid JSON objects) and that you invoke jq with the -s option:

map({ItemId: .Properties.ItmId})             # extract the ItmID values
| group_by(.ItemId) # group by "ItemId"
| map({ItemId: .[0].ItemId, Count: length}) # store the counts
| .[] # convert to a stream

A slightly more memory-efficient approach would be to use inputs if your jq has it; but in that case, use -n instead of -s, and replace the first line above by: [inputs | {ItemId: .Properties.ItmId} ]

Efficient solution

The above solutions use the built-in group_by, which is convenient but leads to easily-avoided inefficiencies. Using the following counter makes it easy to write a very efficient solution:

def counter(stream):
reduce stream as $s ({}; .[$s|tostring] += 1);

Using the -n command-line option, and applied as follows:

counter(inputs | .Properties.ItmId)

this leads to a dictionary of counts:

{
"1694738780": 1,
"1347809133": 1
}

Such a dictionary is probably more useful than a stream of singleton objects as envisioned by the OP, but if such as stream is needed, one can modify the above as follows:

counter(inputs | .Properties.ItmId)
| to_entries[]
| {ItemId: (.key), Count: .value}

How to Get Aggregate Data by Time Slice (sum, avg, min, max, etc.) in Rails 3

Unfortunately I've never used Postgres, so this solution works in MySQL. But I think you can find out Postgres analogs.

class Counter < ActiveRecord::Base
has_many :samples do
# default 30 minutes
def per_time_slice(slice = 30)
start = "2000-01-01 00:00:00"
self.select("*,
CONCAT( FLOOR(TIMESTAMPDIFF(MINUTE,'#{start}',created_at)/#{slice})*#{slice},
(FLOOR(TIMESTAMPDIFF(MINUTE,'#{start}',created_at)/#{slice})+1)*#{slice} ) as slice,
avg(value) as avg_value,
min(value) as min_value,
max(value) as max_value,
sum(value) as sum_value,
count(value) as count_value").
group("slice").order("slice")
end
end
end

Usage

counter = find_some_counter
samples = counter.samples.per_time_slice(60).where(:name => "Bobby")
samples.map(&:avg_value)
samples.map(&:min_value)
samples.map(&:max_value)

etc

how do i count up when there is a unique value but when there is a duplicate value the count remains the same

Tables are unordered sets, so for this to work you need a column that defines the order of the rows so that you can check in that order if the value of the column Question changes or not.

I don't see such column in your sample data, so I will use SQLite's rowid.

Create a CTE that will return a column flag which will indicate if a row is the start of a new Question.

Finally use SUM() window function to get the result that you want:

WITH cte AS (
SELECT Q.Question_id, Q.Question, PMA.part_model_ans, QP.part_total_marks, MA.answer_mark, Q.rowid,
Q.Question <> LAG(Q.Question, 1, '') OVER (PARTITION BY Q.Question_id ORDER BY Q.rowid) flag
FROM QUESTIONS Q
LEFT JOIN QUESTIONS_PART QP ON QP.question_id = Q.question_id
LEFT JOIN PART_MODEL_ANSWER PMA ON PMA.part_id = QP.part_id
LEFT JOIN MODEL_ANSWER MA ON MA.question_id = Q.question_id
)
SELECT Question_id, Question, part_model_ans, part_total_marks, answer_mark,
SUM(flag) OVER (PARTITION BY Question_id ORDER BY rowid) number
FROM cte
ORDER BY question_id

See a simplified demo.



Related Topics



Leave a reply



Submit