How to get the average of a column in MySQL
The built-in AVG
function (an aggregate function) could be used like so:
select avg(rating) from table_name
Note that, like most aggregate functions, the average will exclude null values (the average of 1, 2, null
is 1.5 instead of 1.0). Also, in MySQL the return datatype will be decimal
if you're averaging decimal or integer columns so use the appropriate C# datatype.
How to find average of a particular field in Scala
You can do simply the following
val text = sc.textFile("/neerja/input.txt")
val fourth = text.map(line => line.split("\\t"))
.map(arr => Try(arr(4).toDouble) getOrElse(0.0)).mean()
println(fourth)
you should get the average of the 5th column subject
updated
If average of all the subject columns are required, I would suggest you to create dataframe
. Dataframe
s are optimized RDD
and many inbuilt functions are available for computation.
For creating a dataframe
for the data given you would require a schema
.
import org.apache.spark.sql.types.{DoubleType, IntegerType, StructField, StructType}
val schema = StructType(Seq(
StructField("Sn", IntegerType, true),
StructField("subject1", DoubleType, true),
StructField("subject2", DoubleType, true),
StructField("subject3", DoubleType, true),
StructField("subject4", DoubleType, true)
))
RDD[Row]
needs to be created as
val data = text.map(line => line.split("\\t"))
.map(arr => Row.fromSeq(Seq(arr(0).toInt, Try(arr(1).asInstanceOf[DoubleType]) getOrElse(0.0),Try(arr(2).toDouble) getOrElse(0.0),Try(arr(3).toDouble) getOrElse(0.0),Try(arr(4).toDouble) getOrElse(0.0))))
finally dataframe is created
val df = sqlContext.createDataFrame(data, schema)
average of each columns can be calculated by using mean
function as
df.select(mean("subject1").as("averageOFS1"),mean("subject2").as("averageOFS2"),mean("subject3").as("averageOFS3"),mean("subject4").as("averageOFS4")).show(false)
which should give you dataframe
+------------------+-----------------+-----------+-----------------+
|averageOFS1 |averageOFS2 |averageOFS3|averageOFS4 |
+------------------+-----------------+-----------+-----------------+
|21.796166666666668|4.661666666666666|5.24965 |7.919609688333335|
+------------------+-----------------+-----------+-----------------+
pandas get column average/mean
If you only want the mean of the weight
column, select the column (which is a Series) and call .mean()
:
In [479]: df
Out[479]:
ID birthyear weight
0 619040 1962 0.123123
1 600161 1963 0.981742
2 25602033 1963 1.312312
3 624870 1987 0.942120
In [480]: df["weight"].mean()
Out[480]: 0.83982437500000007
Find the average of two combined columns in sql
By definition, AVG(col1) = SUM(col1)/COUNT(*)
and AVG(col2) = SUM(col2)/COUNT(*)
, therefore (SUM(col1)+SUM(col2))/COUNT(*)
= AVG(col1) + AVG(col2)
.
Also, the commutativity of addition gives us (SUM(col1)+SUM(col2))/COUNT(*) = SUM(col1+col2)/COUNT(*)
and hence AVG(col1+col2)
.
Calculate AVERAGE from 2 columns for each row in SQL
You need to add the fields together and divide by the number of fields. If your Average
field is of DECIMAL
type you don't really even need to specify the ROUND
function. Any decimal exceeding the declaration will just be truncated (SQL Fiddle) :
UPDATE table_name
SET AVERAGE = (grade1 + grade2) / 2;
In your example you only have two fields that you are getting the average of. So Average decimal(3,1)
would work for you since the most the decimal portion will ever be is .5
. So the ROUND
function is clearly not needed.
Row-wise average for a subset of columns with missing values
You can simply:
df['avg'] = df.mean(axis=1)
Monday Tuesday Wednesday avg
Mike 42 NaN 12 27.000000
Jenna NaN NaN 15 15.000000
Jon 21 4 1 8.666667
because .mean()
ignores missing values by default: see docs.
To select a subset, you can:
df['avg'] = df[['Monday', 'Tuesday']].mean(axis=1)
Monday Tuesday Wednesday avg
Mike 42 NaN 12 42.0
Jenna NaN NaN 15 NaN
Jon 21 4 1 12.5
calculate average in separate column over period and group by date Standard SQL BigQuery
You can use coalesce
to return the avg
grouped by date, and if it's null
return the total average of the column instead using a subquery:
select date, coalesce(avg(rate), (select avg(rate) from my_table))
from my_table
group by date
Related Topics
Opening New Gnome-Terminal (V3.28+) with Multiple Tabs and Different Commands
Stack Smashing Code Not Working on Linux Kernel 2.6.38.7... Please Help
How to Share a Register Between Threads
Reading Gnu-Screen Logs with Vim
Error While Loading Charsequence (Scala 2.11.4)
How to Determinate Destination MAC Address
Movdqu Instruction + Page Boundary
Is It Safe to Call Dlclose(Null)
Replace Parentheses and Spaces in Filenames with Underscore
Random Alphanumeric String Linux Swift 3
How to Extract Patterns Form a Text Files in Shell Bash