Find the Average of Fields in the Columns

How to get the average of a column in MySQL

The built-in AVG function (an aggregate function) could be used like so:

select avg(rating) from table_name

Note that, like most aggregate functions, the average will exclude null values (the average of 1, 2, null is 1.5 instead of 1.0). Also, in MySQL the return datatype will be decimal if you're averaging decimal or integer columns so use the appropriate C# datatype.

How to find average of a particular field in Scala

You can do simply the following

val text = sc.textFile("/neerja/input.txt")

val fourth = text.map(line => line.split("\\t"))
.map(arr => Try(arr(4).toDouble) getOrElse(0.0)).mean()

println(fourth)

you should get the average of the 5th column subject

updated

If average of all the subject columns are required, I would suggest you to create dataframe. Dataframes are optimized RDD and many inbuilt functions are available for computation.

For creating a dataframe for the data given you would require a schema.

import org.apache.spark.sql.types.{DoubleType, IntegerType, StructField, StructType}
val schema = StructType(Seq(
StructField("Sn", IntegerType, true),
StructField("subject1", DoubleType, true),
StructField("subject2", DoubleType, true),
StructField("subject3", DoubleType, true),
StructField("subject4", DoubleType, true)
))

RDD[Row] needs to be created as

val data = text.map(line => line.split("\\t"))
.map(arr => Row.fromSeq(Seq(arr(0).toInt, Try(arr(1).asInstanceOf[DoubleType]) getOrElse(0.0),Try(arr(2).toDouble) getOrElse(0.0),Try(arr(3).toDouble) getOrElse(0.0),Try(arr(4).toDouble) getOrElse(0.0))))

finally dataframe is created

val df = sqlContext.createDataFrame(data, schema)

average of each columns can be calculated by using mean function as

df.select(mean("subject1").as("averageOFS1"),mean("subject2").as("averageOFS2"),mean("subject3").as("averageOFS3"),mean("subject4").as("averageOFS4")).show(false)

which should give you dataframe

+------------------+-----------------+-----------+-----------------+
|averageOFS1 |averageOFS2 |averageOFS3|averageOFS4 |
+------------------+-----------------+-----------+-----------------+
|21.796166666666668|4.661666666666666|5.24965 |7.919609688333335|
+------------------+-----------------+-----------+-----------------+

pandas get column average/mean

If you only want the mean of the weight column, select the column (which is a Series) and call .mean():

In [479]: df
Out[479]:
ID birthyear weight
0 619040 1962 0.123123
1 600161 1963 0.981742
2 25602033 1963 1.312312
3 624870 1987 0.942120

In [480]: df["weight"].mean()
Out[480]: 0.83982437500000007

Find the average of two combined columns in sql

By definition, AVG(col1) = SUM(col1)/COUNT(*) and AVG(col2) = SUM(col2)/COUNT(*), therefore (SUM(col1)+SUM(col2))/COUNT(*) = AVG(col1) + AVG(col2).

Also, the commutativity of addition gives us (SUM(col1)+SUM(col2))/COUNT(*) = SUM(col1+col2)/COUNT(*) and hence AVG(col1+col2).

Calculate AVERAGE from 2 columns for each row in SQL

You need to add the fields together and divide by the number of fields. If your Average field is of DECIMAL type you don't really even need to specify the ROUND function. Any decimal exceeding the declaration will just be truncated (SQL Fiddle) :

UPDATE table_name 
SET AVERAGE = (grade1 + grade2) / 2;

In your example you only have two fields that you are getting the average of. So Average decimal(3,1) would work for you since the most the decimal portion will ever be is .5. So the ROUND function is clearly not needed.

Row-wise average for a subset of columns with missing values

You can simply:

df['avg'] = df.mean(axis=1)

Monday Tuesday Wednesday avg
Mike 42 NaN 12 27.000000
Jenna NaN NaN 15 15.000000
Jon 21 4 1 8.666667

because .mean() ignores missing values by default: see docs.

To select a subset, you can:

df['avg'] = df[['Monday', 'Tuesday']].mean(axis=1)

Monday Tuesday Wednesday avg
Mike 42 NaN 12 42.0
Jenna NaN NaN 15 NaN
Jon 21 4 1 12.5

calculate average in separate column over period and group by date Standard SQL BigQuery

You can use coalesce to return the avg grouped by date, and if it's null return the total average of the column instead using a subquery:

select date, coalesce(avg(rate), (select avg(rate) from my_table))
from my_table
group by date


Related Topics



Leave a reply



Submit