How to measure the execution time of a query on Spark
Update:
No, using time
package is not the best way to measure execution time of Spark jobs. The most convenient and exact way I know of is to use the Spark History Server.
On Bluemix, in your notebooks go to the "Paelette" on the right side. Choose the "Evironment" Panel and you will see a link to the Spark History Server, where you can investigate the performed Spark jobs including computation times.
In PySpark groupBy, how do I calculate execution time by group?
If don't want to print the execution time to stdout you could return it as an extra column from the Pandas UDF instead e.g.
@pandas_udf("my_col long, execution_time long", PandasUDFType.GROUPED_MAP)
def my_pandas_udf(pdf):
start = datetime.now()
# Some business logic
return pdf.assign(execution_time=datetime.now() - start)
Alternatively, to compute the average execution time in the driver application, you could accumulate the execution time and the number of UDF calls in the UDF with two Accumulators. e.g.
udf_count = sc.accumulator(0)
total_udf_execution_time = sc.accumulator(0)
@pandas_udf("my_col long", PandasUDFType.GROUPED_MAP)
def my_pandas_udf(pdf):
start = datetime.now()
# Some business logic
udf_count.add(1)
total_udf_execution_time.add(datetime.now() - start)
return pdf
# Some Spark action to run business logic
mean_udf_execution_time = total_udf_execution_time.value / udf_count.value
Related Topics
How to Aggregate Over Rolling Time Window with Groups in Spark
MySQL Convert Latin1 Data to Utf8
How to Subtract 30 Days from the Current Date Using SQL Server
"This SQLtransaction Has Completed; It Is No Longer Usable."... Configuration Error
Time Part of a Datetime Field in SQL
Update Multiple Rows with One Query
Find SQL Records Containing Similar Strings
Finding Node Order in Xml Document in SQL Server
Timestamp Difference in Hours for Postgresql
Split String into Rows Oracle SQL
How to Count in SQL All Fields with Null Values in One Record
Rolling 90 Days Active Users in Bigquery, Improving Preformance (Dau/Mau/Wau)
Postgresql Where Count Condition
Postgresql: Give All Permissions to a User on a Postgresql Database