How to execute SQL scripts with Spark
Using Scala :
import scala.io.Source
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder
.appName("execute-query-files")
.master("local[*]") //since the jar will be executed locally
.getOrCreate()
val sqlQuery = Source.fromFile("path/to/data.sql").mkString //read file
spark.sql(sqlQuery) //execute query
Where spark is your spark session, already created.
Execute SQL stored in dataframe using pyspark
You can do it as follows:
sqls = spark.sql(""" select parameter_value
from schema_name.table_params
where project_name = 'some_projectname'
and sub_project_name = 'some_sub_project'
and parameter_name = 'extract_sql' """).collect()
for sql in sqls:
spark.sql(sql[0]).show()
How to run sql query in PySpark notebook
Read the data lake file and write into a dataframe with saveAsTable and query the table as shown below.
df = spark.read.load('abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<filename>', format='parquet')
df.write.mode("overwrite").saveAsTable("testdb.test2")
Using %%sql
%%sql
select * from testdb.test2
Using %%pyspark
%%pyspark
df = spark.sql("select * from testdb.test2")
display(df)
Execute SQL statement starting with WITH keyword in Spark
Ok so I just figured it out, for such requests, createStatement should be used:
import java.sql._
val connection = DriverManager.getConnection(jdbcUrl)
val stmt1 = connection.createStatement
val rs: ResultSet = stmt1.executeQuery(ch)
while (rs.next) {
println(rs.getString("col1"))
// ...
}
How to run .sql file in PySpark
If you want to combine all query results into a list of dataframes (assuming each line is one single query)
with open('/path/to/file.sql', 'r') as f:
query = f.readlines()
dfs = []
for line in query:
dfs.append(spark.sql(line))
If you want to combine all dataframes (assuming that they all have the same schema)
from functools import reduce
df = reduce(lambda x, y: x.union(y), dfs)
How does Spark SQL execute SQL query with joining operation?
The PostgreSQL database will only return a single resultset from a single query. If you would use valid SQL, that could be the joined result. Or nothing, in case no records match your conditions.
Related Topics
How to Find What Foreign Key References an Index on Table
Activerecord::Statementinvalid. Pg Error
Rails 3 Sum Product of Two Fields
Arel Causing Infinite Loop on Aggregation
Better Way to Write Large Sqls Inside Rails Models
Help with Sorting Records in Ruby on Rails
Rails Brakeman Warning of SQL Injection
Rails Activerecord Query Using Inner Join
Joining Two Separate Queries in a Postgresql ...Query... (Possible or Not Possible)
Select The First Row in a Join of Two Tables in One Statement
Determine Table Referenced in a View in SQL Server
How to Do a SQL Update in Batches, Like an Update Top
Selecting The Top N Rows Within a Group by Clause
Create a Sqlite View Where a Row Depends on The Previous Row
Aggregate Hstore Column in Postresql
Rake Task to Truncate All Tables in Rails 3
Writing a Function in SQL to Loop Through a Date Range in a Udf