using below code.
val data = Seq(a,b,c)
data.par.foreach{
val df = spark.read.parquet('gs://'+i+'-data')
df.createOrReplaceTempView("people")
df2=spark.sql("""select * from people """)
df.show()
}
I need help to convert above scala code in pyspark for parallel processing in for loop.
I need help to convert the following Scala code to PySpark for parallel processing in a for
loop:
val data = Seq(a,b,c)
data.par.foreach{
val df = spark.read.parquet('gs://'+i+'-data')
df.createOrReplaceTempView("people")
df2=spark.sql("""select * from people """)
df.show()
}
The following PySpark code can be used to achieve parallel processing in a for
loop:
from pyspark.sql import SparkSession
spark = SparkSession.builder.master('yarn').appName('myAppName').getOrCreate()
spark.conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false")
data = [a,b,c]
# Perform the tasks in parallel
for i in data:
spark.sparkContext.parallelize(data).foreach(lambda i:
df = spark.read.parquet('gs://'+i+'-data')
df.createOrReplaceTempView("people")
df2=spark.sql("""select * from people """)
df.show())