You are analyzing customer purchases in a Fabric notebook by using PySpark.
You have the following DataFrames:
transactions: Contains five columns named transaction_id, customer_id, product_id, amount, and date and has 10 million rows, with each row representing a transaction. customers: Contains customer details in 1,000 rows and three columns named customer_id, name, and country.
You need to join the DataFrames on the customer_id column. The solution must minimize data shuffling.
You write the following code.
from pyspark.sql import functions as F
results =
Which code should you run to populate the results DataFrame?
Momoanwar
Highly Voted 10 months agosraakesh95
Highly Voted 9 months, 2 weeks ago282b85d
Most Recent 6 months, 2 weeks agostilferx
7 months, 1 week agoSamuComqi
10 months ago