Which of the following code blocks fails to return a new DataFrame that is the result of an inner join between DataFrame storesDF and DataFrame employeesDF on column storeId and column employeeId?
A.
storesDF.join(employeesDF, Seq(col("storeId"), col("employeeId")))
B.
storesDF.join(employeesDF, Seq("storeId", "employeeId"))
C.
storesDF.join(employeesDF, storesDF("storeId") === employeesDF("storeId") and storesDF("employeeId") === employeesDF("employeeId"))
D.
storesDF.join(employeesDF, Seq("storeId", "employeeId"), "inner")
E.
storesDF.alias("s").join(employeesDF.alias("e"), col("s.storeId") === col("e.storeId") and col("s.employeeId") === col("e.employeeId"))
The correct answer is: C.
storesDF.join(employeesDF, storesDF("storeId") === employeesDF("storeId") and storesDF("employeeId") === employeesDF("employeeId"))
Explanation:
This code block contains an issue because the and operation should be replaced with && in Spark SQL for combining conditions. The and keyword is not valid in Spark's DataFrame API; you need to use && (which is a logical operator for combining conditions in Spark).
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
ARUNKUMARKRISHNASAMY
1 month, 3 weeks ago237f4d0
7 months, 1 week agotangerine141
8 months, 1 week agotangerine141
8 months, 1 week ago