Exam Certified Associate Developer for Apache Spark topic 1 question 92 discussion

Actual exam question from Databricks's Certified Associate Developer for Apache Spark

Question #: 92
Topic #: 1

[All Certified Associate Developer for Apache Spark Questions]

Which of the following code blocks fails to return a new DataFrame that is the result of an inner join between DataFrame storesDF and DataFrame employeesDF on column storeId and column employeeId?

A. storesDF.join(employeesDF, Seq(col("storeId"), col("employeeId")))
B. storesDF.join(employeesDF, Seq("storeId", "employeeId"))
C. storesDF.join(employeesDF, storesDF("storeId") === employeesDF("storeId") and storesDF("employeeId") === employeesDF("employeeId"))
D. storesDF.join(employeesDF, Seq("storeId", "employeeId"), "inner")
E. storesDF.alias("s").join(employeesDF.alias("e"), col("s.storeId") === col("e.storeId") and col("s.employeeId") === col("e.employeeId"))

Show Suggested Answer

Suggested Answer: A 🗳️

by newusername at Nov. 9, 2023, 1:31 p.m.

Comments

Submit Cancel

ARUNKUMARKRISHNASAMY

3 months, 3 weeks ago

Selected Answer: C

The correct answer is: C. storesDF.join(employeesDF, storesDF("storeId") === employeesDF("storeId") and storesDF("employeeId") === employeesDF("employeeId")) Explanation: This code block contains an issue because the and operation should be replaced with && in Spark SQL for combining conditions. The and keyword is not valid in Spark's DataFrame API; you need to use && (which is a logical operator for combining conditions in Spark).

upvoted 1 times

...