Exam Certified Associate Developer for Apache Spark All Questions

View all questions & answers for the Certified Associate Developer for Apache Spark exam

Exam Certified Associate Developer for Apache Spark topic 1 question 32 discussion

Actual exam question from Databricks's Certified Associate Developer for Apache Spark

Question #: 32
Topic #: 1

[All Certified Associate Developer for Apache Spark Questions]

The code block shown below contains an error. The code block is intended to return a new DataFrame with the mean of column sqft from DataFrame storesDF in column sqftMean. Identify the error.
Code block:
storesDF.agg(mean("sqft").alias("sqftMean"))

A. The argument to the mean() operation should be a Column abject rather than a string column name.
B. The argument to the mean() operation should not be quoted.
C. The mean() operation is not a standalone function – it’s a method of the Column object.
D. The agg() operation is not appropriate here – the withColumn() operation should be used instead.
E. The only way to compute a mean of a column is with the mean() method from a DataFrame.

Show Suggested Answer

Suggested Answer: A 🗳️

by 4be8126 at April 26, 2023, 10:02 a.m.

Comments

Submit Cancel

4be8126

Highly Voted 2 years, 2 months ago

Selected Answer: E

The code block shown is correct and should return a new DataFrame with the mean of column sqft from DataFrame storesDF in column sqftMean. Therefore, the answer is E - none of the options identify a valid error in the code block. Here's an explanation for each option: A. The argument to the mean() operation can be either a Column object or a string column name, so there is no error in using a string column name in this case. E. This option is incorrect because the code block shown is a valid way to compute the mean of a column using PySpark. Another way to compute the mean of a column is with the mean() method from a DataFrame, but that doesn't mean the code block shown is invalid.

upvoted 7 times

newusername

1 year, 8 months ago

wrong! A

upvoted 3 times

...

NirajBhise

Most Recent 2 months, 1 week ago

Selected Answer: A

The function mean() is part of pyspark.sql.functions, and it expects a Column object, not a string.

upvoted 1 times

...

sofiess

9 months, 1 week ago

The mean() function expects a Column object as an argument, which can be created using col("sqft"). Simply passing the column name as a string will result in an error.

upvoted 2 times

...

DanYanez

9 months, 1 week ago

The correct answer is A. The argument to the mean() operation should be a Column object rather than a string column name. In Spark DataFrames, the mean() function takes a Column object as its argument, not a string column name. To create a Column object from a string column name, you can use the col() function.

upvoted 1 times

...

ajayrtk

1 year, 4 months ago

The error in the code is A. The argument to the mean() operation should be a Column object rather than a string column name. In the provided code block, "sqft" is passed as a string column name to the mean() function. However, the correct approach is to use a Column object. This can be achieved by referencing the column using the storesDF DataFrame and the col() function. Here's the corrected code: storesDF.agg(mean(col("sqft")).alias("sqftMean"))

upvoted 2 times

...

azurearch

1 year, 4 months ago

from pyspark.sql.functions import col, mean students =[ {'rollno':'001','name':'sravan','sqft':23, 'height':5.79,'weight':67,'address':'guntur'}, {'rollno':'002','name':'ojaswi','sqft':16, 'height':3.79,'weight':34,'address':'hyd'}] storesDF = spark.createDataFrame( students) storesDF.agg(mean('sqft').alias('sqftMean')).show() this works as well! not sure which one is wrong then

upvoted 3 times

...

azure_bimonster

1 year, 5 months ago

Selected Answer: A

A is most like correct here

upvoted 2 times

...

Saurabh_prep

1 year, 6 months ago

Selected Answer: A

A) should be the one considering databricks practice pdf. mean() function should take col object as input.

upvoted 1 times

...

outwalker

1 year, 8 months ago

it appears that there might be some flexibility in how the mean function can be used with either a string column name or a col() function. However, the most accurate and recommended approach is to use the col() function to create a Column object explicitly. With this in mind, the best choice is: A. The argument to the mean() operation should be a Column object rather than a string column name. The mean function takes a Column object as an argument, not a string column name. To fix the error, the code block should be rewritten as storesDF.agg(mean(col("sqft")).alias("sqftMean")), where the col function is used to create a Column object from the string column name "sqft". While there might be situations where using a string column name works, following the standard practice of creating a Column object with col() ensures compatibility and clarity in code.

upvoted 1 times

...

juliom6

1 year, 8 months ago

Selected Answer: A

Correct answer is A: from pyspark.sql.functions import col, mean students =[ {'rollno':'001','name':'sravan','sqft':23, 'height':5.79,'weight':67,'address':'guntur'}, {'rollno':'002','name':'ojaswi','sqft':16, 'height':3.79,'weight':34,'address':'hyd'}] storesDF = spark.createDataFrame( students) storesDF.agg(mean(col('sqft')).alias('sqftMean')).show()

upvoted 3 times

...

juadaves

1 year, 9 months ago

D withColumn() for new calculated column.

upvoted 1 times

...

thanab

1 year, 10 months ago

A. A The error in the code block is **A**, the argument to the `mean` operation should be a Column object rather than a string column name. The `mean` function takes a Column object as an argument, not a string column name. To fix the error, the code block should be rewritten as `storesDF.agg(mean(col("sqft")).alias("sqftMean"))`, where the `col` function is used to create a Column object from the string column name `"sqft"`. Here is the correct code storesDF.agg(mean(col("sqft")).alias("sqftMean"))

upvoted 2 times

juadaves

1 year, 9 months ago

storesDF.agg(mean("Value").alias("sqftMean")).show() it works

upvoted 1 times

...

halouanne

1 year, 11 months ago

The correct answer is: B. The argument to the mean() operation should not be quoted. In the context of Apache Spark, the mean function takes a column name as its argument. Therefore, you would write it without quotes. The corrected code line would look something like this:

upvoted 1 times

...

cookiemonster42

1 year, 11 months ago

Selected Answer: A

There's a similar question in the official Databricks samples and the right answer there is: Code block: storesDF.__1__(__2__(__3__).alias("sqftMean")) A. 1. agg 2. mean 3. col("sqft") If we stick to this logic, the answer is A.

upvoted 3 times

...

zozoshanky

1 year, 11 months ago

df.agg(mean("amountpaid").alias("amountpaid")).show() df.agg(mean(col("amountpaid")).alias("sqftMean")).show(). Both produces the result

upvoted 1 times

...

Mohitsain

2 years ago

Selected Answer: D

agg is not required here.

upvoted 3 times

...