exam questions

Exam DP-500 All Questions

View all questions & answers for the DP-500 exam

Exam DP-500 topic 1 question 108 discussion

Actual exam question from Microsoft's DP-500
Question #: 108
Topic #: 1
[All DP-500 Questions]

You are using a Python notebook in an Apache Spark pool in Azure Synapse Analytics.
You need to present the data distribution statistics from a DataFrame in a tabular view.
Which method should you invoke on the DataFrame?

  • A. freqItems
  • B. corr
  • C. summary
  • D. rollup
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
cherious
Highly Voted 2 years, 4 months ago
Selected Answer: C
Correct answer is Summary. Corr shows correlation between columns and it has nothing to do with data distribution statistics. Source: https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.DataFrame.summary.html
upvoted 9 times
...
per_ing
Highly Voted 2 years, 3 months ago
Duplicate of 62, 101 and 103 only with other answer options. I believe the correct is still "describe" even though that is not an option here
upvoted 6 times
...
Deloro
Most Recent 1 year, 7 months ago
question returns often with different answers however I believe it should always be summary .describe() function takes cols:String*(columns in df) as optional args. .summary() function takes statistics:String*(count,mean,stddev..etc) as optional args.
upvoted 1 times
...
solref
2 years, 1 month ago
Selected Answer: C
DataFrame.summary(*statistics)[source] https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.DataFrame.summary.html It is the same question than 62 , 101 and 103. But in those cases the answer was "describe" and it's the same if you are looking for dataframe statistics detail df.describe(['age']).show() https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.describe.html
upvoted 1 times
...
ThariCD
2 years, 2 months ago
Selected Answer: C
It should definitely be summary or describe, either works. Summary shows count, mean, stddev, min, max and quartiles: https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.DataFrame.summary.html Describe shows count, mean, stddev, min and max: https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.DataFrame.describe.html#pyspark.sql.DataFrame.describe The differences seem to be that summary is newer and includes the percentiles at 25%, 50% and 75%.
upvoted 1 times
...
JuanData
2 years, 2 months ago
Selected Answer: C
summary
upvoted 1 times
...
AshwinN1992
2 years, 4 months ago
Please confirm
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago