exam questions

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 1 question 49 discussion

Actual exam question from Microsoft's DP-203
Question #: 49
Topic #: 1
[All DP-203 Questions]

You have an Azure Synapse Analytics Apache Spark pool named Pool1.
You plan to load JSON files from an Azure Data Lake Storage Gen2 container into the tables in Pool1. The structure and data types vary by file.
You need to load the files into the tables. The solution must maintain the source data types.
What should you do?

  • A. Use a Conditional Split transformation in an Azure Synapse data flow.
  • B. Use a Get Metadata activity in Azure Data Factory.
  • C. Load the data by using the OPENROWSET Transact-SQL command in an Azure Synapse Analytics serverless SQL pool.
  • D. Load the data by using PySpark.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
galacaw
Highly Voted 3 years ago
Should be D, it's about Apache Spark pool, not serverless SQL pool.
upvoted 40 times
...
Joanna0
Highly Voted 1 year, 4 months ago
Selected Answer: D
If your JSON files have a consistent structure and data types, then OPENROWSET is a good option. However, if your JSON files have a varying structure and data types, then PySpark is a better option.
upvoted 6 times
...
EmnCours
Most Recent 5 months, 3 weeks ago
Selected Answer: D
Correct Answer: D
upvoted 1 times
...
vaibhavs120
9 months, 1 week ago
The answer is C because with external table you can load the data with solution must maintain the source data types.
upvoted 1 times
...
e56bb91
10 months, 2 weeks ago
Selected Answer: D
ChatGPT 4o Using PySpark in an Apache Spark pool within Azure Synapse Analytics is the most flexible and powerful way to handle JSON files with varying structures and data types. PySpark can infer schema and handle complex data transformations, making it well-suited for loading heterogeneous JSON data into tables while preserving the original data types.
upvoted 1 times
...
Okkier
10 months, 2 weeks ago
Selected Answer: D
When loading data into an Apache Spark pool, especially when dealing with inconsistent file structures, PySpark (the Python API for Spark) is generally the better choice over OpenRowset. This is because PySpark offers greater flexibility, better performance, and more robust handling of varied and complex data structures.
upvoted 1 times
...
kldakdlsa
10 months, 3 weeks ago
should be D
upvoted 1 times
...
ellala
1 year, 7 months ago
Selected Answer: D
We have an "Azure Synapse Analytics Apache Spark pool" therefore, we use Spark. There is no information about a serverless SQL Pool
upvoted 2 times
...
kkk5566
1 year, 8 months ago
Selected Answer: D
Should be D
upvoted 2 times
...
vctrhugo
1 year, 11 months ago
Selected Answer: D
PySpark provides a powerful and flexible programming interface for processing and loading data in Azure Synapse Analytics Apache Spark pools. With PySpark, you can leverage its JSON reader capabilities to infer the schema and maintain the source data types during the loading process.
upvoted 3 times
...
vctrhugo
1 year, 11 months ago
Selected Answer: D
To load JSON files from an Azure Data Lake Storage Gen2 container into tables in an Azure Synapse Analytics Apache Spark pool, you can use PySpark. PySpark provides a flexible and powerful framework for working with big data in Apache Spark. Therefore, the correct answer is: D. Load the data by using PySpark. You can use PySpark to read the JSON files from Azure Data Lake Storage Gen2, infer the schema, and load the data into tables in the Spark pool while maintaining the source data types. PySpark provides various functions and methods to handle JSON data and perform transformations as needed before loading it into tables.
upvoted 4 times
...
janaki
1 year, 12 months ago
Option D: Load the data by using PySpark
upvoted 1 times
...
henryphchan
2 years ago
Selected Answer: D
The question stated that "You have an Azure Synapse Analytics Apache Spark pool named Pool1.", so this question is about Spark pool
upvoted 1 times
...
Victor_Kings
2 years, 1 month ago
Selected Answer: C
As stated by Microsoft, "Serverless SQL pool can automatically synchronize metadata from Apache Spark. A serverless SQL pool database will be created for each database existing in serverless Apache Spark pools.". So even though the files in Azure Storage were created with Apache Spark, you can still query them using OPENROWSET with a serverless SQL Pool https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-storage-files-spark-tables
upvoted 4 times
dgerok
1 year, 1 month ago
We are dealing with varying JSON. There is nothing about this option by the link you've provided. The correct answer is D...
upvoted 1 times
...
Tejashu
1 year, 5 months ago
As the question states that "You need to load the files into the tables" , through serverless sql pool we cannot load data. so the answer should be D
upvoted 2 times
...
...
esaade
2 years, 2 months ago
Selected Answer: D
To load JSON files from an Azure Data Lake Storage Gen2 container into the tables in an Apache Spark pool in Azure Synapse Analytics while maintaining the source data types, you should use PySpark.
upvoted 3 times
...
haidebelognime
2 years, 2 months ago
Selected Answer: D
PySpark is the Python API for Apache Spark, which is a distributed computing framework that can handle large-scale data processing.
upvoted 2 times
...
brzhanyu
2 years, 5 months ago
Selected Answer: D
Should be D, it's about Apache Spark pool, not serverless SQL pool.
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago