Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 1 question 49 discussion

Actual exam question from Microsoft's DP-203

Question #: 49
Topic #: 1

You have an Azure Synapse Analytics Apache Spark pool named Pool1.
You plan to load JSON files from an Azure Data Lake Storage Gen2 container into the tables in Pool1. The structure and data types vary by file.
You need to load the files into the tables. The solution must maintain the source data types.
What should you do?

A. Use a Conditional Split transformation in an Azure Synapse data flow.
B. Use a Get Metadata activity in Azure Data Factory.
C. Load the data by using the OPENROWSET Transact-SQL command in an Azure Synapse Analytics serverless SQL pool.
D. Load the data by using PySpark.

Show Suggested Answer

Suggested Answer: D 🗳️

by galacaw at April 27, 2022, 12:15 p.m.

Comments

Submit Cancel

galacaw

Highly Voted 3 years, 2 months ago

Should be D, it's about Apache Spark pool, not serverless SQL pool.

upvoted 40 times

...

Joanna0

Highly Voted 1 year, 6 months ago

Selected Answer: D

If your JSON files have a consistent structure and data types, then OPENROWSET is a good option. However, if your JSON files have a varying structure and data types, then PySpark is a better option.

upvoted 6 times

...

EmnCours

Most Recent 7 months, 2 weeks ago

Selected Answer: D

Correct Answer: D

upvoted 1 times

...

vaibhavs120

11 months, 1 week ago

The answer is C because with external table you can load the data with solution must maintain the source data types.

upvoted 1 times

...

e56bb91

1 year ago

Selected Answer: D

ChatGPT 4o Using PySpark in an Apache Spark pool within Azure Synapse Analytics is the most flexible and powerful way to handle JSON files with varying structures and data types. PySpark can infer schema and handle complex data transformations, making it well-suited for loading heterogeneous JSON data into tables while preserving the original data types.

upvoted 1 times

...

Okkier

1 year ago

Selected Answer: D

When loading data into an Apache Spark pool, especially when dealing with inconsistent file structures, PySpark (the Python API for Spark) is generally the better choice over OpenRowset. This is because PySpark offers greater flexibility, better performance, and more robust handling of varied and complex data structures.

upvoted 1 times

...

kldakdlsa

1 year ago

should be D

upvoted 1 times

...

ellala

1 year, 9 months ago

Selected Answer: D

We have an "Azure Synapse Analytics Apache Spark pool" therefore, we use Spark. There is no information about a serverless SQL Pool

upvoted 2 times

...

kkk5566

1 year, 10 months ago

Selected Answer: D

Should be D

upvoted 2 times

...

vctrhugo

2 years ago

Selected Answer: D

PySpark provides a powerful and flexible programming interface for processing and loading data in Azure Synapse Analytics Apache Spark pools. With PySpark, you can leverage its JSON reader capabilities to infer the schema and maintain the source data types during the loading process.

upvoted 3 times

...

vctrhugo

2 years, 1 month ago

Selected Answer: D

To load JSON files from an Azure Data Lake Storage Gen2 container into tables in an Azure Synapse Analytics Apache Spark pool, you can use PySpark. PySpark provides a flexible and powerful framework for working with big data in Apache Spark. Therefore, the correct answer is: D. Load the data by using PySpark. You can use PySpark to read the JSON files from Azure Data Lake Storage Gen2, infer the schema, and load the data into tables in the Spark pool while maintaining the source data types. PySpark provides various functions and methods to handle JSON data and perform transformations as needed before loading it into tables.

upvoted 4 times

...

janaki

2 years, 1 month ago

Option D: Load the data by using PySpark

upvoted 1 times

...

henryphchan

2 years, 2 months ago

Selected Answer: D

The question stated that "You have an Azure Synapse Analytics Apache Spark pool named Pool1.", so this question is about Spark pool

upvoted 1 times

...

Victor_Kings

2 years, 2 months ago

Selected Answer: C

As stated by Microsoft, "Serverless SQL pool can automatically synchronize metadata from Apache Spark. A serverless SQL pool database will be created for each database existing in serverless Apache Spark pools.". So even though the files in Azure Storage were created with Apache Spark, you can still query them using OPENROWSET with a serverless SQL Pool https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-storage-files-spark-tables

upvoted 4 times

dgerok

1 year, 3 months ago

We are dealing with varying JSON. There is nothing about this option by the link you've provided. The correct answer is D...

upvoted 1 times

...

Tejashu

1 year, 7 months ago

As the question states that "You need to load the files into the tables" , through serverless sql pool we cannot load data. so the answer should be D

upvoted 2 times

...

esaade

2 years, 4 months ago

Selected Answer: D

To load JSON files from an Azure Data Lake Storage Gen2 container into the tables in an Apache Spark pool in Azure Synapse Analytics while maintaining the source data types, you should use PySpark.

upvoted 3 times

...

haidebelognime

2 years, 4 months ago

Selected Answer: D

PySpark is the Python API for Apache Spark, which is a distributed computing framework that can handle large-scale data processing.

upvoted 2 times

...

brzhanyu

2 years, 7 months ago

Selected Answer: D

Should be D, it's about Apache Spark pool, not serverless SQL pool.

upvoted 2 times

...

Load full discussion...