exam questions

Exam DP-100 All Questions

View all questions & answers for the DP-100 exam

Exam DP-100 topic 6 question 4 discussion

Actual exam question from Microsoft's DP-100
Question #: 4
Topic #: 7
[All DP-100 Questions]

You need to implement a feature engineering strategy for the crowd sentiment local models.
What should you do?

  • A. Apply an analysis of variance (ANOVA).
  • B. Apply a Pearson correlation coefficient.
  • C. Apply a Spearman correlation coefficient.
  • D. Apply a linear discriminant analysis.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️
The linear discriminant analysis method works only on continuous variables, not categorical or ordinal variables.
Linear discriminant analysis is similar to analysis of variance (ANOVA) in that it works by comparing the means of the variables.
Scenario:
Data scientists must build notebooks in a local environment using automatic feature engineering and model building in machine learning pipelines.
Experiments for local crowd sentiment models must combine local penalty detection data.
All shared features for local models are continuous variables.
Incorrect Answers:
B: The Pearson correlation coefficient, sometimes called Pearson's R test, is a statistical value that measures the linear relationship between two variables. By examining the coefficient values, you can infer something about the strength of the relationship between the two variables, and whether they are positively correlated or negatively correlated.
C: Spearman's correlation coefficient is designed for use with non-parametric and non-normally distributed data. Spearman's coefficient is a nonparametric measure of statistical dependence between two variables, and is sometimes denoted by the Greek letter rho. The Spearman's coefficient expresses the degree to which two variables are monotonically related. It is also called Spearman rank correlation, because it can be used with ordinal variables.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/fisher-linear-discriminant-analysis https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/compute-linear-correlation

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
haby
11 months, 3 weeks ago
A,B,C are methods of Filter Feature Selection, while LDA is a Dimensionality Reduction method and works for categorical target. In this case, I will take D.
upvoted 1 times
...
ferren
1 year, 1 month ago
chatgpt says D
upvoted 1 times
...
phdykd
1 year, 4 months ago
In the context of feature engineering for the crowd sentiment local models, which include audio data and need to detect similar sounds, a Pearson correlation coefficient (B) would be a suitable strategy. The Pearson correlation coefficient measures the linear relationship between two datasets, which could be valuable in this scenario to understand which features most strongly correlate with positive or negative crowd sentiment. This could involve correlations between specific sound features in the audio data and the sentiment label. While the other techniques mentioned (ANOVA, Spearman correlation coefficient, and linear discriminant analysis) can be useful in certain circumstances, the Pearson correlation coefficient is more relevant in this scenario where you might be dealing with continuous features (like sound frequencies or volumes) and you are interested in linear relationships with the target variable (sentiment).
upvoted 2 times
...
ning
2 years, 5 months ago
MLP combined with LDA ... As mentioned in question, MLP is used for sentiment analysis, multiple layers ... Then my guess will be LDA for feature reduction ... Which here is called feature engineering ... No other things are related with feature reduction ...
upvoted 1 times
...
prashantjoge
3 years, 6 months ago
these questions seems to be based on Machine Learning Studio (classic). is this still in the syllabus
upvoted 2 times
prashantjoge
3 years, 6 months ago
This method is often used for dimensionality reduction, because it projects a set of features onto a smaller feature space while preserving the information that discriminates between classes. This not only reduces computational costs for a given classification task, but can help prevent overfitting. To generate the scores, you provide a label column and set of numerical feature columns as inputs. The algorithm determines the optimal combination of the input columns that linearly separates each group of data while minimizing the distances within each group. The module returns a dataset containing the compact, transformed features, along with a transformation that you can save and apply to another dataset.
upvoted 2 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...