exam questions

Exam DP-100 All Questions

View all questions & answers for the DP-100 exam

Exam DP-100 topic 5 question 8 discussion

Actual exam question from Microsoft's DP-100
Question #: 8
Topic #: 5
[All DP-100 Questions]

DRAG DROP -
You are producing a multiple linear regression model in Azure Machine Learning Studio.
Several independent variables are highly correlated.
You need to select appropriate methods for conducting effective feature engineering on all the data.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.
Select and Place:

Show Suggested Answer Hide Answer
Suggested Answer:
Step 1: Use the Filter Based Feature Selection module
Filter Based Feature Selection identifies the features in a dataset with the greatest predictive power.
The module outputs a dataset that contains the best feature columns, as ranked by predictive power. It also outputs the names of the features and their scores from the selected metric.
Step 2: Build a counting transform
A counting transform creates a transformation that turns count tables into features, so that you can apply the transformation to multiple datasets.
Step 3: Test the hypothesis using t-Test
Reference:
https://docs.microsoft.com/bs-latn-ba/azure/machine-learning/studio-module-reference/filter-based-feature-selection https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/build-counting-transform

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
lucazav
Highly Voted 4 years, 1 month ago
I would simply do the following: 1. Remove duplicate rows (to fix any duplicate issue) 2. Use the Filter Based Feature Selection module (to filter out too much correlated features) 3. Build a counting transform (to add new engineered features)
upvoted 26 times
slashssab
3 years, 1 month ago
Removing duplicated rows is used in regressions because duplicated data provides additional bias to you model, so i think @lucazav is correct.
upvoted 4 times
...
bruce
3 years, 8 months ago
They haven't mentioned anything about Duplicate data here. So I think the answer will be 1. Use the Filter Based Feature Selection module 2. Compute linear correlation 3. Build a counting transform
upvoted 12 times
...
...
modschegiebsch
Highly Voted 4 years, 6 months ago
What kind of hyothesis am I supposed to test here? I would go with the correlations as well.
upvoted 11 times
...
haby
Most Recent 11 months, 3 weeks ago
Compute linear correlation - Use the Filter Based Feature Selection module - Test the hypothesis using t-Test
upvoted 2 times
...
phdykd
1 year, 4 months ago
To effectively perform feature engineering on highly correlated independent variables in a multiple linear regression model in Azure Machine Learning Studio, you should take the following actions in sequence: Compute linear correlation (e): Calculate the correlation between the independent variables to understand the degree of correlation between them. Use the Filter Based Feature Selection module (c): Utilize this module to select the most relevant features while considering their correlations. This step helps in reducing multicollinearity and selecting a subset of features that contribute the most to the model. Test the hypothesis using t-Test (d): After selecting the features, perform hypothesis testing using t-Tests to validate the statistical significance of the chosen features in relation to the dependent variable.
upvoted 2 times
...
phdykd
1 year, 4 months ago
Compute linear correlation: By computing linear correlation between variables, you can identify pairs of variables that are highly correlated. These are the ones causing multicollinearity. Use the Filter Based Feature Selection module: Azure Machine Learning Studio provides this module to automatically select important features. It can help eliminate redundant features, i.e., features that are highly correlated with each other, which helps reduce multicollinearity. Remove duplicate rows: As mentioned before, removing duplicate rows is a good practice in general, not necessarily to handle high correlation among variables. However, in some cases, duplicate rows may contribute to multicollinearity, especially when they form a significant proportion of the dataset.
upvoted 1 times
...
phdykd
1 year, 9 months ago
Compute linear correlation Use the filter based feature selection module Build a counting transform Rest of them are not directly related to feature engineering for highly correlated independent variables in a multiple linear regression model.
upvoted 1 times
...
ning
2 years, 5 months ago
I cannot find any where in designer 1. build a counting transform 2. computer linear correlation I guess this is dated question
upvoted 1 times
...
dija123
2 years, 11 months ago
I find the given answer is correct
upvoted 1 times
...
ljljljlj
3 years, 4 months ago
On exam 2021/7/10
upvoted 7 times
...
slash_nyk
3 years, 5 months ago
do we know the answer to this question?
upvoted 6 times
...
ralucabala
3 years, 7 months ago
Filter Based Feature Selection has only 2 options in Designer: PearsonCorrelation and ChiSquare test
upvoted 2 times
...
Alexandra
4 years, 5 months ago
linear correlation measures correlation only between two variables. I think that is why t-test is more suitable as the requirements is to measure correlation on whole data set
upvoted 5 times
ning
2 years, 5 months ago
No, this will be a linear correlation matrix, the same thing applied to PCA ...
upvoted 1 times
...
...
davo123
4 years, 6 months ago
Is this the correct answer? I do not see any of the solutions in the references. One could compute the correlations as well
upvoted 6 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...