Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000 AND the value in column customerSatisfaction is greater than or equal to 30?
A.
storesDF.filter(col("sqft") <= 25000 and col("customerSatisfaction") >= 30)
B.
storesDF.filter(col("sqft") <= 25000 or col("customerSatisfaction") >= 30)
C.
storesDF.filter(sqft) <= 25000 and customerSatisfaction >= 30)
D.
storesDF.filter(col("sqft") <= 25000 & col("customerSatisfaction") >= 30)
E.
storesDF.filter(sqft <= 25000) & customerSatisfaction >= 30)
in pyspark, all wrong as the conditions inside the filter should be wrapped inside parentesis. should be: D. storesDF.filter((col("sqft") <= 25000) & (col("customerSatisfaction") >= 30))
It's D:
https://sparkbyexamples.com/spark/spark-and-or-not-operators/
PySpark Logical operations use the bitwise operators:
& for and
| for or
~ for not
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
gaco
Highly Voted 10 months agoPushpakKothekar
Most Recent 2 months, 1 week agoSouvik_79
4 months, 3 weeks agoJgo1986
9 months, 3 weeks ago65bd33e
10 months, 1 week agodeadbeef38
12 months agoJgo1986
9 months, 3 weeks agoSowwy1
1 year, 2 months agosionita
1 year, 6 months agoMSH_6
1 year, 10 months agonewusername
1 year, 7 months ago