Use filter in dplyr conditional on an if statement in R
You could do
library(dplyr)
y <- ""
data.frame(x = 1:5) %>%
{if (y=="") filter(., x>3) else filter(., x<3)} %>%
tail(1)
or
data.frame(x = 1:5) %>%
filter(if (y=="") x>3 else x<3) %>%
tail(1)
or even store your pipe in the veins of
mypipe <- . %>% tail(1) %>% print
data.frame(x = 1:5) %>% mypipe
R: IF statement in dplyr::filter requires ELSE otherwise fails?
We can return TRUE
in else
condition which will select all the rows in case the condition is FALSE
and is not dependent on the value in the column we are testing.
library(dplyr)
a <- NA
mtcars %>% filter(if(!is.na(a)) cyl == a else TRUE)
and to answer your question, yes if
would require else
part because without it, it would just return NULL
which will fail in filter
. See this example :
num <- 2
a <- if(num > 1) 'yes'
a
#[1] "yes"
a <- if(num > 3) 'yes'
a
#NULL
Hence when you use
a <- NA
mtcars %>% filter(if(!is.na(a)) cyl == a)
What actually happens is
mtcars %>% filter(NULL)
which returns the same error message.
How to filter a grouped dataframe with a conditional statement using dplyr?
To count the number of unique values we can use n_distinct
and filter the rows based on that.
library(dplyr)
df %>%
group_by(country, year) %>%
filter(if(n_distinct(version) == 2) version == 'versionA' else TRUE)
# country year version
# <fct> <dbl> <fct>
#1 country1 2011 versionA
#2 country2 2011 versionA
#3 country3 2011 versionB
conditional filtering based on grouped data in R using dplyr
Here's another method that selects directly using math rather than %in%
data %>% filter(col * sign((group < 3) - 0.5) > 0)
#> # A tibble: 76 x 3
#> group year col
#> <int> <int> <dbl>
#> 1 2 1985 2.20
#> 2 3 1986 -0.205
#> 3 4 1991 -2.10
#> 4 3 1994 -0.113
#> 5 2 1997 1.90
#> 6 1 2000 1.37
#> 7 3 2002 -0.805
#> 8 4 2003 -0.535
#> 9 1 2004 0.792
#> 10 3 2006 -1.28
#> # ... with 66 more rows
R filter rows such that one column is conditional on two other columns
df %>%
group_by(id) %>%
filter(any(n1 == 1), any(n2 == 1))
# A tibble: 6 x 3
# Groups: id [3]
id n1 n2
<chr> <dbl> <dbl>
1 firm a 1 0
2 firm b 1 0
3 firm e 1 0
4 firm a 0 1
5 firm e 0 1
6 firm b 0 1
Conditional filtering using tidyverse
As @docendo-discimus pointed out in the comments, the following solutions work. I also added rlang::has_name
instead of "a" %in% names(.)
.
This Q&A contains the original idea: Conditionally apply pipeline step depending on external value.
df1 %>%
filter(if(has_name("a")) a == 1 else TRUE)
# A tibble: 2 x 2
a b
<int> <chr>
1 1 a
2 1 b
df2 %>%
filter(if(has_name("a")) a == 1 else TRUE)
# A tibble: 4 x 1
b
<chr>
1 a
2 a
3 b
4 b
Or alternatively, by using {}
:
df1 %>%
{if(has_name("a")) filter(., a == 1L) else .}
# A tibble: 2 x 2
a b
<int> <chr>
1 1 a
2 1 b
> df2 %>%
+ {if(has_name("a")) filter(., a == 1L) else .}
# A tibble: 4 x 1
b
<chr>
1 a
2 a
3 b
4 b
if else with filter R
Your attempt was very close but there appears to be some syntax issues; this should solve your problem:
library(tidyverse)
df1 <- data.frame(
sample_id = c('SB024', '3666-01', '3666-01', '3666-02'),
FAO = c(100,50,3,5)
)
df1 %>%
filter(ifelse(str_detect(sample_id, "3666"), FAO >=4, FAO >20))
#> sample_id FAO
#> 1 SB024 100
#> 2 3666-01 50
#> 3 3666-02 5
df1 %>%
filter(ifelse(str_detect(sample_id, "XXXX"), FAO >=4, FAO >20))
#> sample_id FAO
#> 1 SB024 100
#> 2 3666-01 50
Created on 2021-11-05 by the reprex package (v2.0.1)
Related Topics
How to Match by Nearest Date from Two Data Frames
Randomly Insert Nas into Dataframe Proportionaly
Conditional Binary Join and Update by Reference Using the Data.Table Package
Is There a Logical Way to Think About List Indexing
Add Percentage Labels to a Stacked Barplot
How to Add a Index by Set of Data When Using Rbindlist
Drawing a Barchart to Compare Two Sets of Data Using Ggplot2 Package
How to Determine If Date Is a Weekend or Not (Not Using Lubridate)
R: Replace Multiple Values in Multiple Columns of Dataframes with Na
Converting Date in Year.Decimal Form in R
Display Exact Value of a Variable in R
Ggplot2 Does Not Appear to Work When Inside a Function R
Changing Million/Billion Abbreviations into Actual Numbers? Ie. 5.12M -> 5,120,000
Using R Statistics Add a Group Sum to Each Row
Merge Data.Frames Based on Year and Fill in Missing Values