Use dynamic name for new column/variable in `dplyr`
Since you are dynamically building a variable name as a character value, it makes more sense to do assignment using standard data.frame indexing which allows for character values for column names. For example:
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df[[varname]] <- with(df, Petal.Width * n)
df
}
The mutate
function makes it very easy to name new columns via named parameters. But that assumes you know the name when you type the command. If you want to dynamically specify the column name, then you need to also build the named argument.
dplyr version >= 1.0
With the latest dplyr version you can use the syntax from the glue
package when naming parameters when using :=
. So here the {}
in the name grab the value by evaluating the expression inside.
multipetal <- function(df, n) {
mutate(df, "petal.{n}" := Petal.Width * n)
}
If you are passing a column name to your function, you can use {{}}
in the string as well as for the column name
meanofcol <- function(df, col) {
mutate(df, "Mean of {{col}}" := mean({{col}}))
}
meanofcol(iris, Petal.Width)
dplyr version >= 0.7
dplyr
starting with version 0.7 allows you to use :=
to dynamically assign parameter names. You can write your function as:
# --- dplyr version 0.7+---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
mutate(df, !!varname := Petal.Width * n)
}
For more information, see the documentation available form vignette("programming", "dplyr")
.
dplyr (>=0.3 & <0.7)
Slightly earlier version of dplyr
(>=0.3 <0.7), encouraged the use of "standard evaluation" alternatives to many of the functions. See the Non-standard evaluation vignette for more information (vignette("nse")
).
So here, the answer is to use mutate_()
rather than mutate()
and do:
# --- dplyr version 0.3-0.5---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
varval <- lazyeval::interp(~Petal.Width * n, n=n)
mutate_(df, .dots= setNames(list(varval), varname))
}
dplyr < 0.3
Note this is also possible in older versions of dplyr
that existed when the question was originally posed. It requires careful use of quote
and setName
:
# --- dplyr versions < 0.3 ---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
pp <- c(quote(df), setNames(list(quote(Petal.Width * n)), varname))
do.call("mutate", pp)
}
dynamicaly name a new variable / column within a custom function dplyr mutate and paste
We may use the arguments as unquoted and use {{}}
for evaluation
my_fun <- function(dataf, V1, V2){
dataf %>%
dplyr::mutate("{{V1}}_{{V2}}" := paste0(format({{V1}}, big.mark = ",") ,
'\n(' , format({{V2}}, big.mark = ",") , ')'))
}
-testing
my_fun(df, speed1, n1)
string speed1 speed2 n1 n2 speed1_n1
1 car 7886.962 3218.585 37 83 7,886.962\n(37)
2 train 9534.978 5524.649 98 34 9,534.978\n(98)
3 bike 6984.790 9476.838 60 55 6,984.790\n(60)
4 plain 6543.198 2638.609 9 53 6,543.198\n( 9)
Dynamic variables names in dplyr function across multiple columns
We could use .names
in across
to rename
mean_fun_multicols <- function(data, group_cols, summary_cols) {
data %>%
group_by(across({{group_cols}})) %>%
summarise(across({{ summary_cols }},
~ mean(., na.rm = TRUE), .names = "mean_{.col}"), .groups = "drop")
}
-testing
mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt))
# A tibble: 8 × 4
cyl gear mean_mpg mean_wt
<dbl> <dbl> <dbl> <dbl>
1 4 3 21.5 2.46
2 4 4 26.9 2.38
3 4 5 28.2 1.83
4 6 3 19.8 3.34
5 6 4 19.8 3.09
6 6 5 19.7 2.77
7 8 3 15.0 4.10
8 8 5 15.4 3.37
NOTE: The :=
is mainly used when there is a single column in tidyverse
If we use the OP's function, we are assigning multiple columns to a single column and this returns a tibble
instead of a normal column. We may need to unpack
library(tidyr)
> mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt)) %>% str
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
grouped_df [8 × 3] (S3: grouped_df/tbl_df/tbl/data.frame)
$ cyl : num [1:8] 4 4 4 6 6 6 8 8
$ gear : num [1:8] 3 4 5 3 4 5 3 5
$ mean_c(mpg, wt): tibble [8 × 2] (S3: tbl_df/tbl/data.frame)
..$ mpg: num [1:8] 21.5 26.9 28.2 19.8 19.8 ...
..$ wt : num [1:8] 2.46 2.38 1.83 3.34 3.09 ...
- attr(*, "groups")= tibble [3 × 2] (S3: tbl_df/tbl/data.frame)
..$ cyl : num [1:3] 4 6 8
..$ .rows: list<int> [1:3]
.. ..$ : int [1:3] 1 2 3
.. ..$ : int [1:3] 4 5 6
.. ..$ : int [1:2] 7 8
.. ..@ ptype: int(0)
..- attr(*, ".drop")= logi TRUE
> mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt)) %>%
unpack(where(is_tibble))
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
# A tibble: 8 × 4
# Groups: cyl [3]
cyl gear mpg wt
<dbl> <dbl> <dbl> <dbl>
1 4 3 21.5 2.46
2 4 4 26.9 2.38
3 4 5 28.2 1.83
4 6 3 19.8 3.34
5 6 4 19.8 3.09
6 6 5 19.7 2.77
7 8 3 15.0 4.10
8 8 5 15.4 3.37
Use dynamically generated column names in dplyr
In the OP's code, we need select
library(dplyr)
df %>%
rowwise() %>%
mutate(consensus=max(table(unlist(select(cur_data(), select_vars))) ))
-output
# A tibble: 10 x 4
# Rowwise:
var1 var2 var3 consensus
<int> <int> <int> <int>
1 1 1 1 2
2 1 1 3 1
3 1 2 1 2
4 1 2 1 2
5 2 2 2 2
6 2 3 3 1
7 2 3 2 2
8 1 1 1 2
9 3 1 2 1
10 3 3 2 1
Or just subset from cur_data()
which would only return the data keeping the group attributes
df %>%
rowwise %>%
mutate(consensus = max(table(unlist(cur_data()[select_vars]))))
# A tibble: 10 x 4
# Rowwise:
var1 var2 var3 consensus
<int> <int> <int> <int>
1 1 1 1 2
2 1 1 3 1
3 1 2 1 2
4 1 2 1 2
5 2 2 2 2
6 2 3 3 1
7 2 3 2 2
8 1 1 1 2
9 3 1 2 1
10 3 3 2 1
Or using pmap
library(purrr)
df %>%
mutate(consensus = pmap_dbl(cur_data()[select_vars], ~ max(table(c(...)))))
# A tibble: 10 x 4
var1 var2 var3 consensus
<int> <int> <int> <dbl>
1 1 1 1 2
2 1 1 3 1
3 1 2 1 2
4 1 2 1 2
5 2 2 2 2
6 2 3 3 1
7 2 3 2 2
8 1 1 1 2
9 3 1 2 1
10 3 3 2 1
As these are rowwise operations, can get some efficiency if we use collapse
functions
library(collapse)
tfm(df, consensus = dapply(slt(df, select_vars), MARGIN = 1,
FUN = function(x) fmax(tabulate(x))))
# A tibble: 10 x 4
var1 var2 var3 consensus
* <int> <int> <int> <int>
1 1 1 1 2
2 1 1 3 1
3 1 2 1 2
4 1 2 1 2
5 2 2 2 2
6 2 3 3 1
7 2 3 2 2
8 1 1 1 2
9 3 1 2 1
10 3 3 2 1
Benchmarks
As noted above, collapse
is faster (run on a slightly bigger dataset)
df1 <- df[rep(seq_len(nrow(df)), 1e5), ]
system.time({
tfm(df1, consensus = dapply(slt(df1, select_vars), MARGIN = 1,
FUN = function(x) fmax(tabulate(x))))
})
#user system elapsed
# 5.257 0.123 5.323
system.time({
df1 %>%
mutate(consensus = pmap_dbl(cur_data()[select_vars], ~ max(table(c(...)))))
})
#user system elapsed
# 54.813 0.517 55.246
The rowwise
operation is taking too much time, so stopped the execution
df1 %>%
rowwise() %>%
mutate(consensus=max(table(unlist(select(cur_data(), select_vars))) ))
})
Timing stopped at: 575.5 3.342 581.3
Mutate a dynamic column name with conditions using other dynamic column names
use get
to retreive column value instead
library(tidyverse)
d <- mtcars %>% tibble
fld_name <- "mpg"
other_fld_name <- "cyl"
d %>% mutate(!!fld_name := ifelse(get(other_fld_name) < 5 ,NA, get(fld_name)))
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 NA 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 NA 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 NA 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ... with 22 more rows
Created on 2021-06-22 by the reprex package (v2.0.0)
creating and accessing dynamic column names within dplyr functions
Use across
with the .names
argument or if foo_cnt, etc. with an underscore is ok then just omit the .names
argument since that is the default.
library(dplyr)
library(tibble)
do.some.stuff.2 <- function(data, col) {
cnt <- function(x) cumsum(!is.na(x))
mx <- function(x) cummax(cumsum(x))
mu <- function(x) cumsum(x) / cnt(x)
data %>%
select(date, {{col}}) %>%
filter(!is.na(date) & !is.na({{col}})) %>%
mutate(across({{col}}, lst(cnt, sum=cumsum, max=mx, mu), .names = "{.col}.{.fn}" ))
}
# test
do.some.stuff.2(example, foo)
giving:
# A tibble: 6 x 6
date foo foo.cnt foo.sum foo.max foo.mu
<date> <dbl> <int> <dbl> <dbl> <dbl>
1 2021-02-11 -0.000202 1 -0.000202 -0.000202 -0.000202
2 2021-02-12 0.363 2 0.363 0.363 0.181
3 2021-02-13 1.27 3 1.63 1.63 0.543
4 2021-02-14 1.50 4 3.13 3.13 0.781
5 2021-02-15 1.00 5 4.13 4.13 0.826
6 2021-02-16 -0.458 6 3.67 4.13 0.612
dplyr mutate using dynamic variable name while respecting group_by
I actually did not know much about pluck
, so I don't know what went wrong, but I would go for this and this works:
iris %>%
group_by(Species) %>%
mutate(
!! varname :=
mean(!!as.name(varname), na.rm = T) /
max(sd(!!as.name(varname)),
minsd[varname])
)
Let me know if this isn't what you were looking for.
How to pass dynamic column names in dplyr into custom function?
Using the latest version of dplyr (>=0.7), you can use the rlang
!!
(bang-bang) operator.
library(tidyverse)
from <- "Stand1971"
to <- "Stand1987"
data %>%
mutate(diff=(!!as.name(from))-(!!as.name(to)))
You just need to convert the strings to names with as.name
and then insert them into the expression. Unfortunately I seem to have to use a few more parenthesis than I would like, but the !!
operator seems to fall in a weird order-of-operations order.
Original answer, dplyr (0.3-<0.7):
From that vignette (vignette("nse","dplyr")
), use lazyeval's interp()
function
library(lazyeval)
from <- "Stand1971"
to <- "Stand1987"
data %>%
mutate_(diff=interp(~from - to, from=as.name(from), to=as.name(to)))
Dplyr - Mutate dynamically named variables using other dynamically named variables
Here, we don't need the enquo/quo_name
for 'year' as we are passing a numeric value. The output of paste
will be character
class, using sym
from rlang
(as @joran mentioned) this can be converted to symbol and evaluated with !!
. Make sure to add braces around the '!! calc1_header' and '!! calc2_header' to evaluate the specific object
my_fun <- function(df, year) {
total_header <- paste("total", year, sep = "_")
calc1_header <- rlang::sym(paste("value1", year, sep = "_"))
calc2_header <- rlang::sym(paste("value2", year, sep = "_"))
df %>%
mutate(!!total_header := multiplier * (!!calc1_header) + (!!calc2_header))
}
my_fun(df1, 2016)
# ID multiplier value1_2015 value2_2015 value1_2016 value2_2016 total_2016
#1 1 0.5 2 3 1 4 4.5
#2 2 1.0 2 4 4 5 9.0
Related Topics
Creating a New Column Based on Unique Id With Values in R
How to Get to the Next Line in the R Command Prompt Without Executing
Split an Audio File into Pieces of an Arbitrary Size
Changing from Upper to Lower Case in Several Data Frames
Split Comma-Separated Strings in a Column into Separate Rows
Split Data.Frame Based on Levels of a Factor into New Data.Frames
Unique Combination of All Elements from Two (Or More) Vectors
How to Drop Columns by Name in a Data Frame
Global and Local Variables in R
How to Force a Line Break in Rmarkdown'S Title
How to Keep Columns When Grouping/Summarizing
R Markdown - Changing Font Size and Font Type in HTML Output
Dynamically Select Data Frame Columns Using $ and a Character Value