Subtracting two columns to give a new column in R
As @Bryan Hanson was saying in the above comment, your syntax and data organization relates more to a data frame. I would treat your data as a data frame and simply use the syntax you provided earlier:
> data <- data.frame(A = c(1,2,3,4), B = c(2,2,2,2))
> data$C <- (data$A - data$B)
> data
A B C
1 1 2 -1
2 2 2 0
3 3 2 1
4 4 2 2
How to subtract two columns using tidyverse mutate with columns named by external variables
We may use .data
to subset the column as a vector. The all_of/any_of
are used along with across
to loop across the columns
library(dplyr)
iris %>%
mutate(calculation = .data[[var1]] - .data[[var2]])%>%
head
-output
Sepal.Length Sepal.Width Petal.Length Petal.Width Species calculation
1 5.1 3.5 1.4 0.2 setosa 1.6
2 4.9 3.0 1.4 0.2 setosa 1.9
3 4.7 3.2 1.3 0.2 setosa 1.5
4 4.6 3.1 1.5 0.2 setosa 1.5
5 5.0 3.6 1.4 0.2 setosa 1.4
6 5.4 3.9 1.7 0.4 setosa 1.5
Or may also use cur_data()
iris %>%
head %>%
mutate(calculation = cur_data()[[var1]] - cur_data()[[var2]])
-output
Sepal.Length Sepal.Width Petal.Length Petal.Width Species calculation
1 5.1 3.5 1.4 0.2 setosa 1.6
2 4.9 3.0 1.4 0.2 setosa 1.9
3 4.7 3.2 1.3 0.2 setosa 1.5
4 4.6 3.1 1.5 0.2 setosa 1.5
5 5.0 3.6 1.4 0.2 setosa 1.4
6 5.4 3.9 1.7 0.4 setosa 1.5
Or another option is to pass both the variables in across
, and then reduce
with -
library(purrr)
iris %>%
head %>%
mutate(calculation = reduce(across(all_of(c(var1, var2))), `-`))
-output
Sepal.Length Sepal.Width Petal.Length Petal.Width Species calculation
1 5.1 3.5 1.4 0.2 setosa 1.6
2 4.9 3.0 1.4 0.2 setosa 1.9
3 4.7 3.2 1.3 0.2 setosa 1.5
4 4.6 3.1 1.5 0.2 setosa 1.5
5 5.0 3.6 1.4 0.2 setosa 1.4
6 5.4 3.9 1.7 0.4 setosa 1.5
Or could convert to sym
bol and evaluate (!!
)
iris %>%
head %>%
mutate(calculation = !! rlang::sym(var1) - !! rlang::sym(var2))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species calculation
1 5.1 3.5 1.4 0.2 setosa 1.6
2 4.9 3.0 1.4 0.2 setosa 1.9
3 4.7 3.2 1.3 0.2 setosa 1.5
4 4.6 3.1 1.5 0.2 setosa 1.5
5 5.0 3.6 1.4 0.2 setosa 1.4
6 5.4 3.9 1.7 0.4 setosa 1.5
Or if we want to use all_of
in across
, just subset the column with [[
iris %>%
head %>%
mutate(calculation = across(all_of(var1))[[1]] -
across(all_of(var2))[[1]])
Sepal.Length Sepal.Width Petal.Length Petal.Width Species calculation
1 5.1 3.5 1.4 0.2 setosa 1.6
2 4.9 3.0 1.4 0.2 setosa 1.9
3 4.7 3.2 1.3 0.2 setosa 1.5
4 4.6 3.1 1.5 0.2 setosa 1.5
5 5.0 3.6 1.4 0.2 setosa 1.4
6 5.4 3.9 1.7 0.4 setosa 1.5
The reason we need to subset is because, across
by default will update the original column when the .names
is not present. The calculation
will be a data.frame with a single column
out <- iris %>%
head %>%
mutate(calculation = across(all_of(var1)) -
across(all_of(var2)))
out
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length
1 5.1 3.5 1.4 0.2 setosa 1.6
2 4.9 3.0 1.4 0.2 setosa 1.9
3 4.7 3.2 1.3 0.2 setosa 1.5
4 4.6 3.1 1.5 0.2 setosa 1.5
5 5.0 3.6 1.4 0.2 setosa 1.4
6 5.4 3.9 1.7 0.4 setosa 1.5
str(out)
data.frame': 6 obs. of 6 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1
$ calculation :'data.frame': 6 obs. of 1 variable:
..$ Sepal.Length: num 1.6 1.9 1.5 1.5 1.4 1.5
R: How to repeatedly subtract specific columns from different series of columns, and output to a new dataframe?
Probably others have better ways - but here is one possibility.
- load two libraries and set
dfOld
to data.table
library(data.table)
library(magrittr)
setDT(dfOld)
- get information about the columns, and make into a list.
lv = names(dfOld)[-1][seq(1,ncol(dfOld)-1)%%4>0]
lv = split(lv, ceiling(seq_along(lv)/3))
names(lv) = names(dfOld)[-1][seq(1,ncol(dfOld)-1)%%4==0]
lv
looks like this:
> lv
$D
[1] "A" "B" "C"
$H
[1] "E" "F" "G"
- This is a bit convoluted, but basically, I'm taking each of the elements of the
lv
list, and I'm reshaping columns fromdfOld
, so I can do all subtractions at once. Then I'm retaining only the variables I need, and binding each of the resulting list of data.tables into a single datatable usingrbindlist
res =rbindlist(lapply(names(lv), function(x) {
melt(dfOld,id=c("ID", x),measure.vars = lv[[x]]) %>%
.[,`:=`(nc=value-get(x),variable=paste0(variable,"-",x))] %>%
.[,.(ID,variable,nc)]
}))
- Last step is simple - just
dcast
back
dcast(res,ID~variable, value.var="nc")
Output
ID A-D B-D C-D E-H F-H G-H
1: 1 -66 -65 -63 -33 2 -30
2: 2 -4 -3 -1 -4 -3 -1
3: 3 -4 -3 -1 34 -3 -1
4: 4 3 0 0 3 0 0
5: 5 3 3 3 3 47 3
6: 6 1 0 -4 1 0 -4
7: 7 0 -6 -2 0 -6 -2
8: 8 -8 -2 -5 -8 -2 -5
9: 9 -69 -78 -72 -69 -18 -72
10: 10 5 1 6 5 1 6
Iterative function to subtract columns from a specific column in a dataframe and have the values appear in a new column
This will do what you want. Notice that myfun
treats the first column as special, as per your example.
# example data
df <- data.frame(
Sample = paste0("s00", 1:4),
g1 = 5:8,
g2 = 10:13,
g3 = 15:18,
g4 = 20:23,
g5 = 25:28,
stringsAsFactors = FALSE
)
# function to do what you want
myfun <- function(x, df) {
mat <- df[[x]] - as.matrix(df[ , names(df)[-1]]) #subtract all cols from x
colnames(mat) <- paste0(names(df)[-1], "dt") #give these new cols names
df <- cbind(df, mat) #add new cols to dataframe
df <- df[ , c(1, order(names(df)[-1])+1)] #reorder cols
return(df)
}
# test it
myfun("g3", df)
# result
Sample g1 g1dt g2 g2dt g3 g3dt g4 g4dt g5 g5dt
1 s001 5 10 10 5 15 0 20 -5 25 -10
2 s002 6 10 11 5 16 0 21 -5 26 -10
3 s003 7 10 12 5 17 0 22 -5 27 -10
4 s004 8 10 13 5 18 0 23 -5 28 -10
Subtract columns in R
You might want something like
# this is your data
df <- data.frame("2014"=c(100,54,200),
"2015"=c(200,2,221),
"2016"=c(342,523,225),
"2017"=c(532,121,229),
"2018"=c(65,200,250),
row.names=c("Brad","Tom","Uma"))
df
# X2014 X2015 X2016 X2017 X2018
# Brad 100 200 342 532 65
# Tom 54 2 523 121 200
# Uma 200 221 225 229 250
...and then you do this
df[,-1] - df[,-5]
# X2015 X2016 X2017 X2018
# Brad 100 142 190 -467
# Tom -52 521 -402 79
# Uma 21 4 4 21
The df[,-1]
gives every column of your data.frame except the first one, and the df[,-5]
gives every column except the fifth... from there, it's simple subtraction.
Subtracting values in one column based on two other columns in R
We can do a group by diff
with a condition to check with an if/else
library(dplyr)
df %>%
group_by(site, ID) %>%
summarise(diff = if(length(cover) == 1) NA
else diff(cover), .groups = 'drop')
-output
# A tibble: 4 x 3
site ID diff
<chr> <chr> <dbl>
1 blue C NA
2 green B -5
3 red A -5
4 red B -5
OP's original code is not working because one of the grouping doesn't have both 'after' and 'before' in 'time'. We may need to specify an if/else
condition for that
How do you subtract two columns in a data frame IF they exist in the data frame?
It's always easier with sample data, but let's make a little example here that matches the description of your data:
df <- data.frame(columnxbeg = 1:5, columnxend = 6:10,
columnybeg = 2:6, columnyend = 8:12)
df
#> columnxbeg columnxend columnybeg columnyend
#> 1 1 6 2 8
#> 2 2 7 3 9
#> 3 3 8 4 10
#> 4 4 9 5 11
#> 5 5 10 6 12
To do this in a single pipeline, we need to find the columns that have the suffix "beg" and the columns with the suffix "end", ensure they are in the correct order, subtract them, and bind them onto the existing data:
library(tidyverse)
df %>%
bind_cols(((df %>%
select(ends_with("beg")) %>%
select(order(names(.)))) -
(df %>%
select(ends_with("end")) %>%
select(order(names(.))))) %>%
rename_with(~str_replace(.x, "beg", "diff")))
#> columnxbeg columnxend columnybeg columnyend columnxdiff columnydiff
#> 1 1 6 2 8 -5 -6
#> 2 2 7 3 9 -5 -6
#> 3 3 8 4 10 -5 -6
#> 4 4 9 5 11 -5 -6
#> 5 5 10 6 12 -5 -6
This will work with any number of columns, as long as they have a consistent naming pattern.
Created on 2022-06-06 by the reprex package (v2.0.1)
subtract columns in the datatable in pairs
You can divide the dataframe in half and subtract the second part with the first one and assign new columns names.
n <- ncol(df)
col1 <- 1:(n/2)
col2 <- (n/2 + 1):n
new_col_name <- paste(names(df)[col2], names(df)[col1], sep = '-')
df[new_col_name] <- df[col2] - df[col1]
head(df)
# h1 w1 e1 h2 w2 e2 h2-h1 w2-w1 e2-e1
#1 49.43 149.6 150.2 49.39 149.4 150.1 -0.03665 -0.193458 -0.09741
#2 50.10 149.7 150.8 49.03 149.6 149.6 -1.07812 -0.053813 -1.25975
#3 50.05 149.8 150.7 48.42 149.8 151.0 -1.62448 -0.007319 0.32304
#4 49.77 149.7 148.8 49.92 148.7 149.1 0.15132 -1.005730 0.23139
#5 49.44 149.9 151.0 48.39 150.9 150.0 -1.04673 0.977863 -0.97748
#6 49.58 148.8 151.1 50.41 150.6 148.6 0.83088 1.800697 -2.52930
Related Topics
Multiplying All Columns in Dataframe by Single Column
Changing from Upper to Lower Case in Several Data Frames
Order Bars in Ggplot2 Bar Graph
Difference Between Require() and Library()
Controlling Number of Decimal Digits in Print Output in R
Ggplot2 - Bar Plot With Both Stack and Dodge
How to Convert a List Consisting of Vector of Different Lengths to a Usable Data Frame in R
Plot Multiple Boxplot in One Graph
How to Find the Difference in Value in Every Two Consecutive Rows in R
R Markdown - Changing Font Size and Font Type in HTML Output
Transpose/Reshape Dataframe Without "Timevar" from Long to Wide Format
Why Is '[' Better Than 'Subset'
Linear Regression and Group by in R
In R, How to Get an Object'S Name After It Is Sent to a Function