Group-by operation for another column R
Update on OP request(see comments):
Just replace summarise
with mutate
:
df %>%
group_by(user) %>%
mutate(Smallest_time1 = min(time_1, na.rm=TRUE))
user score time_1 time_2 Smallest_time1
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 130 NA 120
2 1 0 NA 742 120
3 1 1 120 NA 120
4 1 1 245 NA 120
5 2 0 NA 812 841
6 2 0 NA 212 841
7 2 0 NA 214 841
8 2 1 841 NA 841
9 3 0 NA 919 612
10 3 0 NA 528 612
11 3 1 721 NA 612
12 3 1 612 NA 612
We could use min()
inside summarise
with na.rm=TRUE
argument:
library(dplyr)
df %>%
group_by(user) %>%
summarise(Smallest_time1 = min(time_1, na.rm= TRUE))
user Smallest_time1
<dbl> <dbl>
1 1 120
2 2 841
3 3 612
Mutate a grouped value (like a conditional mean)
Use the group_by
before the mutate
to create the mean
column by group - instead of creating a summarise
d dataset and then joining to original data
library(dplyr)
mtcars %>%
group_by(cyl, carb) %>%
mutate(var1 = mean(mpg)) %>%
ungroup %>%
head
R Calculate sum of values by unique column PAIRS (B-A and A-B) while keeping both pairs
This may also be done with pmin/pmax
to create a grouping column
library(dplyr)
library(stringr)
df1 %>%
group_by(Date, grp = str_c(pmin(ID1, ID2), pmax(ID1, ID2))) %>%
mutate(Sum = sum(Count)) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 6 × 5
Date ID1 ID2 Count Sum
<chr> <chr> <chr> <int> <int>
1 12-1 A B 1 2
2 12-1 B A 1 2
3 12-1 D E 1 3
4 12-1 E D 2 3
5 12-2 Y Z 2 5
6 12-2 Z Y 3 5
data
df1 <- structure(list(Date = c("12-1", "12-1", "12-1", "12-1", "12-2",
"12-2"), ID1 = c("A", "B", "D", "E", "Y", "Z"), ID2 = c("B",
"A", "E", "D", "Z", "Y"), Count = c(1L, 1L, 1L, 2L, 2L, 3L)),
class = "data.frame", row.names = c(NA,
-6L))
Adding a column of means by group to original data
This is what the ave
function is for.
df1$Y.New <- ave(df1$Y, df1$X)
R - Grouping values within a df
Using data in a data.table, we can perform operations on variables by a grouping variable (in by=
), then assign that back to the data using the data.table assignment operator :=
library(data.table)
setDT(df)
df[, "family_income" := sum(income), by = id_family]
The data.table data structure is a pumped up version of the R data.frame, giving added functionality and efficiency gains. If DT
is your data.table, DT[i, j, by]
is the notation showing how we can use i
to sort or subset data, j
for selecting or computing on variables, and by
to perfrom j
-operations on groups. For example, for cars with over 100 horsepower, what is the mean fuel efficiency for automatic (0) and manual (1) cars?
dtcars <- data.table(mtcars)
dtcars[hp>100, mean(mpg), by=am]
Returns:
> dtcars[hp>100, mean(mpg), by=am]
am V1
1: 1 20.61429
2: 0 16.06875
Create new column that takes the sum of another column values and group by condition in R
In dplyr
, you usually are using summary functions to get another output. However, with group and ungroup, you can add a summary column.
newdf <- df %>%
group_by(Building) %>%
mutate(PopSum = sum(Population, na.rm=TRUE)) %>%
ungroup()
Calculating mean by group using dplyr in R
We can use
library(dplyr)
df <- df %>%
group_by(class) %>%
mutate(Mean = mean(x)) %>%
ungroup
-ouptut
df
# A tibble: 6 x 3
x class Mean
<dbl> <dbl> <dbl>
1 2.43 1 1.05
2 0.0625 1 1.05
3 0.669 1 1.05
4 0.195 2 -0.0550
5 0.285 2 -0.0550
6 -0.644 2 -0.0550
data
df <- data.frame(x, class)
Related Topics
How to Sort a Data Frame by Alphabetic Order of a Character Variable in R
Splitting a Dataframe into Several Dataframes
Duplicating Rows in R Merge Function
Saving Output of Confusionmatrix as a .Csv Table
Why Are These Numbers Not Equal
R Memory Management/Cannot Allocate Vector of Size N Mb
Extract Row Corresponding to Minimum Value of a Variable by Group
Split Data Frame String Column into Multiple Columns
Test If a Vector Contains a Given Element
Multi-Row X-Axis Labels in Ggplot Line Chart
Append Data Frames Together in a for Loop
How to Combine Multiple Variable Data to a Single Variable Data
How to Sum a Variable by Group
Split Delimited Strings in a Column and Insert as New Rows
How to Select the Rows With Maximum Values in Each Group With Dplyr