Why are my dplyr group_by & summarize not working properly? (name-collision with plyr)
I believe you've loaded plyr after dplyr, which is why you are getting an overall summary instead of a grouped summary.
This is what happens with plyr loaded last.
library(dplyr)
library(plyr)
df %>%
group_by(DRUG,FED) %>%
summarize(mean=mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high= CI90hi(AUC0t),
min=min(AUC0t, na.rm=TRUE),
max=max(AUC0t,na.rm=TRUE),
sd= sd(AUC0t, na.rm=TRUE))
mean low high min max sd
1 150 105 195 100 200 50
Now remove plyr and try again and you get the grouped summary.
detach(package:plyr)
df %>%
group_by(DRUG,FED) %>%
summarize(mean=mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high= CI90hi(AUC0t),
min=min(AUC0t, na.rm=TRUE),
max=max(AUC0t,na.rm=TRUE),
sd= sd(AUC0t, na.rm=TRUE))
Source: local data frame [4 x 8]
Groups: DRUG
DRUG FED mean low high min max sd
1 0 0 150 150 150 150 150 NaN
2 0 1 NaN NA NA NA NA NaN
3 1 0 100 100 100 100 100 NaN
4 1 1 200 200 200 200 200 NaN
dplyr: group_by + summarize not working as expected
We are extracting the whole column with $
instead we can just use the unquoted column name to get only the values of the 'frequency' with in each 'Category'
library(dplyr)
table %>%
group_by(Category) %>%
summarize(meanfrequency = mean(Frequency))
# A tibble: 3 x 2
# Category meanfrequency
# <chr> <dbl>
#1 First 2
#2 Second 4.33
#3 Third 1.5
If we do table$Frequency
inside the chain, it is similar to that we do outside. Also, R
is case-sensitive, so need table$Frequency
instead of table$frequency
mean(table$Frequency)
Also, table
is a function/class name, so it is better not to name objects with those names
data
table <- structure(list(Category = c("First", "First", "Second", "First",
"Third", "Third", "Second", "First", "Second"), Frequency = c(1L,
4L, 6L, 1L, 1L, 2L, 6L, 2L, 1L)), class = "data.frame", row.names = c(NA,
-9L))
Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
The problem here is that you are loading dplyr first and then plyr, so plyr's function summarise
is masking dplyr's function summarise
. When that happens you get this warning:
library(plyr)
Loading required package: plyr
------------------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
------------------------------------------------------------------------------------------
Attaching package: ‘plyr’
The following objects are masked from ‘package:dplyr’:
arrange, desc, failwith, id, mutate, summarise, summarize
So in order for your code to work, either detach plyr detach(package:plyr)
or restart R and load plyr first and then dplyr (or load only dplyr):
library(dplyr)
dfx %>% group_by(group, sex) %>%
summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
Source: local data frame [6 x 4]
Groups: group
group sex mean sd
1 A F 41.51 8.24
2 A M 32.23 11.85
3 B F 38.79 11.93
4 B M 31.00 7.92
5 C F 24.97 7.46
6 C M 36.17 9.11
Or you can explicitly call dplyr's summarise in your code, so the right function will be called no matter how you load the packages:
dfx %>% group_by(group, sex) %>%
dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
dplyr issues when using group_by(multiple variables)
Taking Dickoa's answer one step further -- as Hadley says "summarise peels off a single layer of grouping". It peels off grouping from the reverse order in which you applied it so you can just use
mtcars %>%
group_by(cyl, gear) %>%
summarise(newvar = sum(wt)) %>%
summarise(newvar2 = sum(newvar) + 5)
Note that this will give a different answer if you use group_by(gear, cyl)
in the second line.
And to get your first attempt working:
df1 <- mtcars %>%
group_by(cyl, gear) %>%
summarise(newvar = sum(wt))
df2 <- df1 %>%
group_by(cyl) %>%
summarise(newvar2 = sum(newvar)+5)
tidyverse-dplyr summarise not operating as expected
As said in the comments, the problem is that the plyr
version of summarise
is loaded after dplyr
so when you call summarise
you are getting the wrong one. You should try to load plyr
first (or much better, try not to load it at all), but you can also play safe by being explicit which version of summarise
you want.
library(tidyverse)
DF = data.frame(COLUMN_NAME = c("PARTYID","PARTYID","AGE","AGE","SALESID","SALES"),
DATA_TYPE = c("char","tinyint","int","smallint","varchar","numeric"))
# bad:
DF %>% group_by(COLUMN_NAME) %>%
plyr::summarise(mixedTypes = (any(grepl("char", DATA_TYPE)) &
!(all(grepl("char", DATA_TYPE)))))
# good:
DF %>% group_by(COLUMN_NAME) %>%
dplyr::summarise(mixedTypes = (any(grepl("char", DATA_TYPE)) &
!(all(grepl("char", DATA_TYPE)))))
If you really need plyr
loaded as well as dplyr
it would be a good idea to do it this way, and also with other key conflicts like mutate
. But better is to avoid having both loaded at once.
group_by doesn't work properly on retrosheet data
This is because you are using dplyr
and plyr
packages simultaneously. summarize
function is masked from dplyr
by plyr
package.
Try this:
ll_data_frame %>%
group_by(DayOfWeek) %>%
dplyr::summarize(R = sum(HomeRunsScore))
ll_data_frame %>%
group_by(VisitingTeam) %>%
dplyr::summarize(R = sum(HomeRunsScore))
Why are my dplyr group_by & summarize not working properly? (name-collision with plyr)
I believe you've loaded plyr after dplyr, which is why you are getting an overall summary instead of a grouped summary.
This is what happens with plyr loaded last.
library(dplyr)
library(plyr)
df %>%
group_by(DRUG,FED) %>%
summarize(mean=mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high= CI90hi(AUC0t),
min=min(AUC0t, na.rm=TRUE),
max=max(AUC0t,na.rm=TRUE),
sd= sd(AUC0t, na.rm=TRUE))
mean low high min max sd
1 150 105 195 100 200 50
Now remove plyr and try again and you get the grouped summary.
detach(package:plyr)
df %>%
group_by(DRUG,FED) %>%
summarize(mean=mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high= CI90hi(AUC0t),
min=min(AUC0t, na.rm=TRUE),
max=max(AUC0t,na.rm=TRUE),
sd= sd(AUC0t, na.rm=TRUE))
Source: local data frame [4 x 8]
Groups: DRUG
DRUG FED mean low high min max sd
1 0 0 150 150 150 150 150 NaN
2 0 1 NaN NA NA NA NA NaN
3 1 0 100 100 100 100 100 NaN
4 1 1 200 200 200 200 200 NaN
Related Topics
Split an Audio File into Pieces of an Arbitrary Size
Combing a Categorical Variable to Create a New Categorical Variable in R
Conditional Replacement of a Comma With a Dot in a Numeric Column
Error in Confusion Matrix:The Data and Reference Factors Must Have the Same Number of Levels
How to Change the Spacing Between Legend Items in Ggplot2
Add Legend to Geom_Line() Graph in R
How to Filter Multiple Columns With Same Condition in R
Splitting a Large Data Frame into Smaller Segments
Rstudio Does Not Display Any Output in Console After Entering Code
Change R Default Library Path Using .Libpaths in Rprofile.Site Fails to Work
How Does the 'Prop.Table()' Function Work in R
Saving Output of Confusionmatrix as a .Csv Table
Changing from Upper to Lower Case in Several Data Frames
How to Fix Spaces in Column Names of a Data.Frame (Remove Spaces, Inject Dots)
Extract Rows for the First Occurrence of a Variable in a Data Frame
Axis Labels on Two Lines With Nested X Variables (Year Below Months)