Count number of records returned by group by
You can do both in one query using the OVER clause on another COUNT
select
count(*) RecordsPerGroup,
COUNT(*) OVER () AS TotalRecords
from temptable
group by column_1, column_2, column_3, column_4
Count number of rows within each group
Current best practice (tidyverse) is:
require(dplyr)
df1 %>% count(Year, Month)
count number of rows in a data frame in R based on group
Here's an example that shows how table(.)
(or, more closely matching your desired output, data.frame(table(.))
does what it sounds like you are asking for.
Note also how to share reproducible sample data in a way that others can copy and paste into their session.
Here's the (reproducible) sample data:
mydf <- structure(list(ID = c(110L, 111L, 121L, 131L, 141L),
MONTH.YEAR = c("JAN. 2012", "JAN. 2012",
"FEB. 2012", "FEB. 2012",
"MAR. 2012"),
VALUE = c(1000L, 2000L, 3000L, 4000L, 5000L)),
.Names = c("ID", "MONTH.YEAR", "VALUE"),
class = "data.frame", row.names = c(NA, -5L))
mydf
# ID MONTH.YEAR VALUE
# 1 110 JAN. 2012 1000
# 2 111 JAN. 2012 2000
# 3 121 FEB. 2012 3000
# 4 131 FEB. 2012 4000
# 5 141 MAR. 2012 5000
Here's the calculation of the number of rows per group, in two output display formats:
table(mydf$MONTH.YEAR)
#
# FEB. 2012 JAN. 2012 MAR. 2012
# 2 2 1
data.frame(table(mydf$MONTH.YEAR))
# Var1 Freq
# 1 FEB. 2012 2
# 2 JAN. 2012 2
# 3 MAR. 2012 1
Count rows within each group when condition is satisfied Sql Server
You can do this with two levels of aggregation:
select id, count(*) howManyMonths
from (
select id
from mytable
group by id, year(date), month(date)
having avg(1.0 * isFull) > 0.6
) t
group by id
The subquery aggregates by id, year and month, and uses a having
clause to filter on groups that meet the success rate (avg()
comes handy for this). The outer query counts how many month passed the target rate for each id.
How can I count the number of rows within each group using SQL?
SELECT T.SITE,T.DATE,COUNT(*)CNTT FROM YOUR_TABLE AS T GROUP BY T.SITE,T.DATE
Based on your description you need something like this query
How can I count the number of rows per group in Pandas?
Try for pandas 0.25+
df.groupby(['year_of_award']).agg(number_of_rows=('award': 'count'))
else
df.groupby(['year_of_award']).agg({'award': 'count'}).rename(columns={'count': 'number_of_rows'})
How to count how many rows inside a group by group meets a certain criteria
I would suggest using CASE WHEN
(standard ISO SQL syntax) like in this example:
SELECT a.category,
SUM(CASE WHEN a.is_interesting = 1 THEN 1 END) AS conditional_count,
COUNT(*) group_count
FROM a
GROUP BY a.category
This will sum up values of 1 and null values (when the condition is false), which comes down to actually counting the records that meet the condition.
This will however return null when no records meet the conditions. If you want to have 0 in that case, you can either wrap the SUM
like this:
COALESCE(SUM(CASE WHEN a.is_interesting = 1 THEN 1 END), 0)
or, shorter, use COUNT
instead of SUM
:
COUNT(CASE WHEN a.is_interesting = 1 THEN 1 END)
For COUNT
it does not matter what value you put in the THEN
clause, as long as it is not null. It will count the instances where the expression is not null.
The addition of the ELSE 0
clause also generally returns 0 with SUM
:
SUM(CASE WHEN a.is_interesting = 1 THEN 1 ELSE 0 END)
There is however one boundary case where that SUM
will still return null. This is when there is no GROUP BY
clause and no records meet the WHERE
clause. For instance:
SELECT SUM(CASE WHEN 1 = 1 THEN 1 ELSE 0 END)
FROM a
WHERE 1 = 0
will return null, while the COUNT
or COALESCE
versions will still return 0.
Add a column that count number of rows until the first 1, by group in R
df <- data.frame(Group=c(1,1,1,1,2,2),
var1=c(1,0,0,1,1,1),
var2=c(0,0,1,1,0,0),
var3=c(0,1,0,0,0,1))
This works for any number of variables as long as the structure is the same as in the example (i.e. Group + many variables that are 0 or 1)
df %>%
mutate(rownr = row_number()) %>%
pivot_longer(-c(Group, rownr)) %>%
group_by(Group, name) %>%
mutate(out = cumsum(value != 1 & (cumsum(value) < 1)) + 1,
out = ifelse(max(out) > n(), 0, max(out))) %>%
pivot_wider(names_from = c(name, name), values_from = c(value, out)) %>%
select(-rownr)
Returns:
Group value_var1 value_var2 value_var3 out_var1 out_var2 out_var3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 0 1 3 2
2 1 0 0 1 1 3 2
3 1 0 1 0 1 3 2
4 1 1 1 0 1 3 2
5 2 1 0 0 1 0 2
6 2 1 0 1 1 0 2
How can I count a number of conditional rows within r dplyr mutate?
Here is a dplyr
only solution:
The trick is to substract the grouping number of X (e.g. cumsum(Product=="X")
from the sum of X (e.g. sum(Product=="X")
in each Customer
group:
library(dplyr)
df %>%
arrange(Customer, Date) %>%
group_by(Customer) %>%
mutate(nSubsqX1 = sum(Product=="X") - cumsum(Product=="X"))
Date Customer Product nSubsqX1
<date> <chr> <chr> <int>
1 2020-05-18 A X 0
2 2020-02-10 B X 5
3 2020-02-12 B Y 5
4 2020-03-04 B Z 5
5 2020-03-29 B X 4
6 2020-04-08 B X 3
7 2020-04-30 B X 2
8 2020-05-13 B X 1
9 2020-05-23 B Y 1
10 2020-07-02 B Y 1
11 2020-08-26 B Y 1
12 2020-12-06 B X 0
13 2020-01-31 C X 3
14 2020-09-19 C X 2
15 2020-10-13 C X 1
16 2020-11-11 C X 0
17 2020-12-26 C Y 0
If the number of rows in a group exceeds X number of observations, randomly sample X number of rows
Here is one way to group by group column and create a condition in slice
to check if the number of rows (n()
) is greater than 'X', sample the sequence of rows (row_number()
) with X
or else return row_number()
(or sample in case X
is different value
library(dplyr)
X <- 2
df %>%
group_by(group) %>%
slice(if(n() >= X) sample(row_number(), X, replace = FALSE) else
sample(row_number())) %>%
ungroup
-output
# A tibble: 5 × 2
id group
<int> <int>
1 10 1
2 8 2
3 4 2
4 1 3
5 9 3
Related Topics
Error in If/While (Condition) {: Missing Value Where True/False Needed
Extract Row Corresponding to Minimum Value of a Variable by Group
How to Select the Rows With Maximum Values in Each Group With Dplyr
Summarizing Multiple Columns With Dplyr
Adding a Column of Means by Group to Original Data
How to Use "≪≪-" (Scoping Assignment) in R
How Can Two Strings Be Concatenated
Order Data Frame Rows According to Vector With Specific Order
Getting the Top Values by Group
Predict() - Maybe I'M Not Understanding It
Is There a Dplyr Equivalent to Data.Table::Rleid
Run R Script from Command Line
How to Plot With 2 Different Y-Axes
Cleaning Up Factor Levels (Collapsing Multiple Levels/Labels)
Convert a List to a Data Frame