count unique combinations of values
count
in plyr
package will do that task.
> df
ID value.1 value.2 value.3 value.4
1 1 M D F A
2 2 F M G B
3 3 M D F A
4 4 L D E B
> library(plyr)
> count(df[, -1])
value.1 value.2 value.3 value.4 freq
1 F M G B 1
2 L D E B 1
3 M D F A 2
Counting unique combinations of values across multiple columns regardless of order?
Assuming the character /
doesn't show up in any of the offer names, you can do:
select count(distinct offer_combo) as distinct_offers
from (
select listagg(offer, '/') within group (order by offer) as offer_combo
from (
select customer_id, offer_1 as offer from t
union all select customer_id, offer_2 from t
union all select customer_id, offer_3 from t
) x
group by customer_id
) y
Result:
DISTINCT_OFFERS
---------------
2
See running example at db<>fiddle.
Count unique combinations regardless of column order
Another solution, using .groupby
:
x = (
df1.groupby(df1.apply(lambda x: tuple(sorted(x)), axis=1))
.agg(A=("A", "first"), B=("B", "first"), count=("B", "size"))
.reset_index(drop=True)
)
print(x)
Prints:
A B count
0 cat bunny 1
1 bunny mouse 2
2 dog cat 3
3 mouse dog 1
How to count unique combinations of values in selected columns in pandas data frame including frequencies with the value of 0?
Use Series.reindex
with MultiIndex.from_product
:
s = df.groupby(['Colour', 'TOY_ID']).size()
s = s.reindex(pd.MultiIndex.from_product(s.index.levels), fill_value=0)
print (s)
Colour TOY_ID
Blue 31490.0 50
31569.0 50
50360636.0 20
50366678.0 0
Green 31490.0 17
31569.0 0
50360636.0 0
50366678.0 10
Yellow 31490.0 0
31569.0 0
50360636.0 25
50366678.0 9
Name: a, dtype: int64
count unique combinations of variable values in an R dataframe column
An option with tidyverse
where group by 'id', paste
the 'status' and get the count
library(dplyr)
library(stringr)
df %>%
group_by(id) %>%
summarise(status = str_c(status, collapse="")) %>%
count(status)
# A tibble: 4 x 2
# status n
# <chr> <int>
#1 abc 2
#2 b 1
#3 bc 2
#4 bcd 2
Add a count unique combinations across rows in pandas
I think yes, you can use:
cols = df.columns.difference(['id']).tolist()
#should working like
#cols = ['cat_1','cat_2', 'cat_3', 'cat_4', 'cat_5', 'cat_6', 'cat_7']
df = df.groupby(cols, sort=False).size().reset_index(name='count')
print (df)
cat_1 cat_2 cat_3 cat_4 count
0 Chips Null Null Null 1
1 Chips Avocado Null Null 1
2 Chips Pasta Null Null 2
3 Chips Pasta Cheese Null 1
4 Chips Sauce Cheese Null 1
5 Pasta Null Null Null 2
6 Pasta Bread Null Null 2
7 Pasta Cheese Null Null 1
Count unique combinations in and summarize other columns in new one
We could use return as a list
library(data.table)
dt[, .(N = .N, new_col = .(d)), by = .(a, b, c)]
a b c N new_col
<char> <char> <char> <int> <list>
1: 1a 1b 1c 2 n1,n2
2: 2a 2b 2c 4 n1,n2,n3,n4
unique combinations of values in selected columns in pandas data frame and count
You can groupby
on cols 'A' and 'B' and call size
and then reset_index
and rename
the generated column:
In [26]:
df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
Out[26]:
A B count
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3
update
A little explanation, by grouping on the 2 columns, this groups rows where A and B values are the same, we call size
which returns the number of unique groups:
In[202]:
df1.groupby(['A','B']).size()
Out[202]:
A B
no no 1
yes 2
yes no 4
yes 3
dtype: int64
So now to restore the grouped columns, we call reset_index
:
In[203]:
df1.groupby(['A','B']).size().reset_index()
Out[203]:
A B 0
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3
This restores the indices but the size aggregation is turned into a generated column 0
, so we have to rename this:
In[204]:
df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
Out[204]:
A B count
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3
groupby
does accept the arg as_index
which we could have set to False
so it doesn't make the grouped columns the index, but this generates a series
and you'd still have to restore the indices and so on....:
In[205]:
df1.groupby(['A','B'], as_index=False).size()
Out[205]:
A B
no no 1
yes 2
yes no 4
yes 3
dtype: int64
Related Topics
How to Swap Labels and Symbols in a Legend in R
Implementation of Skyline Query or Efficient Frontier
Compute Projection/Hat Matrix via Qr Factorization, Svd (And Cholesky Factorization)
Join Two Data Tables and Use Only One Column from Second Dt
Draw Lines Between Different Elements in a Stacked Bar Plot
De-Aggregate/Reverse-Summarise/Expand a Dataset in R
How to Add Se Error Bars to My Barplot in Ggplot2
How to Prep Transaction Data into Basket for Arules
Plot Dates on the X Axis and Time on the Y Axis with Ggplot2
Extracting Indices for Data Frame Rows That Have Max Value for Named Field
Manipulating Files with Non-English Names in R
Parallel Processing in R Limited
Differencebetween Short (&,|) and Long (&&, ||) Forms of And, or Logical Operators in R