Count Unique Combinations of Values

count unique combinations of values

count in plyr package will do that task.

> df
  ID   value.1   value.2   value.3 value.4
1  1     M         D         F           A
2  2     F         M         G           B
3  3     M         D         F           A
4  4     L         D         E           B
> library(plyr)
> count(df[, -1])
    value.1   value.2   value.3 value.4 freq
1     F         M         G           B    1
2     L         D         E           B    1
3     M         D         F           A    2

Counting unique combinations of values across multiple columns regardless of order?

Assuming the character / doesn't show up in any of the offer names, you can do:

select count(distinct offer_combo) as distinct_offers
from (
  select listagg(offer, '/') within group (order by offer) as offer_combo
  from (
    select customer_id, offer_1 as offer from t
    union all select customer_id, offer_2 from t
    union all select customer_id, offer_3 from t
  ) x
  group by customer_id
) y

Result:

DISTINCT_OFFERS
---------------
2

See running example at db<>fiddle.

Count unique combinations regardless of column order

Another solution, using .groupby:

x = (
    df1.groupby(df1.apply(lambda x: tuple(sorted(x)), axis=1))
    .agg(A=("A", "first"), B=("B", "first"), count=("B", "size"))
    .reset_index(drop=True)
)
print(x)

Prints:

       A      B  count
0    cat  bunny      1
1  bunny  mouse      2
2    dog    cat      3
3  mouse    dog      1

How to count unique combinations of values in selected columns in pandas data frame including frequencies with the value of 0?

Use Series.reindex with MultiIndex.from_product:

s = df.groupby(['Colour', 'TOY_ID']).size()

s = s.reindex(pd.MultiIndex.from_product(s.index.levels), fill_value=0)
print (s)
Colour  TOY_ID    
Blue    31490.0       50
        31569.0       50
        50360636.0    20
        50366678.0     0
Green   31490.0       17
        31569.0        0
        50360636.0     0
        50366678.0    10
Yellow  31490.0        0
        31569.0        0
        50360636.0    25
        50366678.0     9
Name: a, dtype: int64

count unique combinations of variable values in an R dataframe column

An option with tidyverse where group by 'id', paste the 'status' and get the count

library(dplyr)
library(stringr)
df %>% 
   group_by(id) %>% 
   summarise(status = str_c(status, collapse="")) %>% 
   count(status)
# A tibble: 4 x 2
#  status     n
#  <chr>  <int>
#1 abc        2
#2 b          1
#3 bc         2
#4 bcd        2

Add a count unique combinations across rows in pandas

I think yes, you can use:

cols = df.columns.difference(['id']).tolist()
#should working like
#cols = ['cat_1','cat_2', 'cat_3', 'cat_4', 'cat_5', 'cat_6', 'cat_7']
df = df.groupby(cols, sort=False).size().reset_index(name='count')
print (df)
   cat_1    cat_2   cat_3 cat_4  count
0  Chips     Null    Null  Null      1
1  Chips  Avocado    Null  Null      1
2  Chips    Pasta    Null  Null      2
3  Chips    Pasta  Cheese  Null      1
4  Chips    Sauce  Cheese  Null      1
5  Pasta     Null    Null  Null      2
6  Pasta    Bread    Null  Null      2
7  Pasta   Cheese    Null  Null      1

Count unique combinations in and summarize other columns in new one

We could use return as a list

library(data.table)
dt[, .(N = .N, new_col = .(d)), by = .(a, b, c)]
        a      b      c     N     new_col
   <char> <char> <char> <int>      <list>
1:     1a     1b     1c     2       n1,n2
2:     2a     2b     2c     4 n1,n2,n3,n4

unique combinations of values in selected columns in pandas data frame and count

You can groupby on cols 'A' and 'B' and call size and then reset_index and rename the generated column:

In [26]:

df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
Out[26]:
     A    B  count
0   no   no      1
1   no  yes      2
2  yes   no      4
3  yes  yes      3

update

A little explanation, by grouping on the 2 columns, this groups rows where A and B values are the same, we call size which returns the number of unique groups:

In[202]:
df1.groupby(['A','B']).size()

Out[202]: 
A    B  
no   no     1
     yes    2
yes  no     4
     yes    3
dtype: int64

So now to restore the grouped columns, we call reset_index:

In[203]:
df1.groupby(['A','B']).size().reset_index()

Out[203]: 
     A    B  0
0   no   no  1
1   no  yes  2
2  yes   no  4
3  yes  yes  3

This restores the indices but the size aggregation is turned into a generated column 0, so we have to rename this:

In[204]:
df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})

Out[204]: 
     A    B  count
0   no   no      1
1   no  yes      2
2  yes   no      4
3  yes  yes      3

groupby does accept the arg as_index which we could have set to False so it doesn't make the grouped columns the index, but this generates a series and you'd still have to restore the indices and so on....:

In[205]:
df1.groupby(['A','B'], as_index=False).size()

Out[205]: 
A    B  
no   no     1
     yes    2
yes  no     4
     yes    3
dtype: int64

Count Unique Combinations of Values