Calculate Max Value Across Multiple Columns by Multiple Groups

Calculate max value across multiple columns by multiple groups

Solution using data.table. Find max value on 3:5 columns (Score columns) by ID and Group.

library(data.table)
setDT(d)
d[, .(Max = do.call(max, .SD)), .SDcols = 3:5, .(ID, Group)]

   ID Group Max
1: a1   abc  11
2: a1   def   5
3: a2   def  11

Data:

d <- structure(list(ID = structure(c(1L, 1L, 1L, 2L), .Label = c("a1", 
"a2"), class = "factor"), Group = structure(c(1L, 1L, 2L, 2L), .Label = 
c("abc", 
"def"), class = "factor"), Score1 = c(10L, 0L, 0L, 5L), Score2 = c(0L, 
0L, 5L, 10L), Score3 = c(0L, 11L, 2L, 11L)), class = "data.frame", row.names = 
c(NA, 
-4L))

Select the row with the maximum value in each group based on multiple columns in R dplyr

We may get rowwise max of the 'count' columns with pmax, grouped by 'col1', filter the rows where the max value of 'Max' column is.

library(dplyr)
df1 %>% 
 mutate(Max = pmax(count_col1, count_col2) ) %>%
 group_by(col1) %>%
 filter(Max == max(Max)) %>%
 ungroup %>%
 select(-Max)

-output

# A tibble: 3 × 4
  col1   col2   count_col1 count_col2
  <chr>  <chr>       <dbl>      <dbl>
1 apple  aple            1          4
2 banana banan           4          1
3 banana bananb          4          1

We may also use slice_max

library(purrr)
df1 %>%
  group_by(col1) %>%
  slice_max(invoke(pmax, across(starts_with("count")))) %>%
  ungroup
# A tibble: 3 × 4
  col1   col2   count_col1 count_col2
  <chr>  <chr>       <dbl>      <dbl>
1 apple  aple            1          4
2 banana banan           4          1
3 banana bananb          4          1

How to get the max value of a multiple column group-by pandas?

If you need the bookid and conceptid for the maximum weight, try this

annotations.ix[annotations.groupby(['bookid'], sort=False)['weight'].idxmax()][['bookid', 'conceptid', 'weight']]

Note: Since Pandas v0.20 ix has been deprecated. Use .loc instead.

Find maximum value of one column based on group_by multiple other columns

We can use slice_max instead of summarise to return all the columns after the select step

library(dplyr)
df_k %>%
  group_by(COUNTRY, date_start) %>%
  select(-code) %>%
  slice_max(order_by = 'ord', n = 1)

If we need to create a new column, use mutate

df_k %>%
    group_by(COUNTRY, date_start) %>%
    select(-code) %>%
    mutate(ordMax = max(ord, na.rm = TRUE)) %>%
    ungroup

python get max and min values across mutiple columns while grouping a dataframe

You can melt the DataFrame so that you consider either 'actual' or 'budget' when calculating the min or max. Then group the melted DataFrame and merge back.

id_vars = ['measure', 'measure_group', 'route']

df1 = (df.melt(id_vars=id_vars, value_vars=['actual', 'budget'])
         .groupby(id_vars)['value']
         .agg(['min', 'max']))

df = df.merge(df1, how='left', on=id_vars)

   measure    measure_group route      year  actual  budget  min   max
0       AC  electrification     A  20182019     103      99   99   122
1       AC  electrification     A  20192020     110     122   99   122
2       AC  electrification     B  20182019       9      10    9    55
3       AC  electrification     B  20192020      55      50    9    55
4       HV  electrification     A  20182019       2      10    2    15
5       HV  electrification     A  20192020       7      15    2    15
6       HV  electrification     B  20182019      67      10   10   115
7       HV  electrification     B  20192020     100     115   10   115
8     cat1            track     A  20182019      10      15   10   111
9     cat1            track     A  20192020     111      25   10   111
10    cat1            track     B  20182019      55      16   16   175
11    cat1            track     B  20192020      75     175   16   175
12    cat2            track     A  20182019      84       5    5  1005
13    cat2            track     A  20192020     125    1005    5  1005
14    cat2            track     B  20182019       7       4    4    25
15    cat2            track     B  20192020      15      25    4    25

Multiple column groupby with pandas to find maximum value for each group

I would do it by using merge on the grouped data.

Based on this data:

df = pd.DataFrame({'Feature':['age']*9+['talk']*9,
                   'value':(['No']*3+['Yes']*3+['[Null]']*3)*2,
                   'frequency':[2700,1707,83,222,15,8,323,8,5,20,170,500,210,1500,809,234,43,85],
                   'label':['N','P','O']*6})

Using:

df.groupby(['Feature','value'],as_index=False)['frequency'].max().merge(df,on=['Feature','Value','frequency'])

Outputs:

  Feature   value  frequency label
0     age      No       2700     N
1     age     Yes        222     N
2     age  [Null]        323     N
3    talk      No        500     O
4    talk     Yes       1500     P
5    talk  [Null]        234     N

Adding the extra column can be done via a simple assignment:

df_1['sum_no_max'] = df.groupby(['Feature','value'])['frequency'].sum().values - df_1['frequency'].values

Finally outputting:

  Feature   value  frequency label  sum_no_max
0     age      No       2700     N        1790
1     age     Yes        222     N          23
2     age  [Null]        323     N          13
3    talk      No        500     O         190
4    talk     Yes       1500     P        1019
5    talk  [Null]        234     N         128

SQL MAX of multiple columns?

This is an old answer and broken in many way.

See https://stackoverflow.com/a/6871572/194653 which has way more upvotes and works with sql server 2008+ and handles nulls, etc.

Original but problematic answer:

Well, you can use the CASE statement:

SELECT
    CASE
        WHEN Date1 >= Date2 AND Date1 >= Date3 THEN Date1
        WHEN Date2 >= Date1 AND Date2 >= Date3 THEN Date2
        WHEN Date3 >= Date1 AND Date3 >= Date2 THEN Date3
        ELSE                                        Date1
    END AS MostRecentDate

Calculate Max Value Across Multiple Columns by Multiple Groups