Calculate Max Value Across Multiple Columns by Multiple Groups

Calculate max value across multiple columns by multiple groups

Solution using data.table. Find max value on 3:5 columns (Score columns) by ID and Group.

library(data.table)
setDT(d)
d[, .(Max = do.call(max, .SD)), .SDcols = 3:5, .(ID, Group)]

ID Group Max
1: a1 abc 11
2: a1 def 5
3: a2 def 11

Data:

d <- structure(list(ID = structure(c(1L, 1L, 1L, 2L), .Label = c("a1", 
"a2"), class = "factor"), Group = structure(c(1L, 1L, 2L, 2L), .Label =
c("abc",
"def"), class = "factor"), Score1 = c(10L, 0L, 0L, 5L), Score2 = c(0L,
0L, 5L, 10L), Score3 = c(0L, 11L, 2L, 11L)), class = "data.frame", row.names =
c(NA,
-4L))

Select the row with the maximum value in each group based on multiple columns in R dplyr

We may get rowwise max of the 'count' columns with pmax, grouped by 'col1', filter the rows where the max value of 'Max' column is.

library(dplyr)
df1 %>%
mutate(Max = pmax(count_col1, count_col2) ) %>%
group_by(col1) %>%
filter(Max == max(Max)) %>%
ungroup %>%
select(-Max)

-output

# A tibble: 3 × 4
col1 col2 count_col1 count_col2
<chr> <chr> <dbl> <dbl>
1 apple aple 1 4
2 banana banan 4 1
3 banana bananb 4 1

We may also use slice_max

library(purrr)
df1 %>%
group_by(col1) %>%
slice_max(invoke(pmax, across(starts_with("count")))) %>%
ungroup
# A tibble: 3 × 4
col1 col2 count_col1 count_col2
<chr> <chr> <dbl> <dbl>
1 apple aple 1 4
2 banana banan 4 1
3 banana bananb 4 1

How to get the max value of a multiple column group-by pandas?

If you need the bookid and conceptid for the maximum weight, try this

annotations.ix[annotations.groupby(['bookid'], sort=False)['weight'].idxmax()][['bookid', 'conceptid', 'weight']]

Note: Since Pandas v0.20 ix has been deprecated. Use .loc instead.

Find maximum value of one column based on group_by multiple other columns

We can use slice_max instead of summarise to return all the columns after the select step

library(dplyr)
df_k %>%
group_by(COUNTRY, date_start) %>%
select(-code) %>%
slice_max(order_by = 'ord', n = 1)

If we need to create a new column, use mutate

df_k %>%
group_by(COUNTRY, date_start) %>%
select(-code) %>%
mutate(ordMax = max(ord, na.rm = TRUE)) %>%
ungroup

python get max and min values across mutiple columns while grouping a dataframe

You can melt the DataFrame so that you consider either 'actual' or 'budget' when calculating the min or max. Then group the melted DataFrame and merge back.

id_vars = ['measure', 'measure_group', 'route']

df1 = (df.melt(id_vars=id_vars, value_vars=['actual', 'budget'])
.groupby(id_vars)['value']
.agg(['min', 'max']))

df = df.merge(df1, how='left', on=id_vars)


   measure    measure_group route      year  actual  budget  min   max
0 AC electrification A 20182019 103 99 99 122
1 AC electrification A 20192020 110 122 99 122
2 AC electrification B 20182019 9 10 9 55
3 AC electrification B 20192020 55 50 9 55
4 HV electrification A 20182019 2 10 2 15
5 HV electrification A 20192020 7 15 2 15
6 HV electrification B 20182019 67 10 10 115
7 HV electrification B 20192020 100 115 10 115
8 cat1 track A 20182019 10 15 10 111
9 cat1 track A 20192020 111 25 10 111
10 cat1 track B 20182019 55 16 16 175
11 cat1 track B 20192020 75 175 16 175
12 cat2 track A 20182019 84 5 5 1005
13 cat2 track A 20192020 125 1005 5 1005
14 cat2 track B 20182019 7 4 4 25
15 cat2 track B 20192020 15 25 4 25

Multiple column groupby with pandas to find maximum value for each group

I would do it by using merge on the grouped data.

Based on this data:

df = pd.DataFrame({'Feature':['age']*9+['talk']*9,
'value':(['No']*3+['Yes']*3+['[Null]']*3)*2,
'frequency':[2700,1707,83,222,15,8,323,8,5,20,170,500,210,1500,809,234,43,85],
'label':['N','P','O']*6})

Using:

df.groupby(['Feature','value'],as_index=False)['frequency'].max().merge(df,on=['Feature','Value','frequency'])

Outputs:

  Feature   value  frequency label
0 age No 2700 N
1 age Yes 222 N
2 age [Null] 323 N
3 talk No 500 O
4 talk Yes 1500 P
5 talk [Null] 234 N

Adding the extra column can be done via a simple assignment:

df_1['sum_no_max'] = df.groupby(['Feature','value'])['frequency'].sum().values - df_1['frequency'].values

Finally outputting:

  Feature   value  frequency label  sum_no_max
0 age No 2700 N 1790
1 age Yes 222 N 23
2 age [Null] 323 N 13
3 talk No 500 O 190
4 talk Yes 1500 P 1019
5 talk [Null] 234 N 128

SQL MAX of multiple columns?

This is an old answer and broken in many way.

See https://stackoverflow.com/a/6871572/194653 which has way more upvotes and works with sql server 2008+ and handles nulls, etc.

Original but problematic answer:

Well, you can use the CASE statement:

SELECT
CASE
WHEN Date1 >= Date2 AND Date1 >= Date3 THEN Date1
WHEN Date2 >= Date1 AND Date2 >= Date3 THEN Date2
WHEN Date3 >= Date1 AND Date3 >= Date2 THEN Date3
ELSE Date1
END AS MostRecentDate


Related Topics



Leave a reply



Submit