Calculate max value across multiple columns by multiple groups
Solution using data.table
. Find max value on 3:5
columns (Score columns) by ID
and Group
.
library(data.table)
setDT(d)
d[, .(Max = do.call(max, .SD)), .SDcols = 3:5, .(ID, Group)]
ID Group Max
1: a1 abc 11
2: a1 def 5
3: a2 def 11
Data:
d <- structure(list(ID = structure(c(1L, 1L, 1L, 2L), .Label = c("a1",
"a2"), class = "factor"), Group = structure(c(1L, 1L, 2L, 2L), .Label =
c("abc",
"def"), class = "factor"), Score1 = c(10L, 0L, 0L, 5L), Score2 = c(0L,
0L, 5L, 10L), Score3 = c(0L, 11L, 2L, 11L)), class = "data.frame", row.names =
c(NA,
-4L))
Select the row with the maximum value in each group based on multiple columns in R dplyr
We may get rowwise max of the 'count' columns with pmax
, grouped by 'col1', filter
the rows where the max
value of 'Max' column is.
library(dplyr)
df1 %>%
mutate(Max = pmax(count_col1, count_col2) ) %>%
group_by(col1) %>%
filter(Max == max(Max)) %>%
ungroup %>%
select(-Max)
-output
# A tibble: 3 × 4
col1 col2 count_col1 count_col2
<chr> <chr> <dbl> <dbl>
1 apple aple 1 4
2 banana banan 4 1
3 banana bananb 4 1
We may also use slice_max
library(purrr)
df1 %>%
group_by(col1) %>%
slice_max(invoke(pmax, across(starts_with("count")))) %>%
ungroup
# A tibble: 3 × 4
col1 col2 count_col1 count_col2
<chr> <chr> <dbl> <dbl>
1 apple aple 1 4
2 banana banan 4 1
3 banana bananb 4 1
How to get the max value of a multiple column group-by pandas?
If you need the bookid and conceptid for the maximum weight, try this
annotations.ix[annotations.groupby(['bookid'], sort=False)['weight'].idxmax()][['bookid', 'conceptid', 'weight']]
Note: Since Pandas v0.20 ix
has been deprecated. Use .loc
instead.
Find maximum value of one column based on group_by multiple other columns
We can use slice_max
instead of summarise
to return all the columns after the select
step
library(dplyr)
df_k %>%
group_by(COUNTRY, date_start) %>%
select(-code) %>%
slice_max(order_by = 'ord', n = 1)
If we need to create a new column, use mutate
df_k %>%
group_by(COUNTRY, date_start) %>%
select(-code) %>%
mutate(ordMax = max(ord, na.rm = TRUE)) %>%
ungroup
python get max and min values across mutiple columns while grouping a dataframe
You can melt
the DataFrame so that you consider either 'actual' or 'budget' when calculating the min or max. Then group the melted DataFrame and merge back.
id_vars = ['measure', 'measure_group', 'route']
df1 = (df.melt(id_vars=id_vars, value_vars=['actual', 'budget'])
.groupby(id_vars)['value']
.agg(['min', 'max']))
df = df.merge(df1, how='left', on=id_vars)
measure measure_group route year actual budget min max
0 AC electrification A 20182019 103 99 99 122
1 AC electrification A 20192020 110 122 99 122
2 AC electrification B 20182019 9 10 9 55
3 AC electrification B 20192020 55 50 9 55
4 HV electrification A 20182019 2 10 2 15
5 HV electrification A 20192020 7 15 2 15
6 HV electrification B 20182019 67 10 10 115
7 HV electrification B 20192020 100 115 10 115
8 cat1 track A 20182019 10 15 10 111
9 cat1 track A 20192020 111 25 10 111
10 cat1 track B 20182019 55 16 16 175
11 cat1 track B 20192020 75 175 16 175
12 cat2 track A 20182019 84 5 5 1005
13 cat2 track A 20192020 125 1005 5 1005
14 cat2 track B 20182019 7 4 4 25
15 cat2 track B 20192020 15 25 4 25
Multiple column groupby with pandas to find maximum value for each group
I would do it by using merge
on the grouped data.
Based on this data:
df = pd.DataFrame({'Feature':['age']*9+['talk']*9,
'value':(['No']*3+['Yes']*3+['[Null]']*3)*2,
'frequency':[2700,1707,83,222,15,8,323,8,5,20,170,500,210,1500,809,234,43,85],
'label':['N','P','O']*6})
Using:
df.groupby(['Feature','value'],as_index=False)['frequency'].max().merge(df,on=['Feature','Value','frequency'])
Outputs:
Feature value frequency label
0 age No 2700 N
1 age Yes 222 N
2 age [Null] 323 N
3 talk No 500 O
4 talk Yes 1500 P
5 talk [Null] 234 N
Adding the extra column can be done via a simple assignment:
df_1['sum_no_max'] = df.groupby(['Feature','value'])['frequency'].sum().values - df_1['frequency'].values
Finally outputting:
Feature value frequency label sum_no_max
0 age No 2700 N 1790
1 age Yes 222 N 23
2 age [Null] 323 N 13
3 talk No 500 O 190
4 talk Yes 1500 P 1019
5 talk [Null] 234 N 128
SQL MAX of multiple columns?
This is an old answer and broken in many way.
See https://stackoverflow.com/a/6871572/194653 which has way more upvotes and works with sql server 2008+ and handles nulls, etc.
Original but problematic answer:
Well, you can use the CASE statement:
SELECT
CASE
WHEN Date1 >= Date2 AND Date1 >= Date3 THEN Date1
WHEN Date2 >= Date1 AND Date2 >= Date3 THEN Date2
WHEN Date3 >= Date1 AND Date3 >= Date2 THEN Date3
ELSE Date1
END AS MostRecentDate
Related Topics
How to Name Variables on the Fly
Paste Multiple Columns Together
Error: Unexpected Symbol/Input/String Constant/Numeric Constant/Special in My Code
Ggplot2 - Bar Plot With Both Stack and Dodge
Cbind a Dataframe With an Empty Dataframe - Cbind.Fill
Remove Part of String After "."
Convert Date-Time String to Class Date
Can Dplyr Package Be Used For Conditional Mutating
Compare Two Data.Frames to Find the Rows in Data.Frame 1 That Are Not Present in Data.Frame 2
Access Lapply Index Names Inside Fun
Fitting a Density Curve to a Histogram in R
Multirow Axis Labels With Nested Grouping Variables
Replace Na With Previous or Next Value, by Group, Using Dplyr
Difference Between '%In%' and '=='
Adding a Column of Means by Group to Original Data