SQL select only rows with max value on a column
At first glance...
All you need is a GROUP BY
clause with the MAX
aggregate function:
SELECT id, MAX(rev)
FROM YourTable
GROUP BY id
It's never that simple, is it?
I just noticed you need the content
column as well.
This is a very common question in SQL: find the whole data for the row with some max value in a column per some group identifier. I heard that a lot during my career. Actually, it was one the questions I answered in my current job's technical interview.
It is, actually, so common that Stack Overflow community has created a single tag just to deal with questions like that: greatest-n-per-group.
Basically, you have two approaches to solve that problem:
Joining with simple group-identifier, max-value-in-group
Sub-query
In this approach, you first find the group-identifier, max-value-in-group
(already solved above) in a sub-query. Then you join your table to the sub-query with equality on both group-identifier
and max-value-in-group
:
SELECT a.id, a.rev, a.contents
FROM YourTable a
INNER JOIN (
SELECT id, MAX(rev) rev
FROM YourTable
GROUP BY id
) b ON a.id = b.id AND a.rev = b.rev
Left Joining with self, tweaking join conditions and filters
In this approach, you left join the table with itself. Equality goes in the group-identifier
. Then, 2 smart moves:
- The second join condition is having left side value less than right value
- When you do step 1, the row(s) that actually have the max value will have
NULL
in the right side (it's aLEFT JOIN
, remember?). Then, we filter the joined result, showing only the rows where the right side isNULL
.
So you end up with:
SELECT a.*
FROM YourTable a
LEFT OUTER JOIN YourTable b
ON a.id = b.id AND a.rev < b.rev
WHERE b.id IS NULL;
Conclusion
Both approaches bring the exact same result.
If you have two rows with max-value-in-group
for group-identifier
, both rows will be in the result in both approaches.
Both approaches are SQL ANSI compatible, thus, will work with your favorite RDBMS, regardless of its "flavor".
Both approaches are also performance friendly, however your mileage may vary (RDBMS, DB Structure, Indexes, etc.). So when you pick one approach over the other, benchmark. And make sure you pick the one which make most of sense to you.
Get records with max value for each group of grouped SQL results
There's a super-simple way to do this in mysql:
select *
from (select * from mytable order by `Group`, age desc, Person) x
group by `Group`
This works because in mysql you're allowed to not aggregate non-group-by columns, in which case mysql just returns the first row. The solution is to first order the data such that for each group the row you want is first, then group by the columns you want the value for.
You avoid complicated subqueries that try to find the max()
etc, and also the problems of returning multiple rows when there are more than one with the same maximum value (as the other answers would do)
Note: This is a mysql-only solution. All other databases I know will throw an SQL syntax error with the message "non aggregated columns are not listed in the group by clause" or similar. Because this solution uses undocumented behavior, the more cautious may want to include a test to assert that it remains working should a future version of MySQL change this behavior.
Version 5.7 update:
Since version 5.7, the sql-mode
setting includes ONLY_FULL_GROUP_BY
by default, so to make this work you must not have this option (edit the option file for the server to remove this setting).
Select the row with the maximum value in each group based on multiple columns in R dplyr
We may get rowwise max of the 'count' columns with pmax
, grouped by 'col1', filter
the rows where the max
value of 'Max' column is.
library(dplyr)
df1 %>%
mutate(Max = pmax(count_col1, count_col2) ) %>%
group_by(col1) %>%
filter(Max == max(Max)) %>%
ungroup %>%
select(-Max)
-output
# A tibble: 3 × 4
col1 col2 count_col1 count_col2
<chr> <chr> <dbl> <dbl>
1 apple aple 1 4
2 banana banan 4 1
3 banana bananb 4 1
We may also use slice_max
library(purrr)
df1 %>%
group_by(col1) %>%
slice_max(invoke(pmax, across(starts_with("count")))) %>%
ungroup
# A tibble: 3 × 4
col1 col2 count_col1 count_col2
<chr> <chr> <dbl> <dbl>
1 apple aple 1 4
2 banana banan 4 1
3 banana bananb 4 1
Select the row with the maximum value in each group
Here's a data.table
solution:
require(data.table) ## 1.9.2
group <- as.data.table(group)
If you want to keep all the entries corresponding to max values of pt
within each group:
group[group[, .I[pt == max(pt)], by=Subject]$V1]
# Subject pt Event
# 1: 1 5 2
# 2: 2 17 2
# 3: 3 5 2
If you'd like just the first max value of pt
:
group[group[, .I[which.max(pt)], by=Subject]$V1]
# Subject pt Event
# 1: 1 5 2
# 2: 2 17 2
# 3: 3 5 2
In this case, it doesn't make a difference, as there aren't multiple maximum values within any group in your data.
Return the row with max value for each group
A perfect use case for DISTINCT ON
:
SELECT DISTINCT ON (realm, race) *
FROM tbl
ORDER BY realm, race, total DESC;
db<>fiddle here
Notably, the query has no GROUP BY
at all.
Assuming total is NOT NULL
, else append NULLS LAST
.
In case of a tie, the winner is arbitrary unless you add more ORDER BY
items to break the tie.
Detailed explanation:
- Select first row in each GROUP BY group?
select rows with max value from group
You can use the RANK
(or DENSE_RANK
) analytic functions to find the maximum value(s) within a group:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( id, col1, col2, col3, col4, col5, date_col ) AS
SELECT 1, 1, 1, 1, 1, 1, DATE '2015-11-13' FROM DUAL
UNION ALL SELECT 2, 1, 1, 1, 1, 2, DATE '2015-11-12' FROM DUAL
UNION ALL SELECT 3, 1, 1, 1, 1, 3, DATE '2015-11-11' FROM DUAL
UNION ALL SELECT 4, 1, 1, 1, 1, 4, DATE '2015-11-13' FROM DUAL
UNION ALL SELECT 5, 1, 1, 1, 1, 5, DATE '2015-11-12' FROM DUAL
UNION ALL SELECT 5, 1, 1, 1, 1, 5, DATE '2015-11-12' FROM DUAL
UNION ALL SELECT 6, 1, 1, 1, 2, 1, DATE '2015-11-12' FROM DUAL
UNION ALL SELECT 7, 1, 1, 1, 2, 2, DATE '2015-11-13' FROM DUAL
UNION ALL SELECT 8, 1, 1, 1, 2, 3, DATE '2015-11-11' FROM DUAL
UNION ALL SELECT 9, 1, 1, 1, 2, 4, DATE '2015-11-12' FROM DUAL
UNION ALL SELECT 10, 1, 1, 1, 2, 5, DATE '2015-11-13' FROM DUAL
Query 1:
SELECT *
FROM (
SELECT t.*,
RANK() OVER ( PARTITION BY col1, col2, col3, col4 ORDER BY date_col DESC ) AS rnk
FROM table_name t
)
WHERE rnk = 1
Results:
| ID | COL1 | COL2 | COL3 | COL4 | COL5 | DATE_COL | RNK |
|----|------|------|------|------|------|----------------------------|-----|
| 1 | 1 | 1 | 1 | 1 | 1 | November, 13 2015 00:00:00 | 1 |
| 4 | 1 | 1 | 1 | 1 | 4 | November, 13 2015 00:00:00 | 1 |
| 7 | 1 | 1 | 1 | 2 | 2 | November, 13 2015 00:00:00 | 1 |
| 10 | 1 | 1 | 1 | 2 | 5 | November, 13 2015 00:00:00 | 1 |
Get the row(s) which have the max value in groups using groupby
In [1]: df
Out[1]:
Sp Mt Value count
0 MM1 S1 a 3
1 MM1 S1 n 2
2 MM1 S3 cb 5
3 MM2 S3 mk 8
4 MM2 S4 bg 10
5 MM2 S4 dgd 1
6 MM4 S2 rd 2
7 MM4 S2 cb 2
8 MM4 S2 uyi 7
In [2]: df.groupby(['Mt'], sort=False)['count'].max()
Out[2]:
Mt
S1 3
S3 8
S4 10
S2 7
Name: count
To get the indices of the original DF you can do:
In [3]: idx = df.groupby(['Mt'])['count'].transform(max) == df['count']
In [4]: df[idx]
Out[4]:
Sp Mt Value count
0 MM1 S1 a 3
3 MM2 S3 mk 8
4 MM2 S4 bg 10
8 MM4 S2 uyi 7
Note that if you have multiple max values per group, all will be returned.
Update
On a hail mary chance that this is what the OP is requesting:
In [5]: df['count_max'] = df.groupby(['Mt'])['count'].transform(max)
In [6]: df
Out[6]:
Sp Mt Value count count_max
0 MM1 S1 a 3 3
1 MM1 S1 n 2 3
2 MM1 S3 cb 5 8
3 MM2 S3 mk 8 8
4 MM2 S4 bg 10 10
5 MM2 S4 dgd 1 10
6 MM4 S2 rd 2 7
7 MM4 S2 cb 2 7
8 MM4 S2 uyi 7 7
row with max value per group - SQLite
Join against that result to get the complete table records
SELECT t1.*
FROM your_table t1
JOIN
(
SELECT name, Max(population) as max_population
FROM your_table
WHERE name IN ('a', 'b', 'c')
GROUP BY name
) t2 ON t1.name = t2.name
and t1.population = t2.max_population
Related Topics
How to Specify the Size of a Graph in Ggplot2 Independent of Axis Labels
Convert Multiple Columns of Numeric Data to Dates in R
Combine Two Lists in a Dataframe in R
Convert Continuous Numeric Values to Discrete Categories Defined by Intervals
Formatting Decimal Places in R
Reorder Bars in Geom_Bar Ggplot2 by Value
Reshape Multiple Value Columns to Wide Format
Why Can't R'S Ifelse Statements Return Vectors
Rotating and Spacing Axis Labels in Ggplot2
Pull Out P-Values and R-Squared from a Linear Regression
Dplyr Conditional Summarise Function
Adding Some Space Between the X-Axis and the Bars, in Ggplot
How to Filter Multiple Columns With Same Condition in R
How to Prevent Ifelse() from Turning Date Objects into Numeric Objects
Cluster Analysis in R: Determine the Optimal Number of Clusters