R collapse multiple rows into 1 row - same columns
As you suggested that you would like a data.table
solution in your comment, you could use
library(data.table)
df <- data.table(record_numb,col_a,col_b,col_c)
df[, lapply(.SD, paste0, collapse=""), by=record_numb]
record_numb col_a col_b col_c
1: 1 123 234 543
2: 2 987 765 543
.SD
basically says, "take all the variables in my data.table" except those in the by argument. In @Frank's answer, he reduces the set of the variables using .SDcols
. If you want to cast the variables into numeric, you can still do this in one line. Here is a chaining method.
df[, lapply(.SD, paste0, collapse=""), by=record_numb][, lapply(.SD, as.integer)]
The second "chain" casts all the variables as integers.
In R, collapse over multiple logical rows of the same ID into 1 row
We can use any
instead of paste
as any
will check for any TRUE elements in the column, grouped by 'ID'
library(data.table)
setDT(df)[, lapply(.SD, any), ID]
-output
# ID cardiovasc beta_blockers antibiotics
#1: a TRUE FALSE TRUE
How to collapse multiple rows with condition into one row using dplyr in r?
Here's another way to achieve the output.
library(tidyverse)
df %>%
mutate(value = str_extract(Description, "'\\w+'"),
Description = trimws(str_remove(Description, value))) %>%
group_by(Description, Category) %>%
summarise(ID = toString(ID),
value = sprintf("'%s'", toString(gsub("'", "", value)))) %>%
unite(Description, value, Description, sep = ' ')
# Description Category ID
# <chr> <chr> <chr>
#1 'foo' is a cat B 3
#2 'foo, bar' is a dog A 1, 2
#3 'bar' is a fish C 5
#4 'foo' is not a cat B 4
R collapsing multiple rows into one row by grouping multiple columns
Here is one option with dplyr
library(dplyr)
df %>%
group_by_at(groupColumns) %>%
summarise_at(vars(dataColumns), ~ if(all(is.na(.))) NA_real_
else na.omit(.))
# A tibble: 3 x 6
# Groups: TreatName, id [3]
# TreatName id Method drug1 drug2 drug3
# <fct> <fct> <fct> <dbl> <dbl> <dbl>
#1 Dynamic patient2 IV NA NA 56
#2 Static patient1 IV 34 7 NA
#3 Static patient2 IV NA NA 0
R collapse multiple rows into 1 row using specific function to date & character columns
It is not clear why we have to go through Map
and get
. After grouping by 'id', get the mean
of 'date1' and paste
the 'charval' together
dt2[, .(date1 = mean(date1), charval = toString(charval)), id]
# id date1 charval
#1: 1 2009-01-02 aa, vv, ss
#2: 2 2009-01-05 a, b, c, d
Note: toString
is paste(..., collapse=', ')
dt2[, .(date1 = mean(date1), charval = paste(charval, collapse=";")), id]
# id date1 charval
#1: 1 2009-01-02 aa;vv;ss
#2: 2 2009-01-05 a;b;c;d
As the OP's question is about Map
with using get
to call the mean
. This seems to be triggering the
if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
warning("argument is not numeric or logical: returning NA")
return(NA_real_)
and returns the NA when it finds that 'date1' is of class Date
although it is stored as numeric
. One option is to specify the envir
in get
Another problem is the use of ifelse
. It is better to use if/else
as there are only two elements
dt2[, Map(function(x, y) if(x != "paste") get(x, envir = parent.frame())(y, na.rm = TRUE)
else paste(y, collapse=':'), setNames(c("mean", "paste"), names(.SD)), .SD), by = id]
# id date1 charval
#1: 1 2009-01-02 aa:vv:ss
#2: 2 2009-01-05 a:b:c:d
get
is kind of tricky and if specify the correct environment, it works as expected
get("mean")(dt2$date1)
#[1] "2009-01-04"
Or instead of if/else
to the "paste" string, we can check on the column class
and if it is character
then do the paste
or else return mean
dt2[, Map(function(x, y) if(is.character(y)) get(x)(y, collapse=":")
else get(x, envir = parent.frame())(y, na.rm = TRUE),
setNames(c("mean", "paste"), names(.SD)), .SD), by = id]
# id date1 charval
#1: 1 2009-01-02 aa:vv:ss
#2: 2 2009-01-05 a:b:c:d
Note that it is better to use the first approach without any hassles
How to merge multiple rows into a single row for a single column?
As it is a tibble, we can make use of tidyverse functions (in the newer version of dplyr
, we can use across
with summarise
)
library(dplyr)
library(stringr)
df %>%
group_by(Injury) %>%
summarise(across(everything(), str_c, collapse=""))
Or with summarise_at
df %>%
group_by(Injury) %>%
summarise_at(vars(-group_cols()), str_c, collapse="")
Related Topics
Calculate Group Mean, Sum, or Other Summary Stats. and Assign Column to Original Data
Use Dynamic Name For New Column/Variable in 'Dplyr'
Drop Unused Factor Levels in a Subsetted Data Frame
R Memory Management/Cannot Allocate Vector of Size N Mb
Split Delimited Strings in a Column and Insert as New Rows
Order Discrete X Scale by Frequency/Value
Split Data.Frame Based on Levels of a Factor into New Data.Frames
Find Complement of a Data Frame (Anti - Join)
Combine a List of Data Frames into One Data Frame by Row
Add Regression Line Equation and R^2 on Graph
Reshape Three Column Data Frame to Matrix ("Long" to "Wide" Format)
Finding All Duplicate Rows, Including "Elements With Smaller Subscripts"
Understanding Exactly When a Data.Table Is a Reference to (Vs a Copy Of) Another Data.Table
How to Implement Coalesce Efficiently in R
Subset Data Frame Based on Number of Rows Per Group
Select Rows from a Data Frame Based on Values in a Vector
Combine Two Data Frames by Rows (Rbind) When They Have Different Sets of Columns