Expand ranges defined by from and to columns
You can use the plyr
package:
library(plyr)
ddply(presidents, "name", summarise, year = seq(from, to))
# name year
# 1 Barack Obama 2009
# 2 Barack Obama 2010
# 3 Barack Obama 2011
# 4 Barack Obama 2012
# 5 Bill Clinton 1993
# 6 Bill Clinton 1994
# [...]
and if it is important that the data be sorted by year, you can use the arrange
function:
df <- ddply(presidents, "name", summarise, year = seq(from, to))
arrange(df, df$year)
# name year
# 1 Bill Clinton 1993
# 2 Bill Clinton 1994
# 3 Bill Clinton 1995
# [...]
# 21 Barack Obama 2011
# 22 Barack Obama 2012
Edit 1: Following's @edgester's "Update 1", a more appropriate approach is to use adply
to account for presidents with non-consecutive terms:
adply(foo, 1, summarise, year = seq(from, to))[c("name", "year")]
expand a data frame to have as many rows as range of two columns in original row
With dplyr
, we can use rowwise
with do
library(dplyr)
df1 %>%
rowwise() %>%
do(data.frame(symbol= .$symbol, value = .$start:.$end)) %>%
arrange(symbol)
# A tibble: 30 x 2
# symbol value
# <chr> <int>
# 1 a 7
# 2 a 8
# 3 a 9
# 4 a 10
# 5 a 11
# 6 i 8
# 7 i 9
# 8 i 10
# 9 i 11
#10 i 12
# ... with 20 more rows
Expand number range to the individual numbers
I have a data.table
solution in mind.
I made the hypothesis that your label
var is unique by observation. Otherwise, you should use a row number to group your data.
library(data.table)
df <- data.frame(start = c(10, 20), end = c(15,33), label = c('ex1','ex2'))
setDT(df)
df[, seq(.SD[['start']], .SD[['end']]), by = label]
label V1
1: ex1 10
2: ex1 11
3: ex1 12
4: ex1 13
5: ex1 14
6: ex1 15
7: ex2 20
8: ex2 21
9: ex2 22
10: ex2 23
11: ex2 24
12: ex2 25
13: ex2 26
14: ex2 27
15: ex2 28
16: ex2 29
17: ex2 30
18: ex2 31
19: ex2 32
20: ex2 33
In terms of efficiency, it might be hard to find a solution faster than data.table
that is designed to that end.
If you can't use label
as a unique identifier, you can do
df[,'rn' := seq(.N)]
df[, seq(.SD[['start']], .SD[['end']]), by = c('rn','label')]
rn label V1
1: 1 ex1 10
2: 1 ex1 11
3: 1 ex1 12
4: 1 ex1 13
5: 1 ex1 14
6: 1 ex1 15
7: 2 ex2 20
8: 2 ex2 21
9: 2 ex2 22
10: 2 ex2 23
11: 2 ex2 24
12: 2 ex2 25
13: 2 ex2 26
14: 2 ex2 27
15: 2 ex2 28
16: 2 ex2 29
17: 2 ex2 30
18: 2 ex2 31
19: 2 ex2 32
20: 2 ex2 33
and you can drop the intermediate row number using df[,'rn' := NULL]
Efficiency
data.table
brings a good speedup (does not matter that much if you use one or two columns to group in this example)
Unit: microseconds
expr min lq mean median uq
df %>% rowwise() %>% do(f(.)) 1549.408 1808.669 2309.332 2292.525 2555.888
df[, seq(.SD[["start"]], .SD[["end"]]), by = "label"] 1011.608 1302.249 1555.808 1490.542 1779.543
df[, seq(.SD[["start"]], .SD[["end"]]), by = c("label", "rn")] 968.124 1095.703 1387.556 1253.023 1592.483
max neval cld
7141.964 100 b
3061.487 100 a
2953.598 100 a
If you want to go even faster, you can set a key (?setkeyv
). If your dataframe is of significant size, this might bring huge performance gains (in this small example it won't)
Expand range of dates by another column by inserting rows in R
Here's a very pedestrian way of doing it:
do.call(rbind, lapply(split(df, seq_along(df$idnum)), function(x) {
if(x$between[1] == x$end[1]) return(x)
x <- x[c(1, 1),]
x$end[1] <- x$between[1]
x$start[2] <- x$between[1] + 1
x$between[2] <- NA
x}))
#> idnum var start end between
#> 1.1 17 A 1993-03-01 1993-03-01 1993-03-01
#> 1.1.1 17 A 1993-03-02 1993-03-12 <NA>
#> 2.2 17 B 1993-01-02 1993-04-03 1993-04-03
#> 2.2.1 17 B 1993-04-04 1993-04-09 <NA>
#> 3 20 A 1993-02-01 1993-02-01 1993-02-01
#> 4.4 21 C 1993-05-09 1993-07-10 1993-07-10
#> 4.4.1 21 C 1993-07-11 1993-07-12 <NA>
Created on 2020-07-26 by the reprex package (v0.3.0)
Expand data set to fill in with sequential values in R
We can get the rowwise sequence from 'Score2_Min' to 'Score2_Max' with map2
in a list
column and then unnest
the list
column
library(dplyr)
library(tidyr)
library(purrr)
data %>%
transmute(Score1, Score2 = map2(Score2_Min, Score2_Max, `:`)) %>%
unnest(Score2)
# A tibble: 14 x 2
# Score1 Score2
# <dbl> <int>
# 1 286 108
# 2 286 109
# 3 286 110
# 4 286 111
# 5 287 112
# 6 287 113
# 7 288 112
# 8 288 113
# 9 289 112
#10 289 113
#11 290 112
#12 290 113
#13 291 112
#14 291 113
Split a column consisting of number range and use the resulting numbers as range values in R
We can split the 'Speed' into two column with separate
, then create a sequence list
column based on the values of 'start', 'end' and unnest
the list
column
library(dplyr)
library(tidyr)
library(purrr)
df1 %>%
separate(Speed, into = c('start', 'end'), remove = FALSE, convert = TRUE) %>%
mutate(AcutalSpeed = map2(start, end, `:`), start = NULL, end = NULL) %>%
unnest(c(AcutalSpeed))
# A tibble: 101 x 3
# Speed SpeedLevel AcutalSpeed
# <chr> <dbl> <int>
# 1 0-20 1 0
# 2 0-20 1 1
# 3 0-20 1 2
# 4 0-20 1 3
# 5 0-20 1 4
# 6 0-20 1 5
# 7 0-20 1 6
# 8 0-20 1 7
# 9 0-20 1 8
#10 0-20 1 9
# … with 91 more rows
Related Topics
How to Sort a Data Frame by Alphabetic Order of a Character Variable in R
Fitting a Linear Model With Multiple Lhs
Extracting Specific Columns from a Data Frame
What Exactly Is Copy-On-Modify Semantics in R, and Where Is the Canonical Source
Create Counter With Multiple Variables
How to Drop Columns by Name in a Data Frame
Filtering a Data Frame by Values in a Column
How to Convert a List Consisting of Vector of Different Lengths to a Usable Data Frame in R
Fastest Way to Replace Nas in a Large Data.Table
Convert Row Names into First Column
Removing Duplicate Combinations (Irrespective of Order)
R Reshape Data Frame from Long to Wide Format
How to Plot All the Columns of a Data Frame in R