How to sort a data frame by alphabetic order of a character variable in R?
Well, I've got no problem here :
df <- data.frame(v=1:5, x=sample(LETTERS[1:5],5))
df
# v x
# 1 1 D
# 2 2 A
# 3 3 B
# 4 4 C
# 5 5 E
df <- df[order(df$x),]
df
# v x
# 2 2 A
# 3 3 B
# 4 4 C
# 1 1 D
# 5 5 E
order a data.table based on a character column with a specific (not alphabetical) order in mind
One possibility is to join on the preferred order:
DT[preferred.order, on="x"]
x y
1: b 2
2: a 3
3: c 1
Note that this requires the preferred.order vector contains all elements in DT$x
and has no duplicates.
As an alternative, you could create a factor variable of DT$x
with the preferred ordering and then use setorder
to order DT by reference.
DT[, xFac := factor(x, levels=preferred.order)]
setorder(DT, xFac)
which returns
DT
x y xFac
1: b 2 b
2: a 3 a
3: c 1 c
Which method is preferable will vary on the use-case.
Sort column in R: strings first (alphabetically), then numbers (numerically)
For a base R option:
df <- data.frame(Col2=c("100", "B", "A", "Z", "10", "4"), stringsAsFactors=FALSE)
df[order(grepl("^\\d+$", df$Col2), sprintf("%10s", df$Col2)), ]
[1] "A" "B" "Z" "4" "10" "100"
The two sorting levels here first place letters before numbers. The second sorting level left pads everything to 10 characters with zeroes. Then it sorts ascending. This is effectively an ascending numeric sort for the numbers. The trick here is to realize that number strings actually do sort correctly as text if they all have the same width.
dply: order columns alphabetically in R
Try this
df %>% select(noquote(order(colnames(df))))
or just
df[,order(colnames(df))]
Update Dec 2021
New versions of dplyr
(>= 1.0.7) work without the noquote
:
df %>% select(order(colnames(df)))
Order data frame rows according to vector with specific order
Try match
:
df <- data.frame(name=letters[1:4], value=c(rep(TRUE, 2), rep(FALSE, 2)))
target <- c("b", "c", "a", "d")
df[match(target, df$name),]
name value
2 b TRUE
3 c FALSE
1 a TRUE
4 d FALSE
It will work as long as your target
contains exactly the same elements as df$name
, and neither contain duplicate values.
From ?match
:
match returns a vector of the positions of (first) matches of its first argument
in its second.
Therefore match
finds the row numbers that matches target
's elements, and then we return df
in that order.
How can I sort a dataframe by a predetermined order of factor levels in R?
We can specify the levels
of the 'group' as category_order
and that use that to `arrange
library(dplyr)
df1 <- df %>%
arrange(factor(group, levels = category_order))
df1
# group value
#1 tree 50
#2 house 2
#3 lake 1
#4 human 5
Or using fct_relevel
library(forcats)
df %>%
arrange(fct_relevel(group, category_order))
Sorting multiple columns by first letter and by numbers in R
Another method using dplyr
:
library(dplyr)
arrange(df, sub('_.+$', '', item), mean)
an alternative would be to use str_extract
from stringr
to extract only the first letter from item
:
library(stringr)
arrange(df, str_extract(item, '^._'), mean)
Result:
item mean
1 a_c 2
2 a_a 4
3 a_b 5
4 b_e 1
5 b_f 3
6 b_d 7
Data:
df <- structure(list(item = c("a_b", "a_c", "a_a", "b_d", "b_f", "b_e"
), mean = c(5L, 2L, 4L, 7L, 3L, 1L)), .Names = c("item", "mean"
), class = "data.frame", row.names = c(NA, -6L))
Notes:
sub('_.+$', '', item)
creates a temporary variable by removing_
and everything after that fromitem
._.+$
matches a literal underscore (_
) followed by any character one or more times (.+
) at the end of the string ($
).str_extract(item, '^._')
creates a temporary variable by extracting any one character (.
) followed by a literal underscore (_
) in the beginning of the string (^
)The neat thing about
dplyr::arrange
is that you can create a temporary sorting variable within the function and not have it included in the output.
Related Topics
Remove Specific Characters from Column Names in R
R: Error in Usemethod("Tbl_Vars")
Subtracting Two Columns to Give a New Column in R
Add Row to a Data Frame With Total Sum for Each Column
How to Select the Rows With Maximum Values in Each Group With Dplyr
Data.Table VS Dplyr: Can One Do Something Well the Other Can't or Does Poorly
How to Combine Multiple Conditions to Subset a Data-Frame Using "Or"
Plot Multiple Boxplot in One Graph
Selecting Multiple Odd or Even Columns/Rows for Dataframe
Regex to Replace Comma to Dot Separator
Remove Unwanted Symbols from Expression Function - R
How to Generate a Histogram for Each Column of My Table
Plotting Two Variables as Lines Using Ggplot2 on the Same Graph
Summarizing Multiple Columns With Dplyr