Replacing tied rank by their average
We can just use rank
from base R
. The default method for ties.method
is "average"
x$freq.Freq <- rank(-x$freq.Freq)
x$freq.Freq
#[1] 1.0 2.5 2.5 4.0 6.0 6.0 6.0 8.0 9.0
How to get ranks with no gaps when there are ties among values?
I can think of a quick function to do this. It's not optimal with a for loop but it works:)
x=c(1,1,2,3,4,5,8,8)
foo <- function(x){
su=sort(unique(x))
for (i in 1:length(su)) x[x==su[i]] = i
return(x)
}
foo(x)
[1] 1 1 2 3 4 5 6 6
rank and order in R
set.seed(1)
x <- sample(1:50, 30)
x
# [1] 14 19 28 43 10 41 42 29 27 3 9 7 44 15 48 18 25 33 13 34 47 39 49 4 30 46 1 40 20 8
rank(x)
# [1] 9 12 16 25 7 23 24 17 15 2 6 4 26 10 29 11 14 19 8 20 28 21 30 3 18 27 1 22 13 5
order(x)
# [1] 27 10 24 12 30 11 5 19 1 14 16 2 29 17 9 3 8 25 18 20 22 28 6 7 4 13 26 21 15 23
rank
returns a vector with the "rank" of each value. the number in the first position is the 9th lowest. order
returns the indices that would put the initial vector x
in order.
The 27th value of x
is the lowest, so 27
is the first element of order(x)
- and if you look at rank(x)
, the 27th element is 1
.
x[order(x)]
# [1] 1 3 4 7 8 9 10 13 14 15 18 19 20 25 27 28 29 30 33 34 39 40 41 42 43 44 46 47 48 49
Efficient method to calculate the rank vector of a list in Python
Using scipy, the function you are looking for is scipy.stats.rankdata
:
In [13]: import scipy.stats as ss
In [19]: ss.rankdata([3, 1, 4, 15, 92])
Out[19]: array([ 2., 1., 3., 4., 5.])
In [20]: ss.rankdata([1, 2, 3, 3, 3, 4, 5])
Out[20]: array([ 1., 2., 4., 4., 4., 6., 7.])
The ranks start at 1, rather than 0 (as in your example), but then again, that's the way R
's rank
function works as well.
Here is a pure-python equivalent of scipy
's rankdata function:
def rank_simple(vector):
return sorted(range(len(vector)), key=vector.__getitem__)
def rankdata(a):
n = len(a)
ivec=rank_simple(a)
svec=[a[rank] for rank in ivec]
sumranks = 0
dupcount = 0
newarray = [0]*n
for i in xrange(n):
sumranks += i
dupcount += 1
if i==n-1 or svec[i] != svec[i+1]:
averank = sumranks / float(dupcount) + 1
for j in xrange(i-dupcount+1,i+1):
newarray[ivec[j]] = averank
sumranks = 0
dupcount = 0
return newarray
print(rankdata([3, 1, 4, 15, 92]))
# [2.0, 1.0, 3.0, 4.0, 5.0]
print(rankdata([1, 2, 3, 3, 3, 4, 5]))
# [1.0, 2.0, 4.0, 4.0, 4.0, 6.0, 7.0]
create a mean rank for a rank-frequency data.frame by R
sure, just group by frequency
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
dt <-data.frame(frequency=c(64,58,54,32,29,29,25,17,17,15,12,12,10))
dt %>% arrange(desc(frequency))%>%
mutate(rank = row_number()) %>%
group_by(frequency) %>%
mutate(mean_rank = mean(rank)) %>%
ungroup()
#> # A tibble: 13 × 3
#> frequency rank mean_rank
#> <dbl> <int> <dbl>
#> 1 64 1 1
#> 2 58 2 2
#> 3 54 3 3
#> 4 32 4 4
#> 5 29 5 5.5
#> 6 29 6 5.5
#> 7 25 7 7
#> 8 17 8 8.5
#> 9 17 9 8.5
#> 10 15 10 10
#> 11 12 11 11.5
#> 12 12 12 11.5
#> 13 10 13 13
R: Rank-function with two variables and ties.method random
Since order(order(x))
gives the same result as rank(x)
(see Why does order(order(x)) equal rank(x) in R?), you could just do
order(order(y, z, runif(length(y))))
to get the rank values.
Here's a more involved approach that allows you to use methods from ties.method
. It requires dplyr
:
library(dplyr)
rank2 <- function(df, key1, key2, ties.method) {
average <- function(x) mean(x)
random <- function(x) sample(x, length(x))
df$r <- order(order(df[[key1]], df[[key2]]))
group_by_(df, key1, key2) %>% mutate(rr = get(ties.method)(r))
}
rank2(df, "y", "z", "average")
# Source: local data frame [10 x 5]
# Groups: y, z [8]
# x y z r rr
# <dbl> <dbl> <dbl> <int> <dbl>
# 1 1 1 0.2 1 1.0
# 2 2 4 0.8 6 6.0
# 3 3 5 0.5 8 8.0
# 4 4 5 0.4 7 7.0
# 5 5 2 0.2 3 3.0
# 6 6 8 0.1 9 9.5
# 7 7 8 0.1 10 9.5
# 8 8 1 0.7 2 2.0
# 9 9 3 0.3 4 4.5
# 10 10 3 0.3 5 4.5
Create ranking for vector of double
One way to do so would be using a multimap
.
Place the items in a multimap mapping your objects to
size_t
s (the intial values are unimportant). You can do this with one line (use the ctor that takes iterators).Loop (either plainly or using whatever from
algorithm
) and assign 0, 1, ... as the values.Loop over the distinct keys. For each distinct key, call
equal_range
for the key, and set its values to the average (again, you can use stuff fromalgorithm
for this).
The overall complexity should be Theta(n log(n)), where n is the length of the vector.
replace subset of vector values with subset average
This is my attempt. I first calculate the average rank, then split the subjects of the same rank into rows.
library(tidyverse)
options(stringsAsFactors = FALSE)
subj <- c("A", "B", "C,D,E", "C,D,E", "C,D,E", "F", "G,H", "G,H", "I")
rank <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
df <- data.frame(rank, subj)
df %>%
group_by(subj) %>%
summarise(rank = mean(rank)) %>%
rowwise() %>%
do(tibble(subj = unlist(strsplit(.$subj, ",")), rank = .$rank)) %>%
ungroup()
Output:
# A tibble: 9 × 2
subj rank
* <chr> <dbl>
1 A 1.0
2 B 2.0
3 C 4.0
4 D 4.0
5 E 4.0
6 F 6.0
7 G 7.5
8 H 7.5
9 I 9.0
Another approach:
m <- aggregate(rank~subj, data=df, mean)
m <- apply(m, 1, function(x) data.frame(subj = unlist(strsplit(x[1], ",")), rank = x[2]))
m <- do.call(rbind, m)
rownames(m) <- NULL
m
Output:
subj rank
1 A 1.0
2 B 2.0
3 C 4.0
4 D 4.0
5 E 4.0
6 F 6.0
7 G 7.5
8 H 7.5
9 I 9.0
Related Topics
R Partial Reshape Data from Long to Wide
Keep Before and After Date of an External List
Change Level of Multiple Factor Variables
Using If Else Conditions on Vectors
R Ggplot2 Add Today's Date to the Title
Running Multiple Linear Regressions Across Several Columns of a Data Frame in R
Scale_Color_Manual Colors Won't Change
Trying to Find Row Associated with Max Value in Dataframe R
Difference Between Mean(C(1,2,21)) and Mean(1,2,21)
How to Plot Multiple Residuals Plots in a Loop
Ggplot Aes_String Does Not Work Inside a Function
How to Manually Set Colors in a Bar Chart
Azure Put Blob Authentication Fails in R
Writing R Function with If Enviornment