Drop unused factor levels in a subsetted data frame
All you should have to do is to apply factor() to your variable again after subsetting:
> subdf$letters
[1] a b c
Levels: a b c d e
subdf$letters <- factor(subdf$letters)
> subdf$letters
[1] a b c
Levels: a b c
EDIT
From the factor page example:
factor(ff) # drops the levels that do not occur
For dropping levels from all factor columns in a dataframe, you can use:
subdf <- subset(df, numbers <= 3)
subdf[] <- lapply(subdf, function(x) if(is.factor(x)) factor(x) else x)
How can I drop unused levels from a data frame?
There's a recently added function in R for this:
y <- droplevels(y)
Override [.data.frame to drop unused factor levels by default
I'd be really wary of changing the default behavior; you never know when another function you use depends on the usual default behavior. I'd instead write a similar function to your subsetDrop
but for [
, like
sel <- function(x, ...) droplevels(x[...])
Then
> d <- data.frame(a=factor(LETTERS[1:5]), b=factor(letters[1:5]))
> str(d[1:2,])
'data.frame': 2 obs. of 2 variables:
$ a: Factor w/ 5 levels "A","B","C","D",..: 1 2
$ b: Factor w/ 5 levels "a","b","c","d",..: 1 2
> str(sel(d,1:2,))
'data.frame': 2 obs. of 2 variables:
$ a: Factor w/ 2 levels "A","B": 1 2
$ b: Factor w/ 2 levels "a","b": 1 2
If you really want to change the default, you could do something like
foo <- `[.data.frame`
`[.data.frame` <- function(...) droplevels(foo(...))
but make sure you know how namespaces work as this will work for anything called from the global namespace but the version in the base namespace is unchanged. Which might be a good thing, but it's something you want to make sure you understand. After this change the output is as you'd like.
> str(d[1:2,])
'data.frame': 2 obs. of 2 variables:
$ a: Factor w/ 2 levels "A","B": 1 2
$ b: Factor w/ 2 levels "a","b": 1 2
Dropping unused factor levels in data.table
We can use .SDcols
to specify the columns of interest. It can take a vector of columns names (length of 1 or greater than 1) or column index. Now, the .SD
i.e. Subset of Data.table would have those columns specified in the .SDcols
. As there is only a single column, extract that column with [[
, apply the droplevels
on the vector
and assign (:=
) it back to the column of interest. Not the parens around the object identifier v1. It is to evaluate the object to get the value in it instead of creating a column 'v1'
x[, (v1) := droplevels(.SD[[1]]), .SDcols = v1]
Usually, the syntax would be
x[, (v1) := lapply(.SD, droplevels), .SDcols = v1]
It can take one column or multiple columns. The only reason to extract ([[
) is because we know it is a single column
Another option is get
x[, (v1) := droplevels(get(v1))]
where,
v1 <- "y"
Subsetting a data.frame based on factor levels in a second data.frame
df.1[,unique(df.2$Var[which(df.2$Info=="X1")])]
A C
1 0.8924861 0.7149490854
2 0.5711894 0.7200819517
3 0.7049629 0.0004052017
4 0.9188677 0.5007302717
5 0.3440664 0.9138259818
6 0.8657903 0.2724015017
7 0.7631228 0.5686033906
8 0.8388003 0.7377064163
9 0.0796059 0.6196693045
10 0.5029824 0.8717568610
Change factor levels and rearrange dataframe
This mistakes is easy to make. You have to supply the column vector to fct_relevel
. Like so:
library(dplyr,warn.conflicts = F)
library(forcats)
df <-
structure(
list(layer = structure(
1:5,
.Label = c(
'CEOS and managers',
'Clerks and services',
'Production',
'Professionals',
'Technicians'
),
class = 'factor'
)),
row.names = c(NA,-5L),
class = c('tbl_df', 'tbl', 'data.frame')
)
df %>%
mutate(layer = forcats::fct_relevel(
layer,c(
'CEOS and managers',
'Professionals',
'Technicians',
'Clerks and services',
'Production'))) %>%
arrange(layer)
#> # A tibble: 5 x 1
#> layer
#> <fct>
#> 1 CEOS and managers
#> 2 Professionals
#> 3 Technicians
#> 4 Clerks and services
#> 5 Production
Created on 2021-01-11 by the reprex package (v0.3.0)
Related Topics
How to Remove Na from a Factor Variable (And from a Ggplot Chart)
Concatenate String Columns and Order in Alphabetical Order
Combing a Categorical Variable to Create a New Categorical Variable in R
How to Fix Spaces in Column Names of a Data.Frame (Remove Spaces, Inject Dots)
Collapse/Concatenate/Aggregate a Column to a Single Comma Separated String Within Each Group
Add Regression Line Equation and R^2 on Graph
How to Read Data When Some Numbers Contain Commas as Thousand Separator
Fastest Way to Replace Nas in a Large Data.Table
Plot Multiple Boxplot in One Graph
Axis Labels on Two Lines With Nested X Variables (Year Below Months)
Remove Last N Rows in Data Frame With the Arbitrary Number of Rows
Subtracting Two Columns to Give a New Column in R
Removing Space Between Numeric Values in R
How to Convert a Factor to Integer\Numeric Without Loss of Information
Reshape Three Column Data Frame to Matrix ("Long" to "Wide" Format)