R - Getting Characters After Symbol

Extract text after a symbol in R

x <- c('>>xyz>>hello>>mate 1', '>>xyz>>hello>>mate 2', '>>xyz>>mate 3', ' >>xyz>>mate 4' ,'>>xyz>>hello>>mate 5')
sub('.*>>', '', x)
#[1] "mate 1" "mate 2" "mate 3" "mate 4" "mate 5"

Get the characters after a certain pattern in R - regex

You may use

df <- data.frame(cat = c("c(\\\"BPT\\\", \"BP\")", "c(\"BP2\", \"BP\")", "c(\"BPT\", \"BP\")", "c(\"CN\", \"NC\")"))
df$cat <- as.character(df$cat)
unlist(lapply(gsub('\\', '', df$cat, fixed=TRUE), function(x) eval(parse(text=x))[[1]]))
## => [1] "BPT" "BP2" "BPT" "CN"

See the R demo online.

Notes

  • gsub('\\', '', df$cat, fixed=TRUE) removes all backslashes. You may use gsub('\\\"', '"', df$cat, fixed=TRUE) if you only plan to remove backslashes before ".
  • eval(parse(text=x))[[1]] parses the vector and returns the first item
  • lapply helps traverse the whole data you have. See Using sapply and lapply.

How to extract everything after a specific string?

With str_extract. \\b is a zero-length token that matches a word-boundary. This includes any non-word characters:

library(stringr)
str_extract(test, '\\b\\w+$')
# [1] "Pomme" "Poire" "Fraise"

We can also use a back reference with sub. \\1 refers to string matched by the first capture group (.+), which is any character one or more times following a - at the end:

sub('.+-(.+)', '\\1', test)
# [1] "Pomme" "Poire" "Fraise"

This also works with str_replace if that is already loaded:

library(stringr)
str_replace(test, '.+-(.+)', '\\1')
# [1] "Pomme" "Poire" "Fraise"

Third option would be using strsplit and extract the second word from each element of the list (similar to word from @akrun's answer):

sapply(strsplit(test, '-'), `[`, 2)
# [1] "Pomme" "Poire" "Fraise"

stringr also has str_split variant to this:

str_split(test, '-', simplify = TRUE)[,2]
# [1] "Pomme" "Poire" "Fraise"

R get rid of string before/after special characters (pipe and ) using regex

You can extract text between > and |. Special characters can be escaped with \\.

sub('>(.*)\\|.*', '\\1', test)
#[1] "P01923" "P19405orf"

R Returning all characters after the first underscore

In the pattern, we can change the zero or more any characters (.* - here . is metacharacter that can match any character) to zero or more characters that is not a _ ([^_]*) from the start (^) of the string.

sub("^[^_]*_", "", x)
#[1] "binloop_v6" "binloopv2"

If we don't specify it as such, the _ will match till the last _ in the string and uptill that substring will be lost returning 'v6' and 'binloopv2'


An easier option would be word from stringr

library(stringr)
word(x, 2, sep = "_")
#[1] "binloop" "binloopv2"

Characters before/after a symbol

It could be that this suffices:

unlist(strsplit("xxx, yyy. zzz","[,.]"))[2] # get yyy with space, or:
gsub(" ","",unlist(strsplit("xxx, yyy. zzz","[,.]")))[2] # remove space

Extract characters after the last appearance of a certain symbol in a vector

A possible solution, using stringr::str_extract:

  • $ means the end of the string.
  • \\d+ means one or more numeric digit.
  • (?<=\\.) looks behind, to check whether behind the numeric digit there is a dot.

You can learn more at: Lookahead and lookbehind regex tutorial

library(stringr)

x <- c("1.22.33.444","11.22.333.4","1e.3e.3444.45", "g.78.in.89")

str_extract(x, "(?<=\\.)\\d+$")

#> [1] "444" "4" "45" "89"

Extract digits and next string after from a character vector in R

Use the pattern to match one or more digits (\\d+) followed by one or more spaces (\\s+) and word (\\w+)

library(stringr)
str_extract_all(my_text, "\\d+\\s+\\w+")[[1]]


Related Topics



Leave a reply



Submit