Split a string with delimiters but keep the delimiters in the result in C#
If you want the delimiter to be its "own split", you can use Regex.Split e.g.:
string input = "plum-pear";
string pattern = "(-)";
string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'
So if you are looking for splitting a mathematical formula, you can use the following Regex
@"([*()\^\/]|(?<!E)[\+\-])"
This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02
So:
Regex.Split("10E-02*x+sin(x)^2", @"([*()\^\/]|(?<!E)[\+\-])")
Yields:
10E-02
*
x
+
sin
(
x
)
^
2
How to split a string, but also keep the delimiters?
You can use lookahead and lookbehind, which are features of regular expressions.
System.out.println(Arrays.toString("a;b;c;d".split("(?<=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("(?=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("((?<=;)|(?=;))")));
And you will get:
[a;, b;, c;, d]
[a, ;b, ;c, ;d]
[a, ;, b, ;, c, ;, d]
The last one is what you want.
((?<=;)|(?=;))
equals to select an empty character before ;
or after ;
.
EDIT: Fabian Steeg's comments on readability is valid. Readability is always a problem with regular expressions. One thing I do to make regular expressions more readable is to create a variable, the name of which represents what the regular expression does. You can even put placeholders (e.g. %1$s
) and use Java's String.format
to replace the placeholders with the actual string you need to use; for example:
static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";
public void someMethod() {
final String[] aEach = "a;b;c;d".split(String.format(WITH_DELIMITER, ";"));
...
}
split string with two delimiters and keep order and delimiter
I guess you could use preg_replace() to kind of "format" the string first, and insert some item delimiter that's not used elsewhere and so safe to use? In this case I'm using \t as the inserted delimiter.
$formatted_text = preg_replace('/ ?([-*]) /', "\t$1", $text);
$items_with_one_empty_in_front = explode("\t", $formatted_text);
var_dump($items_with_one_empty_in_front);
array(6) {
[0]=>
string(0) ""
[1]=>
string(4) "*aaa"
[2]=>
string(4) "-bbb"
[3]=>
string(4) "-ccc"
[4]=>
string(4) "*ddd"
[5]=>
string(4) "*eee"
}
You could then do like this:
foreach(array_slice($items_with_one_empty_in_front, 1) as $i => $item) {
if ($item[0] == '*') {
echo "$i - Negative: ".substr($item, 1);
}
else if ($item[0] == '-') {
echo "$i - Positive: ".substr($item, 1);
}
}
Version 2:
$parts = explode(" ", $text);
$opwords = [
'*' => 'Negative',
'-' => 'Positive'
];
$i = 1;
while($parts) {
$op = array_shift($parts);
$term = array_shift($parts);
echo $i++ . " - " . $opwords[$op] . ": ". $term . "\n";
}
Split String by Delimiter and Include Delimiter - Common Lisp
The problem is after the end condition of the do* loop. When variable i reaches the end of the string, the do* loop is exited but there is still a current-word which has not been added yet to words. When the end condition is met you need to add x to current-word and then current-word to words, before exiting the loop:
(defun split-string-with-delimiter (string delimiter)
"Splits a string into a list of strings, with the delimiter still
in the resulting list."
(let ((words nil)
(current-word (make-adjustable-string "")))
(do* ((i 0 (+ i 1))
(x (char string i) (char string i)))
((>= (+ i 1) (length string)) (progn (vector-push-extend x current-word) (push current-word words)))
(if (eql delimiter x)
(unless (string= "" current-word)
(push current-word words)
(push (string delimiter) words)
(setf current-word (make-adjustable-string "")))
(vector-push-extend x current-word)))
(nreverse words)))
However, note that this version is still buggy in that if the last character of string is a delimiter, this will be included into the last word, i.e. (split-string-with-delimiter "a.bc.def." #\.) => ("a" "." "bc" "." "def.")
I'll let you add this check.
In any case, you might want to make this more efficient by looking ahead for delimiter and extracting all the characters between the current i and the next delimiter at once as one single substring.
Split String into Array keeping delimiter/separator in Swift
Suppose you are splitting the string by a separator called separator
, you can do the following:
let result = yourString.components(separatedBy: separator) // first split
.flatMap { [$0, separator] } // add the separator after each split
.dropLast() // remove the last separator added
.filter { $0 != "" } // remove empty strings
For example:
let result = " Hello World ".components(separatedBy: " ").flatMap { [$0, " "] }.dropLast().filter { $0 != "" }
print(result) // [" ", "Hello", " ", "World", " "]
How can I split a string in Java and retain the delimiters?
str.split("(?=[:;])")
This will give you the desired array, only with an empty first item. And:
str.split("(?=\\b[:;])")
This will give the array without the empty first item.
- The key here is the
(?=X)
which is a zero-width positive lookahead (non-capturing construct) (see regex pattern docs). [:;]
means "either ; or :"\b
is word-boundary - it's there in order not to consider the first:
as delimiter (since it is the beginning of the sequence)
Splitting string with character sequence as a delimiter
Faster than using String.split
is Pattern.split
: i.e., precompile the pattern and store that for subsequent use. If you use the same pattern all the time, and do a lot of splitting using that pattern, it may be worth putting that pattern into a static field or something.
Also, if your pattern contains no regex metacharacters, you can pass in Pattern.LITERAL
when creating the pattern. This is something you can't do with String.split
. :-P
string.split but keeping sequential matches
Note: This answer only addresses the parts of the question preceding its "update" paragraph, because it was written before the question was edited.
string.Split
will produce n-1 empty parts for n consecutive separator characters. Since you want it to produce n empty parts instead, you are one tilde short wherever several of them occur consequtively. Add the "missing" tildes as follows before you perform the Split
:
// using System.Text.RegularExpressions;
const string input = "~abc~~~123~~~hijkl~9";
string[] parts = Regex.Replace(input, "~~+", "$0~").Split('~');
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Related Topics
Truncate Two Decimal Places Without Rounding
Asp.Net Core - Swashbuckle Not Creating Swagger.Json File
Regular Expression for Password Validation in C#
Windows Service Cannot Access Network Location (Unc) Path
Console.Writeline Does Not Show Up in Output Window
Ssh.Net Sftp Get a List of Directories and Files Recursively
How to Pass Jquery Variable Value to C# MVC
How to Set the Query Timeout from SQL Connection String
Select Multiple Fields Group by and Sum
Selenium.Webdriver.Chromedriver Slow to Launch - Why
Pass an Array of Integers to ASP.NET Web API
Can Newtonsoft Json.Net Skip Serializing Empty Lists
Disable Required Validation Attribute Under Certain Circumstances
Download File from an ASP.NET Web API Method Using Angularjs
How to Acces an Instance of a Class from Another Class