Oracle SQL: Extracting text between two characters
We can use REGEXP_SUBSTR
here:
SELECT regexp_substr('{ABCDE}{F1}', '\{([^}]+)\}', 1,1,NULL,1) AS output
FROM dual
Demo
This is a less commonly seen use of REGEXP_SUBSTR
, which uses a capture group, per this pattern:
\{([^}]+)\}
The sixth parameter says to return the first capture group.
Oracle Query to select only the text between two '-' characters
Since INSTR function allows to set what occurrence of any character you want, you can use this to identify the pieces to erase based on the "-" character:
select trim(
replace(
replace(
description,
SUBSTR(description, 1, INSTR(description, '-',1,1)), -- <-- first occurrence of "-"
null),
SUBSTR(description, INSTR(description, '-',1,2), -- <-- second occurrence of "-"
length(description)),
null)
) as result
from all_cases;
Assuming that all your rows follow the same pattern, update all in one call would look something like this:
update all_cases
set description = trim(
replace(
replace(description,
SUBSTR(description, 1, INSTR(description, '-',1,1)),
null),
SUBSTR(description, INSTR(description, '-',1,2), length(description)),
null)
);
Which acts just identifying the left and right pieces based in the "-" character and replacing them with null.
Oracle SQL: Extracting multiple text between two characters
You can use the 4th argument of REGEXP_SUBSTR
to specify an occurrence for matching.
To get a row for the 1st, 2nd, and 3rd occurrence, you can cross-join with a sub-query from dual.
WITH test_data AS (
SELECT 1 AS content_id, '<TITLE> <SUB-TITLE-1> Content<SUB-TITLE-2>Content.<A third sub-title>' AS content_data FROM dual UNION
SELECT 2 AS content_id, '<TITLE> <SUB-TITLE-1> Content<SUB-TITLE-2>Content.' AS content_data FROM dual
)
SELECT t.content_id,
REGEXP_SUBSTR(t.content_data, '<(.*?)>', 1, s.match_occurrence, 'i', 1) AS content_match
FROM test_data t
CROSS JOIN (
SELECT 1 AS match_occurrence FROM dual UNION
SELECT 2 AS match_occurrence FROM dual UNION
SELECT 3 AS match_occurrence FROM dual UNION
SELECT 4 AS match_occurrence FROM dual
/* ... etc, with the number of rows equal to the maximum number of matches that can appear */
) s
WHERE REGEXP_SUBSTR(t.content_data, '<.*?>', 1, s.match_occurrence) IS NOT NULL /* Only return records that have a match for the given occurrence */
ORDER BY t.content_id, s.match_occurrence
Borrowing the CONNECT_BY_LEVEL
from Barbaros' excellent answer, you could do it more concisely as:
WITH test_data AS (
SELECT 1 AS content_id, '<TITLE> <SUB-TITLE-1> Content<SUB-TITLE-2>Content.<A third sub-title>' AS content_data FROM dual UNION
SELECT 2 AS content_id, '<TITLE> <SUB-TITLE-1> Content<SUB-TITLE-2>Content.' AS content_data FROM dual
)
SELECT t.content_id,
REGEXP_SUBSTR(t.content_data, '<(.*?)>', 1, LEVEL, 'i', 1) AS content_match
FROM test_data t
CONNECT BY
LEVEL <= REGEXP_COUNT(t.content_data, '<.*?>')
AND PRIOR sys_guid() IS NOT NULL
AND PRIOR content_id = content_id
ORDER BY t.content_id, LEVEL
Note that the CONNECT_BY_LEVEL
method might be slower on large datasets, so I would avoid that if performance is a concern.
how to extract string between 2 strings
How about this? I added another value into the table, that doesn't contain the "Reasons removed" string.
Basically, it selects everything between "Reasons added" and "Reasons removed" if it exists; if not, takes everything from "Reasons added" to the end. CASE
takes care about that.
SQL> select * from test;
COL
----------------------------------------------------------------------------------------------
"Change step forward
Note added
Step changed from [OFF] to [ON]
Reasons added: test1 (Type), some test 2 (Type), test3 , sometest4& and5(Type)
Reasons removed: test6- test7(Type)"
"This is text without "Reas. remov."
Note added
Step changed from [OFF] to [ON]
Reasons added: test1 (Type), some test 2 (Type), test3 , and this would be the end"
SQL> select trim(substr(col,
2 instr(col, 'Reasons added') + 14,
3 --
4 case when instr(col, 'Reasons removed') > 0 then
5 instr(col, 'Reasons removed') - instr(col, 'Reasons added') - 14
6 else
7 length(col)
8 end)) result
9 from test;
RESULT
--------------------------------------------------------------------------------
test1 (Type), some test 2 (Type), test3 , sometest4& and5(Type)
test1 (Type), some test 2 (Type), test3 , and this would be the end"
SQL>
How do I extract data between two strings based on a pattern in Oracle SQL
Here is how I would do this. Note a couple of things:
- The output preserves newlines that existed in the input. You didn't
say anything about removing them; however, your output doesn't show
them. In any case - they can be removed, if needed, but that is an
unrelated process. - You say "word" but obviously you are using that in a sense different
from the common usage in regular expressions. In regexp, "word
characters" are only letters, digits and underscore; yet your
"words" include brackets, equal sign, and who knows what else. I interpreted the term "word" to mean any
sequence of consecutive non-whitespace characters.
Here is how we can recreate your data. When you ask a question here, this is how you should provide sample data - not as an image that we can't copy and paste in an SQL editor.
CREATE TABLE sample_data( col_a varchar2(20), col_b CLOB );
INSERT INTO sample_data VALUES
('12345', to_clob(
'Created:2/28/2019
Updated:1/19/2021
LIST:[ABC][DEF][GHI]
[LMNO][PQRST]
[Location=BLAH].[City=BLAH]'));
INSERT INTO sample_data VALUES
('12346', to_clob(
'Created:2/28/2019
Updated:1/19/2021
LIST:[ABC][DEF][GHI]
[LMNO][PQRST]
[SOC].[RAW]'));
commit;
Then here is the query and the output. Note that, depending on your interface (in my case: SQL Developer, which uses a SQL*Plus-like interface), you may need to change some settings so that the output is not truncated. In particular, in SQL*Plus, CLOB columns are truncated to 80 characters by default; I had to
set long 100
So - query and output:
select col_a, col_b,
regexp_substr(col_b, '(\s|^)(LIST:[^.]*?)\s+\S+\.', 1, 1, null, 2)
as result
from sample_data
;
COL_A COL_B RESULT
----- ------------------------------ ------------------------------
12345 Created:2/28/2019 LIST:[ABC][DEF][GHI]
Updated:1/19/2021 [LMNO][PQRST]
LIST:[ABC][DEF][GHI]
[LMNO][PQRST]
[Location=BLAH].[City=BLAH]
12346 Created:2/28/2019 LIST:[ABC][DEF][GHI]
Updated:1/19/2021 [LMNO][PQRST]
LIST:[ABC][DEF][GHI]
[LMNO][PQRST]
[SOC].[RAW]
The regular expression matches a single whitespace character or the beginning of the string ((\s|^)
), then the characters LIST:
followed by as few consecutive, non-period characters (this will match spaces and newline characters, in particular) as needed to allow a match - which continues with one or more whitespace characters, followed by a single word (string of 1 or more non-whitespace characters) and a literal period (\.
).
The expression we must return is enclosed in parentheses, so that we can return it from regexp_substr
. Such an expression is called a "capture group". The regexp includes another capture group, (\s|^)
, out of necessity (alternation), so the capture group we must return is the second in the regexp. This is what the last argument to regexp_substr
does: it instructs the function to return the second capture group.
Note a peculiar thing about the period (related to the much more general concept of escaping within bracket expressions): the period must be escaped to represent a literal period, rather than "any character", at the end of the regular expression; however, within the (negated) bracket expression [^.]*?
, the period - representing a literal period, not "any character" - is not escaped. Oracle follows the ERE (extended regular expressions) dialect of the POSIX standard, and that standard says that escape sequences are invalid within bracket expressions. This is different from other regular expression dialect, and confuses a lot of users.
Oracle SQL: Getting text between two sentences
A straight solution is to use substr function in conjunction with instr function. Or DBMS_LOB.instr and substr if the datatype of the column is a LOB.
documentation for dbms_lob.instr
for example:
select
substr(
mycolumn,
instr(mycolumn,'Pd to LOOK:'), --starting position
instr(mycolumn,'Pd to SEE (swap):') - instr(mycolumn,'Pd to LOOK:'), --amount to get
) as "Pd to LOOK:"
--, other expressions/columns
from
my_table;
Related Topics
How to Concatenate Many Rows With Same Id in SQL
Sqlcmd Not Able to Find a Library (Libmsodbcsql-17.0.So.1.1) That Is There
How to Check If a SQL Server String Is Null or Empty
Postgresql Error: Relation Already Exists
Using Node.Js (Express) and MySQL to Insert for a Timestamp
Two Rows With the Same Id and Two Different Values, Getting the Second Value into Another Column
Splitting String into Multiple Rows in Oracle
Mysql Query - Records Between Today and Last 30 Days
Sql Average from Multiple Columns
Sql: Update Column With Increment Numbers Based on 2 Columns
How to Convert This SQL Select to Linq Query
Sql Server Function to Return Minimum Date (January 1, 1753)
Select Only Rows With Max Date
How to Close Idle Connections in Postgresql Automatically
Left Join Without Duplicate Rows from Left Table