SQL Server: use all the words of a string as separate LIKE parameters (and all words should match) - string

I have a string containing a certain number of words (it may vary from 1 to many) and I need to find the records of a table which contains ALL those words in any order.
For instances, suppose that my input string is 'yellow blue red' and I have a table with the following records:
1 yellow brown white
2 red blue yellow
3 black blue red
The query should return the record 2.
I know that the basic approach should be something similar to this:
select * from mytable where colors like '%yellow%' and colors like '%blue%' and colors like '%red%'
However I am not being able to figure out how turn the words of the string into separate like parameters.
I have this code that splits the words of the string into a table, but now I am stuck:
DECLARE #mystring varchar(max) = 'yellow blue red';
DECLARE #terms TABLE (term varchar(max));
INSERT INTO #terms
SELECT Split.a.value('.', 'NVARCHAR(MAX)') term FROM (SELECT CAST('<X>'+REPLACE(#mystring, ' ', '</X><X>')+'</X>' AS XML) AS String) AS A CROSS APPLY String.nodes('/X') AS Split(a)
SELECT * FROM #terms
Any idea?

First, put that XML junk in a function:
CREATE FUNCTION dbo.SplitThem
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN ( SELECT Item = y.i.value(N'(./text())[1]', N'nvarchar(4000)')
FROM ( SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i));
Now you can extract the words in the table, join them to the words in the input string, and discard any that don't have the same count:
DECLARE #mystring varchar(max) = 'red yellow blue';
;WITH src AS
(
SELECT t.id, t.colors, fc = f.c, tc = COUNT(t.id)
FROM dbo.mytable AS t
CROSS APPLY dbo.SplitThem(t.colors, ' ') AS s
INNER JOIN (SELECT Item, c = COUNT(*) OVER()
FROM dbo.SplitThem(#mystring, ' ')) AS f
ON s.Item = f.Item
GROUP BY t.id, t.colors, f.c
)
SELECT * FROM src
WHERE fc = tc;
Output:
id
colors
fc
tc
2
red blue yellow
3
3
Example db<>fiddle
This disregards any possibility of duplicates on either side and ignores the larger overarching issue that this is the least optimal way possible to store sets of things. You have a relational database, use it! Surely you don't think the tags on this question are stored somewhere as the literal string
string sql-server-2012 sql-like
Of course not, these question:tag relationships are stored in a, well, relational table. Splitting strings is for the birds and those with all kinds of CPU and time to spare.

If you are storing a delimited list in a single column then you really need to normalize it out into a separate table.
But assuming you actually want to just do multiple free-form LIKE comparisons, you can do them against a list of values:
select *
from mytable t
where not exists (select 1
from (values
('%yellow%'),
('%blue%'),
('%red%')
) v(search)
where t.colors not like v.search
);
Ideally you should pass these values through as a Table Valued Parameter, then you just put that into your query
select *
from mytable t
where not exists (select 1
from #tmp v
where t.colors not like v.search
);
If you want to simulate an OR semantic rather than AND the change not exists to exists and not like to like.

Related

Power Query: How to delete duplicate characters from a string (eg. xzxxxzzzzxzzzzx-> leave only xz)?

I have a huge table in Power Query with text in cells that consist of multiple 'x's and 'z's. I want to deduplicate values so I have one x and one z only.
For example:
xzzzxxxzxz-> xz
zzzzzzzzzz-> z
The table is very big, so I don't want to create additional columns. Can you please help?
You can convert the string to a list of characters, make the list distinct (remove duplicates), sort (if desired), and then transform back to text.
= Table.TransformColumns(#"Previous Step", {{"ColumnName",
each Text.Combine( List.Sort( List.Distinct( Text.ToList(_) ) ) ),
type text}})

Using Large Look up table

Problem Statement :
I have two tables - Data (40 cols) and LookUp(2 cols) . I need to use col10 in data table with lookup table to extract the relevant value.
However I cannot make equi join . I need a join based on like/contains as values in lookup table contain only partial content of value in Data table not complete value. Hence some regex based matching is required.
Data Size :
Data Table : Approx - 2.3 billion entries (1 TB of data)
Look up Table : Approx 1.4 Million entries (50 MB of data)
Approach 1 :
1.Using the Database ( I am using Google Big Query) - A Join based on like take close to 3 hrs , yet it returns no result. I believe Regex based join leads to Cartesian join.
Using Apache Beam/Spark - I tried to construct a Trie for the lookup table which will then be shared/broadcast to worker nodes. However with this approach , I am getting OOM as I am creating too many Strings. I tried increasing memory to 4GB+ per worker node but to no avail.
I am using Trie to extract the longest matching prefix.
I am open to using other technologies like Apache spark , Redis etc.
Do suggest me on how can I go about handling this problem.
This processing needs to performed on a day-to-day basis , hence time and resources both needs to be optimized .
However I cannot make equi join
Below is just to give you an idea to explore for addressing in pure BigQuery your equi join related issue
It is based on an assumption I derived from your comments - and covers use-case when y ou are looking for the longest match from very right to the left - matches in the middle are not qualified
The approach is to revers both url (col10) and shortened_url (col2) fields and then SPLIT() them and UNNEST() with preserving positions
UNNEST(SPLIT(REVERSE(field), '.')) part WITH OFFSET position
With this done, now you can do equi join which potentially can address your issue at some extend.
SO, you JOIN by parts and positions then GROUP BY original url and shortened_url while leaving only those groups HAVING count of matches equal of count of parts in shorteded_url and finally you GROUP BY url and leaving only entry with highest number of matching parts
Hope this can help :o)
This is for BigQuery Standard SQL
#standardSQL
WITH data_table AS (
SELECT 'cn456.abcd.tech.com' url UNION ALL
SELECT 'cn457.abc.tech.com' UNION ALL
SELECT 'cn458.ab.com'
), lookup_table AS (
SELECT 'tech.com' shortened_url, 1 val UNION ALL
SELECT 'abcd.tech.com', 2
), data_table_parts AS (
SELECT url, x, y
FROM data_table, UNNEST(SPLIT(REVERSE(url), '.')) x WITH OFFSET y
), lookup_table_parts AS (
SELECT shortened_url, a, b, val,
ARRAY_LENGTH(SPLIT(REVERSE(shortened_url), '.')) len
FROM lookup_table, UNNEST(SPLIT(REVERSE(shortened_url), '.')) a WITH OFFSET b
)
SELECT url,
ARRAY_AGG(STRUCT(shortened_url, val) ORDER BY weight DESC LIMIT 1)[OFFSET(0)].*
FROM (
SELECT url, shortened_url, COUNT(1) weight, ANY_VALUE(val) val
FROM data_table_parts d
JOIN lookup_table_parts l
ON x = a AND y = b
GROUP BY url, shortened_url
HAVING weight = ANY_VALUE(len)
)
GROUP BY url
with result as
Row url shortened_url val
1 cn457.abc.tech.com tech.com 1
2 cn456.abcd.tech.com abcd.tech.com 2

Table comprehensions: get subset from internal table into another one

As stated in the topic, I want to have a conditioned subset of an internal
table inside another internal table.
Let us first look, what it may look like the old fashioned way.
DATA: lt_hugeresult TYPE tty_mytype,
lt_reducedresult TYPE tty_mytype.
SELECT "whatever" FROM "wherever"
INTO CORRESPONDING FIELDS OF TABLE lt_hugeresult
WHERE "any_wherecondition".
IF sy-subrc = 0.
lt_reducedresult[] = lt_hugeresult[].
DELETE lt_reducedresult WHERE col1 EQ 'a value'
AND col2 NE 'another value'
AND col3 EQ 'third value'.
.
.
.
ENDIF.
We all may know this.
Now I was reading about the table reducing stuff, which is introduced
with abap 7.40, appearently SP8.
Table Comprehensions – Building Tables Functionally
Table-driven:
VALUE tabletype( FOR line IN tab WHERE ( … )
( … line-… … line-… … )
)
For each selected line in the source table(s), construct a line in the result table. Generalization of value constructor from static to dynamic number of lines.
I was experimenting with that, but the results seem not really to fit,
perhaps I am doing it wrong, or I might even need the condition-driven approach.
So, how would it look like, if I want to write the above statement with table comprehension techniques ?
Until now I have this, delivering not that, what I need, and I have seen, that
it seems, as if the "not equal" is not possible...
DATA(reduced) = VALUE tty_mytype( FOR checkline IN lt_hugeresult
WHERE ( col1 = 'a value' )
( col2 = 'another value' )
( col3 = space )
).
Anyone having some hints ?
EDIT: Seems still not to work. Here is, as I do it:
Executable line:
Debugger results:
Wrong Reduced:
And what now ???
You could use the FILTER operator with the EXCEPT WHERE addition to filter out any rows that match the where clause:
lt_reducedresult = FILTER # ( lt_hugeresult EXCEPT WHERE col1 = 'a value'
AND col2 <> 'another value'
AND col3 = 'a third value' ).
Note that lt_hugeresult would have to be a sorted table, and the col1/col2/col3 need to be key components (you can specify a secondary key using the USING KEY addition).
The documentation for FILTER explicitly notes that:
Table filtering can also be performed using a table comprehension or a table reduction with an iteration expression for table iterations with FOR. The operator FILTER provides a shortened format for this special case and is more efficient to execute.
A table filter constructs the result row by row. If the result contains almost all rows in the source table, this method can be slower than copying the source table and deleting the surplus rows from the target table.
So your approach of using DELETE might actually be appropriate depending on the size of the table.
The Table Iterations may be a lot confusing when you use WHERE, because of parenthesis groups.
The "NOT EQUAL" condition is very well supported, as I show below in the solution of your first example. The issue you observe is due to misproper use of parenthesis groups.
You must absolutely define the whole logical expression after WHERE Inside ONE parenthesis group (one, or several elementary conditions separated by logical operators AND, OR, etc.)
After the parenthesis group for WHERE, you define usually only one parenthesis group which corresponds to the line to be added to the target internal table. You may define subsequent parenthesis groups, if for each line in the source internal table, you want to add several lines in the target internal table.
In your example, only the first parenthesis group applies to WHERE (either col1 = 'a value' in your first example, or insplot = _ilnum in your second example).
The subsequent parenthesis groups correspond to the lines to be added, i.e. 2 lines are added for each source line in the first example (one line with col2 = 'another value', and one line with col3 = space), and 3 lines are added for each source line in the second example (one line with inspoper = i_evaluation-inspoper, one line with inspchar = i_evaluation-inspchar, one line corresponding to the line of _single_results).
So, you should write your code as follows.
First example :
DATA(reduced) = VALUE tty_mytype( FOR checkline IN lt_hugeresult
WHERE ( col1 = 'a value'
AND col2 <> 'another value'
AND col3 = 'third value'
)
( checkline )
).
Second example :
DATA(singres) = VALUE tbapi2045d4( FOR checkline IN _single_results
WHERE ( insplot = _ilnum
AND inspoper = i_evaluation-inspoper
AND inspchar = i_evaluation-inspchar
)
( checkline )
).
I compared old-fashioned syntax of your above example with table comprehension technique and got exactly the same result.
Actually, your sample is not functional because it lacks row specification for constructed table reduced.
Try this one, which worked for me.
DATA(reduced) = VALUE tty_mytype( FOR checkline IN lt_hugeresult
WHERE ( col1 = 'a value' AND
col2 = 'another value' AND
col3 = space )
( checkline )
).
In the above sample we have the most basic type of result row specification where is is absolutely similar to source table. More sophisticated examples, where new table rows are evaluated with table iterations, can be found here.

WHERE variable = ( subquery ) in OpenSQL

I'm trying to retrieve rows from a table where a subquery matches an variable. However, it seems as if the WHERE clause only lets me compare fields of the selected tables against a constant, variable or subquery.
I would expect to write something like this:
DATA(lv_expected_lines) = 5.
SELECT partner contract_account
INTO TABLE lt_bp_ca
FROM table1 AS tab1
WHERE lv_expected_lines = (
SELECT COUNT(*)
FROM table2
WHERE partner = tab1~partner
AND contract_account = tab1~contract_account ).
But obviously this select treats my local variable as a field name and it gives me the error "Unknown column name "lv_expected_lines" until runtime, you cannot specify a field list."
But in standard SQL this is perfectly possible:
SELECT PARTNER, CONTRACT_ACCOUNT
FROM TABLE1 AS TAB1
WHERE 5 = (
SELECT COUNT(*)
FROM TABLE2
WHERE PARTNER = TAB1.PARTNER
AND CONTRACT_ACCOUNT = TAB1.CONTRACT_ACCOUNT );
So how can I replicate this logic in RSQL / Open SQL?
If there's no way I'll probably just write native SQL and be done with it.
The program below might lead you to an Open SQL solution. It uses the SAP demo tables to determines the plane types that are used on a specific number of flights.
REPORT zgertest_sub_query.
DATA: lt_planetypes TYPE STANDARD TABLE OF s_planetpp.
PARAMETERS: p_numf TYPE i DEFAULT 62.
START-OF-SELECTION.
SELECT planetype
INTO TABLE lt_planetypes
FROM sflight
GROUP BY planetype
HAVING COUNT( * ) EQ p_numf.
LOOP AT lt_planetypes INTO DATA(planetype).
WRITE: / planetype.
ENDLOOP.
It only works if you don't need to read fields from TAB1. If you do you will have to gather these with other selects while looping at your results.
For those dudes who found this question in 2020 I report that this construction is supported since ABAP 7.50. No workarounds are needed:
SELECT kunnr, vkorg
FROM vbak AS v
WHERE 5 = ( SELECT COUNT(*)
FROM vbap
WHERE kunnr = v~kunnr
AND vkorg = v~vkorg )
INTO TABLE #DATA(customers).
This select all customers who made 5 sales orders within some sales organization.
In ABAP there is no way to do the query as in NATIVE SQL.
I would advice not to use NATIVE SQL, instead give a try to SELECT/ENDSELECT statement.
DATA: ls_table1 type table1,
lt_table1 type table of table1,
lv_count type i.
SELECT PARTNER, CONTRACT_ACCOUNT
INTO ls_table1
FROM TABLE1.
SELECT COUNT(*)
INTO lv_count
FROM TABLE2
WHERE PARTNER = TAB1.PARTNER
AND CONTRACT_ACCOUNT = TAB1.CONTRACT_ACCOUNT.
CHECK lv_count EQ 5.
APPEND ls_table1 TO lt_table1.
ENDSELECT
Here you append to ls_table1 only those rows where count is equals to 5 in selection of table2.
Hope it helps.

Split a string of numbers passed to stored procedure and perform a lookup in a table [duplicate]

This question already has answers here:
Oracle LISTAGG() for querying use
(2 answers)
Closed 9 years ago.
I have to pass string of numbers ( like 234567, 678956, 345678) to a stored procedure, the SP will split that string by comma delimiter and take each value ( eg: 234567) and do a look up in another table and get the corresponding value from another column and build a string.
For instance if have a table, TableA with 3 columns Column1, Column2, and Column3 with data as follows:
1 123456 XYZ
2 345678 ABC
I would pass a string of numbers to a stored procedure, for instance '123456', '345678'. It would then split this sting of numbers and take the first number - 123456 and do a look up in TableA and get the matching value from Column3 - i.e. 'XYZ'.
I need to loop through the table with split string of numbers ('12345', '345678') and return the concatenated string - like "XYZ ABC"
I am trying to do it in Oracle 11g.
Any suggestions would be helpful.
It's almost always more efficient to do everything in a single statement if at all possible, i.e. don't use a function if you can avoid it.
There is a little trick you can use to solve this using REGEXP_SUBSTR() to turn your string into something usable.
with the_string as (
select '''123456'', ''345678''' as str
from dual
)
, the_values as (
select regexp_substr( regexp_replace(str, '[^[:digit:],]')
, '[^,]+', 1, level ) as val
from the_string
connect by regexp_substr( regexp_replace(str, '[^[:digit:],]')
, '[^,]+', 1, level ) is not null
)
select the_values.val, t1.c
from t1
join the_values
on t1.b = the_values.val
This works be removing everything but the digits you require and something to split them on, the comma. You then split it on the comma and use a hierarchical query to turn this into a column, which you can then use to join.
Here's a SQL Fiddle as a demonstration.
Please note that this is highly inefficient when used on large datasets. It would probably be better if you passed variables normally to your function...

Resources