SQLITE compare columns in two tables to find LIKE data - linux

I am trying to compare data in one column that is in two different tables. The two tables are a lot more columns, but for simplicity.....
CREATE TABLE A(
ID integer PRIMARY KEY AUTOINCREMENT,
name char(20)
);
CREATE TABLE B(
ID integer PRIMARY KEY AUTOINCREMENT,
name char(20)
);
INSERT INTO A(name) VALUES ('John Smith');
INSERT INTO A(name) VALUES ('J Doe');
INSERT INTO A(name) VALUES ('Jane Smith');
INSERT INTO B(name)VALUES('John Smith');
INSERT INTO B(name)VALUES('J. Doe');
INSERT INTO B(name)VALUES('jane smith');
Most of what I've found so far has been to find the differences between tables, but I haven't managed to find how to match up similar data. I am looking for something that will yield results like this:
Table A | Table B
John Smith | John Smith
Jane Smith | jane smith
J Doe | J. Doe
The following code matched up several names:
CREATE TABLE tblC (
tblAName char(20),
tblBName char(20)
);
INSERT INTO tblC ( tblAName, tblBName)
SELECT
tblA.name,
tblB.name
FROM tblA
LEFT JOIN on tblB WHERE tblA.name LIKE tblB.name;
However, I haven't figured out how to get the names that contain punctuation. This didn't work:
INSERT INTO tblC (tblAName, tblBName)
SELECT
tblA.name,
tblB.name
FROM tblA
LEFT JOIN on tblB WHERE tblA.name LIKE tblB.name
WHERE tblA.name LIKE "%Xxx%" OR "%X.%" tblB.name LIKE "%Xxx%" OR "%X.%";

To ignore certain characters, remove them with replace() before doing the comparisons. To ignore case, use LIKE, or COLLATE NOCASE:
SELECT A.name,
B.name
FROM A
JOIN B ON replace(A.name, '.', '') LIKE
replace(B.name, '.', '');

Related

SQL Server: use all the words of a string as separate LIKE parameters (and all words should match)

I have a string containing a certain number of words (it may vary from 1 to many) and I need to find the records of a table which contains ALL those words in any order.
For instances, suppose that my input string is 'yellow blue red' and I have a table with the following records:
1 yellow brown white
2 red blue yellow
3 black blue red
The query should return the record 2.
I know that the basic approach should be something similar to this:
select * from mytable where colors like '%yellow%' and colors like '%blue%' and colors like '%red%'
However I am not being able to figure out how turn the words of the string into separate like parameters.
I have this code that splits the words of the string into a table, but now I am stuck:
DECLARE #mystring varchar(max) = 'yellow blue red';
DECLARE #terms TABLE (term varchar(max));
INSERT INTO #terms
SELECT Split.a.value('.', 'NVARCHAR(MAX)') term FROM (SELECT CAST('<X>'+REPLACE(#mystring, ' ', '</X><X>')+'</X>' AS XML) AS String) AS A CROSS APPLY String.nodes('/X') AS Split(a)
SELECT * FROM #terms
Any idea?
First, put that XML junk in a function:
CREATE FUNCTION dbo.SplitThem
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN ( SELECT Item = y.i.value(N'(./text())[1]', N'nvarchar(4000)')
FROM ( SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i));
Now you can extract the words in the table, join them to the words in the input string, and discard any that don't have the same count:
DECLARE #mystring varchar(max) = 'red yellow blue';
;WITH src AS
(
SELECT t.id, t.colors, fc = f.c, tc = COUNT(t.id)
FROM dbo.mytable AS t
CROSS APPLY dbo.SplitThem(t.colors, ' ') AS s
INNER JOIN (SELECT Item, c = COUNT(*) OVER()
FROM dbo.SplitThem(#mystring, ' ')) AS f
ON s.Item = f.Item
GROUP BY t.id, t.colors, f.c
)
SELECT * FROM src
WHERE fc = tc;
Output:
id
colors
fc
tc
2
red blue yellow
3
3
Example db<>fiddle
This disregards any possibility of duplicates on either side and ignores the larger overarching issue that this is the least optimal way possible to store sets of things. You have a relational database, use it! Surely you don't think the tags on this question are stored somewhere as the literal string
string sql-server-2012 sql-like
Of course not, these question:tag relationships are stored in a, well, relational table. Splitting strings is for the birds and those with all kinds of CPU and time to spare.
If you are storing a delimited list in a single column then you really need to normalize it out into a separate table.
But assuming you actually want to just do multiple free-form LIKE comparisons, you can do them against a list of values:
select *
from mytable t
where not exists (select 1
from (values
('%yellow%'),
('%blue%'),
('%red%')
) v(search)
where t.colors not like v.search
);
Ideally you should pass these values through as a Table Valued Parameter, then you just put that into your query
select *
from mytable t
where not exists (select 1
from #tmp v
where t.colors not like v.search
);
If you want to simulate an OR semantic rather than AND the change not exists to exists and not like to like.

How to reorder values in a row alphabetically using T-SQL?

I need to reorder the values in rows of a table by alphabetical order, for example:
Id Values
--------------------------------
1 Banana, Apple, Oranges
2 Oranges, Melon, Cucumber
3 Cucumber, Banana, Apple
The expected output should be:
Id Values
--------------------------------
1 Apple, Banana, Oranges
2 Cucumber, Melon, Oranges
3 Apple, Banana, Cucumber
You can generate the data above using the following code:
CREATE TABLE [Table] (
[Id] INT NOT NULL,
[Values] VARCHAR(30) NOT NULL,
CONSTRAINT [PK_Table_Id] PRIMARY KEY CLUSTERED ([Id])
);
GO
INSERT INTO [Table] ([Id], [Values]) VALUES (1, 'Banana, Apple, Oranges'),(2, 'Oranges, Melon, Cucumber'),(3, 'Cucumber, Banana, Apple');
If you are using SQL Server 2017 or later, we can use a combination of STRING_SPLIT and STRING_AGG:
WITH cte AS (
SELECT Id, value
FROM [Table]
CROSS APPLY STRING_SPLIT([Values], ', ')
)
SELECT
Id,
STRING_AGG(value, ', ') WITHIN GROUP (ORDER BY value) AS [Values]
FROM cte
GROUP BY Id
ORDER BY Id;
However, I seriously suggest that you stop just with my CTE step above, because storing CSV values in your table is a bad idea from the very beginning. So, once you have each value per Id on a separate row, you should stop, because then your data is already normalized, or at least much closer to it.

Need initial N characters of column in Postgres where N is unknown

I have one column in my table in Postgres let's say employeeId. We do some modification based on the employee type and store it in DB. Basically, we append strings from these 4 strings ('ACR','AC','DCR','DC'). Now we can have any combination of these 4 strings appended after employeeId. For example, EMPIDACRDC, EMPIDDCDCRAC etc. These are valid combinations. I need to retrieve EMPID from this. EMPID length is not fixed. The column is of varying length type. How can this be done in Postgres?
I am not entirely sure I understand the question, but regexp_replace() seems to do the trick:
with sample (employeeid) as (
values
('1ACR'),
('2ACRDCR'),
('100DCRAC')
)
select employeeid,
regexp_replace(employeeid, 'ACR|AC|DCR|DC.*$', '', 'gi') as clean_id
from sample
returns:
employeeid | clean_id
-----------+---------
1ACR | 1
2ACRDCR | 2
100DCRAC | 100
The regular expression says "any character after any of those string up to the end of the string" - and that is then replace with nothing. This however won't work if the actual empid contains any of those codes that are appended.
It would be much cleaner to store this information in two columns. One for the empid and one for those "codes"

Cassandra migrate int to bigint

What would be the easiest way to migrate an int to a bigint in Cassandra? I thought of creating a new column of type bigint and then running a script to basically set the value of that column = the value of the int column for all rows, and then dropping the original column and renaming the new column. However, I'd like to know if someone has a better alternative, because this approach just doesn't sit quite right with me.
You could ALTER your table and change your int column to a varint type. Check the documentation about ALTER TABLE, and the data types compatibility matrix.
The only other alternative is what you said: add a new column and populate it row by row. Dropping the first column can be entirely optional: if you don't assign values when performing insert everything will stay as it is, and new records won't consume space.
You can ALTER your table to store bigint in cassandra with varint. See the example-
cassandra#cqlsh:demo> CREATE TABLE int_test (id int, name text, primary key(id));
cassandra#cqlsh:demo> SELECT * FROM int_test;
id | name
----+------
(0 rows)
cassandra#cqlsh:demo> INSERT INTO int_test (id, name) VALUES ( 215478936541111, 'abc');
cassandra#cqlsh:demo> SELECT * FROM int_test ;
id | name
---------------------+---------
215478936541111 | abc
(1 rows)
cassandra#cqlsh:demo> ALTER TABLE demo.int_test ALTER id TYPE varint;
cassandra#cqlsh:demo> INSERT INTO int_test (id, name) VALUES ( 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999, 'abcd');
cassandra#cqlsh:demo> SELECT * FROM int_test ;
id | name
------------------------------------------------------------------------------------------------------------------------------+---------
215478936541111 | abc
9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 | abcd
(2 rows)
cassandra#cqlsh:demo>

Insert data in map<text,text> in cassandra db

i have a column in cassandra database as map<text,text>
I insert the data in this table as :
INSERT INTO "Table1" (col1) VALUES ({'abc':'abc','hello':'world','flag':'true'});
So, in my code i can get the data as :
{
"abc":"abc",
"hello":"world",
"flag":"true"
}
But, now i want this like :
{
"abc":"abc",
"hello":"world",
"flag":{
"data":{ "hi":"cassandra"},
"working":"no"
}
}
For this, when I try the insert query, it says that it does not match the type map<text,text>
How can I make this work ?
The problem here (in your second example) is that the type of col1 is a map<text,text> but flag is a complex type and no longer matches that definition. One way to solve this would be to create individual TEXT columns for each property, as well as a user defined type for flag and the data it contains:
> CREATE TYPE flagtype (data map<text,text>,working text);
> CREATE TABLE table1 (abc text,
hello text,
flag frozen<flagtype>
PRIMARY KEY (abc));
Then INSERTing the JSON text from your second example works.
> INSERT INTO table1 JSON '{"abc":"abc",
"hello":"world",
"flag":{"data":{"hi":"cassandra"},
"working":"no"}}';
> SELECT * FROM table1;
abc | flag | hello
-----+--------------------------------------------+-------
abc | {data: {'hi': 'cassandra'}, working: 'no'} | world
(1 rows)
If you are stuck on using the map<text,text> type, and want the value JSON sub properties to be treated a large text string, you could try a simple table like this:
CREATE TABLE stackoverflow.table2 (
key1 text PRIMARY KEY,
col1 map<text, text>);
And on your INSERTs just escape out the inner quotes:
> INSERT INTO table2 JSON '{"key1":"1","col1":{"abc":"abc","hello":"world"}}';
> INSERT INTO table2 JSON '{"key1":"2","col1":{"abc":"abc","hello":"world",
"flag":"{\"data\":{\"hi\":\"cassandra\"},\"working\":\"no\"}"}}';
> SELECT * FROm table2;
key1 | col1
------+----------------------------------------------------------------------------------------
2 | {'abc': 'abc', 'flag': '{"data":{"hi":"cassandra"},"working":"no"}', 'hello': 'world'}
1 | {'abc': 'abc', 'hello': 'world'}
(2 rows)
That's a little hacky and will probably require some additional parsing on your application side. But it gets you around the problem of having to define each column.

Resources