Third substring from the end in Google Big Query

Third substring from the end in Google Big Query - string

I use standard sql and want to extract third substring from the end.
Example Input: "Search-site-variable-brand-0-city-none-18053517"
Output: "city"

I just wanted to point out that if you plan to apply this transformation to multiple columns, it may be useful to pull the logic into a UDF. Here's an example of how to do that:
CREATE TEMP FUNCTION SecondSubstringFromEnd(s STRING) AS ((
SELECT arr[SAFE_OFFSET(ARRAY_LENGTH(arr) - 3)]
FROM (
SELECT SPLIT(s, '-') AS arr
)
));
WITH Input AS (
SELECT 'Search-site-variable-brand-0-city-none-18053517' AS str UNION ALL
SELECT 'a-b' UNION ALL
SELECT 'w-x-yyy-z'
)
SELECT
str,
SecondSubstringFromEnd(str) AS second_substring_from_end
FROM Input;

This might do the trick:
WITH data AS(
select "Search-site-variable-brand-0-city-none-18053517" as Input
)
SELECT
CASE WHEN ARRAY_LENGTH(SPLIT(Input, '-')) > 3 THEN SPLIT(Input, '-')[OFFSET(ARRAY_LENGTH(SPLIT(Input, '-')) - 3)] END word
FROM data
It returns NULL in case the string has no split, such as empty strings.

Few more variations for BigQuery Standard SQL:
#standardSQL
WITH YourTable AS(
SELECT 'Search-site-variable-brand-0-city-none-18053517' AS Input UNION ALL
SELECT 'Second-substring-from-the-end-in-Google-BigQuery' UNION ALL
SELECT 'bigQuery-assign-a-value-to-table-1-based-on-table-2' UNION ALL
SELECT 'Error-Message-Too-many-sources-provided-15285-Limit-is-10000' UNION ALL
SELECT 'Google-Bigquery-data-import-from-Google-Analytics-360' UNION ALL
SELECT 'Bigquery-Partitioning-data-past-2000-limit'
)
SELECT
Input,
REVERSE(SPLIT(REVERSE(Input), '-')[SAFE_ORDINAL(3)]) AS Output_1,
ARRAY_REVERSE(SPLIT(Input, '-'))[SAFE_ORDINAL(3)] AS Output_2
FROM YourTable

The "ARRAY_REVERSE" function works wonders in this scenario.
with input AS
(
SELECT "Search-site-variable-brand-0-city-none-18053517" AS to_reverse_string
)
SELECT ARRAY_REVERSE(SPLIT(to_reverse_string, "-"))[SAFE_OFFSET(2)]
FROM input

Related

MssqlRow to json string without knowing structure and data type on compile time [duplicate]

Using PostgreSQL I can have multiple rows of json objects.
select (select ROW_TO_JSON(_) from (select c.name, c.age) as _) as jsonresult from employee as c
This gives me this result:
{"age":65,"name":"NAME"}
{"age":21,"name":"SURNAME"}
But in SqlServer when I use the FOR JSON AUTO clause it gives me an array of json objects instead of multiple rows.
select c.name, c.age from customer c FOR JSON AUTO
[{"age":65,"name":"NAME"},{"age":21,"name":"SURNAME"}]
How to get the same result format in SqlServer ?

By constructing separate JSON in each individual row:
SELECT (SELECT [age], [name] FOR JSON PATH, WITHOUT_ARRAY_WRAPPER)
FROM customer
There is an alternative form that doesn't require you to know the table structure (but likely has worse performance because it may generate a large intermediate JSON):
SELECT [value] FROM OPENJSON(
(SELECT * FROM customer FOR JSON PATH)
)

no structure better performance
SELECT c.id, jdata.*
FROM customer c
cross apply
(SELECT * FROM customer jc where jc.id = c.id FOR JSON PATH , WITHOUT_ARRAY_WRAPPER) jdata (jdata)

Same as Barak Yellin but more lazy:
1-Create this proc
CREATE PROC PRC_SELECT_JSON(#TBL VARCHAR(100), #COLS VARCHAR(1000)='D.*') AS BEGIN
EXEC('
SELECT X.O FROM ' + #TBL + ' D
CROSS APPLY (
SELECT ' + #COLS + '
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
) X (O)
')
END
2-Can use either all columns or specific columns:
CREATE TABLE #TEST ( X INT, Y VARCHAR(10), Z DATE )
INSERT #TEST VALUES (123, 'TEST1', GETDATE())
INSERT #TEST VALUES (124, 'TEST2', GETDATE())
EXEC PRC_SELECT_JSON #TEST
EXEC PRC_SELECT_JSON #TEST, 'X, Y'
If you're using PHP add SET NOCOUNT ON; in the first row (why?).

Replace One character in string with multiple characters in loop - ORACLE

I have a situation where say a string has one replaceable character.. For ex..
Thi[$] is a strin[$] I am [$]ew to Or[$]cle
Now I need to replace the [$] with s,g,n,a
Respectively...
How can I do that? Please help.

There is a special PL/SQL function UTL_LMS.FORMAT_MESSAGE:
You can use use it in your INLINE pl/sql function:
with function format(
str in varchar2
,s1 in varchar2 default null
,s2 in varchar2 default null
,s3 in varchar2 default null
,s4 in varchar2 default null
,s5 in varchar2 default null
,s6 in varchar2 default null
,s7 in varchar2 default null
,s8 in varchar2 default null
,s9 in varchar2 default null
,s10 in varchar2 default null
) return varchar2
as
begin
return utl_lms.format_message(replace(str,'[$]','%s'),s1,s2,s3,s4,s5,s6,s7,s8,s9,10);
end;
select format('Thi[$] is a strin[$] I am [$]ew to Or[$]cle', 's','g','n','a') as res
from dual;
Result:
RES
-------------------------------------
This is a string I am new to Oracle

Here is a hand-rolled solution using a recursive WITH clause, and INSTR and SUBSTR functions to chop the string and inject the relevant letter at each juncture.
with rcte(str, sigils, occ) as (
select 'Thi[$] is a strin[$] I am [$]ew to Or[$]cle' as str
, 'sgna' as sigils
, 0 as occ
from dual
union all
select substr(str, 1, instr(str,'[$]',1,1)-1)||substr(sigils, occ+1, 1)||substr(str, instr(str,'[$]',1,1)+3) as str
, sigils
, occ+1 as occ
from rcte
where occ <= length(sigils)
)
select *
from rcte
where occ = length(sigils)
Here is a working demo on db<>fiddle.
However, it looks like #sayanm has provided a neater solution.

Consider this method that lets the lookup values be table-based. See the comments within. The original string is split into rows using the placeholder as a delimiter. Then the rows are put back together using listagg, joining on it's order to the lookup table.
Table-driven using as many placeholders as you want. The order matters though of course just as with the other answers.
-- First CTE just sets up source data
WITH tbl(str) AS (
SELECT 'Thi[$] is a strin[$] I am [$]ew to Or[$]cle' FROM dual
),
-- Lookup table. Does not have to be a CTE here, but a normal table
-- in the database.
tbl_sub_values(ID, VALUE) AS (
SELECT 1, 's' FROM dual UNION ALL
SELECT 2, 'g' FROM dual UNION ALL
SELECT 3, 'n' FROM dual UNION ALL
SELECT 4, 'a' FROM dual
),
-- Split the source data using the placeholder as a delimiter
tbl_split(piece_id, str) AS (
SELECT LEVEL AS piece_id, REGEXP_SUBSTR(t.str, '(.*?)(\[\$\]|$)', 1, LEVEL, NULL, 1)
FROM tbl T
CONNECT BY LEVEL <= REGEXP_COUNT(t.str, '[$]') + 1
)
-- select * from tbl_split;
-- Put the string back together, joining with the lookup table
SELECT LISTAGG(str||tsv.value) WITHIN GROUP (ORDER BY piece_id) STRING
FROM tbl_split ts
LEFT JOIN tbl_sub_values tsv
ON ts.piece_id = tsv.id;
STRING
--------------------------------------------------------------------------------
This is a string I am new to Oracle

Multiple string substitutions in PostgreSQL

I have a column with abbreviations separated by spaces like this
'BG MSG'
Also, there's another table with substitutions
target replacement
----------------------
'BG', 'Brick Galvan'
'MSG', 'Mosaic Galvan'
The goal is to apply all the substitutions to the abbreviations to obtain something like
'Brick Galvan Mosaic Galvan' from 'BG MSG'
I know I could do
replace( replace('BG MSG', 'BG', 'Brick Galvan'), 'MSG', 'Mosaic Galvan')
But imagine there are hundreds of substitutions, and they can change from one day to the next. The resulting query will be hideous to maintain.
I mean, I could do a code generator that will create the query with all the nested replaces, but I'm looking for something more elegant and postgres-native.
I've found solutions like this one
How to replace multiple special characters in Postgres 9.5 but they seem to work only for single characters.

Let's say your tables look like this:
create table my_table(id serial primary key, abbrevs text);
insert into my_table (abbrevs) values
('BG MSG');
create table substitutions(target text, replacement text);
insert into substitutions values
('BG', 'Brick Galvan'),
('MSG', 'Mosaic Galvan');
You can get each abbreviation in a single row:
select id, unnest(string_to_array(abbrevs, ' ')) as abbrev
from my_table
id | abbrev
----+--------
1 | BG
1 | MSG
(2 rows)
and use them to join the substitution table and get full names:
select id, string_agg(replacement, ' ') as full_names
from (
select id, unnest(string_to_array(abbrevs, ' ')) as abbrev
from my_table
) t
join substitutions on abbrev = target
group by id
id | full_names
----+----------------------------
1 | Brick Galvan Mosaic Galvan
(1 row)
Db<>fiddle.

Nested replace approach would work but it is quite ugly, right?
SELECT REPLACE(REPLACE(REPLACE(REPLACE(…
After carefully formatted to make it look readable, the best you can get follows:
SELECT
REPLACE(
REPLACE(
REPLACE(
REPLACE(...
On the other hand, you might just use the LATERAL JOIN solution which uses more characters but, it is definitely more readable.
-- Input: BG, MSG
-- Output: Brick Galvan, Mosaic Galvan
SELECT msg.Materials
FROM (SELECT 'BG, MSG' AS Materials) mt
INNER JOIN LATERAL (SELECT REPLACE(mt.Materials::text, 'BG', 'Brick Galvan') AS Materials) bg ON true
INNER JOIN LATERAL (SELECT REPLACE(bg.Materials::text, 'MSG', 'Mosaic Galvan') AS Materials) msg ON true;

Google BigQuery Replace function for string type

I am trying to replace certain customer names in my data.
I was able to do SQL using Google BigQuery language to transform one part of the string another via the replace function for one particular string.
Replace(CustomerName, 'ABC', 'XYZ')
However, I have a couple more that I would need to use the replace function such that
Replace(CustomerName, 'PLO', 'Rustic')
Replace(CustomerName, 'Kix', 'BowWow')
and so on.
I've tried doing
Replace(CustomerName, 'ABC', 'XYZ') OR Replace(CustomerName, 'PLO', 'Rustic') OR Replace(CustomerName, 'Kix', 'BowWow')
but that got me an error message.
I've also tried
Replace(CustomerName, 'ABC', 'XYZ') AND Replace(CustomerName, 'PLO', 'Rustic') AND Replace(CustomerName, 'Kix', 'BowWow')
but that also got me an error message.
I am able to just use "case when statement" and then hardcode each one, but I'm wondering if there is a better/faster way to just use replace statement instead.
Thanks for your help.

The CASE WHEN option is pretty reasonable. Another option is to chain them together:
REPLACE(
REPLACE(
REPLACE(
CustomerName,
'ABC',
'XYZ'),
'PLO',
'Rustic'),
'Kix',
'BowWow')
Which one you pick really depends on the exact scenario. The chained REPLACE calls are probably faster, but they could overlap in weird ways (e.g., if the output to one replacement matches the input to a subsequent one). The CASE WHEN approach avoids that issue, but it's probably more expensive because you need to do one operation to find the substring and another to actually replace it.
Note that when you're using AND or OR, you're trying to combine the string output of REPLACE as if it were a boolean, which is why it's failing.

In cases when you have quite a number of replacements - chaining of REPLACEs can become not practical and annoying manual work.
Below addresses this potential issue (assuming you maintain Lookup table with pairs: Word, Replacement)
SELECT CustomerName, fixedCustomerName FROM JS(
// input table
(
SELECT
CustomerName, Replacements
FROM YourTable
CROSS JOIN (
SELECT
GROUP_CONCAT_UNQUOTED(CONCAT(Word, ',', Replacement), ';') AS Replacements
FROM ReplacementLookup
) ,
// input columns
CustomerName, Replacements,
// output schema
"[
{name: 'CustomerName', type: 'string'},
{name: 'fixedCustomerName', type: 'string'}
]",
// function
"function(r, emit){
var Replacements = r.Replacements.split(';');
var fixedCustomerName = r.CustomerName;
for (var i = 0; i < Replacements.length; i++) {
var pat = new RegExp(Replacements[i].split(',')[0],'gi')
fixedCustomerName = fixedCustomerName.replace(pat, Replacements[i].split(',')[1]);
}
emit({CustomerName: r.CustomerName,fixedCustomerName: fixedCustomerName});
}"
)
You can test it using below example
SELECT CustomerName, fixedCustomerName FROM JS(
// input table
(
SELECT
CustomerName, Replacements
FROM (
SELECT CustomerName FROM
(SELECT '1234ABC567' AS CustomerName),
(SELECT '12 34 PLO 56' AS CustomerName),
(SELECT 'Kix' AS CustomerName),
(SELECT '98 ABC PLO Kix ABC 76 XYZ 54' AS CustomerName),
(SELECT 'ABCQweKIX' AS CustomerName)
) YourTable
CROSS JOIN (
SELECT
GROUP_CONCAT_UNQUOTED(CONCAT(Word, ',', Replacement), ';') AS Replacements
FROM (
SELECT Word, Replacement FROM
(SELECT 'XYZ' AS Word, 'QWE' AS Replacement),
(SELECT 'ABC' AS Word, 'XYZ' AS Replacement),
(SELECT 'PLO' AS Word, 'Rustic' AS Replacement),
(SELECT 'Kix' AS Word, 'BowWow' AS Replacement)
)
) ReplacementLookup
) ,
// input columns
CustomerName, Replacements,
// output schema
"[
{name: 'CustomerName', type: 'string'},
{name: 'fixedCustomerName', type: 'string'}
]",
// function
"function(r, emit){
var Replacements = r.Replacements.split(';');
var fixedCustomerName = r.CustomerName;
for (var i = 0; i < Replacements.length; i++) {
var pat = new RegExp(Replacements[i].split(',')[0],'gi')
fixedCustomerName = fixedCustomerName.replace(pat, Replacements[i].split(',')[1]);
}
emit({CustomerName: r.CustomerName,fixedCustomerName: fixedCustomerName});
}"
)
Please note: there is still issue if result of one replacement matches the input to a subsequent replacement

I believe there are multiple ways to tackle this problem, and it depends on the size of your dataset, practicality of simply making a guiding table by hand and uploading it to BigQuery, and the granularity of the data you want to replace.
If your values are very granular, you can create a table with "from" and "to" values on different columns, and join that table with your main table, and retrieve those values very cleanly.
# Replace the support_table table with your actual table
WITH support_table AS (
SELECT "ABC" AS OldValue, "XYZ" AS NewValue
)
SELECT main_table.OldValue, support_table.NewValue FROM main_table
JOIN support_table ON main_table.old_value = support_table.old_value
Now, if you want to replace a big list of different values with something, you can use REGEXP_REPLACE with a string containing all possible values.
If you have a very big list of items, you can use
STRING_AGG in a table with all the values you want to replace, or skip the STRING_AGG step and create said string by hand.
Both of the snippets below result in "item1|item2|item3". Choose which is faster for you to do.
# Replace the values_to_replace table with your actual table
WITH values_to_replace AS (
SELECT "item1" AS ColumnWithItemsToReplace
UNION ALL
SELECT "item2"
UNION ALL
SELECT "item3"
)
SELECT STRING_AGG(ColumnsWithItemsToReplace,"|") FROM values_to_replace
SELECT r"item1|item2|item3"
STRING_AGG will retrieve all the values from a table or query and concatenate them using a separator of choice. If you use the pipe separator, you will be able to create a string like "item1|item2|item3|..."
For a regular expression, the pipe counts as "or", which means that the regex will interpret the string as "item1 or item2 or item3". Thus, if you pass that generated string to REGEXP_REPLACE as the values to be replaced, it will be considered valid.
Example code below:
REGEXP_REPLACE(
column_to_replace
,(SELECT STRING_AGG(ColumnWithItemsToReplace,"|") FROM `YourTable`)
,"Replacer"
)
Hope it helps.

in Tsql can i compare two string "MY String" to my string and show they are different

I need to do a query between two tables and find non matching fields
table 1 field locations has "my String"
table 2 field locations has "MY string"
they = by text but not by capitalization i need to return a false for this

Having the following data:
DECLARE #TableOne TABLE
(
[ID] TINYINT
,[Value] VARCHAR(12)
)
DECLARE #TableTwo TABLE
(
[ID] TINYINT
,[Value] VARCHAR(12)
)
INSERT INTO #TableOne ([ID], [Value])
VALUES (1,'my String')
INSERT INTO #TableTwo ([ID], [Value])
VALUES (1,'MY String')
You can use set Case Sentitive collation like this:
SELECT [TO].[Value]
,[TW].[Value]
FROM #TableOne [TO]
INNER JOIN #TableTwo [TW]
ON [TO].[ID] = [TW].[ID]
AND [TO].[Value] <> [TW].[Value]
COLLATE Latin1_General_CS_AS
or use HASH functions like this:
SELECT [TO].[Value]
,[TW].[Value]
FROM #TableOne [TO]
INNER JOIN #TableTwo [TW]
ON [TO].[ID] = [TW].[ID]
WHERE HASHBYTES('SHA1', [TO].[Value]) <> HASHBYTES('SHA1', [TW].[Value])

DECLARE #Table1 AS TABLE (FieldName VARCHAR(100))
DECLARE #Table2 AS TABLE (FieldName VARCHAR(100))
INSERT INTO #Table1 (FieldName) VALUES ('MY Location')
INSERT INTO #Table2 (FieldName) VALUES ('My Location')
With a default case insensitive collation order - Matches and returns results
SELECT * FROM #Table1 AS T1
INNER JOIN #Table2 AS T2
ON T1.FieldName = T2.FieldName
With a case sensitive collation order specified. Will not match
SELECT * FROM #Table1 AS T1
INNER JOIN #Table2 AS T2
ON T1.FieldName = T2.FieldName COLLATE Latin1_General_CS_AS_KS_WS
Microsoft article on collation

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Third substring from the end in Google Big Query - string

I use standard sql and want to extract third substring from the end. Example Input: "Search-site-variable-brand-0-city-none-18053517" Output: "city"

The "ARRAY_REVERSE" function works wonders in this scenario. with input AS ( SELECT "Search-site-variable-brand-0-city-none-18053517" AS to_reverse_string ) SELECT ARRAY_REVERSE(SPLIT(to_reverse_string, "-"))[SAFE_OFFSET(2)] FROM input

Related

MssqlRow to json string without knowing structure and data type on compile time [duplicate]

Replace One character in string with multiple characters in loop - ORACLE

Multiple string substitutions in PostgreSQL

Google BigQuery Replace function for string type

in Tsql can i compare two string "MY String" to my string and show they are different

Categories

Resources