BigQuery UDF remove from array. How do I anonymously reference an element in a struct? - struct

This removes the elements from the array, but it renames the field to v.
Is there a better way to do this?
OR
Can I get the first element of the struct (ex. t[0])?
CREATE TEMP FUNCTION remove(arr ANY TYPE, val ANY TYPE) AS ((
SELECT ARRAY_AGG(t)
FROM (SELECT * FROM UNNEST(arr) v) t
WHERE t.v <> val
));

Is there a better way to do this?
Use below trimmed down version
CREATE TEMP FUNCTION remove(arr ANY TYPE, val ANY TYPE) AS ((
SELECT ARRAY_AGG(v)
FROM UNNEST(arr) v
WHERE v <> val
));

Removing any element (val) from an array (arr) is possible, as you showed. Putting the t.v inside the SELECT_ARRAY_AGG let the output be as expected.
CREATE TEMP FUNCTION remove(arr ANY TYPE, val ANY TYPE) AS ((
SELECT ARRAY_AGG(t.v)
FROM (SELECT * FROM UNNEST(arr) v) t
WHERE t.v <> val
));
select remove([1,2,3,4,5],3) as array_without_3

Extracting the first element of a struct is possible. However this is an ugly workaround to do so:
CREATE TEMP FUNCTION firstelement(arr string)
returns string
LANGUAGE js as
"""
var tmp=JSON.parse(arr);
var var1=arr.substring(2,arr.search(':',)-1);
if(!var1.includes('{')) {
return tmp[var1];
}
""";
Select firstelement(TO_JSON_STRING(A)) as A, cast(firstelement(TO_JSON_STRING(B)) as int64) as B,
from (
select struct( "4a" as z,2 as b , 9 as bb) as A, struct( 4 as z,2 as b , 9 as bb) as B
);
At first the struct has to be converted to a JSON string. The function finds the name of the first entry and holds that information in var1. The string is converted toa JSON and the element with the name var1 is read out and returned by the function as a string. If it was a number, in the select statement a cast as is needed.

Related

Optimized Lua Table Searching

I have a LUA table:
local tableDatabase = {
name = uniqueName,
class = val_one_to_eight, --not unique
value = mostly_but_not_guaranteed_unique_int}
This table can be sorted by any of the above and may contain a very large data set.
Right now in order to insert I'm just iterating through the table ipairs until I find:
insertedvalue.uniqueName > tableDatabase.uniqueName
--(or comparing the other parms instead if they are the selected sort order.)
I need this function to work super fast. Is there a search algorithm someone could recommend for finding the index into the table to insert or some method I could use that would work on a lua table to optimize this for speed of insertions?
As I know, for strictly ordered structure you can use binary search or similar algorithms.
Lua Users provides ready to use function.
Why don't you create an index on name? If it is not fast enough, you can make __index less generic, i.e. hardcoding the only index on name.
-- Returns a table. ... is a list of fields, for which unique indices should be created:
function indexedTable (...)
local t = {
__indices = {},
__insert = function (self, value) -- instead of table.insert.
self [#self + 1] = value -- implicily calls metamethod __newindex.
end
}
-- Initialise indices:
for _, index in ipairs {...} do
t.__indices [index] = {}
end
setmetatable (t, {
-- Allow t [{name = 'unique'}]:
__index = function (t, key)
if type (key) == 'table' then
for index_key, index_value in pairs (key) do
local value = t.__indices [index_key] [index_value]
if value then
return value
end
end
else
return rawget (t, key)
end
end,
-- Updates all indices on t [k] = v, but doesn't work on table.insert, so use t:__insert"
__newindex = function (t, key, value)
-- insert uniqueness constraint here, if you want.
for index_key, index in pairs (t.__indices) do
index [value [index_key]] = value
end
rawset (t, key, value)
end
})
return t
end
-- Test:
local tableDatabase = indexedTable ('name')
-- Not table.insert, as it is not customizable via metamethods:
tableDatabase:__insert {
name = 'unique1',
class = 1,
value = 'somewhat unique'
}
tableDatabase:__insert {
name = 'unique2',
class = 2,
value = 'somewhat unique'
}
tableDatabase:__insert {
name = 'unique3',
class = 2,
value = 'somewhat unique but not absolutely'
}
local unique2 = tableDatabase [{name = 'unique2'}] -- index search.
print (unique2.name, unique2.class, unique2.value)

How do I filter an array against a list of strings in Slick

I've got a column containing an array of varchar, and a list of search strings that I want to match against the column. If any of the search strings match any substring in the column strings, I want to return the row.
So for example if the column contains:
row 1: ['butter', 'water', 'eggs']
row 2: ['apples', 'oranges']
row 3: ['chubby', 'skinny']
And my search strings are:
Set("ter", "hub")
I want my filtered results to include row 1 and row 3, but not row 2.
If I were writing this in plain Scala I'd do something like:
val rows = [the rows containing my column]
val search = Set("ter", "hub")
rows.filter(r => search.exists(se => r.myColumn.exists(s => s.contains(se))))
Is there some way of doing this in Slick so the filtering gets done on the DB side before returning the results? Some combination of LIKE and ANY, maybe? I'm a little fuzzy on the mechanics of filtering an array against another array in SQL in the first place.
While I'm not convinced that this is the best way to do it, I've put together a solution that uses Regex. First, I concatenate the search terms into a simple regular expression:
val matcher = search.mkString(".*(","|",").*") // i.e. '.*(ter|hub).*'
Then I concatenate the array in the table column using an implicit SimpleExpression:
implicit class StringConcat(s: Rep[List[String]]){
def stringConcat: Rep[String] = {
val expr = SimpleExpression.unary[List[String], String] { (s, qb) =>
qb.sqlBuilder += "array_to_string("
qb.expr(s)
qb.sqlBuilder += ", ',')"
}
expr.apply(s)
}
}
Finally, I build a regex query using another implicit SimpleExpression:
implicit class RegexQuery(s: Rep[String]) {
def regexQ(p: Rep[String]): Rep[Boolean] = {
val expr = SimpleExpression.binary[String,String,Boolean] { (s, p, qb) =>
qb.expr(s)
qb.sqlBuilder += " ~* "
qb.expr(p)
}
expr.apply(s,p)
}
}
And I can then perform my match like:
myTable.filter(row => row.myColumn.stringConcat.regexQ(matcher))
Hope that helps someone out, and if you have a better way of doing it let me know.
Edit to add:
If you're looking for exact matches, and not partial matches, you can use the array overlap operator, like:
myColumn && '{"water","oranges"}'
In Slick this is the #& operator, like
.filter(table => table.myColumn #& myMatchList)

How to compare None Python and NULL SQLite?

I scrape the site and write the result in the database, some elements may be None, this is not a problem, in the database they are displayed as NULL.
The problem is that before writing to the database, I check if this record already exists or not:
cursor = await db.execute('SELECT * FROM MATCHES WHERE (a="{}" AND b="{}" AND c="{}".format(a, b, c)))
The problem is that SELECT thinks that a = 'None'but not a = None
And this is strange because when recording everything is ok:
INSERT INTO MATCHES (a, b, c) VALUES (?,?,?), (a, b, c)
In this case, if some element is None in the database then it will be NULL
I tried another option:
SELECT * FROM MATCHES WHERE (a=? AND b=? AND c=?), (a, b, c)
But it works like the first.
What can I do?
You could use "NULL-safe" operator IS:
SELECT * FROM MATCHES WHERE (a IS ? AND b IS ? AND c IS ?);
CREATE TABLE tab(i INT, j INT);
INSERT INTO tab(i, j) VALUES (1,1),(NULL,NULL),(1,2);
SELECT *
FROM tab
WHERE i = j;
SELECT *
FROM tab
WHERE i IS j;
db-fiddle.com demo

Substitute substring within a string bidrectionally [duplicate]

This question already has answers here:
Replace multiple strings with multiple other strings
(27 answers)
Closed 7 years ago.
Given a string M that contains term A and B, I would like to substitute every A for B and every B for A to for M'. Naively one would try replacing A by B and then subsequently B by A but in that case the M' contains only of A. I can think of replacing the terms and record their position so that the terms do not get replaced again. This works when we only have A and B to replace. But if we need to substitute more than 2 terms and they are of different length then it gets tricky.
So I thought about doing this:
We are given M as input string and R = [(x1, y1), (x2, y2), ... (xn, yn)] as terms to replace, where we replace xi for yi for all i.
With M, Initiate L = [(M, false)] to be a list of (string * boolean) tuple where false means that this string has not been replaced.
Search for occurence of xi in each member L(i) of L with second term false. Partition L(i) into [(pre, false), (xi, false), (post, false)], and map to [(pre, false), (yi, true), (post, false)] where pre and post are string before and after xi. Flatten L.
Repeat the above until R is exhausted.
Concatenate the first element of each tuple of L to from M'.
Is there a more efficient way of doing this?
Here's a regex solution:
var M = 'foobazbar123foo match';
var pairs = {
'foo': 'bar',
'bar': 'foo',
'baz': 'quz',
'no': 'match'
};
var re = new RegExp(Object.keys(pairs).join('|'),'g');
alert(M.replace(re, function(m) { return pairs[m]; }));
Note: This is a demonstration / POC. A real implementation would have to handle proper escaping of the lookup strings.
Another approach is to replace strings by intermediate temporary strings (symbols) and then replace symbols by their original counterparts. So the transformation 'foo' => 'bar' can be transformed in two steps as, say, 'foo' => '___1' => 'bar'. The other transformation 'bar' ==> 'foo' will then become 'bar' ==> '___2' ==> 'foo'. This will prevent the mixup you describe.
Sample python code for the same example as the other answer follows:
import re
def substr(string):
repdict = {"foo":"bar", "bar":"foo", "baz":"quz", "no":"match"}
tmpdict = dict()
count = 0
for left, right in repdict.items():
tmpleft = "___" + count.__str__()
tmpdict[tmpleft] = right
count = count + 1
tmpright = "___" + count.__str__()
tmpdict[tmpright] = left
count = count + 1
string = re.sub(left, tmpleft, string)
string = re.sub(right, tmpright, string)
for tmpleft, tmpright in tmpdict.items():
string = re.sub(tmpleft, tmpright, string)
print string
>>> substr("foobazbar123foo match")
barquzfoo123bar no

scala assigning string and array of values

I'm trying to assign a string followed by an array of scores.
I defined some categories
case class CategoryScore( //Define Category Score class
val food: Int,
val tech: Int,
val service: Int,
val fashion: Int)
and mapped them to some keys so that a String such as the name of a product would be followed by the case class of scores.
var keywordscores:Map[String, CategoryScore] = Map() //keyword scores
keywordscores += ("amazon",CategoryScore(1,9,1,4)) //Tried to add score for a string, does not work
am I missing something here?
scala> keywordscores += ("amazon" -> CategoryScore(1,9,1,4))
or (note the extra parenthesis)
scala> keywordscores += (("amazon", CategoryScore(1,9,1,4)))
The reason for that is that + is defined as +(kvs: (A, B)*): Map[A, B], meaning it can take any number of (key,value) pairs, leading to += (k,v) being ambiguous.
The a -> b notation removes this ambiguity (and it's much nicer to read).
Maps are added like
keywordscores += ("amazon" -> CategoryScore(1,9,1,4))
With a mutable Map you can also update/insert entries as follows,
val keywordscores:collection.mutable.Map[String, CategoryScore] = Map()
keywordscores("amazon") = CategoryScore(1,9,1,4))
Here a new entry with key "amazon" is inserted; a subsequent call with the same key will update the value.

Resources