Big query update struct to null in repeated field - nested

In Google bigquery, I'm trying to do an update on a repeated field.
For comparison, this works (or at least is flagged as valid), but of course isn't actually updating the field.
UPDATE my.table t
SET my_field = ARRAY(
SELECT AS STRUCT g.foo, g.bar, g.struct_to_set_null
FROM unnest(t.groups) as g
), ... FROM ... etc
Setting struct_to_set_null to null gives an error:
UPDATE my.table t
SET my_field = ARRAY(
SELECT AS STRUCT g.foo, g.bar, null as struct_to_set_null
FROM unnest(t.groups) as g
), ... FROM ... etc
Value of type ARRAY<STRUCT<... (really long and cut off) cannot be assigned to groups, which has type <ARRAY,STRUCT<... (same, really long, cut off)
I can see that the field in question is of type RECORD and NULLABLE, so I would think setting it to null is allowed. Is there a trick to getting this to work?

The problem is that BigQuery isn't inferring the type of the struct field just from the NULL literal; you need to be a bit more explicit. Here is an example:
CREATE TABLE tmp_elliottb.UpdateTable (
my_field ARRAY<STRUCT<foo INT64, bar INT64, struct_to_set_null STRUCT<x STRING, y BOOL, z INT64>>>
);
UPDATE tmp_elliottb.UpdateTable
SET my_field = ARRAY(
SELECT AS STRUCT foo, bar, NULL AS struct_to_set_null FROM UNNEST(my_field)
)
WHERE true;
This gives me:
Value of type ARRAY<STRUCT<foo INT64, bar INT64, struct_to_set_null INT64>> cannot be assigned to my_field, which has type ARRAY<STRUCT<foo INT64, bar INT64, struct_to_set_null STRUCT<x STRING, y BOOL, z INT64>>> at [4:16]
What I can do instead is to use an IF expression that produces NULL, but has struct_to_set_null on one of the branches in order to force the output type that I want:
UPDATE tmp_elliottb.UpdateTable
SET my_field = ARRAY(
SELECT AS STRUCT
foo, bar,
IF(false, struct_to_set_null, NULL) AS struct_to_set_null
FROM UNNEST(my_field)
)
WHERE true;
Or, alternatively, I can use SELECT * REPLACE:
UPDATE tmp_elliottb.UpdateTable
SET my_field = ARRAY(
SELECT AS STRUCT * REPLACE (IF(false, struct_to_set_null, NULL) AS struct_to_set_null )
FROM UNNEST(my_field)
)
WHERE true;

Repeated is an Array type, so it cannot be set to NULL.
Currently, BigQuery has two following limitations with respect to NULLs and ARRAYs:
BigQuery raises an error if query result has ARRAYs which contain NULL elements, although such ARRAYs can be used inside the query.
BigQuery translates NULL ARRAY into empty ARRAY in the query result, although inside the query NULL and empty ARRAYs are two distinct values.

Related

Diesel: Adding result of subqueries

Given the following tables:
accounts (
id INTEGER,
opening_balance INTEGER,
)
transactions (
debit INTEGER,
credit INTEGER,
amount INTEGER
foreign key debit references accounts (id),
foreign key credit references accounts (id)
)
I want to execute the following SQL query:
select
id,
opening_balance
+ (select sum(amount) from transactions where debit = account_id)
- (select sum(amount) from transactions where credit = account_id)
from accounts;
I tried something like this:
accounts
.select((
id,
opening_balance
+ transactions::table
.select(sum(transactions::amount))
.filter(transactions::debit.eq(id))
- transactions::table
.select(sum(transactions::amount))
.filter(transactions::credit.eq(id)),
))
While the individual parts of this query work fine, I cannot get this to compile.
the trait bound
`diesel::query_builder::SelectStatement<schema::transactions::table, diesel::query_builder::select_clause::SelectClause<aggregate_folding::sum::sum<diesel::sql_types::Integer, schema::transactions::columns::amount>>, diesel::query_builder::distinct_clause::NoDistinctClause, diesel::query_builder::where_clause::WhereClause<diesel::expression::operators::Eq<schema::transactions::columns::debit, schema::fiscal_year_accounts::columns::account_id>>>: diesel::Expression`
is not satisfied
required because of the requirements on the impl of `AsExpression<diesel::sql_types::Integer>` for
`diesel::query_builder::SelectStatement<schema::transactions::table, diesel::query_builder::select_clause::SelectClause<aggregate_folding::sum::sum<diesel::sql_types::Integer, schema::transactions::columns::amount>>, diesel::query_builder::distinct_clause::NoDistinctClause, diesel::query_builder::where_clause::WhereClause<diesel::expression::operators::Eq<schema::transactions::columns::debit, schema::fiscal_year_accounts::columns::account_id>>>`
The + and - operators work with static values, but how can I get them to work with subqueries?
First of all: Always provide a complete minimal example of your problem. This includes the exact version of all used crates, all relevant imports to make your code actually producing this error message, the complete error message with all help and notice statements and in diesels case the generated schema file.
To answer your question: You miss two call to .single_value() which is required to convert a query into a subquery that could be used as expression. Both subqueries return a Nullable<BigInt> therefore it is required that opening_balance as a matching type.
For the sake of completeness see the working code below
#[macro_use]
extern crate diesel;
use diesel::prelude::*;
table! {
accounts {
id -> Integer,
// changed to `BigInt` as a sum of `Integer` returns a `BigInt`
opening_balance -> BigInt,
}
}
table! {
transactions {
id -> Integer,
amount -> Integer,
debit -> Integer,
credit -> Integer,
}
}
allow_tables_to_appear_in_same_query!(accounts, transactions);
fn test() {
use self::accounts::dsl::*;
use diesel::dsl::sum;
let _q = accounts.select((
id,
opening_balance.nullable() // call `.nullable()` here to explicitly mark it as potential nullable
+ transactions::table
.select(sum(transactions::amount))
.filter(transactions::debit.eq(id))
.single_value() // call `.single_value()` here to transform this query into a subquery
- transactions::table
.select(sum(transactions::amount))
.filter(transactions::credit.eq(id))
.single_value(), // call `.single_value()` here to transform this query into a subquery
));
}

How to get the index number of a type from an Array type in typescript?

consider the following typescript code:
type Data = [string, number, symbol]; // Array of types
// If I want to access 'symbol' I would do this:
type Value = Data[2]; //--> symbol
// I need to get the index of the 'symbol' which is 2
// How to create something like this:
type Index = GetIndex<Data, symbol>;
I want to know if there is a possibility to get the index of symbol type in the type 'Data'.
This solution returns the keys in string format (string "2" instead of number 2).
Given an array A and a value type T, we use a mapped type to check which keys of A have values that match T. If the type is correct, we return that key and otherwise we return never. That gives us a mapped tuple [never, never, "2"] representing matching and non-matching keys. We want just the values, not the tuple, so we add [number] at the end of our type which gives us the union of all elements in the tuple -- in this case it is just "2" as never is ignored here.
type GetIndex<A extends any[], T> = {
[K in keyof A]: A[K] extends T ? K : never;
}[number]
type Index = GetIndex<Data, symbol>; // Index is "2"
Playground Link

Apache Spark SQL: COALESCE NULL array into empty struct array

I have query
SELECT foo FROM bar;
Foo in this case is an array of structs that can be NULL. I would like to coalesce it into an empty array.
SELECT COALESCE(foo, array()) FROM bar;
When I do that I get an error that there is a mismatch.
cannot resolve 'coalesce(foo, array())' due to data type mismatch: input to function coalesce should all be the same type, ...
The struct has about 25 fields so if possible I don't want to manually define them unless there is no other way. The reason why I don't want NULL is because when I write it to JSON I want the field to be an empty array and with NULL it's missing completely.

How do I write queries with a null constraint in pg-promise properly?

When writing Postgres queries, constraints are usually written like
WHERE a = $(a) or WHERE b IN $(b:csv) if you know it's a list. However, if a value is null, the constraint would have to be written WHERE x IS NULL. Is it possible to get the query to auto-format if the value is null or not?
Say I might want to find rows WHERE c = 1. If I know c is 1, I write the query like
db.oneOrNone(`SELECT * FROM blah WHERE c = $(c), { c })
But if c turns out to be null, the query would have to become ...WHERE c IS NULL.
Would it be possible to construct a general query like WHERE $(c), and it would automatically format to WHERE c = 1 if c is 1, and WHERE c IS NULL if c is set to null?
You can use Custom Type Formatting to help with dynamic queries:
const valueOrNull = (col, value) => ({
rawType: true,
toPostgres: () => pgp.as.format(`$1:name ${value === null ? 'IS NULL' : '= $2'}`,
[col, value])
});
Then you can pass it in as a formatting value:
db.oneOrNone('SELECT * FROM blah WHERE $[cnd]', { cnd: valueOrNull('col', 123) })
UPDATE
Or you can use custom formatting just for the value itself:
const eqOrNull = value => ({
rawType: true,
toPostgres: () => pgp.as.format(`${value === null ? 'IS NULL' : '= $1'}`, value)
});
usage examples:
db.oneOrNone('SELECT * FROM blah WHERE $1:name $2', ['col', eqOrNull(123)])
//=> SELECT * FROM blah WHERE "col" = 123
db.oneOrNone('SELECT * FROM blah WHERE $1:name $2', ['col', eqOrNull(null)])
//=> SELECT * FROM blah WHERE "col" IS NULL
Note that for simplicity I didn't include check for undefined, but you most likely will do so, because undefined is internally formatted as null also.
A very useful alternative to modifying the query depending on whether the value is NULL is to use IS [NOT] DISTINCT FROM. From the reference:
For non-null inputs, IS DISTINCT FROM is the same as the <> operator. However, if both inputs are null it returns false, and if only one input is null it returns true. Similarly, IS NOT DISTINCT FROM is identical to = for non-null inputs, but it returns true when both inputs are null, and false when only one input is null. Thus, these predicates effectively act as though null were a normal data value, rather than “unknown”.
In short, instead of =, use IS NOT DISTINCT FROM, and instead of <>, use IS DISTINCT FROM.
This becomes especially useful when comparing two columns, either of which may be null.
Note that IS [NOT] DISTINCT FROM cannot use indexes, so certain queries may perform poorly.

Cassandra: aggregate function INITCOND with complex user defined type

What can be put as INITCOND for user defined aggregate functions in Cassandra? I have only seen examples with simple types (eg. tuple).
I have the following type for the state object in the aggregation function:
create type avg_type_1 (
accum tuple<text,int,double>, // source, count, sum
avg_map map<text,double> // source, average
);
When I omit INITCOND I get a Java NullPointerException.
The following works for the UDT in the question:
INITCOND ((null, 0, 0.0), null)
The accum field (tuple): the first element (source, String) is set to null, the second element (count, int) is set to 0 (zero), and the third and final element (sum, double) is set to 0.0 (zero).
The avg_map field (map): set to null (no map yet).
The fields can also be referred to by name, as the following (from describe ...) shows.
INITCOND {accum: (null, 0, 0.0), avg_map: null};
For named fields, curly brackets "{}" are used (as they are represented as a map).
Lastly, here is an example to also initialize the map.
INITCOND {accum: (null, 0, 0.0), avg_map: {'i1': 23.5, 'i2': 1.2}};

Resources