Diesel: Adding result of subqueries - rust

Given the following tables:
accounts (
id INTEGER,
opening_balance INTEGER,
)
transactions (
debit INTEGER,
credit INTEGER,
amount INTEGER
foreign key debit references accounts (id),
foreign key credit references accounts (id)
)
I want to execute the following SQL query:
select
id,
opening_balance
+ (select sum(amount) from transactions where debit = account_id)
- (select sum(amount) from transactions where credit = account_id)
from accounts;
I tried something like this:
accounts
.select((
id,
opening_balance
+ transactions::table
.select(sum(transactions::amount))
.filter(transactions::debit.eq(id))
- transactions::table
.select(sum(transactions::amount))
.filter(transactions::credit.eq(id)),
))
While the individual parts of this query work fine, I cannot get this to compile.
the trait bound
`diesel::query_builder::SelectStatement<schema::transactions::table, diesel::query_builder::select_clause::SelectClause<aggregate_folding::sum::sum<diesel::sql_types::Integer, schema::transactions::columns::amount>>, diesel::query_builder::distinct_clause::NoDistinctClause, diesel::query_builder::where_clause::WhereClause<diesel::expression::operators::Eq<schema::transactions::columns::debit, schema::fiscal_year_accounts::columns::account_id>>>: diesel::Expression`
is not satisfied
required because of the requirements on the impl of `AsExpression<diesel::sql_types::Integer>` for
`diesel::query_builder::SelectStatement<schema::transactions::table, diesel::query_builder::select_clause::SelectClause<aggregate_folding::sum::sum<diesel::sql_types::Integer, schema::transactions::columns::amount>>, diesel::query_builder::distinct_clause::NoDistinctClause, diesel::query_builder::where_clause::WhereClause<diesel::expression::operators::Eq<schema::transactions::columns::debit, schema::fiscal_year_accounts::columns::account_id>>>`
The + and - operators work with static values, but how can I get them to work with subqueries?

First of all: Always provide a complete minimal example of your problem. This includes the exact version of all used crates, all relevant imports to make your code actually producing this error message, the complete error message with all help and notice statements and in diesels case the generated schema file.
To answer your question: You miss two call to .single_value() which is required to convert a query into a subquery that could be used as expression. Both subqueries return a Nullable<BigInt> therefore it is required that opening_balance as a matching type.
For the sake of completeness see the working code below
#[macro_use]
extern crate diesel;
use diesel::prelude::*;
table! {
accounts {
id -> Integer,
// changed to `BigInt` as a sum of `Integer` returns a `BigInt`
opening_balance -> BigInt,
}
}
table! {
transactions {
id -> Integer,
amount -> Integer,
debit -> Integer,
credit -> Integer,
}
}
allow_tables_to_appear_in_same_query!(accounts, transactions);
fn test() {
use self::accounts::dsl::*;
use diesel::dsl::sum;
let _q = accounts.select((
id,
opening_balance.nullable() // call `.nullable()` here to explicitly mark it as potential nullable
+ transactions::table
.select(sum(transactions::amount))
.filter(transactions::debit.eq(id))
.single_value() // call `.single_value()` here to transform this query into a subquery
- transactions::table
.select(sum(transactions::amount))
.filter(transactions::credit.eq(id))
.single_value(), // call `.single_value()` here to transform this query into a subquery
));
}

Related

Count and return value if match

I am just starting out with Rust and NEAR and trying to create a simple function that counts how many NFTs have been minted with a particular substring.
My NFTs have a token_id which contains a randomstring-tier1 or randomstring-tier2 and rather than returning the total amount minted. i want o know for each tier.
I have this very basic function that returns the total count.
pub fn nft_total_supply(&self) -> U128 {
//return the length of the token metadata by ID
U128(self.token_metadata_by_id.len() as u128)
}
But not got a good enough understanding of how I check the token_id for a particular sub-string.
Was trying
pub fn check_nft_minted_by_tier1(
&self,
token_id
) -> u128 {
if token_id.contains("tier1").count() {
U128(tier1.len() as u128)
}
}
But this doesn't work.
My personal recommendation is to have this information stored in state that way you can quickly access it with little computation and it's also scalable. If the list of NFTs grows very high, you might not have enough GAS to loop through each and check the token ID for the substring.
I would store something in state that either keeps track of the number of NFTs with each tier, or keeps track of the token IDs for each tier. That way you can expand later and maybe get the metadata as well rather than just the total number of NFTs.
This set of token IDs can be populated upon minting an NFT. At the end of the mint function, you can add a quick check to see if the token ID contains the sub-string and then add it to the list if it does. You can reference this post to see how you can check for a sub-string.

Big query update struct to null in repeated field

In Google bigquery, I'm trying to do an update on a repeated field.
For comparison, this works (or at least is flagged as valid), but of course isn't actually updating the field.
UPDATE my.table t
SET my_field = ARRAY(
SELECT AS STRUCT g.foo, g.bar, g.struct_to_set_null
FROM unnest(t.groups) as g
), ... FROM ... etc
Setting struct_to_set_null to null gives an error:
UPDATE my.table t
SET my_field = ARRAY(
SELECT AS STRUCT g.foo, g.bar, null as struct_to_set_null
FROM unnest(t.groups) as g
), ... FROM ... etc
Value of type ARRAY<STRUCT<... (really long and cut off) cannot be assigned to groups, which has type <ARRAY,STRUCT<... (same, really long, cut off)
I can see that the field in question is of type RECORD and NULLABLE, so I would think setting it to null is allowed. Is there a trick to getting this to work?
The problem is that BigQuery isn't inferring the type of the struct field just from the NULL literal; you need to be a bit more explicit. Here is an example:
CREATE TABLE tmp_elliottb.UpdateTable (
my_field ARRAY<STRUCT<foo INT64, bar INT64, struct_to_set_null STRUCT<x STRING, y BOOL, z INT64>>>
);
UPDATE tmp_elliottb.UpdateTable
SET my_field = ARRAY(
SELECT AS STRUCT foo, bar, NULL AS struct_to_set_null FROM UNNEST(my_field)
)
WHERE true;
This gives me:
Value of type ARRAY<STRUCT<foo INT64, bar INT64, struct_to_set_null INT64>> cannot be assigned to my_field, which has type ARRAY<STRUCT<foo INT64, bar INT64, struct_to_set_null STRUCT<x STRING, y BOOL, z INT64>>> at [4:16]
What I can do instead is to use an IF expression that produces NULL, but has struct_to_set_null on one of the branches in order to force the output type that I want:
UPDATE tmp_elliottb.UpdateTable
SET my_field = ARRAY(
SELECT AS STRUCT
foo, bar,
IF(false, struct_to_set_null, NULL) AS struct_to_set_null
FROM UNNEST(my_field)
)
WHERE true;
Or, alternatively, I can use SELECT * REPLACE:
UPDATE tmp_elliottb.UpdateTable
SET my_field = ARRAY(
SELECT AS STRUCT * REPLACE (IF(false, struct_to_set_null, NULL) AS struct_to_set_null )
FROM UNNEST(my_field)
)
WHERE true;
Repeated is an Array type, so it cannot be set to NULL.
Currently, BigQuery has two following limitations with respect to NULLs and ARRAYs:
BigQuery raises an error if query result has ARRAYs which contain NULL elements, although such ARRAYs can be used inside the query.
BigQuery translates NULL ARRAY into empty ARRAY in the query result, although inside the query NULL and empty ARRAYs are two distinct values.

How to get X% percentile in Cassandra

Consider a table with structure:
CREATE TABLE statistics (name text, when timestamp, value int,
PRIMARY KEY ((name, when)));
What is the best way to calculate, for example, 50% value percentile by name?
I thought about:
a) writing custom aggregate function + query like:
SELECT PERCENTILE(value, 0.5) FROM statistics WHERE name = '...'
b) count elements by name first
SELECT COUNT(value) FROM statistics WHERE name = '...'
then find (0.5/count)th row value with paging when it is sorted by value ascending. Say, if count is 100 it will be 50th row.
c) your ideas
I'm not sure if case A can handle the task. Case B might be tricky when there is odd number of rows.
As long as you always provide name - this request can be very expensive without specifying partition and having everything within one. I am assuming you mean ((name), when) not ((name, when)) in your table, otherwise what your asking is impossible without full table scans (using hadoop or spark).
The UDA would work - but it can be expensive unless your willing to accept an approximation. To have it perfectly accurate you need to do 2 pass (ie doing a count, than a 2nd pass to go X into set, but since no isolation this isnt gonna be perfect either). So if you need it perfectly accurate your best bet is probably to just pull entire statistics[name] partition locally or to have UDA build up entire set (or majority) in a map (not recommended if partitions get large at all) before calculating. ie:
CREATE OR REPLACE FUNCTION all(state tuple<double, map<int, int>>, val int, percentile double)
CALLED ON NULL INPUT RETURNS tuple<double, map<int, int>> LANGUAGE java AS '
java.util.Map<Integer, Integer> m = state.getMap(1, Integer.class, Integer.class);
m.put(m.size(), val);
state.setMap(1, m);
state.setDouble(0, percentile);
return state;';
CREATE OR REPLACE FUNCTION calcAllPercentile (state tuple<double, map<int, int>>)
CALLED ON NULL INPUT RETURNS int LANGUAGE java AS
'java.util.Map<Integer, Integer> m = state.getMap(1, Integer.class, Integer.class);
int offset = (int) (m.size() * state.getDouble(0));
return m.get(offset);';
CREATE AGGREGATE IF NOT EXISTS percentile (int , double)
SFUNC all STYPE tuple<double, map<int, int>>
FINALFUNC calcAllPercentile
INITCOND (0.0, {});
If willing to accept an approximation you can use a sampling reservoir, say 1024 elements you store and as your UDA gets elements you replace elements in it at a decreasingly statistical chance. (vitter's algorithm R) This is pretty easy to implement, and IF your data set is expected to have a normal distribution will give you a decent approximation. If your data set is not a normal distribution this can be pretty far off. With a normal distribution theres actually a lot of other options as well but R is I think easiest to implement in a UDA. like:
CREATE OR REPLACE FUNCTION reservoir (state tuple<int, double, map<int, int>>, val int, percentile double)
CALLED ON NULL INPUT RETURNS tuple<int, double, map<int, int>> LANGUAGE java AS '
java.util.Map<Integer, Integer> m = state.getMap(2, Integer.class, Integer.class);
int current = state.getInt(0) + 1;
if (current < 1024) {
// fill the reservoir
m.put(current, val);
} else {
// replace elements with gradually decreasing probability
int replace = (int) (java.lang.Math.random() * (current + 1));
if (replace <= 1024) {
m.put(replace, val);
}
}
state.setMap(2, m);
state.setDouble(1, percentile);
state.setInt(0, current);
return state;';
CREATE OR REPLACE FUNCTION calcApproxPercentile (state tuple<int, double, map<int, int>>)
CALLED ON NULL INPUT RETURNS int LANGUAGE java AS
'java.util.Map<Integer, Integer> m = state.getMap(2, Integer.class, Integer.class);
int offset = (int) (java.lang.Math.min(state.getInt(0), 1024) * state.getDouble(1));
if(m.get(offset) != null)
return m.get(offset);
else
return 0;';
CREATE AGGREGATE IF NOT EXISTS percentile_approx (int , double)
SFUNC reservoir STYPE tuple<int, double, map<int, int>>
FINALFUNC calcApproxPercentile
INITCOND (0, 0.0, {});
In above, the percentile function will get slower sooner, playing with size of sampler can give you more or less accuracy but too large and you start to impact performance. Generally a UDA over more than 10k values (even simple functions like count) starts to fail. Important to recognize in these scenarios too that while the single query returns a single value, theres a ton of work to get it. So a lot of these queries or much concurrency will put a lot of pressure on your coordinators. This does require >3.8 (I would recommend 3.11.latest+) for CASSANDRA-10783
note: I make no promises that I havent missed an off by 1 error in example UDAs - I did not test fully, but should be close enough you can make it work from there

How do I get an Option<T> instead of an Option<Vec<T>> from a Diesel query which only returns 1 or 0 records?

I'm querying for existing records in a table called messages; this query is then used as part of a 'find or create' function:
fn find_msg_by_uuid<'a>(conn: &PgConnection, msg_uuid: &Uuid) -> Option<Vec<Message>> {
use schema::messages::dsl::*;
use diesel::OptionalExtension;
messages.filter(uuid.eq(msg_uuid))
.limit(1)
.load::<Message>(conn)
.optional().unwrap()
}
I've made this optional as both finding a record and finding none are both valid outcomes in this scenario, so as a result this query might return a Vec with one Message or an empty Vec, so I always end up checking if the Vec is empty or not using code like this:
let extant_messages = find_msg_by_uuid(conn, message_uuid);
if !extant_messages.unwrap().is_empty() { ... }
and then if it isnt empty taking the first Message in the Vec as my found message using code like
let found_message = find_msg_by_uuid(conn, message_uuid).unwrap()[0];
I always take the first element in the Vec since the records are unique so the query will only ever return 1 or 0 records.
This feels kind of messy to me and seems to take too many steps, I feel as if there is a record for the query then it should return Option<Message> not Option<Vec<Message>> or None if there is no record matching the query.
As mentioned in the comments, use first:
Attempts to load a single record. Returns Ok(record) if found, and Err(NotFound) if no results are returned. If the query truly is optional, you can call .optional() on the result of this to get a Result<Option<U>>.
fn find_msg_by_uuid<'a>(conn: &PgConnection, msg_uuid: &Uuid) -> Option<Message> {
use schema::messages::dsl::*;
use diesel::OptionalExtension;
messages
.filter(uuid.eq(msg_uuid))
.first(conn)
.optional()
.unwrap()
}

Relational override on 'objects'?

I have a signature
sig Test {
a: Int,
b: Int,
c: Int
}
If I have two instances (atoms?) of this ( x,y:Test )
can I define a relation between these where only some parameters has changed without having to list all the other parameters as equal?
I want to avoid having to list all unchanged fields
as this can be error-prone assuming I have many fields.
Currently I am using x.(a+b+c) = y.(a+next[b]+c) but would like to use something like x = y ++ (b->next[y.b])
from what I understand about Alloy I think the answer is No: you cannot talk about all relations where some atom is involved in without explicitly naming these relations. But some experts may correct me if I'm wrong.

Resources