Promql join on different label names - promql

I need join to promql responses on label names.
I see good article - https://www.robustperception.io/left-joins-in-promql
And I see that on (foo) clause does mean a.foo = b.foo
Is it possible implement in promql a.foo = b.bar join?

I've implemented by using label_replace for renaming label.

Related

JOOQ: multisetAgg or toSet filtering out NULL

Quite often, the new feature of multisetAgg is used along with LEFT JOINs.
Let's say, I have a user as dimension table and fact table paid_subscriptions. I want to query a specific user with all of his paid subscriptions and for each subscription do some processing (like sending an email or whatever).
I would write some JOOQ like this:
ctx
.select(row(
USER.ID,
USER.USERNAME,
multisetAgg(PAIDSUBSCRIPTIONS.SUBNAME).as("subscr").convertFrom(r -> r.intoSet(Record1::value1))
).mapping(MyUserWithSubscriptionPOJO::new)
)
.from(USER)
.leftJoin(PAIDSUBSCRIPTIONS).onKey()
.where(someCondition)
.groupBy(USER)
.fetch(Record1::value1));
The problem here is: the multisetAgg produces a Set which can contain null as element.
I either heve to filter out the null subscriptions I don't care about after JOOQ select, or I have to rewrite my query with something like this:
multisetAgg(PAIDSUBSCRIPTIONS.SUBNAME).as("subscr").convertFrom(r -> {
final Set<String> res = r.intoSet(Record1::value1);
res.remove(null); // remove possible nulls
return res;
})
Both don't look too nice in code.
I wonder if there is a better approach to write this with less code or even an automatic filtering of null values or some other kind of syntactic sugar avilable in JOOQ? After all, I think it is quite a common usecase especially considering that often enough, I end up with some java8 style stream processing of my left joined collection and first step is to filter out null which is something I forget often :)
You're asking for a few things here:
SET instead of MULTISET (will be addressed with #12033)
Adding NULL filtering (is already possible with FILTER)
The implied idea that such NULL values could be removed automatically (might be addressed with #13776)
SET instead of MULTISET
The SQL standard has some notions of a SET as opposed to MULTISET or ARRAY. For example:
#13795
It isn't as powerful as MULTISET, and it doesn't have to be, because usually, just by adding DISTINCT you can turn any MULTISET into a SET. Nevertheless, Informix (possibly the most powerful ORDBMS) does have SET data types and constructors:
LIST (ARRAY)
MULTISET
SET
So, we might add support for this in the future, perhaps. I'm not sure yet of its utility, as opposed to using DISTINCT with MULTISET (already possible) or MULTISET_AGG (possible soon):
#12033
Adding NULL filtering
You already have the FILTER clause to do this directly in SQL. It's a SQL standard and supported by jOOQ natively, or via CASE emulations. A native SQL example, as supported by e.g. PostgreSQL:
select
t.a,
json_agg(u.c),
json_agg(u.c) filter (where u.b is not null)
from (values (1), (2)) t (a)
left join (values (2, 'a'),(2, 'b'),(3, 'c'),(3, 'd')) u (b, c) on t.a = u.b
group by t.a
Producing:
|a |json_agg |json_agg |
|---|----------|----------|
|1 |[null] | |
|2 |["a", "b"]|["a", "b"]|
So, just write:
multisetAgg(PAIDSUBSCRIPTIONS.SUBNAME).filter(PAIDSUBSCRIPTIONS.ID.isNotNull())
The implied idea that such NULL values could be removed automatically
Note, I understand that you'd probably like this to be done automatically. There's a thorough discussion on that subject here: #13776. As always, it's a desirable thing that is far from easy to implement consistently.
I'm positive that this will be done eventually, but it's a very big change.

SQLite3 Simulate RIGHT OUTER JOIN with LEFT OUTER JOIN's without being able to change table order

I am new to SQL and have recently started implementing joins into my code, the data I wish to retrieve can be done with the following SQL statement. However, as you know SQLite3 does not support RIGHT OUTER and FULL OUTER JOINs.
Therefore I would like to re-write this statement using LEFT OUTER JOINs as only these are supported, any help would be appreciated.
Before you go ahead and mark this question as duplicate, I have looked at answers to other similar questions but none have explained the general rules when it comes to rearranging queries to use LEFT JOINs only.
I also think this particular example is slightly different in the sense that the table (periods) cannot be joined with either of the tables (teacher_subjects, classroom_subjects) without first joining the (class_placement) table.
FROM P
LEFT JOIN CP
ON P.PID = CP.PID
RIGHT JOIN CS
ON CP.CID = CS.CID
RIGHT JOIN TS
ON CP.TID = TS.TID
WHERE (CP.CID IS NULL
AND CP.TID IS NULL)
ORDER BY P.PID;
Unsurprisingly, the error I get from running this query is:
sqlite3.OperationalError: RIGHT and FULL OUTER JOINs are not currently supported
Sorry in advance if I am being really stupid but if you require any extra information please ask. Many Thanks.
Ignoring column order, x right join y on c is y left join x on c. This is commonly explicitly said. (But you can also just apply the definitions of the joins to your original expression to get subexpressions with the values you want.) You can read the from grammar to see how you can parenthesize or subquery for precedence. Applying the identity we get ts left join (cs left join (p left join cp on x) on y) on z.
Similarly, ignoring column order, x full join y on c is y full join x on c. Expressing full join in terms of left join & right join is a frequently asked duplicate.

Spark Dataframe / SQL - Complex enriching nested data

Context
I have an example of event source data in a dataframe input as shown below.
SOURCE
where eventOccurredTime is a String type. This is from the source and I want to retain this in its original string form (with nano sec)
And I want to use the string to enrich some extra date/time typed data for downstream usage. below is an example
TARGET
Now as a one off I can execute some spark sql on the dataframe as shown below to get the result I want:
import org.apache.spark.sql.DataFrame
def transformDF(): DataFrame = {
spark.sql(
s"""
SELECT
id,
struct(
event.eventCategory,
event.eventName,
event.eventOccurredTime,
struct (
CAST(date_format(event.eventOccurredTime,"yyyy-MM-dd'T'HH:mm:ss.SSS") AS TIMESTAMP) AS eventOccurredTimestampUTC,
CAST(date_format(event.eventOccurredTime,"yyyy-MM-dd'T'HH:mm:ss.SSS") AS DATE) AS eventOccurredDateUTC,
unix_timestamp(substring(event.eventOccurredTime,1,23),"yyyy-MM-dd'T'HH:mm:ss.SSS") * 1000 AS eventOccurredTimestampMillis,
datesDim.dateSeq AS eventOccurredDateDimSeq
) AS eventOccurredTimeDim,
NOTE: This is a snippet, for the full event, I have to do this explicitly in this long SQL 20 times for the 20 string dates
Some things to point out:
unix_timestamp(substring(event.eventOccurredTime,1,23)
Above I found I had to substring a date that had nano precision or would return null, hence the substring
xDim.xTimestampUTC
xDim.xDateUTC
xDim.xTimestampMillis
xDim.xDateDimSeq
above is the pattern / naming convention for the 4 nested xDim struct fields to derive and they are present in the predefined spark schema the json is read using to create the source dataframe.
datesDim.dateSeq AS eventOccurredDateDimSeq
To get the above 'eventOccurredDateDimSeq' field, I need to join to a dates dimensions table 'datesDim' (static with an hourly grain), where dateSeq is the 'key' where this date falls into an hourly bucket where datesDim.UTC is defined to the hour
LEFT OUTER JOIN datesDim ON
CAST(date_format(event.eventOccurredTime,"yyyy-MM-dd'T'HH:00:00") AS TIMESTAMP) = datesDim.UTC
The table is globally available in the spark cluster so should be quick to look up, but I need to do this for every date enrichment in the payloads and they will have different dates.
dateDimensionDF.write.mode("overwrite").saveAsTable("datesDim")
The general schema pattern is that if there is a string date whose field name is:
x
..there is a 'xDim' struct equiv that immediately follows it in schema order below as described.
xDim.xTimestampUTC
xDim.xDateUTC
xDim.xTimestampMillis
xDim.xDateDimSeq
As mentioned with the snippet, although in the image above I am only showing 'eventOccuredTime' in above, there are more of these through the schema, at lower levels too, that need the same transformation pattern applied.
Problem:
So I have the spark sql (the full monty the snippet came from) to do this one off for 1 event type and its a large, explicit SQL statement that applies the time functions and joins I showed), but here is my problem I need help with.
So I want to try and create a more generic, functionally orientated reusable solution, that traverses a nested dataframe and applies this transformation pattern as described above 'where it needs to'
How do define 'where it needs to'?
Perhaps the naming convention is a good start - traverse the DF, look for any struct fields that have the xDim ('Dim' suffix) pattern, and use the 'x' field presceding as the input, and populate the xDim.* values in line with the naming pattern as described?
How in a function to best join on the datesDim registered table (its static remember) so it performs?
Solution?
Think one or more UDF is needed (we use Scala), maybe by itself or as a fragment within SQL, but not sure. Ensuring the DatesDim lookup performs is key I think.
Or maybe there is another way?
Note: I am working with Dataframes / SparkSQL not Datasets, options for each welcomed though?
Databricks
NOTE: Im actually using the databricks platform for this, so for those verse in SQL 'Higher order functions' in Dbricks
https://docs.databricks.com/spark/latest/spark-sql/higher-order-functions-lambda-functions.html
....is there a slick option here using 'TRANSFORM' as a SQL HOF (might need to register a utility UDF and use this with transform perhaps)?
Awesome, thanks spark community for your help!!! Sorry this is a long post setting the scene.

how to avoid redundant association with composite relationship

I have a composite relationship between two objects (A & B) (A is composed of many Bs). Now another class (C) has a one-to-many association relationship to class 'B'. I would like to be able to retrieve all instances of class (A) from class (C).
How do I do this without creating redundant associations? Since 'C' has basically a list of 'Bs' I can't just iterate over all of them asking what's your 'A' and eventually returning a list of 'A' to 'C'.
I really hope someone out there understands this and doesn't find it completely confusing!
Thanks
Update:
Dataset has a list of defined variables. An activity can select a subset of variables from each dataset and give some attributes to them, hence an association class is used. Now if I want to be able to retrieve from an Activity instance the datasets it is registered with, how do I achieve this in UML and in object implementation?
According to your task, it is IMPOSSIBLE to take all B's from all C's. Because there is no sentence that states that any B belong to some C.
On the contrary, as A have compositions of B (notice, A IS NOT composition, A HAS composition of B, for A can have heaps of other things, too), and any B MUST belong to some A object, you can easily get all B's from all A's. Only create the list of B as a set for not to have multiply values.
But even if the association B-A includes B->A connection, you cannot get all A's from B's. Because some A's can be EMPTY. You'll never reach them. from B's.
So, you cannot take all A from C's for TWO important reasons. And NO redundant association will help.
As for the question set after "Update",
For getting All from variables, use
Dataset <---- Variable ---> Activity // This variant is the easiest for adding associations.
For getting connected datasets from an activity,
Dataset <--- Variable <----- Activity
But please, notice, it is not updated, it is DIFFERENT question.
I assume your diagram would look something like this :
If C has a reference to B, and B has a reference to A, then it should be no problem navigating to A from C. There is no need for any additional redundant relationships.

How do I structure a SELECT query for the following

Hoping that someone here will be able to provide some mysql advice...
I am working on a categorical searchtag system. I have tables like the following:
EXERCISES
exerciseID
exerciseTitle
SEARCHTAGS
searchtagID
parentID ( -> searchtagID)
searchtag
EXERCISESEARCHTAGS
exerciseID (Foreign key -> EXERCISES)
searchtagID (Foreign key -> SEARCHTAGS)
Searchtags can be arranged in an arbitrarily deep tree. So for example I might have a tree of searchtags that looks like this...
Body Parts
Head
Neck
Arm
Shoulder
Elbow
Leg
Hip
Knee
Muscles
Pecs
Biceps
Triceps
Now...
I want to select all of the searchtags in ONE branch of the tree that reference at least ONE record in the subset of records referenced by a SINGLE searchtag in a DIFFERENT branch of the tree.
For example, let's say the searchtag "Arm" points to a subset of exercises. If any of the exercises in that subset are also referenced by searchtags from the "Muscles" branch of SEARCHTAGS, I would like to select for them. So my query could potentially return "Biceps," "Triceps".
Two questions:
1) What would the SELECT query for something like this look like? (If such a thing is even possible without creating a lot of slow down. I'm not sure where to start...)
2) Is there anything I should do to tweak my datastructure to ensure this query will continue to run fast - even as the tables get big?
Thanks in advance for your help, it's much appreciated.
An idea: consider using a cache table that saves all ancestor relationships in your searchtags:
CREATE TABLE SEARCHTAGRELATIONS (
parentID INT,
descendantID INT
);
Also include the tag itself as parent and descendant (so, for searchtag with id 1, the relations table includes a row with (1,1).
That way, you get rid of the parent/descendant relationships and can join a flat table. Assuming "Muscles" has the ID 5,
SELECT descendantID FROM SEARCHTAGRELATIONS WHERE parentID=5
returns all searchtags contained in muscles.
Alternatively, use modified preorder tree traversal, also known as the nested set model. It requires two fields (left and right) instead of one (parent id), and makes certain operations harder, but makes selecting whole branches much easier.

Resources