Suppose I have created a model, name = abc
as satellites I have : H_ABC, S_ABC, SB_ABC
in the H_ABC i have two natural keys key1, key2
and in SB_ABC i have 3 payload columns, 2 of them are coming from S_ABC
but one of them is coming from H_ABC
I have tried to use for instance:
{{
satellite(
source_model='H_ABC',
payload_columns=[
some columns
]
}}
{{
satellite(
source_model='S_ABC',
payload_columns=[
some columns
]
}}
S_ABC is fine, but I can't use the one column that I need in the first one ?
My question: Can i referance from two difference satellite in one satellite ?
Related
So Deduping is one of the basic and imp Datacleaning technique.
There are a number of ways to do that in dataflow.
Like myself doing deduping with help of aggregate transformation where i put key columns(Consider "Firstname" and "LastName" as cols) which are need to be unique in Group by and a column pattern like name != 'Firstname' && name!='LastName'
$$ _____first($$) in aggregate tab.
The problem with this method is ,if we have a total of 200 cols among 300 cols to be considered as Unique cols, Its a very tedious to do include 200 cols in my column Pattern.
Can anyone suggest a better and optimised Deduping process in Dataflow acc to the above situation?
I tried to repro the deduplication process using dataflow. Below is the approach.
List of columns that needs to be grouped by are given in dataflow parameters.
In this repro, three columns are given. This can be extended as per requirements.
Parameter Name: Par1
Type: String
Default value: 'col1,col2,col3'
Source is taken as in below image.
(Group By columns: col1, col2, col3;
Aggregate column: col4)
Then Aggregate transform is taken and in group by,
sha2(256,byNames(split($Par1,','))) is given in columns and it is named as groupbycolumn
In Aggregates, + Add column pattern near column1 and then delete Column1. Then Enter true() in matching condition. Then click on undefined column expression and enter $$ in column name expression and first($$) in value expression.
Output of aggregation function
Data is grouped by col1,col2 and col3 and first value of col4 is taken for every col1,col2 and col3 combination.
Then using select transformation, groupbycolumn from above output can be removed before copying to sink.
Reference: ** MS document** on Mapping data flow script - Azure Data Factory | Microsoft Learn
I have a column called TAG_
Data in the TAG_ column could like the below:
STV-123456
TV-12456
ME-666666
I want to create two computed columns
One that shows the first part of TAG_ before the hyphen
STV
TV
ME
One that shows the second part of TAG_ after the hyphen
123456
12456
666666
This shouldn't be hard but the light bulb is not on yet. Please help.
try this:
SELECT SUBSTRING(TAG_ ,0,CHARINDEX('-',TAG_ ,0)) AS before,
SUBSTRING(TAG_ ,CHARINDEX('-',TAG_ ,0)+1,LEN(TAG_ )) AS after from testtable
and the result:
Hope this helps!
Example for MySQL, syntax is likely different for other vendors:
create table t
( tag_ text not null
, fst text generated always as (substr(tag_, 1, locate('-', tag_)-1)) stored
, snd text generated always as (substr(tag_, locate('-', tag_)+1)) stored
);
Fiddle
I am looking for the best way to store and retrieve an array of data. The solution I am currently implementing uses a many to many relationship as follows.
venue_themes
user_id style environment
A1A2 formal indoor
A2B2 formal outdoor
theme_setting_to_setting_enum
id user_id setting_enum_id
1 A1A2 1
2 A1A2 3
3 A2B2 1
4 A2B2 2
setting_enum
id value
1 garden
2 beach
3 golf course
4 backyard
The query I currently have is:
SELECT vt.user_id, vt.style, vt.environment, se.value FROM venue_themes vt JOIN theme_settings_to_setting_enum ts ON vt.user_id = ts.user_id JOIN setting_enum se ON ts.setting_enum_id = se.id GROUP BY vt.user_id, ts.id, se.id;
This works but it returns multiple rows with the same data other than my setting enum values.
An example return is :
user_id style environment value
AAAA formal indoor beach
AAAA formal indoor backyard
AAAA formal indoor tent
This is fine but seems excessive if I have many values. What I really want my data to look like is:
user_id style environment value
AAAA formal indoor beach, backyard, tent
Ideally I would have my values returned in an array or something similar so I don't have to build a function to manipulate the returned data.
You can remove se.id from the GROUP BY clause, and use STRING_AGG() to generate the CSV string:
SELECT vt.user_id, vt.style, vt.environment, STRING_AGG(se.value, ', ') se_values
FROM venue_themes vt
JOIN theme_settings_to_setting_enum ts ON vt.user_id = ts.user_id
JOIN setting_enum se ON ts.setting_enum_id = se.id
GROUP BY vt.user_id;
Assuming that user_id is the primary key of venue_themes, it is sufficient to have just this column in the GROUP BY clause (other columns of the table are fonctionnally dependent on the primary key).
You can control the order in which values are aggregated in the string with an ORDER BY clause:
STRING_AGG(se.value, ', ' ORDER BY se.id) se_values
If you want an array instead of a CSV string, then use ARRAY_AGG():
ARRAY_AGG(se.value, ', ' ORDER BY se.id) se_values
As the title states, I am trying to do a merge of 2 tables. I want a nested joint where the values from the first table are always there and rows matching the second table are added to the first. I believe this is known as the nested join.
Unfortunately, it only allows for 1 key to 1 key matching where as I need it for 1 key in table 1 to 2 keys in table 2
Here is an example
Table1:
Group
..
..
Time
Date
Table2:
Group 1
Group 2
..
..
..
Other Info
What I want is where "Group = Group 1 OR Group = Group 2" and display the matching row from table 2 nested into Table 1
I looked at the following example but I must be confused by the syntax because it doesn't seem to be working for me.
How to join two tables in PowerQuery with one of many columns matching?
So after further investigation of the answer post I linked earlier, I will add an explanation of it here:
Table.AddColumn(Source, "Name_of_Column",
(Q1) => Table.SelectRows(Query2,
each Q1[Col_from_q1] = [Col_from_q2] or Q1[Col_from_q1] = [2_Col_from_q2]
)
)
So this did work for me and it adds an extra column that needs to be expanded to get all the values from the table. What i would add is that I don't know / haven't tested if there are multiple matches and how it treats it, based on nestedjoin, I would assume that it will duplicate rows in the first table.
I have one column in my table in Postgres let's say employeeId. We do some modification based on the employee type and store it in DB. Basically, we append strings from these 4 strings ('ACR','AC','DCR','DC'). Now we can have any combination of these 4 strings appended after employeeId. For example, EMPIDACRDC, EMPIDDCDCRAC etc. These are valid combinations. I need to retrieve EMPID from this. EMPID length is not fixed. The column is of varying length type. How can this be done in Postgres?
I am not entirely sure I understand the question, but regexp_replace() seems to do the trick:
with sample (employeeid) as (
values
('1ACR'),
('2ACRDCR'),
('100DCRAC')
)
select employeeid,
regexp_replace(employeeid, 'ACR|AC|DCR|DC.*$', '', 'gi') as clean_id
from sample
returns:
employeeid | clean_id
-----------+---------
1ACR | 1
2ACRDCR | 2
100DCRAC | 100
The regular expression says "any character after any of those string up to the end of the string" - and that is then replace with nothing. This however won't work if the actual empid contains any of those codes that are appended.
It would be much cleaner to store this information in two columns. One for the empid and one for those "codes"