I m trying to get the output value of lookup activity which runs the stored procedure which insert the value from lookup activity into database table, after executing I see only first char of the output value inserted in the table.
I checked the output of the lookup activity and input of the stored procedure activity it has full values, but only during the insert its insert only the first char of the values.
Related
I am currently using the Azure Data Factory to retrieve a fixed-length file from blob storage and trying to import the record into my database.
Fixed-length.txt
0202212161707
1Tom
1Kelvin
1Michael
23
The first row is the header record, which is start with '0' and comes up with the creation time.
The following row are the detail record, started with '1' and comes up with user name.
The last row is the end record, which started with '2' and comes up with the sum of the detail record.
However, I want to validate that the data of the file is correct before I insert those records. I would like to check if the checksum is correct first, and then only insert all those record started with 1.
Currently, I insert all those record line by line into SQL DB and run a stored procedures to perform the tasks. Is it possible to utlize Azure Data Factory to do it?? Thank you.
I reproduced your issue follow below steps.
First take one look up activity to view all the data from file and apply filter on that data.
Then take one set variable activity and get the last row's last element e.g. 23 as 3 with below dynamic expression.
#last(activity('Lookup1').output.value[sub(length(activity('Lookup1').output.value),1)].Prop_0)
Then take one Filter activity to filter rows with 1 prefix with below items value and condition
items : #activity('Lookup1').output.value
condition : #startswith(item().Prop_0,'1')
after filter take ForEach activity to Append those values in an array
Then inside for each activity take Append variable activity it will create an array with filtered values.
Now take If condition with expression which checking value of set variable 1 and length of Append array variable is same or not.
#equals(int(variables('sum')),length(variables('username')))
Then inside true condition, add your copy activity to copy data if condition is true
My Sample Output:
0202212161707
1Tom
1Kelvin
23
for above data control is going to false condition.
0202212161707
1Tom
1Kelvin
1Michael
23
for above data control is going to true condition.
I only need to take the time part from the 'Timestamp type source attribute' and load it into a dedicated SQL pool table (Time datatype column). But I don't find a time function within the expression builder in ADF, is there a way I can do it?
-What did I do?
-I took the time part from the source attribute using substring and then tried to load the same into the destination table, when I do the destination table inserted null values as the column at the destination table is set to time datatype.
I tried to reproduce this and got the same issue. The following is a demonstration of the same. I have a table called mydemo as shown below.
CREATE TABLE [dbo].[mydemo]
(
id int NOT NULL,
my_date date,
my_time time
)
WITH
(
DISTRIBUTION = HASH (id),
CLUSTERED COLUMNSTORE INDEX
)
GO
The following is my source data in my dataflow.
time is not a recognized datatype in azure dataflow (date and timestamp are accepted). Therefore, dataflow fails to convert string (substring(<timestamp_col>,12,5)) into time type.
For better understanding, you can load your sink table as source in dataflow. The time column will be read as 1900-01-01 12:34:56 when time value in the table row is 12:34:56.
#my table row
insert into mydemo values(200,'2022-08-18','12:34:56')
So, instead of using substring(<timestamp_col>,12,5) to return 00:01, use concat('1900-01-01 ',substring(<timestamp_col>,12,8)) which returns 1900-01-01 00:01:00.
Configure the sink, mapping and look at the resulting data in data preview. Now, azure dataflow will be able to successfully insert the values and give desired results.
The following is the output after successful insertion of record into dedicated pool table.
NOTE: You can construct valid yyyy-MM-dd hh:mm:ss as a value using concat('yyyy-MM-dd ',substring(<timestamp_col>,12,8)) in place of 1900-01-01 hh:mm:ss in derived column transformation.
We are trying to copy the parquet file from blob to Postgres table. Now the problem is my source parquet has some columns with number arrays which ADF is complaining to be not supported, if I change that to string datatype my Postgres say that it is expecting Number Array
Is there some solution or workaround to tackle this?
The workaround for the problem would be to change the type of those columns from array type to string in your Postgres table. This can be done using the following code:
ALTER TABLE <table_name> ALTER COLUMN <column_name> TYPE text;
I have taken a sample table player consisting of 2 array columns position (integer array) and role (text array).
After changing the type of these columns, the table looks like this.
ALTER TABLE player1 ALTER COLUMN position TYPE varchar(40);
ALTER TABLE player1 ALTER COLUMN role TYPE varchar(40);
You can now complete the copy activity in ADF without getting any errors.
If there are any existing records, the specific array type values will be converted to string type, and it also helps you complete the copy activity without any errors. The following is an example of this case.
Initial table data (array type columns): https://i.stack.imgur.com/O6ErV.png
Convert to String type: https://i.stack.imgur.com/Xy69B.png
After using ADF copy activity: https://i.stack.imgur.com/U8pFg.png
NOTE:
Considering you have changed the array column to string type in the source file, if you can make changes such that the list of values are enclosed within {} rather than [], then you can convert the column type back to array type using ALTER query.
If list of elements are enclosed within [] and you try to convert the columns back to array type in your table, it throws the following error.
ERROR: malformed array literal: "[1,1,0]"
DETAIL: Missing "]" after array dimensions.
I wanted to implement SCD type 2 logic but using dynamic tables and dynamic key fields from Config Table, I have a challenge to pass the Data Flow Parameter as Sink Key Column for my Alter Row activity, it is not taking the parameter values and always gives the error as invalid key column name, I tried picking the Dataflow parameter for the expression builder at sink key column and trying to pass the value from alter row transformation and I have named the field with parameter in the select statement as well , any help or suggestion highly appreciated
Please clink below image
Sample How I wanted to Pass Dynamic Values in Sink Mapping
Trying to Give the Dynamic Value to Key Value
You have "List of columns" selected, so ADF is looking for a column in your target table that is literally called "$TargetPK1Parameter".
Change the selector to "Custom expression" and enter a string array parameter. The parameter can be an array of strings that represent names of key columns in your target table.
It should look something like this:
I encountered a similar problem when trying to pass a composite key, parameterized, as part of the update method to sink. This now allows me to fully parameterise my dataflow and it handles both composite keys and single columns keys.
Here's how the data looks in my config table:
UpsertKeyColumn = DOMNAME,DDLANGUAGE,AS4LOCAL,VALPOS,AS4VERS
A parameter value is set in the dataflow
Upsert_Key_Column = #item().UpsertKeyColumn
Finally, in the Sink settings, Custom Expression is selected for Key columns and the following expression is entered - split($upsert_key_column,',')
Hello I am getting rows like below.
ID, NAME,EMAIL,PHONENUMBER
123,ABC, qwe#poi.com|asd#lkj.com, 3636|7363
234,DEF,sjs#djd.com|sndir#fmei.com|cmrjje#fmcj.com,5845|4958|5959
The each person can have multiple emails and phone numbers, separated by |. First email and first phone are linked. Second email and second phone are linked. So they need to be in same records. Can I split this record to multiple rows with one email and one phone per record?
We need to use data flow to achieve that. I created a test, the overall architecture and debug result is as follows:
My source dataset is a text file in Azure data lake gen2.Source1 and Source2 use this same data source.
At DerivedColumn1 activity, we can select the EMAIL column and enter expression split(EMAIL,'|') to split this column to an Array.
At Flatten1 activity, select EMAIL[] as Unroll by and Unroll root.
At SurrogateKey1 activity, enter ROW_NO and start value 1.
The data preview is as follows:
Source2 is the same as Source1, so we jump to DerivedColumn2 activity, we can select the PHONENUMBER column and enter expression split(PHONENUMBER,'|') to split this column to an Array.
At Flatten2 activity, select PHONENUMBER[] as Unroll by and Unroll root.
At SurrogateKey2 activity, enter ROW_NO and start value 1. The data preview is as follows:
At Join1 activity, we can Inner join these two data flows with the key column ROW_NO.
The data preview is as follows:
At Select1 activity, we can select the columns what we need.
The data preview is as follows:
Then we can sink the result to our destination.
That's all.