Highlight the updated values when file data or version gets updated in Azure Data Factory - azure

I have created a dataflow in Azure Data Factory to find the differences between the two files, both the files have the same columns and structure. I would like to find/highlight the changes to each value rather than get the updated rows as the output which I am currently getting.
Example:
Current output -
No.
Name
Email
1
Jack
jack#email.com
Desired output -
No.
Name
Email
Jack
I wish to either get the unchanged data in the row as NA or a blank or the changed data as highlighted in some way.
Dataflow -
Thank you for the assistance

I think you should be use the CRC function ( something like crc32(columns()) and get the hash of each row for both the files and the join on the hash1=hash2 and it should give you all the rows which has not been updated and have an exact match .
For the row with some changes , you can use also you can use join hash1!=hash2 clause bnut you will need one unique identfier .

Related

I want to compare the row count of two different Excel files in SSIS get an email alert

I am looking for a way to compare the row count of two Excel files in SSIS, and if the row count of one of the files is >= the row count of the second, I would like to receive an email informing me of this. Is this something I can do in Visual Studio, and if so, how?
I'd structure it like this
I have 4 SSIS variables defined. Two of them will be used in the data flows to capture the amount of rows generated from the sources.
The other two have Expressions applied to them to calculate values.
#[User::RowCountFile1] > #[User::RowCountFile2]
That generates a true/false value that I will use in Send Email to determine whether there is any work (email) to be done.
Since I'm lazy, I also used an Expression to generate the body of the email
"The value of File1 is " + (DT_WSTR,20) #[User::RowCountFile1] + " and File2 is " + (DT_WSTR,20) #[User::RowCountFile2]
Both data flow tasks look like this
The final configuration is to add an Expression to the Send Email task and change the Disable property to be driven by the our #[User::IsFile1BiggerThan2] variable.
first solution is : read excel file and load to data table ,then run query for compare two data table ,then send email .
second solution is : when you read file by query select row count in bind in value1 and value2 then compare.

Relabelling large amounts of data in Excel

In a CSV file: I want to relabel 433,000+ rows of IDs that look like "e904ab64a642efcd25f4a43cb729701646d4bf7a4ed0bacbae9d85127978606a" into simpler ID codes. For each of these unique IDs there are 4-5 rows of data. I really don't want to "find and replace" each of them because there are over 2000+ unique IDs. Is there any function in excel that can help me do that? Otherwise, any recommendations of what programs I can use?
If the IDs are always on consecutive lines, you can
Store the ID before replacement
replace it with your simpler ID (also store it)
go to next line
check if the ID is the same as stored on the previous line
If yes, use the same replacement ID as on previous line
If no, do same 1)
If you are happy doing this manually (since your tags do not currently include vba) then here is a simple approach:
Create a Unique List of IDs, for example by creating a 1-column
PivotTable
Next to each Unique ID, put your simplified ID (however you are creating that - is there an algorithm, or could it just be =Row()?)
Insert a column in the original sheet, adjacent to the ID column
Use a VLOOKUP to find the matching Simplified ID (e.g. =VLOOKUP(A1,'New IDs'!$A:$B,2,FALSE))
When it has finished calculating, copy the Simplified IDs, and Paste Special as Values

Pentaho, how to pull data from cells

I'm a new user to Pentaho AND a fairly weak user of Excel sheets, what I need Pentaho to do is what is described in the image. At the step right before conclusion I have several cells with different data.
I need to sort of merge them together into 1 cell with all the right data. I tried Normaliser/De-Normaliser and I couldn't get it to work properly.
In excel what I do is basically pull the data UP the columns to the cell I want based on a key which is common to those lines.
Let me know if someone needs further information.
In the transformation i receive a formated text file input, up until step 25 (obs) i'm reading only the first line of the text, which is where most of the information i need is located, by the pattern there are other possible 9 lines in each entry, some entries have up to 23 line,others have 6 only. Most of the data i can extract from line 1, but i also need data from 2 other lines, which the step "obs" exctracts with formulas by comparing the 2 initial digits, and then cutting the string i need from those lines, the thing is before doing the "filter rows" step, those information cells are not agregated in the same line, i need them all to be in the same line, as i posted the first image, but i cannot find the step that does so, or i don't have the knowledge to make said step function properly.
If you need more information please let me know.
I'm using this many steps because at some point i'll add triggers and validations for most of them to ensure data integrity.
Found the answer myself, first i had to use a Group by with a key that is present in all lines of the same "block" of cells, then another problem surfaced where the top line of the block contained information i needed,but it didnt have the group by key, therefore i had to use the Get Previous Row Field step to have those rows present BEFORE the Group by step. Hope i helped.

Reading an Excel sheet using ADO/ODBC in Delphi 7

I'm trying to read an Excel sheet from an XLS or XLSX file in memory using Delphi 7. When possible I use automation to read the cells one by one, but when Excel is not installed, I revert to using the ADO/ODBC Jet driver.
I connect using either
Provider=Microsoft.Jet.OLEDB.4.0; Data Source=file.xls;Extended Properties="Excel 8.0;Persist Security Info=False;IMEX=1;HDR=No";
Provider=Microsoft.ACE.OLEDB.12.0; Data Source=file.xlsx;Extended Properties="Excel 12.0;Persist Security Info=False;IMEX=1;HDR=No";
My problem then is that when I use the following query:
SELECT * FROM [SheetName$]
the returned results do not contain the empty rows or empty columns, so if the sheet contains such rows or columns, the following cells are shifted and do not end up in their correct position. I need the sheet to be loaded "as is", ie know exactly from what cell position each value comes from.
I tried to read the cells one by one by issuing one query of the form
SELECT F1 FROM `SheetName$A1:A1`
but now the driver returns an error saying "There is data outside the selected region". btw I had to use backticks to enclose the name because using brackets like this [SheetName$A1:A1] gave out a syntax error message.
Is there a way to tell the driver to select the sheet as-is, whithout skipping blanks? Or maybe a way to know from which cell position each value is returned?
For internal policy reasons (I know they are bad but I do not decide these), it is not possible to use a third party library, I really need this to work from standard Delphi 7 components.
I assume that if your data is say in the range B2:D10 for example, you want to include the column A as an empty column? Maybe? Is that correct? If that's the case, then your data set, when you read the sheet (SELECT * FROM [SheetName$]) would also return 1 million rows by 16K columns!
Can you not execute a query like: SELECT * FROM [SheetName$B2:D10] and use the ADO GetRows function to get an array - which will give you the size of the data. Then you can index into the array to get what data you want?
OK, the correct answer is
Use a third party library no matter what your boss says. Do not even
try ODBC/ADO to load arbitrary Excel files, you will hit a wall sooner or later.
It may work for excel files that contain a single data table, but not when you want to cherry pick data in a sheet primarily made for human consumption (ie where a single column contains some cells with introductory text, some with numerical data, some with comments, etc...)
Using IMEX=1 ignores empty lines and empty columns
Using IMEX=0 sometimes no longer ignores empty lines, but now some of the first non empty cells are considered field names instead of data, although HDR=No. Would not work anyway since valules in a column are of mixed types.
Explicitly looping across cells and making a SELECT * FROM [SheetName$A1:A1] works until you reach an empty cell, then you get access violations (see below)
Access violation at address 1B30B3E3 in module 'msexcl40.dll'. Read of address 00000000
I'm too old to want to try and guess the appropriate value to use so it works until someone comes with yet another mix of data in a column. Sorry for having wasted everybody's time.

How to split one row in different rows in TALEND

I need help to migrate one row from old DB to multiple rows in my New DB.
I have a data like:
OID CUSTOMER_NAME DOB ADDRESS
1 XYZ 03/04/1987 ABC
In my new DB i am storing data in KEY VALUE pair like:
OID KEY VALUE
1 CUSTOMER_NAME XYZ
1 DOB 03/04/1987
1 ADDRESS ABC
Someone please help me how to do this using TALEND tool.
you can use tMap multiple output linked to same output as one possible solution here. But it is not dynamic. why can you split the single row into multiple rows in source select query itself?
if you want to use this tmap option see below
tOracleInput(anyotherinput)-->tMap-->toutput/tlogrow
Take this row as input to tmap component and in tmap create one output group say out_1.
Now in this out_1 drag and link OID and CUSTOMER_NAME columns from input.
Now create another output group out_02 in this tmap and when "add a output" dialog comes
select "create join table from" and in the dropdown select out_1 group, so that our output rows from this out_02 group will also go to out_01 group.
So our tmap will have only one output group out_01 containing rows from both out_01 and out_02. now in out_02 drag and link OID and DBO columns.
similarly repeat it for out_03 and link OID and ADDRESS column..
Use tSplitRow to do it. Please see below.
Talend job:
output:
After spending hour or two I found a solution using Talend and without writing single line of Java Code.
if you follow all my steps then you will get desire result.
Note: I took your Inputs as a source for this development, so actuals may be differed.
Add tMap after your Input Source.
concatenate source columns with coma in single column.
at end of concatenated columns add semicolon. see the image for more details.
After tMap add tNormalize component and do setting as in image.
add tDenormalize component and and do the setting as in image.
Add tExtractDelimitedFields component and configured shown in image.
Add another tMap and do the setting as shown in image.
Now you have two output flows so add another tNormalize component for each output.
Configure first tNormalize component as shon in Image.
configure second tNormalize component with below setting, shown in image.
Our Final Job will be look like below image.
After doing all these things you will have this output
Now you can create another sub job to process these output to join and create new one as per your requirement.
tOracleInput(anyotherinput)-->tSplitRow-->toutput/tlogrow
Snap1
Snap2
you can use tPivotToColumnsDelimited.
Read More about it on talend Help Center.
This component will rotate your table on the basis of a row specified.
Thanks .

Resources