Removing specific rows in an Excel file using Azure Data Factory - azure

I have a set of excel files inside ADLS. The format looks similar to the one below:
The first 4 rows would always be the document header information and the last 3 will be 2 empty rows and the end of the document indicator. The number of rows for the employee information is indefinite. I would like to delete the first 4 rows and the last 3 rows using ADF.
Can any help me with what should be expressions in the Derived column / Select?

My Excel file:
Source Data set settings (give A5 in range and select first row as header):
SourceDataSetProperties
Make sure to refresh schema in the source data set.
Schema
After schema refresh, if you preview the source data, you will be seeing all rows from row number 5. This will include footer too which we can filter in data flow.
Next, add a filter transformation with below expression
!startsWith(sno,'dummy') && sno!=''
this will filter out the rows starting with dummy, in your case, end of document. Also we are ignoring the empty rows by checking sno!=''
Final Preview after filter:

How about this? Under the 'Source' tab, choose the number of lines you want to skip.

Related

Excel Function to Exclude rows based on certain values

I to exclude rows in a excel table based on certain values
For example:
I need to exclude all rows if column A is equal to any of these numbers ( 5840,4302,4432, and so on)
As the table data will be huge to filter only the data that I want.
One way is to exploit Excel Table feature together with the FILTER() spreadsheet function. NB. You will need a relatively recent Excel version for this. Using a Table provides some extra useful functionality (such as automatically adding rows and allowing reference by column name).
The OP's input data may already be a Table, if so, this first step can be skipped.
Put the input and filter list into tables. Excel help page. After the table has been created I have used the Table Design menu (which appears in the menu bar when a cell in the table is selected) to turn off the row banding format and header filters. This is also where you can rename the Tables. I have named them "Input" and "Exclude"
For the filtered output, choose where you want the output to start (cell H3 in my example), and enter a formula to copy the headers: =Input[#Headers]. Of course you can copy and paste the headers manually if you like. Here I've used the Format Painter to copy across the cell formats for the headers.
In the next cell down (H4 in my example), enter this formula: =FILTER(Input,(LEN(Input[ID])>0) * ISERROR(MATCH(Input[ID],Exclude[IDs to exclude],0))).
You should be able to add or delete new rows (right-click in the Table and choose Delete) in both the Input and Exclude tables, and the output should react (if you have Calculation set to Automatic).
NB. The Output range is NOT a Table. Excel doesn't let you convert dynamic ranges into Tables.
EDIT: If you don't want to use Tables, you can simply supply the ranges as the parameters to the FILTER function.
In this example =FILTER(B4:D13,(LEN(B4:B13)>0) *ISERROR(MATCH(B4:B13,F4:F5,0)))

Excel Power Pivot combining multiple sources into column

I have multiple similar data files with a large overlap of similar rows. I'd like to combine them so that a given column from each set appears in a distinct column in a new table. Essentially this is very similar to a standard pivot table where the source is a column field and the values of the field are those of the original files where present.
So for 2 source files:
File 1
and File 2
I'd like to end up with:
So all the common data is in the row and there is one column for each file containing the "Status" (or blank if that row isn't present).
I want to have this as a data source that I can then pivot. Is this possible? I know how to combine the files into a single source using Get Data -> From Folder and I know I can then pivot that data, but it doesn't get me to the final solution above.
Assuming you've got 2 separate queries bringing in data from the 2 source files as listed above, first step would be to add a 'File' column to each of them ie Table.AddColumn(#"Previous Step", "File", each "File 1/2/3 etc", type text) , ie so you end up with:
and
Then Append the 2 adjusted tables to give you this
Select the 'File' column, go to Transform => Pivot Column and in the pivot window choose 'Status' as the Values column and Don't Aggregate as the Aggregate Function
Which gives you your desired result

Transforming Excel worksheet with multiple table in FME

I need to transform Excel files to ESRI FileGDB using FME.
The problem is that my excel worksheets contains more than one table.
Example: At row 1, I have the attributes of the first table. Row 2 to 4 contains the values.
At row 6 I have the attributes of the second table. 45 next rows are the values.
And the same thing for the third table.
These rows can change. I could have the attributes of the second table at any row.
I think the best solution would be to have a process that split the .xls file in three different files so I can transform them directly into ESRI format.
Is there a transformer that could perform this task or should I code it myself in Python?
PS: This process will be called from a REST Service so I can't do this manually. Also, the columns name will always be the same.
Thanks
FME reads the Excel rows in order, so I would add a Counter transformer after reading the Excel file.
The column names don't change, so you could check at which row (number given by the Counter) the new table begins.
Then is just a matter of filtering the features with a TestFilter.

How do I add a blank subtotal row after each different value in a specific column in data that has been filtered with an advanced filter?

I am getting an Excel 2010 Workbook with chunks of data. There are a variable number of blank rows between each chunk of data. Here's what I do with the data using macros:
I copy the data from the source workbook into a workbook with my macros.
I remove the blank rows.
I then sort the data with 4 sort criteria.
I then use an advanced filter to extract 6 of the 26 different types of data.
I then use the VBA code within a macro found here: How to automatically insert a blank row after a group of data to add a blank row after each unique value in Column A, but I am getting numerous blank rows as it appears to be adding a blank rows based on the original data, not the filtered data:
What I need is a way to add a blank row after each unique value in column A after the data is filtered to add sub-totals and counts.
You might copy the filtered data to a separate spreadsheet tab then do the insert.
Otherwise you will have to use the SpecialCells property mentioned above, which is much more complex.

How can I split a cell into two rows?

Please see the following picture. I don't know how to create two rows inside a single data cell in Cognos.
You can insert a table with 2 rows and a single column for the Version query item cell. Then you can populate the 2 rows with appropriate fields from Data Source.
Please find below screen shot where I inserted a table and populated fields in the 2 rows.
This report has been tested and runs fine.

Resources