How to insert column name in the destination table in ssis? - excel

!reference to question1
As shown in the image... I have an excel sheet, which contains 32 tables one after the other (I have taken 2 tables in the image) may grow the table count... but the metadata is same for all the tables.Table has two columns one is constant(Name) & another one will get change(TPA,TPB.. etc) but there is no change in the column position.
now the problem is how to hold the header and inserted as a T_type value into the destination table ?
the no of rows in each table is not fixed( so we can't go for cell reference).

The problem as I understand it
I believe you have data in Excel that looks approximately
Name | TPA
abc | x
...
Name | TPB
acz | p
The data could be described as blocks of data. A block is defined bounded by a starting row with the value of Name in it. The next cell on that row will contain a value that applies to all subsequent rows.
After the header row, you will need to pull out the key value pairs and write them plus the table name into your destination.
The meta data remains consistent, it's just the source data is all banjaxed.
Resolution
This is exactly the problem I had to overcome when I wrote SSIS Excel Source via SSIS. We had to source our data feeds from reports instead of clean tabular data. Using that approach, you would simply define your equivalent ParseSample method and there in the foreach loop (line 71 of ExcelParser) you'd put in the logic of a block is everything from a field with a value of 'Name' until you encounter an empty row.
Psuedocode approximate
# enumerate through all my source data
foreach row in source data
# assign values to local variables
col0 = row[0]
col1 = row[1]
# Test for end of block
if col0 == "Name"
tableName = col1
else if col0 == string.Empty
# do nothing
else
newRow = dataTable.NewRow()
newRow[0] = col0
newRow[1] = tableName
newRow[2] = col1
dataTable.Add(newRow)
If you want to simplify the matter, you can have all the parsing logic in the ScriptMain and dispense with all the data table nonsense.
Upside is there'd be less code, downside is that debugging scripts is the devil in SSIS pre-2012. It's still kludgey in 2012 but it's better than the nothing that came before it.

Related

How to get the data from previous row in Azure data factory

I am working on transforming data in Azure data factory
I have a source file that contains data like this:
ABC Code-01
DEF
GHI
JKL Code-02
MNO
I need to make the data looks like this to the sink file:
ABC Code-01
DEF Code-01
GHI Code-01
JKL Code-02
MNO Code-02
You can achieve this using Fill down concept available in Azure data factory. The code snippet is available here.
Note: The code snippet assumes that you have already added source transformation in data flow.
Steps:
Add source and link it with the source file (I have generated file with your sample data).
Edit the data flow script available on the right corner to add code.
Add the code snippet after the source as shown.
source1 derive(dummy = 1) ~> DerivedColumn
DerivedColumn keyGenerate(output(sk as long),
startAt: 1L) ~> SurrogateKey
SurrogateKey window(over(dummy),
asc(sk, true),
Rating2 = coalesce(Rating, last(Rating, true()))) ~> Window1
After adding the code in the script, data flow generated 3 transformations
a. Derived column transformation with a new dummy column with constant “1”
b. SurrogateKey transformation to generate Key value for each row starting with value 1.
c. Window transformation to perform window based aggregation. Here the code add predefined clause last() to take previous row not Null vale if current row value is NULL.
For more information on Window transformation refer - https://learn.microsoft.com/en-us/azure/data-factory/data-flow-window
As I am getting the values as single column in source, added additional columns in Derived column to split and store the single source column into 2 columns.
Substitute NULL values if column value is blank. If it is blank, last() clause will not recognize as NULL to substitute previous values.
case(length(dropLeft(Column_1,4)) >1, dropLeft(Column_1,4), toString(null()))
Preview of Derived column: Column_1 is the Source raw data, dummy is the column generated from the code snippet added with constant 1, Column1Left & Column1Right are to store the values after splitting (Column_1) raw data.
Note: Column1Right blank values are replaced with NULLs.
In windows transformation:
a. Over – This partition the source data based on the column provided. As there no other columns to uses as partition column, add the dummy column generated using derived column.
b. Sort – Sorts the source data based on the sort column. Add the Surrogate Key column to sort the incoming source data.
c. Window Column – Here, provide the expression to copy not Null value from previous rows only when the current value is Null
coalesce(Column1Right, last(Column1Right,true()))
d. Data preview of window transformation: Here, Column1Right data Null Values are replaced by previous not Null values based on the expression added in Window Columns.
Second derived column is added to concat Column1Left and Column1Right as single column.
Second Derived column preview:
A select transformation is added to only select required columns to the sink and remove unwanted columns (This is optional).
sink data output after fill down process.

Cleaning Excel Table using VBA without impacting the entire table and formatting

Hi I am trying to change to write VBA for excel to clean up data elements that has extra information without impacting the other elements.
I am writing VBA for the first time my table is in the middle of the sheet.
Given Table and Requested Output.
I think your question was not clear in regard to the "steps" that you want to perform on your data (i.e. the exact logic or transformation that needs to be applied).
Based purely on your images and your comment, I make the "steps" to be:
Split any customer IDs in column valueC into multiple rows.
If column valueC does not contain customer IDs (i.e. is blank or contains non-customer ID text), leave it untouched.
My answer uses Power Query instead of VBA. If you are interested in trying it out, in Excel try clicking Data > Get Data > From Other Sources > Blank Query, then click Advanced Editor near the top-left, copy-paste the code below, then click Done.
You might need to change the name of the table in the first line of the code (below), as it was "Table1" for me, but I imagine yours is named something else. Also, the code below is case-sensitive. So if there is no column named exactly valueC, then you will get an error.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
fxProcessSomeText = (textToProcess as any) =>
let
canBeSplit = Text.StartsWith(textToProcess, "### customer id"),
result = if textToProcess is null then null else if canBeSplit then Text.Split(Text.BetweenDelimiters(textToProcess, "### customer id", " ###"), ",") else {textToProcess}
in
result,
invokeFunction = Table.TransformColumns(Source, {{"valueC", fxProcessSomeText}}),
expanded = Table.ExpandListColumn(invokeFunction, "valueC"),
reindex =
let
removeIndex = Table.RemoveColumns(expanded, {"index"}),
addIndex = Table.AddIndexColumn(removeIndex, "index", 1, 1),
moveIndex = Table.ReorderColumns(addIndex, List.Distinct(List.InsertRange(Table.ColumnNames(addIndex), 0, {"index"})))
in
moveIndex
in
reindex
My output table contains more rows than yours. Also, the value in column valueA, row 11 is 1415 for me (it is 1234 in your request output). Not sure if this is a mistake in your example, or if I'm missing some logic.

How to validate source data using a Range validation table in Excel Power query

Can anyone please help me on this?
I am trying to validate the source data in 3 columns BU - Act - Dept in the Range validation Data [BU - Beginning Act - End Account - Beginning Dept - End Dept] (screenshot attached).
BU is a single column in validation table but Act & Dept columns are range columns. I need to check if the BU - Act - Dept combo exists in Range data.
So does Power query has any functionality to validate the source data using a nested join?
This needs to be done in Excel power query and not formulas because the validation Range file has more than 1 million records.
I'm not sure how efficient this is, but it works in principle.
I'll assume you have both tables loaded into the Query Editor with names Source and Validation.
First, choose the Source query and merge in the Validation table, matching on Source[Unit] = Validation[BU] for a left outer join.
Once merged, expand all the columns except Validation[BU]. This will give you a table with more rows since it will pull over every row in Validation that corresponds to the Source[Unit].
Now you can write a validation Status column. Add Column > Custom Column:
= if [Account] >= [Beg Act] and [Account] <= [End Act] and
[Dept] >= [Beg Dept] and [Dept] <= [End Dept]
then "Valid" else "Invalid"
Now that you have this column, group by the first three columns and take the max over the new custom column, Status. This should reduce your table back to its original size and give "Valid" in the Status column if that row matched any of the conditions that were pulled over from the Validation table, otherwise "Invalid".

How to keep Rows and Columns headers when applying operation using Matlab

I have a data set stored in an excel file, when i importing data using matlab function :
A=xlread(xls -filename)
matrix A only stored numeric values of my table.. and when i used another function such as:
B= readtable(xls-filename)
then table will view complete data include rows and columns headers but when i apply such operation on it like
Bnorm=normc(B)
its unable to perform normalization on it due to the rows and columns headers ..
my question are:
is there any way to avoid rows and columns header in table B.
is there any way to store rows and columns headers when read table using xlread function .. such that
column header = store first row in (xls-filename)
row headers = store first column in (xls-filename)
thanks for any suggestion
dataset table
normalized matrix when apply xlread(xls-filename
The answers to your specific questions are:
With a table, you can avoid row labels but column labels always exist.
As per the doc for xlsread, the first output is the numeric data, and the second output is the text data, which in this case would include your header information.
But, in this case, you just need to learn how to work with tables properly. You want something like,
>> Bnorm = normc(B{:,2:end});
which extracts all the numeric elements of table B and uses them as input to normc.
If you want the result to be a table then use
Bnorm = B;
Bnorm{:,2:end} = normc(B{:,2:end}));

Table comprehensions: get subset from internal table into another one

As stated in the topic, I want to have a conditioned subset of an internal
table inside another internal table.
Let us first look, what it may look like the old fashioned way.
DATA: lt_hugeresult TYPE tty_mytype,
lt_reducedresult TYPE tty_mytype.
SELECT "whatever" FROM "wherever"
INTO CORRESPONDING FIELDS OF TABLE lt_hugeresult
WHERE "any_wherecondition".
IF sy-subrc = 0.
lt_reducedresult[] = lt_hugeresult[].
DELETE lt_reducedresult WHERE col1 EQ 'a value'
AND col2 NE 'another value'
AND col3 EQ 'third value'.
.
.
.
ENDIF.
We all may know this.
Now I was reading about the table reducing stuff, which is introduced
with abap 7.40, appearently SP8.
Table Comprehensions – Building Tables Functionally
Table-driven:
VALUE tabletype( FOR line IN tab WHERE ( … )
( … line-… … line-… … )
)
For each selected line in the source table(s), construct a line in the result table. Generalization of value constructor from static to dynamic number of lines.
I was experimenting with that, but the results seem not really to fit,
perhaps I am doing it wrong, or I might even need the condition-driven approach.
So, how would it look like, if I want to write the above statement with table comprehension techniques ?
Until now I have this, delivering not that, what I need, and I have seen, that
it seems, as if the "not equal" is not possible...
DATA(reduced) = VALUE tty_mytype( FOR checkline IN lt_hugeresult
WHERE ( col1 = 'a value' )
( col2 = 'another value' )
( col3 = space )
).
Anyone having some hints ?
EDIT: Seems still not to work. Here is, as I do it:
Executable line:
Debugger results:
Wrong Reduced:
And what now ???
You could use the FILTER operator with the EXCEPT WHERE addition to filter out any rows that match the where clause:
lt_reducedresult = FILTER # ( lt_hugeresult EXCEPT WHERE col1 = 'a value'
AND col2 <> 'another value'
AND col3 = 'a third value' ).
Note that lt_hugeresult would have to be a sorted table, and the col1/col2/col3 need to be key components (you can specify a secondary key using the USING KEY addition).
The documentation for FILTER explicitly notes that:
Table filtering can also be performed using a table comprehension or a table reduction with an iteration expression for table iterations with FOR. The operator FILTER provides a shortened format for this special case and is more efficient to execute.
A table filter constructs the result row by row. If the result contains almost all rows in the source table, this method can be slower than copying the source table and deleting the surplus rows from the target table.
So your approach of using DELETE might actually be appropriate depending on the size of the table.
The Table Iterations may be a lot confusing when you use WHERE, because of parenthesis groups.
The "NOT EQUAL" condition is very well supported, as I show below in the solution of your first example. The issue you observe is due to misproper use of parenthesis groups.
You must absolutely define the whole logical expression after WHERE Inside ONE parenthesis group (one, or several elementary conditions separated by logical operators AND, OR, etc.)
After the parenthesis group for WHERE, you define usually only one parenthesis group which corresponds to the line to be added to the target internal table. You may define subsequent parenthesis groups, if for each line in the source internal table, you want to add several lines in the target internal table.
In your example, only the first parenthesis group applies to WHERE (either col1 = 'a value' in your first example, or insplot = _ilnum in your second example).
The subsequent parenthesis groups correspond to the lines to be added, i.e. 2 lines are added for each source line in the first example (one line with col2 = 'another value', and one line with col3 = space), and 3 lines are added for each source line in the second example (one line with inspoper = i_evaluation-inspoper, one line with inspchar = i_evaluation-inspchar, one line corresponding to the line of _single_results).
So, you should write your code as follows.
First example :
DATA(reduced) = VALUE tty_mytype( FOR checkline IN lt_hugeresult
WHERE ( col1 = 'a value'
AND col2 <> 'another value'
AND col3 = 'third value'
)
( checkline )
).
Second example :
DATA(singres) = VALUE tbapi2045d4( FOR checkline IN _single_results
WHERE ( insplot = _ilnum
AND inspoper = i_evaluation-inspoper
AND inspchar = i_evaluation-inspchar
)
( checkline )
).
I compared old-fashioned syntax of your above example with table comprehension technique and got exactly the same result.
Actually, your sample is not functional because it lacks row specification for constructed table reduced.
Try this one, which worked for me.
DATA(reduced) = VALUE tty_mytype( FOR checkline IN lt_hugeresult
WHERE ( col1 = 'a value' AND
col2 = 'another value' AND
col3 = space )
( checkline )
).
In the above sample we have the most basic type of result row specification where is is absolutely similar to source table. More sophisticated examples, where new table rows are evaluated with table iterations, can be found here.

Resources