Counting number of columns in Excel using talend - excel

I would like to know how I can count the number of columns of an excel sheet using Talend

If you want to count no. of columns in the Excel file then please follow the below instructions.
Use tfileinputExcel component.
Add some basic instruction in the component with limit 1 but important is "Put one name in Schema 'excel_header' (You can use any name) and change its type to 'Dynamic' ". It will fetch the full row from the excel file.
Now Use second component 'tjavaRow'. Link your excel component through the 'main' row to tjavarow component.
Syn your schema in tjavarow component and put below mentioned code.
System.out.println("Counter ::: "+input_row.excel_heade.getColumnCount());
Now your job will count columns for you.
Note: You can store this counter value in any context variable and can use it.
Thank you

tFileInputExcel (just read 1 row i.e. column headers record) ----> tFileOutputDelimited
tFileInputDelimited (read the file created above) --> tNormalize --> tFilterRow_1 (filter null & blank rows) --> tJava_1
In tJava_1 component, you can get the number of rows by using (String)globalMap.get("tFilterRow_1_NB_LINE_OK").
This number of rows actually represent your number of columns in excel.

Related

Generate a multicolumn table using docxtpl

I have a series of data (in 2-dimensional list 'CombinedTable') I need to use to populate a table in an MS Word template. The table has 7 columns so I attempted the following using docxtpl module:
context = {
'tpl_modules1': CombinedTable[0]
'tpl_modules2': CombinedTable[2]
'tpl_modules3': CombinedTable[4]
'tpl_modules4': CombinedTable[6]
'tpl_modules5': CombinedTable[8]
'tpl_modules6': CombinedTable[10]
'tpl_modules7': CombinedTable[12]
}
tpl.render(context)
tpl.save(FilePath + FileName)
Not the most elegant solution I know but am just trying to get this working- unfortunately using this code with the following template results in tpl_modules7 data being written in to all columns, rather than just the 7th.
Does anyone have advice for how to resolve this? I attempted to create a for loop through the columns as well as rows but was unsuccessful in writing anything to the doc (was saved as a blank & empty doc).
The CombinedTable variable is a list of 12 lists (one for each column in template, although only 7 contain data). Each of these 12 lists contains another list with cell data whose length is equal to the number of rows to be written to the table in that column. This means that the number of rows that are written to varies for each column.
EDIT: Looking more closely at the docs, it states that I cannot use %tr multiple times in the same row. I assume I will then have to use a loop through %tc and %tr (which I tried & couldn't get working). Any advice on how to implement this? Especially on the side of the word document. Thanks!
I was able to resolve this satisfactorily for my requirements, however my solution may not suit all. I simply set up 7 different tables in a document with 7 columns and adjusted margins/borders to suit the dimensions I required for the tables. Each of the 7 tables had identical docxtpl syntax as image in my question with the small buffer columns between them being replaced by columns in the word document.

I want to compare the row count of two different Excel files in SSIS get an email alert

I am looking for a way to compare the row count of two Excel files in SSIS, and if the row count of one of the files is >= the row count of the second, I would like to receive an email informing me of this. Is this something I can do in Visual Studio, and if so, how?
I'd structure it like this
I have 4 SSIS variables defined. Two of them will be used in the data flows to capture the amount of rows generated from the sources.
The other two have Expressions applied to them to calculate values.
#[User::RowCountFile1] > #[User::RowCountFile2]
That generates a true/false value that I will use in Send Email to determine whether there is any work (email) to be done.
Since I'm lazy, I also used an Expression to generate the body of the email
"The value of File1 is " + (DT_WSTR,20) #[User::RowCountFile1] + " and File2 is " + (DT_WSTR,20) #[User::RowCountFile2]
Both data flow tasks look like this
The final configuration is to add an Expression to the Send Email task and change the Disable property to be driven by the our #[User::IsFile1BiggerThan2] variable.
first solution is : read excel file and load to data table ,then run query for compare two data table ,then send email .
second solution is : when you read file by query select row count in bind in value1 and value2 then compare.

SQL - Linked Server with Excel imports values as NULL

I have been successfully using a linked server with SQL Server Management Studio importing a file from Excel which has four columns.
The Excel document looks like (no TOOL means blank cell, rows 6-199)
TDS HOLDER TOOL
1 3 1187
2 4 09812
3 5 9082
4 2 ----
5 76 ----
6 9
7 1
. .
. .
. .
200 18 CT-2989
201 98 CT-9871
When I import it as is, it will grab the cells with the numbers at the top, cells that contain ------ and then when it gets to the cells which are blank it will then print NULL for the rest of the data, which is incorrect.
When I alter my Excel document so that the 'CT' values are at the top, it will grab all of the proper CT and TL values in column 3.
The problem is with the SQl Server Import and Export wizard. It uses the data in the top few rows of the spreadsheet to decide on the data types in each column. When your Tools column has numbers at the top the wizard decides the data type of the column is float. When the column had "CT-2989" at the top it chooses a char type. Once it has chosen the float type it will ignore CT-2989 because it isn't convertable to a floating point number. The simplest solution to the problem it to arrange your Excel spreadsheet with a dummy row at the very top which gives the wizard the proper types for each column. For example, make the first data cell in the Tools column "abcdefg", assuming the rest of the data in that column consists of up to 7 alphanumeric characters. Once your data has imported into SSMS, delete the dummy row.
If you go to the "Review DataType Mapping" page of the wizard it will show that the Tools column has been detected as containing float data when the numeric data is at the top of the spreadsheet. Note that the even if the destination type for the Tools column is nvarchar, the wizard makes it's own decisions regarding source type.
There are other solutions using openrowset() and SSIS, but this one is quick and simple.
Here the problem is with OLEDB which is unable to handel mixed data(numbers + text) so there is no solution only a few hacks some of them already mentioned above I just want to add a few more:-
1) In excel sheet Keep the data consistent and maintain distinct coloumns for each depending upon its data type e.g. text,numeric or whole numbers, fractional numbers etc.
2) Ok before importing it break down the sheet in multiple sheets based on its datatype so that OLEDB won't get confiused.
3) Import the excel sheet in MS Access so that it all the data would get a data type then import it into SQL this would handle NULL too very wisely.
Save the worksheet as .CSV and import as a flat file from task, when reviewing the data, unchecked the Datatypes indicator.

Break-Down Data in Excel without VBA (Formula Only)

Many times, I am required to provide some type of break-down to the customers - an example is shown in the attached figure.
I have a table of data ("TABLE DATA" - which is some type of pivot) + Customer provides its official form, its structure must be preserved (highlighted in yellow ). Basically, I need to separate the cost details of CODE "A" and CODE "B" into 2 separated sections.
Customer requires me to provided details for each individual Part (example shows Part A - "Break-Down Part A)
Is there anyway to put a"ITEM" from "TABLE DATA" into Code A and Code B ? the rests can be solved by Vlookup (Price, Quantity) - note: "ITEM" is non-duplicated values . Thank you very much
Number your rows in the breakout using =1 and =A1+1 and then just use the formula ="B-ITEM"&TEXT(A1,"000"). If you want to skip making a counter column you could use ="B-ITEM"&TEXT(ROW()-1,"000") to just use the current row number (minus 1 or however many you need).
If your items aren't sequentially like that, but still unique, I would recommend adding counters on the original tab similar to what you have, which would let you quickly find the 5th A or 7th B, something that counts the previous instances of your current type, and then adds 1. For Row 6 you could do =COUNTIF(A$1:A5,A6)+1.

XLConnect setcellformula usage

I was going through the XLConnect package reference manual. I came across one function called "SetCellFormula" and its example. I was trying to implement this for data in the text file imported into Excel.
What I wish to do is the following
1. Import data of 200 rows and 400 columns into the Excel file.
2. For each row of data containing 400 columns, compute average, median, minimum and maximum and place the results in another sheet. I want to use the SetCellFormula in case I want to edit the cells directly into Excel, as there are associated features like idx2col, col2idx, ecdt. I find that it is easy to do for a single row. Is there any possibility to do that for all rows in a single statement?
3. Please note that the value for rows and columns can vary based on the type of data available.
Please let me know how to proceed with this.

Resources